diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000000..cb76e124ad
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,32 @@
+README.html
+README_files/
+*.DS_Store
+*__pycache__
+*.h5ad
+changelogs
+
+# IDE ignores
+/.idea/
+/.vscode/
+
+# repo specific ignores
+output_bash
+
+# R specific ignores
+.Rhistory
+.Rproj.user
+*.Rproj
+
+# viash specific ignores
+docker_output/
+
+check_results/
+log.txt
+.viash*
+/resources/
+/resources_test/
+
+# nextflow specific ignores
+/.nextflow*
+/work
+output
diff --git a/.nojekyll b/.nojekyll
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/CHANGELOG.md b/CHANGELOG.md
new file mode 100644
index 0000000000..3634f223dd
--- /dev/null
+++ b/CHANGELOG.md
@@ -0,0 +1,1100 @@
+# openproblems v2.0.0
+
+A major update to the OpenProblems framework, switching from a Python-based framework to a Viash + Nextflow-based framework. This update features the same concepts as the previous version, but with a new implementation that is more flexible, scalable, and maintainable.
+
+Structure:
+
+* `src/common`: Common components used by all tasks.
+* `src/datasets`: Components for fetching and processing datasets.
+* `src/tasks`: Benchmarking tasks
+  - `batch_integration`: Batch integration
+  - `denoising`: Denoising
+  - `dimensionality_reduction`: Dimensionality reduction
+  - `match_modalities`: Match modalities
+  - `predict_modality`: Predict modality
+  - `spatial_decomposition`: Spatial decomposition
+  - `spatially_variable_genes`: Spatially variable genes
+* `src/wf_utils`: Workflow utilities.
+
+For more information related to the structure of this repository, see the [documentation](https://openproblems.bio/documentation/reference/openproblems/).
+
+
+# openproblems v1.0.0
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added `cell2location` to the `spatial_decomposition` task.
+- Added nearest-neighbor ranking matrix computation to `_utils`.
+- Datasets now store nearest-neighbor ranking matrix in `adata.obsm["X_ranking"]`.
+- Added support for parsing Nextflow output and generating benchmark results for the website.
+- Added `max_samples` parameter to `qlocal`, `qglobal`, `qnn_auc`, `lcmc`, `qnn`, and `continuity` metrics to allow for subsampling of data for faster computation.
+- Added new scArches based methods: `scarches_scanvi_xgb_all_genes` and `scarches_scanvi_xgb_hvg`.
+- Added `prediction_method` parameter to `_scanvi_scarches` to specify prediction method.
+- Added `_pred_xgb` function to perform XGBoost prediction based on latent representations.
+- Added `obsm` parameter to `_xgboost` function to allow specifying the embedding space for XGBoost training.
+
+## Major changes
+- Updated `scvi-tools` to version `0.20` in both Python and R environments.
+- Updated datasets to include nearest-neighbor ranking matrix.
+- Modified dimensionality reduction task to include nearest-neighbor ranking matrix computation in dataset generation.
+- The website update workflow was refactored to use a new workflow using json instead of markdown.
+- Updated the website generation process to remove duplicate BibTex entries.
+- Added a new `parse_metadata.py` script for generating metadata for the website.
+- Added a new function to `openproblems.utils.py` to get the member ID of a task, dataset, method or metric.
+- Removed the redundant computation and storage of the nearest-neighbor ranking matrix in datasets.
+
+## Minor changes
+- Updated method names to be shorter and more consistent across tasks.
+- Improved method summaries for clarity.
+- Updated JAX and JAXlib versions to 0.4.6. 
+- Updated dependencies to support new versions of Snakemake and GitPython.
+- Removed code related to "nbt2022-reproducibility" repo and merged it into the main website.
+- Updated the schema for benchmark results to include submission time, code version, and resource usage metrics.
+- Improved error handling and added logging to the parsing script.
+- Removed the "raw.json" file from the results directory and merged all data into a single "results.json" file.
+- Updated the workflow to upload the final results to the website's results directory instead of the data directory.
+- Removed unnecessary code and refactored the parsing script for better readability.
+- Added unit tests for the new parsing script.
+- Updated the `run_tests` workflow to skip testing on the `test_website` branch.
+- Updated the `run_tests` workflow to skip testing on the `test_process` branch.
+- Updated the `create-pull-request` step to set the author for the pull request.
+- Updated the `run_tests` workflow to skip testing on pull request reviews.
+- Updated the `update_website_content` workflow to update the website on the `main` branch.
+- Updated the `main.bib` file to fix a typo.
+- Removed extraneous headings from task README files.
+- Updated `generate_test_matrix.py` to use the new `openproblems.utils.get_member_id` function.
+- Updated the website generation process to copy BibTex files to the correct location.
+- Updated the `process_requires` section in `setup.py` to include `gitpython`.
+- Updated git commit hash generation for openproblems functions.
+- Modified `_xgboost` to allow for specifying `tree_method`.
+- Modified `_scanvi_scarches` to consistently use `unlabeled_category`.
+- Modified `_scanvi_scarches` to remove unnecessary copying of `labels`.
+- Removed `_scanvi_scarches` functions that were redundant with `_scanvi_scarches`.
+- Removed unused `_scanvi` functions.
+- Modified `_scanvi_scarches` to allow for specifying `prediction_method` and handle `unlabeled_category` consistently.
+
+## Documentation
+- Improved the documentation of the `auprc` metric.
+- Improved the documentation of the `cell2location` methods.
+- Document sub-stub task behaviour
+
+## Bug fixes
+- Fixed an error in `neuralee_default` where the `subsample_genes` argument could be too small.
+- Fixed an error in `knn_naive` where the `is_baseline` argument was set to `False`.
+- Fixed calculation of ranking matrix in `_utils` to include ties.
+- Fixed a bug in `load_tenx_5k_pbmc()` where a warning about non-unique variable names was being raised.
+- Removed the unused `_utils.py` file.
+- Removed the `X_ranking` entry from the `obsm` attribute of datasets.
+- The `_fit()` function in `nn_ranking.py` now subsamples the data if `max_samples` is specified.
+- The `nn_ranking` metrics now use subsampling in the `_fit()` function to improve performance.
+- Fixed the git hash generation for openproblems functions
+- Fixed a warning about `pkg_resources` being deprecated
+- Removed unnecessary `fetch-depth: 1` from workflow
+- Fixed potential issue in `_scanvi_scarches` where `labels_pred` could be overwritten
+- Fixed potential issue in `_pred_xgb` where `num_round` wasn't being used correctly
+- Fixed an issue where baseline methods were not being filtered correctly from the benchmark results.
+- Fixed an issue where metrics with all NaN values were not being removed from the benchmark results.
+- Fixed an issue where some metrics were not being parsed correctly from the Nextflow output.
+- Fixed an issue where the "mean_score" field was not being calculated correctly for each method.
+- Fixed an issue where the "code_version" field was not being populated correctly for each method.
+- Fixed an issue where the "submission_time" field was not being populated correctly for each method.
+- Fixed an issue where the resource usage metrics were not being parsed correctly from the Nextflow output.
+- Updated the `run_tests` workflow to skip testing on the `test_website` branch.
+- Updated the `run_tests` workflow to skip testing on the `test_process` branch.
+- Updated the `create-pull-request` step to set the author for the pull request.
+- Updated the `run_tests` workflow to skip testing on pull request reviews.
+- Updated the `update_website_
+
+
+# openproblems v0.8.0
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added the zebrafish_labs dataset to the dimensionality reduction task.
+- Added the `diffusion_map` method to the dimensionality reduction task.
+- Added the `spectral_features` method to the dimensionality reduction task, which uses diffusion maps to create embedding features.
+- Added the `distance_correlation_spectral` metric to the dimensionality reduction task, which evaluates the similarity of the high-dimensional Laplacian eigenmaps on the full data matrix and the dimensionally-reduced matrix.
+- Added baseline methods for batch integration: no integration, random integration, random integration by cell type, random integration by batch.
+- Added `alra_sqrt_reversenorm`, `alra_log_reversenorm` methods for ALRA with reversed normalization order.
+- Added `celltype_random_embedding_jitter` method to randomize embedding with jitter.
+
+## Minor changes
+- Improved the `density_preservation` metric calculation.
+- Updated the `distance_correlation` metric to use the new `diffusion_map` method.
+- Increased the default number of components used for `distance_correlation_spectral` to 1000.
+- Made metrics more robust by copying the AnnData object before passing it to the metric function.
+- Added `is_baseline` flag to `adata.uns` in `method` decorator.
+- Added `is_baseline` field to `adata.uns` for all methods.
+- Increased default values for `max_epochs_sp` and `max_epochs_sc` in `destvi` method.
+- Changed default value of `early_stopping_monitor` to `elbo_validation` from `reconstruction_loss_train` in `destvi` method.
+- Added `train_size` and `validation_size` arguments to the `sc_model.train` call in `destvi` method.
+- Added `batch_size` and `plan_kwargs` arguments to the `st_model.train` call in `destvi` method.
+- Refactor ALRA methods for improved clarity and consistency.
+- Added tests for new ALRA methods with reversed normalization order.
+- Added jitter parameter to `_random_embedding` function.
+- Updated `celltype_random_embedding` to use `jitter=None` in `_random_embedding`.
+- Removed unnecessary parameters from the `sample_dataset` function.
+- Removed unnecessary checks for PCA and neighbors in the `check_dataset` function.
+- Updated `pytest.ini` to ignore deprecation warning related to `pkg_resources`. 
+- Added permission to all workflows to read and write contents
+- Added permission to write pull requests to several workflows
+- Added permission to write packages to the `run_tests` workflow.
+
+## Bug fixes
+- Fixed a bug in `density_preservation` that caused it to return 0 when there were NaN values in the embedding.
+- Removed unused `true_features_log_cp10k` and `true_features_log_cp10k_hvg` methods.
+- Removed unnecessary imports in metrics.
+- Removed unnecessary `neighbors` calls in metrics.
+- Removed unused `_get_split` function.
+- Added `embedding_to_graph` and `feature_to_graph` functions for graph-based metrics.
+- Added `get_split` function for metrics that require splitting data into training and testing sets.
+- Added `feature_to_embedding` function for embedding-based metrics.
+- Fixed issue where baseline methods were not properly documented.
+* Increased default maximum epochs for spatial models to improve performance.
+* Improved training parameters for both spatial and single-cell models to improve stability and performance.
+* Updated validation metric used for early stopping in spatial model to improve training quality.
+
+## Documentation
+- Updated documentation to clarify that the AnnData object passed to metric functions is a copy.
+- Updated the documentation for batch integration tasks to reflect the change in the expected format of the dataset objects.
+
+## Major changes
+- Moved baseline methods from individual task modules to a common module.
+- Removed redundant baseline methods from individual task modules.
+- Increased default values for `max_epochs_sp` and `max_epochs_sc` in `destvi` method.
+
+
+# openproblems v0.7.4
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added metadata for all datasets, methods, and metrics.
+
+## Major changes
+- Updated nf-openproblems to v1.10.
+
+## Minor changes
+- Added a new `docker_pull` rule to the Snakemake workflow to pull Docker images.
+- Added a new `docker` rule to the Snakemake workflow to build Docker images.
+- Changed the `pytest` command to include coverage for the `test` directory.
+- Added new environment variables for the TOWER_TEST_ACTION_ID and TOWER_FULL_ACTION_ID to the Snakemake workflow.
+- Updated the `scripts/install_renv.R` script to increase the number of retry attempts.
+
+
+# openproblems v0.7.3
+
+Note: This changelog was automatically generated from the git log.
+
+## Minor changes
+- Updated `scib` version to `1.1.3` in `docker/openproblems-r-extras/requirements.txt` and `docker/openproblems-r-pytorch/requirements.txt`.
+## Bug fixes
+- Added `pytest-timestamper` to test dependencies for better debugging. 
+
+
+# openproblems v0.7.2
+
+Note: This changelog was automatically generated from the git log.
+
+## Bug fixes
+- Fixed an issue where pymde did not work on sparse data. 
+
+
+# openproblems v0.7.1
+
+Note: This changelog was automatically generated from the git log.
+
+## Minor changes
+- Added `hvg_unint` and `n_genes_pre` to the lung batch.
+
+
+# openproblems v0.7.0
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added a bibtex file `main.bib` for storing all references cited in the repository.
+- Added a section on adding paper references to `CONTRIBUTING.md` explaining how to add entries to `main.bib` and link to them in markdown documents.
+- Added new baseline methods for dimensionality reduction: "True Features (logCPM)", "True Features (logCPM, 1kHVG)".
+- Added `alra_log` method, which implements ALRA with log normalization.
+- Added `alra_sqrt` method, which implements ALRA with square root normalization.
+- Added PyMDE dimensionality reduction methods
+- Added citations for Chen et al. (2009) "Local Multidimensional Scaling for Nonlinear Dimension Reduction, Graph Drawing, and Proximity Analysis", Kraemer et al. (2018) "dimRed and coRanking - Unifying Dimensionality Reduction in R", Lee et al. (2009) "Quality assessment of dimensionality reduction: Rank-based criteria", Lueks et al. (2011) "How to Evaluate Dimensionality Reduction? - Improving the Co-ranking Matrix", Szubert et al. (2019) "Structure-preserving visualisation of high dimensional single-cell datasets", and Venna et al. (2006) "Local multidimensional scaling".
+- Added `install_renv.R` script to install R packages using `renv` with retries
+- Added a new metric to evaluate the conservation of highly variable genes (HVGs) after batch integration.
+- Added support for lung data from Vieira Braga et al.
+- Added `magic_reverse_norm` and `magic_approx_reverse_norm` methods which reverse the order of normalization and transformation in the MAGIC algorithm.
+- Added a new workflow to comment on pull request status.
+
+## Major changes
+- Updated the `openproblems` repository to cite papers using bibtex references.
+- Renamed `alra` method to `alra_sqrt`.
+- Updated `spacexr` to latest version.
+- Added `fc_cutoff` and `fc_cutoff_reg` parameters to `rctd` method to control minimum log-fold-change for genes in the normalization and RCTD steps.
+- Renamed the "multimodal_data_integration" task to "matching_modalities".
+- Bumped version to 0.7.0.
+
+## Minor changes
+- Added BibTex references to all data loaders in `openproblems/data`.
+- Added BibTex references to all methods in `openproblems/tasks`.
+- Added BibTex references to all metrics in `openproblems/tasks`.
+- Updated `update_website_content.yml` to copy `main.bib` to the Open Problems website.
+- Added a BibTeX Tidy hook to `.pre-commit-config.yaml`.
+- Updated `scvi-tools` version to `~0.19` in both `openproblems-python-pytorch` and `openproblems-r-pytorch` dockerfiles.
+- Updated `cell2location` version to `47c8d6dc90dd3f1ab639861e8617c6ef0b62bb89` in the `openproblems-python-pytorch` dockerfile.
+- Updated `bslib` to version 0.4.2.
+- Updated `htmltools` to version 0.5.4. 
+- Updated the `alra_sqrt` method to use square root normalization.
+- Updated the `alra_log` method to use log normalization.
+- Updated the method names to reflect the normalization used.
+- Updated dependencies for `gtfparse` and `polars`. 
+- Added PyMDE dependency to requirements.txt
+- Updated the API to specify that datasets should provide log CPM-normalized counts in `adata.X
+
+
+# openproblems v0.6.1
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added `cell2location_detection_alpha_1` method, which uses `detection_alpha=1` and a hard-coded reference.
+- Added a new parameter `hard_coded_reference` to `cell2location_detection_alpha_1` method.
+- Added a new baseline method for dimensionality reduction using high-dimensional Laplacian Eigenmaps.
+- Added organism metadata to datasets.
+- Added a new image, `openproblems-python-bedtools`, to contain packages required for running `pybedtools` and `pyensembl` Python packages.
+- Added support for TensorFlow 2.9.0.
+- Added a new schema for storing results in JSON format.
+- Added a new function to parse Nextflow trace files to this JSON schema.
+- Added `rmse_spectral` metric, which calculates the root mean squared error (RMSE) between high-dimensional Laplacian eigenmaps on the full (or processed) data matrix and the dimensionally-reduced matrix.
+- Added new methods to LIANA: `magnitude_max`, `magnitude_sum`, `specificity_max`, and `specificity_sum`.
+- Added `aggregate_how` parameter to `liana` R function to allow aggregation by "magnitude" or "specificity".
+- Added `top_prop` parameter to `odds_ratio` metric to allow specifying the proportion of interactions to consider for calculating the odds ratio.
+
+## Major changes
+- Removed unused `openproblems-python-batch-integration` docker image.
+- Moved `scanorama`, `bbknn`, `scVI`, `mnnpy` and `scib` from `openproblems-python-batch-integration` to `openproblems-r-pytorch`.
+- Moved `cell2location`, `molecular-cross-validation`, `neuralee`, `tangram` and `phate` from `openproblems-python-extras` to `openproblems-python-pytorch`.
+- Moved `pybedtools`, `pyensembl` and `scalex` from `openproblems-python-extras` to `openproblems-python-pytorch`.
+- Moved `dca` and `keras` from `openproblems-python-tf2.4` to `openproblems-python-tensorflow`.
+- Added `openproblems-python-bedtools` docker image.
+- Added `openproblems-python-tensorflow` docker image.
+- Added `openproblems-python-pytorch` docker image.
+- Moved `harmony-pytorch` from `openproblems-r-extras` to `openproblems-r-pytorch`.
+- Added `openproblems-r-pytorch` docker image.
+- Updated `anndata2ri` version in `openproblems-r-base`.
+- Updated `kBET` version in `openproblems-r-extras`.
+- Updated `scib` version in `openproblems-r-extras`.
+- Updated `scvi-tools` version in `openproblems-r-pytorch`.
+- Updated `torch` version in `openproblems-r-pytorch`.
+- Moved the `codecov` action to run only on success
+- Updated the workflow to upload coverage reports to GitHub Actions as an artifact
+- Renamed the `run_benchmark` job to `setup_benchmark`.
+- Added a new `run_benchmark` job that runs after `setup_benchmark`.
+- Moved the benchmark running logic from the `run_benchmark` job to the new `run_benchmark` job.
+- Added a `setup-environment` step to `setup_benchmark` job.
+- Added outputs to the `setup_benchmark` job.
+- Renamed the `nbt2022-reproducibility` to `website-experimental`
+
+## Minor changes
+- Updated `numpy` and `scipy` dependencies in setup.py.
+- Updated `scikit-learn`, `louvain`, `python-igraph`, `decorator` and `colorama` dependencies in setup.py.
+- Improved Docker image caching.
+- Removed the `counts` layer from the `immune_cells`, `pancreas` datasets, and the `batch_integration_feature` task. 
+- Removed the `counts` layer from `generate_synthetic_dataset` functions in spatial decomposition datasets.
+- Updated the `normalize` functions to not modify the data in place.
+- Updated the `log_cpm_hvg` function to annotate HVGs instead of subsetting the data.
+- Updated the `_high_dim` function in the `nn_ranking` metric to subset to HVGs.
+- Updated the `dimensionality_reduction` task README to clarify the role of the `highly_variable` key.
+- Reduced the random noise added to the one-hot embedding in the `_random_embedding` function from (-0.1, 0.1) to (-0.01, 0.01). 
+- Removed `high_dim_pca` and `high_dim_spectral` methods.
+- Updated the `random_features` method to use the `check_version` function.
+- Moved raw output files from website to the NBT 2022 reproducibility repository.
+- Updated the `process_results.yml` workflow to include the NBT 2022 reproducibility repository.
+- Updated the `run_tests.yml` workflow to skip tests when pushing to specific branches.
+- Removed `# ci skip` from commit message in CI workflow. 
+- Removed redundant file deletion from `process_results.yml` workflow.
+- Added `update_website_content.yml` workflow to update benchmark content on the website repository.
+- Modified the `process_results.yml` workflow to update website content based on results.
+- Changed the `update_website_content.yml` workflow to trigger on both the `main` and `test_website` branches.
+- Updated workflow to push changes to the website only if there are changes to the website content. 
+- Added environment variable to track changes. 
+- Removed unused git command. 
+- Decreased number of samples for testing.
+- Updated `igraph` to 0.10.* in `setup.py`.
+- Updated `anndata2ri` to 1.1.* in `openproblems-r-base/README.md`.
+- Updated `kBET` to `a10ffea` in `openproblems-r-extras/r_requirements.txt`.
+- Updated `scib` to `f0be826` in `openproblems-r-extras/requirements.txt`.
+- Updated `harmony-pytorch` to 0.1.* in `openproblems-r-pytorch/requirements.txt`.
+- Updated `torch` to 1.13.* in `openproblems-r-pytorch/requirements.txt`.
+- Updated `scanorama` to 1.7.0 in `openproblems-r-pytorch/requirements.txt`.
+- Updated `scvi-tools` to 0.16.* in `openproblems-r-pytorch/requirements.txt`.
+- Updated the `regulatory_effect_prediction` task to use
+
+
+# openproblems v0.6.0
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added a new dataset: "Pancreas (inDrop)"
+- Added a new function: "pancreas"
+- Added a new utility function: "utils.split_data"
+- Added `tabula_muris_senis_lung_random` dataset. 
+- Added `celltype_random_embedding` baseline method for batch integration embedding.
+- Added `celltype_random_graph` baseline method for batch integration graph.
+- Added a new argument `sctransform_n_cells` to the seuratv3 function to allow users to specify the number of cells used to build the negative binomial regression in the SCTransform function.
+- Added a new sample dataset that is smaller and more efficient than the previous one.
+- Added a "mean score" metric to the results table.
+- Added support for loading the sample dataset in `load_sample_data`.
+- Added support for running benchmarks on pull requests.
+- Added a new workflow for creating a test matrix.
+- Added a new script to generate a test matrix for the `run_tester` workflow.
+- Added a new script for cleaning up runner diskspace.
+- Added support for uploading docker images to ECR.
+
+## Minor changes
+- Added `tabula_muris_senis` dataset to `openproblems/tasks/denoising/datasets/__init__.py`.
+- Updated `styler` to version 1.8.1.
+- Updated the method for normalizing scores to correctly account for baseline method scores.
+- Improved the way NaN and infinite values are handled in the ranking calculation.
+- Removed redundant code that was previously used to upload results and markdown artifacts to test.
+- Removed the raw output files from the website data directory.
+- Updated the list of reviewers for the pull request to include more relevant team members.
+- Changed the reference to "Code" to "Library" in the JSON output to better reflect the data presented.
+- Added a check to ensure that the task has a minimum number of non-baseline methods before processing results.
+- Removed the check to ensure that the task has a minimum number of methods before processing results.
+- Removed redundant code that was previously used to handle incomplete tasks.
+- Updated the workflow to use a consistent version of Python across all jobs.
+- Updated flake8 dependency to `https://github.com/pycqa/flake8`.
+- Improved random embedding for `celltype_random_embedding` and `celltype_random_graph`.
+- Removed `pip check` from Dockerfile.
+- Updated code to use a more consistent random number generator.
+- Updated liana code to inverse the distribution of the aggregate rank.
+- Improved the logic in `odds_ratio` to ensure that the numerator/denominator is not zero.
+- Removed unnecessary NXF_DEFAULT_DSL from `run_tester` workflow.
+- Increased the number of cells used to build the negative binomial regression in the SCTransform function from 3000 to 5000.
+- Adjusted the default values for `n_pca` and `sctransform_n_cells` in the seuratv3 function for test and non-test cases.
+- Updated the seuratv3_wrapper.R script to pass the `sctransform_n_cells` argument to the SCTransform function.
+- Moved the sample dataset from the `multimodal` folder to the `sample` folder.
+- Refactored the sample data generation to be more efficient.
+- Modified the `compute_ranking` function to calculate and add the "mean score" to the `dataset_results` dictionary.
+- Updated the `dataset_results_to_json` function to include the "mean score" in the results table.
+- Updated the pull request template to reflect recent changes and improvements in the workflow.
+- Updated the workflow to include a new `test_full_benchmark` branch.
+- Removed redundant code from the workflow.
+
+
+# openproblems v0.5.21
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added a new metric, AUPRC, for evaluating cell-cell communication predictions.
+- Added support for aggregating method scores using "max" and "sum" operations.
+- Implemented a new method, true events, which predicts all possible interactions.
+- Added a new method, random events, which randomly predicts interactions.
+- Implemented LIANA, CellPhoneDB, Connectome, Log2FC, NATMI, and SingleCellSignalR methods with the option to aggregate scores using "max" or "sum."
+- Added LIANA, CellPhoneDB, Connectome, Log2FC, NATMI, and SingleCellSignalR methods to the cell-cell communication ligand-target task.
+- Added LIANA, CellPhoneDB, Connectome, Log2FC, NATMI, and SingleCellSignalR methods to the cell-cell communication source-target task.
+
+## Bug fixes
+- Fixed a bug where the odds ratio metric was not handling cases where the numerator or denominator was zero.
+
+## Minor changes
+- Updated the IRkernel package version in the R base docker image to 1.3.1.
+- Updated the saezlab/liana package version in the R extras docker image to 0.1.7.
+- Updated the boto3 package version in the main docker image to 1.26.*.
+- Added a check to the cell-cell communication dataset validation to ensure that there are no duplicate entries in the target data.
+- Updated the documentation for the cell-cell communication ligand-target task.
+- Updated the documentation for the cell-cell communication source-target task.
+
+
+# openproblems v0.5.20
+
+Note: This changelog was automatically generated from the git log.
+
+## Bug fixes
+- Fixed an issue where a sparse matrix was not being converted to CSR format.
+- Fixed a bug in `docker_run.sh` where pip check was not being executed.
+
+## Minor changes
+- Updated `pkgload` to version 1.3.1.
+
+
+# openproblems v0.5.19
+
+Note: This changelog was automatically generated from the git log.
+
+## Minor changes
+- Converted sparse matrix to csr format. 
+
+
+# openproblems v0.5.18
+
+Note: This changelog was automatically generated from the git log.
+
+## Minor changes
+- Converted sparse matrices to CSR format. 
+
+
+# openproblems v0.5.17
+
+Note: This changelog was automatically generated from the git log.
+
+
+
+
+# openproblems v0.5.16
+
+Note: This changelog was automatically generated from the git log.
+
+## Bug fixes
+- Fixed a bug where the bioconductor version was incorrect.
+- Fixed a bug where the matrix in obs was incorrect.
+## Minor changes
+- Updated the scran package to version 1.24.1.
+- Updated the batchelor and scuttle packages.
+
+
+# openproblems v0.5.15
+
+Note: This changelog was automatically generated from the git log.
+
+
+
+
+# openproblems v0.5.14
+
+Note: This changelog was automatically generated from the git log.
+
+## Major changes
+- Updated workflow to run tests against `prod` branch. 
+
+
+# openproblems v0.5.13
+
+Note: This changelog was automatically generated from the git log.
+
+## Bug fixes
+- Skip benchmark if tester fails.
+
+
+# openproblems v0.5.12
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Explicitly push prod images on tag
+
+## Documentation
+- Added short metric descriptions to README
+
+## Minor changes
+- Added labels tests
+
+
+# openproblems v0.5.11
+
+Note: This changelog was automatically generated from the git log.
+
+## Bug fixes
+- Reverted bump of louvain to 0.8, which caused issues. 
+
+## Minor changes
+- Updated torch requirement to 1.13 in the openproblems-r-pytorch docker.
+
+
+# openproblems v0.5.10
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added support for SCALEX version 1.0.2.
+
+## Minor changes
+- Updated RcppAnnoy to version 0.0.20.
+- Updated SageMaker requirement to version 2.116.*.
+
+## Bug fixes
+- Fixed a bug in the `docker_hash` function, which now returns a string instead of an integer.
+- Fixed a bug in the `scalex` method, which now correctly handles the `outdir` parameter.
+
+
+# openproblems v0.5.9
+
+Note: This changelog was automatically generated from the git log.
+
+## Minor changes
+- Update rpy2 requirement from <3.5.5 to <3.5.6
+- Update ragg to 1.2.4
+## Bug fixes
+- Don't fail job if hash fails
+
+
+# openproblems v0.5.8
+
+Note: This changelog was automatically generated from the git log.
+
+## Minor changes
+- Updated scIB to 77ab015. 
+
+
+# openproblems v0.5.7
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added a new batch integration subtask for corrected feature matrices.
+- Added a new sub-task for batch integration, "batch integration embed", which includes all methods that output a joint embedding of cells across batches.
+- Added a new sub-task for batch integration, "batch integration graph", which includes all methods that output a cell-cell similarity graph (e.g., a kNN graph).
+
+# openproblems v0.5.6
+
+Note: This changelog was automatically generated from the git log.
+
+## Bug fixes
+- Fixed an issue where the `::` in branch names would cause problems.
+- Fixed an issue where the `check_r_dependencies.yml` workflow was not properly handling branch names with `::`.
+## Minor changes
+- Updated the `caret` package to version 6.0-93.
+- Updated the README to include information about the Open Problems team and task leaders.
+- Replaced the `NuSVR` method with a faster alternative, improving performance.
+## New functionality
+- Added a new method for running Seuratv3 from a fork, allowing for more efficient use of resources.
+- Added a new requirement to the `r_requirements.txt` file for the `bslib` package.
+- Added a new requirement to the `r_requirements.txt` file for the `caret` package.
+## Documentation
+- Added a new section to the README to document the process of running Seuratv3 from a fork.
+- Updated the README to include a list of all contributors to the Open Problems project.
+
+
+# openproblems v0.5.5
+
+Note: This changelog was automatically generated from the git log.
+
+## Bug fixes
+- Fix sampling and reindexing
+- Fix docker unavailable error to include image name
+
+## New functionality
+- Require minimum celltype count for `spatial_decomposition`
+
+## Minor changes
+- Update Rcpp to 1.0.9
+- Update to nf-openproblems v1.7 
+
+
+# openproblems v0.5.4
+
+Note: This changelog was automatically generated from the git log.
+
+## Bug fixes
+- Fixed an issue where some cell types were missing from the output.
+
+
+# openproblems v0.5.3
+
+Note: This changelog was automatically generated from the git log.
+
+## Bug fixes
+- Fixed a bug in the rctd method where cell types with fewer than 25 cells were not being used. 
+
+
+# openproblems v0.5.2
+
+Note: This changelog was automatically generated from the git log.
+
+## Bug fixes
+- Handle missing function error by catching FileNotFoundError and NoSuchFunctionError instead of just RuntimeError. 
+
+
+# openproblems v0.5.1
+
+Note: This changelog was automatically generated from the git log.
+
+## Major changes
+- Updated `scipy` requirement from `==1.8.*` to `>=1.8,<1.10`.
+- Updated `igraph` to version `1.3.4`.
+
+## Minor changes
+- Changed the mnnpy dependency to use a patch version instead of a specific commit hash.
+
+## Bug fixes
+- Changed `docker_hash` to use the Docker API if `docker` is not available.
+- Use `curl` to retrieve the Docker hash if `docker` fails.
+- Fixed an issue with using `git+https` for `mnnpy`.
+
+
+# openproblems v0.5.0
+
+Note: This changelog was automatically generated from the git log.
+
+## Minor changes
+
+- Updated several R package dependencies.
+- Updated several Python package dependencies.
+- Added several new methods for spatial decomposition: RCTD, DestVI, Stereoscope.
+- Added a new dataset for dimensionality reduction: Mouse hematopoietic stem cell differentiation.
+- Improved documentation for tasks and datasets.
+
+## Bug fixes
+
+- Fixed a bug where the `lintr` package was not being installed correctly.
+- Fixed a bug where the `BRANCH_PARSED` variable was not being properly sanitized in the `run_tests.yml` workflow.
+- Fixed a bug in `_scanvi` and `_scvi` functions where the `max_epochs` parameter was not being passed to the `scanvi` and `scvi` functions.
+- Fixed a bug in `install_renv.R` causing incorrect installation of packages from R repositories.
+- Fixed an issue where the dependency upgrade script would fail to capture the output of the upgrade process.
+- Fixed an issue where the dependency upgrade script would not correctly write updates to the requirements file.
+- Fixed an issue where the `git_hash` function was not being called for external modules.
+- Fixed a bug in `openproblems/tasks/denoising/methods/__init__.py` that prevented DCA from being used.
+- Fixed a bug in `neuralee_default` where it could fail due to sparseness of data.
+- Fixed a bug in `scanvi_all_genes` where the code version was not being set correctly.
+- Fixed a bug in `scanvi_hvg` where the code version was not being set correctly.
+- Fixed a bug in `scarches_scanvi_all_genes` where the code version was not being set correctly.
+- Fixed a bug in `scarches_scanvi_hvg` where the code version was not being set correctly.
+
+## New functionality
+
+- Added a new denoising method called "DCA" based on a deep count autoencoder.
+- Added `xgboost_log_cpm` and `xgboost_scran` methods to `openproblems.tasks.label_projection`.
+- Added a new command-line interface for testing datasets, methods, and metrics.
+- Added a new `install_renv.R` script to simplify the installation of the `renv` package.
+- Added automated CI check to find and suggest available updates to R packages in docker images.
+- Added a hash to the docker image that is based on the age of the code.
+- Added data_reference to dataset metadata.
+- Added `docker_hash` function to retrieve the docker image hash associated with an image.
+- Added support for retrieving the docker image hash for R functions that have a defined `__r_file__`.
+
+## Documentation
+
+- Updated contributing guide to reflect the `main` branch as the default branch.
+- Updated issue templates to reflect the `main` branch as the default branch.
+- Updated pull request template to reflect the `main` branch as the default branch.
+
+
+# openproblems v0.4.4
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added a new docker image `openproblems-r-pytorch` for running Harmony in Python
+
+## Major changes
+- Moved `harmony` to Python-based `harmony-pytorch`
+
+## Bug fixes
+- Fixed an issue where `adata.var` was not being correctly handled in `_utils.py`
+- Updated the documentation for the `openproblems-r-extras` docker image
+
+
+# openproblems v0.4.3
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added PHATE with sqrt potential
+
+## Bug fixes
+- Fixed path to R_HOME
+- Fixed Dockerfile to use R 4.2
+- Minor CI fixes
+
+
+# openproblems v0.4.2
+
+Note: This changelog was automatically generated from the git log.
+
+## Minor changes
+- Run scran pooling in series, not in parallel. 
+
+
+# openproblems v0.4.1
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added `FastMNN`, `Harmony`, and `Liger` methods for batch integration.
+- Added `bbknn_full_unscaled` method.
+- Added Dependabot configuration for pip and GitHub Actions dependencies.
+
+## Minor changes
+- Updated dependencies: `scib`, `bbknn`, `scanorama`, `annoy`, and `mnnpy`.
+- Improved the performance of several methods by pre-processing the data before running them.
+
+## Bug fixes
+- Fixed bugs in `fastMNN`, `harmony`, `liger`, `scanorama`, `scanvi`, `scvi`, `mnn`, and `combat` that caused incorrect embedding. 
+
+
+# openproblems v0.4.0
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added a new file `workflow/generate_website_markdown.py` to generate website markdown files for all tasks and datasets.
+- Updated Nextflow version to v1.5.
+- Updated Nextflow version to v1.6.
+
+## Major changes
+- Added code version to the output of each method.
+- Updated `nextflow` version to `v1.3`.
+- Updated `nextflow` version to `v1.4`.
+- Updated docker version to 20.10.15.
+- Removed Docker setup from CI workflow.
+- Updated Python version to 3.8.13.
+
+## Minor changes
+- Updated dependencies for the Docker images.
+- Updated pre-commit hooks to include `requirements-txt-fixer`.
+- Updated Nextflow workflow to version 1.4.
+- Updated the location of method versions in the results directory.
+- Updated the Tower action ID.
+
+## Bug fixes
+- Fixed a bug where Docker images were not properly pushed to Docker Hub.
+- Updated `requirements.txt` files to fix dependency conflicts.
+- Removed unnecessary dependencies from CI workflows to reduce disk space usage on GitHub runners.
+
+
+# openproblems v0.3.5
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added new integration methods: BBKNN, Combat, FastMNN feature, FastMNN embed, Harmony, Liger, MNN, Scanorama feature, Scanorama embed, Scanvi, Scvi
+- Added new metrics: graph_connectivity, iso_label_f1, nmi
+- Added _utils.py with functions: hvg_batch, scale_batch
+- Added `run_bbknn` function.
+- Added a test for the trustworthiness metric, which now passes for sparse matrices.
+- Added a test for the density preservation metric, which now passes against densmap for a reasonable degree of similarity.
+- Added tests for all methods and metrics.
+- Added a new workflow to automatically delete untagged images from the OpenProblems ECR repository.
+- Added a new workflow to process results and create a PR to update the OpenProblems benchmark.
+- Added support for running tests with the `process` extra in `setup.py`.
+- Added `densmap` dimensionality reduction method.
+- Added `neuralee` dimensionality reduction method.
+- Added `alra` denoising method.
+- Added `scarches_scanvi` label projection method.
+- Added `bbknn` batch integration graph method.
+- Added `beta` regulatory effect prediction method.
+- Added a new `invite-contributors.yml` file to the repository.
+
+## Major changes
+- The `test_methods.py` file has been simplified by removing unused arguments.
+- The `test_metrics.py` file has been simplified by removing unused arguments.
+- The `test_utils/docker.py` file has been modified to allow specifying the docker image as a decorator argument.
+- Updated Nextflow version to 22.04.0.
+- Modified the processing of Nextflow results to save them in a temporary directory.
+- Modified `workflow/parse_nextflow.py` to parse results from Nextflow runs.
+- Modified `.github/workflows/run_tests.yml` to cancel previous runs when a new commit is pushed.
+
+## Minor changes
+- Removed `.nextflow`, `scratch/`, `openproblems/results/` and `openproblems/work/` from `.gitignore`.
+- Updated `CONTRIBUTING.md`
+- Methods should not edit `adata.obsm["train"]` or `adata.obsm["test"]`.
+- Redirects stdout to stderr when running subcommands to ensure that output is printed correctly.
+- Updated CI workflow to skip running tests on push if they failed on the `run_tester` job, unless the branch name starts with `test_benchmark`.
+- Refactored the Neuralee method to use a separate function for embedding.
+- Improved performance by using a default value for `maxit` in `fine_tune_kwargs`.
+- Removed unnecessary code for storing raw counts in the `neuralee_default` method.
+
+# openproblems v0.3.4
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added CeNGEN, Tabula Muris Senis, and Pancreas datasets to the label_projection task.
+- Added scANVI and scArches+scANVI methods to the label_projection task.
+- Added majority_vote and random_labels baseline methods to the label_projection task.
+- Added new methods: densMAP, NeuralEE, scvis
+- Added new metrics: NN Ranking (continuity, co-KNN size, co-KNN AUC, Local continuity meta criterion, Local property metric, Global property metric)
+- Added pre-processing function: log_cpm_hvg()
+- Added support for custom pre-processing functions
+- Added support for variants of methods
+- Added a new batch integration task.
+- Added a batch integration graph subtask.
+- Added a batch integration embedding subtask.
+- Added a batch integration corrected feature matrix subtask.
+- Added ivis method for dimensionality reduction to openproblems.
+- Added self-hosted runner support for `run_benchmark` workflow using Cirun.io
+- Added a `--test` flag to the `run` subcommand, allowing for running a test version of a method.
+- Added `test_load_dataset` to `test/test__load_data.py` to test loading and caching of datasets.
+- Added `test_method` to `test/test_methods.py` to test application of methods.
+- Added `test_trustworthiness_sparse` to `test/test_metrics.py` to test trustworthiness metric on sparse data.
+- Added `test_density_preservation_matches_densmap` to `test/test_metrics.py` to test density preservation metric against densmap.
+- Updated `test/utils/docker.py` to allow specifying the docker image as the last argument.
+- Added `--test` flag to `run` subcommand to run the test version of a method.
+- Added Docker image building to `run_tests.yml`.
+- Added a new workflow to process Nextflow results
+- Added a new workflow to run tests and benchmarks
+- Added support for running benchmarks from tags
+- Added support for running benchmarks from forks
+- Added `openproblems-cli` command to run test-hash
+
+
+# openproblems v0.3.3
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added support for balanced SCOT alignment.
+
+## Minor changes
+- Updated the workflow to store benchmark results in `/tmp`.
+
+## Bug fixes
+- Fixed the parsing and committing of benchmark results on tag.
+- Fixed the Github Actions badge link.
+- Fixed the coverage badge.
+- Fixed the benchmark commit.
+- Ignored AWS warning and cleaned up S3 properly.
+- Updated the workflow to continue on error for forks.
+
+
+# openproblems v0.3.2
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added trustworthiness metric to the dimensionality reduction task.
+- Added density preservation metric.
+- Added several metrics based on nearest neighbor ranking: continuity, co-KNN size, co-KNN AUC, local continuity meta criterion, local property, global property.
+- Added mouse blood data from Olsson et al. (2016) Nature to the `openproblems` dataset collection. 
+- Added a test mode to the `load_olsson_2016_mouse_blood` function. 
+- Added a dataset function for the `mouse_blood_olssen_labelled` dataset in the `openproblems.tasks.dimensionality_reduction.datasets` module.
+- Added ALRA denoising method.
+- Added support for the Single Cell Optimal Transport (SCOT) method for multimodal data integration.
+- SCOT implements Gromov-Wasserstein optimal transport to align single-cell multi-omics data.
+- Added four variations of SCOT: 
++    - sqrt CPM unbalanced
++    - sqrt CPM balanced
++    - log scran unbalanced
++    - log scran balanced
+- Each variation implements different normalization strategies for the input data.
+- Added `scot` method to `openproblems.tasks.multimodal_data_integration.methods`. 
+- Added pre-processing to the `dimensionality_reduction` task.
+- Added pre-processing to all `dimensionality_reduction` methods.
+- Added Wagner_2018_zebrafish_embryo_CRISPR dataset loader
+- Added PR review checklist to the pull request template. 
+- Added `cmake==3.18.4` to the `docker/openproblems-python-extras/requirements.txt` file. 
+- Added `--version` flag to print the version.
+- Added `--test-hash` flag to print the current hash.
+- Added basic help message.
+- Added `install_renv.R` script for installing R packages in Docker images.
+- Added `docker/.version` file to track Docker image version.
+- Added a new docker image for running GitHub Actions.
+- Added a new utils.git module to determine which tasks have changed relative to base/main.
+- Added support for running benchmark tests on tags.
+- Added a test directory for use in the workflow.
+
+
+# openproblems v0.3.1
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added chromatin potential task
+- Added PHATE to the dimensional_reduction task.
+- Added support for testing docker builds on a separate branch.
+- Added support for building images and pushing them to docker hub.
+- Added support for writing methods in R using `scprep`'s `RFunction` class.
+- Added a CLI interface to `openproblems`.
+- Added `f1_micro` metric.
+- Added `mlp_log_cpm` and `mlp_scran` methods for label projection.
+- Added `pancreas_batch` and `pancreas_random` datasets for label projection.
+- Added `f1` metric for label projection.
+- Added metadata to methods and metrics.
+- Added `openproblems.tools.decorators` for decorating methods and metrics.
+- Added `openproblems.tools.normalize` for common normalization functions.
+- Added methods for `logistic_regression`, `mlp`, `harmonic_alignment`, `mnn`, and `procrustes`.
+- Added metrics for `accuracy`, `f1`, `knn_auc`, and `mse`.
+- Added `openproblems.version` to provide package version.
+- Added `dataset` decorator for registering datasets.
+- Added `tools.decorators.profile` decorator to measure memory usage and runtime of methods.
+- Added `tools.normalize` module to provide normalization functions.
+- Added `tools.decorators.normalizer` decorator to normalize data prior to applying methods.
+- Added a new "data loader" component that loads data in a way that's formatted correctly for a given task.
+- Added CITE-seq Cord Blood Mononuclear Cells dataset.
+- Added snakemake support for automatic evaluation.
+- Added zebrafish data to label projection task.
+- Added a new task, "Link gene expression with chromatin accessibility"
+- Added a new dataset, "sciCAR Mouse Kidney with cell clusters"
+- Added a new method, "BETA"
+- Added a new metric, "Correlation between RNA and ATAC"
+- Added a new task, "Dimensional reduction"
+- Added human blood dataset from Nestorowa et al. Blood. 2016
+- Added 10x PBMC dataset
+- Added `load_10x_5k_pbmc` function to load the 10x 5k PBMC dataset.
+
+
+# openproblems v0.2.1
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added MLP method for label projection task.
+- Added pancreas data loading to label projection task.
+
+## Minor changes
+- Updated black.
+- Updated test version of pancreas_batch to have test data.
+- Added random pancreas train data.
+
+## Bug fixes
+- Fixed zebrafish code duplication.
+- Fixed pancreas import location.
+- Fixed bug in zebrafish data.
+- Fixed bug in pancreas import.
+- Removed normalization from loader.
+- Removed dummy and cheat metrics/datasets.
+- Removed excess covariates from pancreas dataset.
+
+
+# openproblems v0.2
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added zebrafish label projection task
+
+## Major changes
+- Moved scIB, rpy2, harmonicalignment, and mnnpy to optional dependencies
+
+## Minor changes
+- Improved n_components fix
+- Moved URL into function for neater namespace
+
+## Bug fixes
+- Fixed n_svd for truncatedSVD
+- Fixed data loader
+- Fixed n_pca problem
+- Scaled without mean if sparse
+- Scaled data for regression
+- Added check to ensure that data has nonzero size
+
+
+# openproblems v0.1
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added a results page to the website.
+- Added a new zebrafish dataset to the openproblems library.
+- Added netlify.toml to deploy website.
+
+## Documentation
+- Updated documentation to reflect new features and datasets.
+
+## Major changes
+- Bumped version to 0.1.
+
+## Minor changes
+- Improved the website's home menu link.
+- Improved website links.
+- Updated website's hero and social links.
+- Updated website's task cards.
+- Updated the website's demo.
+- Improved website's frontmatter.
+- Separated frontmatter from content in website's Markdown files.
+- Fixed black syntax.
+- Excluded website from black.
+- Updated website content to display results.
+- Updated the Travis CI configuration to exclude website from black.
+
+## Bug fixes
+- Fixed zebrafish data loader.
+
+
+# openproblems v0.0.3
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+- Added harmonic alignment method.
+- Added scicar datasets.
+- Added logistic regression methods.
+- Added ability to normalize obsm.
+- Added test suite.
+- Added normalization tools.
+
+## Documentation
+- Updated documentation to reflect normalization changes.
+
+## Major changes
+- Migrated normalizations to openproblems.tools.normalize.
+- Updated dataset specification to require normalization in methods.
+- Removed zebrafish dataset.
+- Moved dataset test spec.
+- Removed "mode2_raw" and "raw" from datasets.
+- Added test dataset spec.
+- Consolidated scicar datasets.
+- Migrated references to github repo.
+
+## Minor changes
+- Improved sparse array equality test.
+- Improved sparse inequality check.
+- Increased test data size.
+- Normalized mode2.
+- Fixed decorator.
+- Used uns.
+- Used functools.wraps.
+- Updated name of log_scran_pooling function.
+- Fixed storing normalization results.
+- Fixed zebrafish load caching.
+- Fixed zebrafish test.
+- Added normalization functions.
+- Updated logistic regression function to work with anndata properly.
+- Fixed cheat method.
+- Fixed git upload.
+- Fixed Travis CI.
+- Fixed harmonic alignment import.
+- Increased test coverage.
+
+## Bug fixes
+- Bugfix harmonic_alignment, closes #4.
+- Bugfix harmonic alignment import.
+- Normalized data inside methods, closes #19.
+- Fix storing normalization results.
+- Fixed zebrafish test.
+- Fix zebrafish load caching.
+- Fix decorator.
+- Fix cheat method.
+- Don't check for raw data -- we are no longer normalizing.
+
+
+# openproblems v0.0.2
+
+Note: This changelog was automatically generated from the git log.
+
+## New functionality
+
+- Added dummy dataset to `openproblems/data`
+- Added `load_dummy` function to `openproblems/data`
+- Added `loader` decorator to `openproblems/data`
+- Added loading functions for sciCAR datasets to `openproblems/data/scicar`
+- Added `scicar_cell_lines` dataset to `openproblems/tasks/multimodal_data_integration/datasets`
+- Added `scicar_mouse_kidney` dataset to `openproblems/tasks/multimodal_data_integration/datasets`
+- Added `dummy` dataset to `openproblems/tasks/label_projection/datasets`
+
+## Major changes
+
+- Changed data structure for multimodal data integration tasks in `openproblems/tasks/multimodal_data_integration`
+- Bumped version to 0.0.2 in `openproblems/version.py`
+- Modified the way to run `evaluate.sh` in `.travis.yml`
+- Added `chmod +x evaluate.sh` to `.travis.yml`
+
+## Documentation
+
+- Added documentation for adding a dataset to a task in `README.md`
+- Added documentation for dataset loading in `README.md`
+- Added documentation for adding a new dataset in `README.md`
+- Updated documentation in `openproblems/tasks/multimodal_data_integration/README.md`
+- Updated documentation in `openproblems/version.py`
+
+# openproblems v0.0.1
+
+First release of OpenProblems.
+
+methods, 1 metric)
+* Multimodal data integration (2 datasets, 2 methods, 2 metrics)
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
new file mode 100644
index 0000000000..45d257b29a
--- /dev/null
+++ b/CODE_OF_CONDUCT.md
@@ -0,0 +1,133 @@
+
+# Contributor Covenant Code of Conduct
+
+## Our Pledge
+
+We as members, contributors, and leaders pledge to make participation in our
+community a harassment-free experience for everyone, regardless of age, body
+size, visible or invisible disability, ethnicity, sex characteristics, gender
+identity and expression, level of experience, education, socio-economic status,
+nationality, personal appearance, race, caste, color, religion, or sexual
+identity and orientation.
+
+We pledge to act and interact in ways that contribute to an open, welcoming,
+diverse, inclusive, and healthy community.
+
+## Our Standards
+
+Examples of behavior that contributes to a positive environment for our
+community include:
+
+* Demonstrating empathy and kindness toward other people
+* Being respectful of differing opinions, viewpoints, and experiences
+* Giving and gracefully accepting constructive feedback
+* Accepting responsibility and apologizing to those affected by our mistakes,
+  and learning from the experience
+* Focusing on what is best not just for us as individuals, but for the overall
+  community
+
+Examples of unacceptable behavior include:
+
+* The use of sexualized language or imagery, and sexual attention or advances of
+  any kind
+* Trolling, insulting or derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or email address,
+  without their explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+
+## Enforcement Responsibilities
+
+Community leaders are responsible for clarifying and enforcing our standards of
+acceptable behavior and will take appropriate and fair corrective action in
+response to any behavior that they deem inappropriate, threatening, offensive,
+or harmful.
+
+Community leaders have the right and responsibility to remove, edit, or reject
+comments, commits, code, wiki edits, issues, and other contributions that are
+not aligned to this Code of Conduct, and will communicate reasons for moderation
+decisions when appropriate.
+
+## Scope
+
+This Code of Conduct applies within all community spaces, and also applies when
+an individual is officially representing the community in public spaces.
+Examples of representing our community include using an official e-mail address,
+posting via an official social media account, or acting as an appointed
+representative at an online or offline event.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported to the community leaders responsible for enforcement at
+[INSERT CONTACT METHOD].
+All complaints will be reviewed and investigated promptly and fairly.
+
+All community leaders are obligated to respect the privacy and security of the
+reporter of any incident.
+
+## Enforcement Guidelines
+
+Community leaders will follow these Community Impact Guidelines in determining
+the consequences for any action they deem in violation of this Code of Conduct:
+
+### 1. Correction
+
+**Community Impact**: Use of inappropriate language or other behavior deemed
+unprofessional or unwelcome in the community.
+
+**Consequence**: A private, written warning from community leaders, providing
+clarity around the nature of the violation and an explanation of why the
+behavior was inappropriate. A public apology may be requested.
+
+### 2. Warning
+
+**Community Impact**: A violation through a single incident or series of
+actions.
+
+**Consequence**: A warning with consequences for continued behavior. No
+interaction with the people involved, including unsolicited interaction with
+those enforcing the Code of Conduct, for a specified period of time. This
+includes avoiding interactions in community spaces as well as external channels
+like social media. Violating these terms may lead to a temporary or permanent
+ban.
+
+### 3. Temporary Ban
+
+**Community Impact**: A serious violation of community standards, including
+sustained inappropriate behavior.
+
+**Consequence**: A temporary ban from any sort of interaction or public
+communication with the community for a specified period of time. No public or
+private interaction with the people involved, including unsolicited interaction
+with those enforcing the Code of Conduct, is allowed during this period.
+Violating these terms may lead to a permanent ban.
+
+### 4. Permanent Ban
+
+**Community Impact**: Demonstrating a pattern of violation of community
+standards, including sustained inappropriate behavior, harassment of an
+individual, or aggression toward or disparagement of classes of individuals.
+
+**Consequence**: A permanent ban from any sort of public interaction within the
+community.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage],
+version 2.1, available at
+[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
+
+Community Impact Guidelines were inspired by
+[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
+
+For answers to common questions about this code of conduct, see the FAQ at
+[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
+[https://www.contributor-covenant.org/translations][translations].
+
+[homepage]: https://www.contributor-covenant.org
+[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
+[Mozilla CoC]: https://github.com/mozilla/diversity
+[FAQ]: https://www.contributor-covenant.org/faq
+[translations]: https://www.contributor-covenant.org/translations
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 0000000000..a141e7571d
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,24 @@
+# Contributing to OpenProblems
+
+[OpenProblems](https://openproblems.bio) is a community effort, and
+everyone is welcome to contribute. This project is hosted on
+[github.com/openproblems-bio/openproblems](https://github.com/openproblems-bio/openproblems).
+
+You can find a full in depth guide on how to contribute to this project
+on the [OpenProblems website](https://openproblems.bio/documentation/).
+
+## Code of conduct
+
+We as members, contributors, and leaders pledge to make participation in
+our community a harassment-free experience for everyone, regardless of
+age, body size, visible or invisible disability, ethnicity, sex
+characteristics, gender identity and expression, level of experience,
+education, socio-economic status, nationality, personal appearance,
+race, caste, color, religion, or sexual identity and orientation.
+
+We pledge to act and interact in ways that contribute to an open,
+welcoming, diverse, inclusive, and healthy community.
+
+Our full [Code of Conduct](CODE_OF_CONDUCT.md) is adapted from the
+[Contributor Covenant](https://www.contributor-covenant.org), version
+2.1.
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000000..c7a5f287cb
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2020 OpenProblems
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
new file mode 100644
index 0000000000..daff800cdd
--- /dev/null
+++ b/README.md
@@ -0,0 +1,14 @@
+[![](https://openproblems.bio/images/heros/home_hero_text.png)](https://openproblems.bio)
+
+------
+
+Open Problems is a living, extensible, community-guided benchmarking platform.
+
+Useful links:
+
+* [Introduction to Open Problems](https://openproblems.bio)
+* [Our benchmarks](https://openproblems.bio/results)
+* [Our datasets](https://openproblems.bio/datasets)
+* [Our team and community](https://openproblems.bio/team)
+* [Planned and past events](https://openproblems.bio/events)
+* [How to contribute](https://openproblems.bio/documentation)
diff --git a/_viash.yaml b/_viash.yaml
new file mode 100644
index 0000000000..de6a7af122
--- /dev/null
+++ b/_viash.yaml
@@ -0,0 +1,14 @@
+viash_version: 0.8.6
+
+source: src
+target: target
+
+config_mods: |
+  .functionality.version := 'dev'
+  .platforms[.type == 'docker'].target_registry := 'ghcr.io'
+  .platforms[.type == 'docker'].target_organization := 'openproblems-bio/openproblems'
+  .platforms[.type == 'docker'].target_image_source := 'https://github.com/openproblems-bio/openproblems'
+  .platforms[.type == "nextflow"].directives.tag := "$id"
+  .platforms[.type == "nextflow"].auto.simplifyOutput := false
+  .platforms[.type == "nextflow"].config.labels := { lowmem : "memory = 20.Gb", midmem : "memory = 50.Gb", highmem : "memory = 100.Gb", lowcpu : "cpus = 5", midcpu : "cpus = 15", highcpu : "cpus = 30", lowtime : "time = 1.h", midtime : "time = 4.h", hightime : "time = 8.h", veryhightime : "time = 24.h" }
+  .platforms[.type == "nextflow"].config.script := "process.errorStrategy = 'ignore'"
\ No newline at end of file
diff --git a/main.nf b/main.nf
new file mode 100644
index 0000000000..fd40518830
--- /dev/null
+++ b/main.nf
@@ -0,0 +1,3 @@
+workflow {
+    print("This is a dummy placeholder for pipeline execution. Please use the corresponding nf files for running pipelines.")
+}
diff --git a/nextflow.config b/nextflow.config
new file mode 100644
index 0000000000..6402ebf273
--- /dev/null
+++ b/nextflow.config
@@ -0,0 +1 @@
+process.container = 'nextflow/bash:latest'
diff --git a/scripts/sync_resources.sh b/scripts/sync_resources.sh
new file mode 100755
index 0000000000..76e88e4a04
--- /dev/null
+++ b/scripts/sync_resources.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+
+set -e
+
+viash run src/common/sync_test_resources/config.vsh.yaml
diff --git a/src/common/check_dataset_schema/config.vsh.yaml b/src/common/check_dataset_schema/config.vsh.yaml
new file mode 100644
index 0000000000..08449c3e7d
--- /dev/null
+++ b/src/common/check_dataset_schema/config.vsh.yaml
@@ -0,0 +1,45 @@
+functionality:
+  name: check_dataset_schema
+  namespace: common
+  description: Checks if the dataset has the necessary slots that are predefined in a schema.
+  argument_groups:
+    - name:  Inputs
+      arguments:
+        - name: --input
+          type: file
+          required: true
+          description: A h5ad file.
+        - name: --schema
+          type: file
+          required: true
+          description: A schema file for the h5ad object.
+    - name: Arguments
+      arguments:
+        - name: --stop_on_error
+          type: boolean
+          default: false
+          description: Whether or not to stop with exit code 1 if the input file does not adhere to the schema.
+    - name: Output
+      arguments:
+        - name: --output
+          type: file
+          required: true
+          description: If specified, this file will contain a structured log of which checks succeeded (or not).
+          example: checks.json
+          direction: output
+  resources:
+    - type: python_script
+      path: script.py
+  test_resources:
+    - path: /resources_test/common/pancreas
+    - type: python_script
+      path: test.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    test_setup:
+      - type: python
+        packages: viashpy
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/common/check_dataset_schema/script.py b/src/common/check_dataset_schema/script.py
new file mode 100644
index 0000000000..cd84f9cdcf
--- /dev/null
+++ b/src/common/check_dataset_schema/script.py
@@ -0,0 +1,60 @@
+import anndata as ad
+import yaml
+import json
+
+## VIASH START
+par = {
+  'input': 'work/d4/f4fabc8aa4f2308841d4ab57bcff62/_viash_par/input_1/dataset.h5ad',
+  'schema': 'work/d4/f4fabc8aa4f2308841d4ab57bcff62/_viash_par/schema_1/schema.yaml',
+  'stop_on_error': False,
+  'output': 'work/d4/f4fabc8aa4f2308841d4ab57bcff62/out.yaml',
+}
+## VIASH END
+
+def check_structure(slot, slot_info, adata_slot):
+  missing = []
+  if slot == "X":
+    slot_info["name"] = "X"
+    slot_info = [slot_info]
+  for obj in slot_info:
+    adata_data = adata_slot.get(obj['name']) if slot != 'X' else adata_slot
+    if obj.get('required') and adata_data is None:
+      missing.append(obj['name'])
+    # todo: check types
+  return missing
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input'])
+
+# create data structure
+out = {
+  "exit_code": 0,
+  "error": {},
+  "data_schema": "ok"
+}
+
+print("Check AnnData against schema", flush=True)
+with open(par["schema"], "r") as f:
+  data_struct = yaml.safe_load(f)
+
+def_slots = data_struct['info']['slots']
+
+out = {
+  "exit_code": 0,
+  "error": {},
+  "data_schema": "ok"
+}
+for slot in def_slots:
+  print("Checking slot", slot, flush=True)
+  missing = check_structure(slot, def_slots[slot], getattr(adata, slot))
+  if missing:
+    print(f"Dataset is missing {slot} {missing}", flush=True)
+    out['exit_code'] = 1
+    out['data_schema'] = 'not ok'
+    out['error'][slot] = missing
+
+with open(par["output"], "w") as f:
+  json.dump(out, f, indent=2)
+
+if par['stop_on_error']:
+  exit(out['exit_code'])  
diff --git a/src/common/check_dataset_schema/test.py b/src/common/check_dataset_schema/test.py
new file mode 100644
index 0000000000..1e7b5eb1e9
--- /dev/null
+++ b/src/common/check_dataset_schema/test.py
@@ -0,0 +1,98 @@
+import sys
+import re
+import pytest
+import json
+import subprocess
+
+## VIASH START
+## VIASH END
+
+input_path = meta["resources_dir"] + "/pancreas/dataset.h5ad"
+
+@pytest.fixture
+def schema(tmp_path):
+  schema = tmp_path / "schema.yaml"
+  schema.write_text("""
+type: file
+description: "A preprocessed dataset"
+example: "preprocessed.h5ad"
+info:
+  label: "Preprocessed dataset"
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+""")
+  return schema
+
+@pytest.fixture
+def error_schema(tmp_path):
+  schema = tmp_path / "schema.yaml"
+  schema.write_text("""
+type: file
+description: "A preprocessed dataset"
+example: "preprocessed.h5ad"
+info:
+  label: "Preprocessed dataset"
+  slots:
+    X:
+      type: double
+      description: Normalized expression values
+      required: true
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: error_test
+        description: "A made up uns variable to test if error is picked up"
+        required: true
+  """)
+  return schema
+
+def test_run(run_component, tmp_path, schema):
+  output_path = tmp_path / "checks.json"
+
+  run_component([
+    "--input", input_path,
+    "--schema", str(schema),
+    "--output", str(output_path)
+  ])
+
+  assert output_path.exists(), "Output path does not exist"
+
+def test_error(run_component, tmp_path, error_schema):
+  output_checks = tmp_path / "checks.json"
+
+  with pytest.raises(subprocess.CalledProcessError) as err:
+    run_component([
+      "--input", input_path,
+      "--schema", str(error_schema),
+      "--stop_on_error", "true",
+      "--output", str(output_checks)
+    ])
+    assert err.value.exitcode > 0
+
+  assert output_checks.exists(), "Output checks file does not exist"
+
+  with open(output_checks, "r") as f:
+      out = json.load(f)
+      assert out["exit_code"] > 0
+      assert out["data_schema"] == "not ok"
+
+
+if __name__ == "__main__":
+  sys.exit(pytest.main([__file__]))
diff --git a/src/common/check_yaml_schema/config.vsh.yaml b/src/common/check_yaml_schema/config.vsh.yaml
new file mode 100644
index 0000000000..b87bec5429
--- /dev/null
+++ b/src/common/check_yaml_schema/config.vsh.yaml
@@ -0,0 +1,26 @@
+functionality:
+  name: check_yaml_schema
+  namespace: common
+  description: Checks if a YAML file adheres to a custom schema file.
+  argument_groups:
+    - name:  Inputs
+      arguments:
+        - name: --input
+          type: file
+          required: true
+          description: A yaml file.
+        - name: --schema
+          type: file
+          required: true
+          description: A schema file for the yaml file.
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - jsonschema
+  - type: nextflow
diff --git a/src/common/check_yaml_schema/script.py b/src/common/check_yaml_schema/script.py
new file mode 100644
index 0000000000..2058832bb2
--- /dev/null
+++ b/src/common/check_yaml_schema/script.py
@@ -0,0 +1,59 @@
+import jsonschema
+import yaml
+from pathlib import Path
+
+## VIASH START
+par = {
+  'input': 'src/tasks/batch_integration/methods/bbknn/config.vsh.yaml',
+  'schema': 'src/common/api/schema_task_method.yaml'
+}
+meta = {
+  'functionality_name': 'foo',
+}
+## VIASH END
+
+def yaml_to_dict(file_path):
+    with open(file_path, 'r') as stream:
+        try:
+            return yaml.safe_load(stream)
+        except yaml.YAMLError as exc:
+            print(exc)
+
+def load_schemas(schema_dir):
+    schema_files = list(schema_dir.glob("./**/schema_*.yaml"))
+    
+    schemas = {}
+    for file in schema_files:
+        schema = yaml_to_dict(file)
+        schemas[file.absolute()] = schema
+    
+    return schemas
+
+def create_validator(schema_name, schemas):
+    schema_store = {}
+    for name, value in schemas.items():
+        schema_store[f"file://{name}"] = value
+
+    # Setting the first schema as the main schema
+    
+    main_schema = schemas[schema_name]
+    resolver = jsonschema.RefResolver(
+        base_uri=f"file://{schema_name}",
+        referrer=main_schema,
+        store=schema_store
+    )
+
+    return jsonschema.Draft7Validator(main_schema, resolver=resolver)
+
+print(">> Read input yaml", flush=True)
+input_yaml_file = Path(par["input"])
+with open(input_yaml_file, 'r') as f:
+  input_yaml = yaml.safe_load(f)
+
+print(">> Read schema(s)", flush=True)
+schema_yaml_file = Path(par["schema"])
+schemas = load_schemas(schema_yaml_file.parent)
+
+print(">> Validate input yaml against schema", flush=True)
+validator = create_validator(schema_yaml_file.absolute(), schemas)
+validator.validate(input_yaml)
diff --git a/src/common/comp_tests/check_get_info.py b/src/common/comp_tests/check_get_info.py
new file mode 100644
index 0000000000..a00f1d702d
--- /dev/null
+++ b/src/common/comp_tests/check_get_info.py
@@ -0,0 +1,37 @@
+import subprocess
+from os import path
+import json
+
+## VIASH START
+## VIASH END
+
+input_path = meta["resources_dir"] + "/test_file.yaml"
+task_id = "denoising"
+output_path = "output.json"
+
+cmd = [
+    meta['executable'],
+    "--input", input_path,
+    "--task_id", task_id,
+    "--output", output_path,
+]
+
+print(">> Running script as test", flush=True)
+out = subprocess.run(cmd, stderr=subprocess.STDOUT)
+
+if out.stdout:
+  print(out.stdout)
+
+if out.returncode:
+  print(f"script: '{cmd}' exited with an error.")
+  exit(out.returncode)
+
+print(">> Checking whether output file exists", flush=True)
+assert path.exists(output_path), "Output does not exist"
+
+print(">> Reading json file", flush=True)
+with open(output_path, 'r') as f:
+    out = json.load(f)
+    print(out)
+
+print("All checks succeeded!", flush=True)
\ No newline at end of file
diff --git a/src/common/comp_tests/check_method_config.py b/src/common/comp_tests/check_method_config.py
new file mode 100644
index 0000000000..a30111d648
--- /dev/null
+++ b/src/common/comp_tests/check_method_config.py
@@ -0,0 +1,132 @@
+import yaml
+
+## VIASH START
+meta = {
+    "config" : "foo"
+}
+## VIASH END
+
+
+NAME_MAXLEN = 50
+
+SUMMARY_MAXLEN = 400
+
+DESCRIPTION_MAXLEN = 5000
+
+_MISSING_DOIS = ["vandermaaten2008visualizing", "hosmer2013applied"]
+
+TIME_LABELS = ["lowtime", "midtime", "hightime", "veryhightime"]
+MEM_LABELS = ["lowmem", "midmem", "highmem"]
+CPU_LABELS = ["lowcpu", "midcpu", "highcpu"]
+
+def _load_bib():
+    with open(f"{meta['resources_dir']}/library.bib", "r") as file:
+        return file.read()
+
+def check_url(url):
+    import requests
+    from urllib3.util.retry import Retry
+    from requests.adapters import HTTPAdapter
+
+    # configure retry strategy
+    session = requests.Session()
+    retry = Retry(connect=3, backoff_factor=0.5)
+    adapter = HTTPAdapter(max_retries=retry)
+    session.mount('http://', adapter)
+    session.mount('https://', adapter)
+
+    get = session.head(url)
+
+    if get.ok or get.status_code == 429: # 429 rejected, too many requests
+        return True
+    else:
+        return False
+
+def search_ref_bib(reference):
+    import re
+    bib = _load_bib()
+    
+    entry_pattern =  r"(@\w+{[^}]*" + reference + r"[^}]*}(.|\n)*?)(?=@)"
+
+    bib_entry = re.search(entry_pattern, bib)
+
+    if bib_entry:
+
+        type_pattern = r"@(.*){" + reference
+        doi_pattern = r"(?=[Dd][Oo][Ii]\s*=\s*{([^,}]+)})"
+
+        entry_type = re.search(type_pattern, bib_entry.group(1))
+
+        if not (entry_type.group(1) == "misc" or reference in _MISSING_DOIS):
+            entry_doi = re.search(doi_pattern, bib_entry.group(1))
+            assert entry_doi.group(1), "doi not found in bibtex reference"
+            url = f"https://doi.org/{entry_doi.group(1)}"
+            assert check_url(url), f"{url} is not reachable, ref= {reference}."
+
+        return True
+
+    else:
+        return False
+
+print("Load config data", flush=True)
+with open(meta["config"], "r") as file:
+    config = yaml.safe_load(file)
+
+print("Check general fields", flush=True)
+assert len(config["functionality"]["name"]) <= NAME_MAXLEN, f"Component id (.functionality.name) should not exceed {NAME_MAXLEN} characters."
+assert "namespace" in config["functionality"] is not None, "namespace not a field or is empty"
+
+print("Check info fields", flush=True)
+info = config['functionality']['info']
+assert "type" in info, "type not an info field"
+info_types = ["method", "control_method"]
+assert info["type"] in info_types , f"got {info['type']} expected one of {info_types}"
+assert "label" in info is not None, "label not an info field or is empty"
+assert "summary" in info is not None, "summary not an info field or is empty"
+assert "FILL IN:" not in info["summary"], "Summary not filled in"
+assert len(info["summary"]) <= SUMMARY_MAXLEN, f"Component id (.functionality.info.summary) should not exceed {SUMMARY_MAXLEN} characters."
+assert "description" in info is not None, "description not an info field or is empty"
+assert "FILL IN:" not in info["description"], "description not filled in"
+assert len(info["description"]) <= DESCRIPTION_MAXLEN, f"Component id (.functionality.info.description) should not exceed {DESCRIPTION_MAXLEN} characters."
+if info["type"] == "method":
+    assert "reference" in info, "reference not an info field"
+    bib = _load_bib()
+    if info["reference"]:
+        reference = info["reference"]
+        if not isinstance(reference, list):
+            reference = [reference]
+        for ref in reference:
+            assert search_ref_bib(ref), f"reference {ref} not added to library.bib"
+    assert "documentation_url" in info is not None, "documentation_url not an info field or is empty"
+    assert "repository_url" in info is not None, "repository_url not an info field or is empty"
+    assert check_url(info["documentation_url"]), f"{info['documentation_url']} is not reachable"
+    assert check_url(info["repository_url"]), f"{info['repository_url']} is not reachable"
+
+if "variants" in info:
+    arg_names = [arg["name"].replace("--", "") for arg in config["functionality"]["arguments"]] + ["preferred_normalization"]
+
+    for paramset_id, paramset in info["variants"].items():
+        if paramset:
+            for arg_id in paramset:
+                assert arg_id in arg_names, f"Argument '{arg_id}' in `.functionality.info.variants['{paramset_id}']` is not an argument in `.functionality.arguments`."
+
+assert "preferred_normalization" in info, "preferred_normalization not an info field"
+norm_methods = ["log_cpm", "log_cp10k", "counts", "log_scran_pooling", "sqrt_cpm", "sqrt_cp10k", "l1_sqrt"]
+assert info["preferred_normalization"] in norm_methods, "info['preferred_normalization'] not one of '" + "', '".join(norm_methods) + "'."
+
+print("Check platform fields", flush=True)
+platforms = config['platforms']
+for platform in platforms:
+    if not platform["type"] == "nextflow":
+        continue
+    nextflow= platform
+
+assert nextflow, "nextflow not a platform"
+assert nextflow["directives"], "directives not a field in nextflow platform"
+assert nextflow["directives"]["label"], "label not a field in nextflow platform directives"
+
+assert [i for i in nextflow["directives"]["label"] if i in TIME_LABELS], "time label not filled in"
+assert [i for i in nextflow["directives"]["label"] if i in MEM_LABELS], "mem label not filled in"
+assert [i for i in nextflow["directives"]["label"] if i in CPU_LABELS], "cpu label not filled in"
+
+print("All checks succeeded!", flush=True)
diff --git a/src/common/comp_tests/check_metric_config.py b/src/common/comp_tests/check_metric_config.py
new file mode 100644
index 0000000000..45fa1efc2b
--- /dev/null
+++ b/src/common/comp_tests/check_metric_config.py
@@ -0,0 +1,139 @@
+import yaml
+from typing import Dict
+
+## VIASH START
+
+meta = {
+    "config" : "foo"
+}
+
+## VIASH END
+
+NAME_MAXLEN = 50
+
+SUMMARY_MAXLEN = 400
+
+DESCRIPTION_MAXLEN = 5000
+
+_MISSING_DOIS = ["vandermaaten2008visualizing", "hosmer2013applied"]
+
+TIME_LABELS = ["lowtime", "midtime", "hightime"]
+MEM_LABELS = ["lowmem", "midmem", "highmem"]
+CPU_LABELS = ["lowcpu", "midcpu", "highcpu"]
+
+
+def _load_bib():
+    bib_path = meta["resources_dir"]+"/library.bib"
+    with open(bib_path, "r") as file:
+        return file.read()
+    
+def check_url(url):
+    import requests
+    from urllib3.util.retry import Retry
+    from requests.adapters import HTTPAdapter
+
+    # configure retry strategy
+    session = requests.Session()
+    retry = Retry(connect=3, backoff_factor=0.5)
+    adapter = HTTPAdapter(max_retries=retry)
+    session.mount('http://', adapter)
+    session.mount('https://', adapter)
+
+    get = session.head(url)
+
+    if get.ok or get.status_code == 429: # 429 rejected, too many requests
+        return True
+    else:
+        return False
+
+def search_ref_bib(reference):
+    import re
+    bib = _load_bib()
+    
+    entry_pattern =  r"(@\w+{[^}]*" + reference + r"[^}]*}(.|\n)*?)(?=@)"
+
+    bib_entry = re.search(entry_pattern, bib)
+
+    if bib_entry:
+
+        type_pattern = r"@(.*){" + reference
+        doi_pattern = r"(?=[Dd][Oo][Ii]\s*=\s*{([^,}]+)})"
+
+        entry_type = re.search(type_pattern, bib_entry.group(1))
+
+        if not (entry_type.group(1) == "misc" or reference in _MISSING_DOIS):
+            entry_doi = re.search(doi_pattern, bib_entry.group(1))
+            assert entry_doi.group(1), "doi not found in bibtex reference"
+            url = f"https://doi.org/{entry_doi.group(1)}"
+            assert check_url(url), f"{url} is not reachable, ref= {reference}."
+
+        return True
+
+    else:
+        return False
+
+def check_metric(metric: Dict[str, str])  -> str:
+    assert "name" in metric is not None, "name not a field or is empty"
+    assert len(metric["name"]) <= NAME_MAXLEN, f"Component id (.functionality.info.metrics.metric.name) should not exceed {NAME_MAXLEN} characters."
+    assert "label" in metric is not None, "label not a field in metric or is empty"
+    assert "summary" in metric is not None, "summary not a field in metric or is empty"
+    assert "FILL IN:" not in metric["summary"], "Summary not filled in"
+    assert len(metric["summary"]) <= SUMMARY_MAXLEN, f"Component id (.functionality.info.metrics.metric.summary) should not exceed {SUMMARY_MAXLEN} characters."
+    assert "description" in metric is not None, "description not a field in metric or is empty"
+    assert len(metric["description"]) <= DESCRIPTION_MAXLEN, f"Component id (.functionality.info.metrics.metric.description) should not exceed {DESCRIPTION_MAXLEN} characters."
+    assert "FILL IN:" not in metric["description"], "description not filled in"
+    # assert "reference" in metric, "reference not a field in metric"
+    if "reference" in metric:
+        reference = metric["reference"]
+        if not isinstance(reference, list):
+            reference = [reference]
+        for ref in reference:
+            assert search_ref_bib(ref), f"reference {ref} not added to library.bib"
+    # assert "documentation_url" in metric , "documentation_url not a field in metric"
+    # assert "repository_url" in metric , "repository_url not a metric field"
+    if "documentation_url" in metric:
+        assert check_url(metric["documentation_url"]), f"{metric['documentation_url']} is not reachable"
+    if "repository_url" in metric:
+        assert check_url(metric["repository_url"]), f"{metric['repository_url']} is not reachable"
+    assert "min" in metric is not None, f"min not a field in metric or is emtpy"
+    assert "max" in metric is not None, f"max not a field in metric or is empty"
+    assert "maximize" in metric is not None, f"maximize not a field in metric or is emtpy"
+    assert isinstance(metric['min'], (int, str)), "not an int or string (-.inf)"
+    assert isinstance(metric['max'], (int, str)), "not an int or string (+.inf)"
+    assert isinstance(metric['maximize'], bool) or metric["maximize"] not in ["-inf", "+inf"], "not a bool"
+
+
+print("Load config data", flush=True)
+with open(meta["config"], "r") as file:
+                config = yaml.safe_load(file)
+
+print("check general fields", flush=True)
+assert "name" in config["functionality"] is not None, "Name not a field or is empty"
+assert len(config["functionality"]["name"]) <= NAME_MAXLEN, f"Component id (.functionality.name) should not exceed {NAME_MAXLEN} characters."
+assert "namespace" in config["functionality"] is not None, "namespace not a field or is empty"
+
+
+print("Check info fields", flush=True)
+info = config['functionality']['info']
+assert "type" in info, "type not an info field"
+assert info["type"] == "metric" , f"got {info['type']} expected 'metric'"
+assert "metrics" in info, "metrics not an info field"
+for metric in info["metrics"]:
+    check_metric(metric)
+
+print("Check platform fields", flush=True)
+platforms = config['platforms']
+for platform in platforms:
+    if not platform["type"] == "nextflow":
+        continue
+    nextflow= platform
+
+assert nextflow, "nextflow not a platform"
+assert nextflow["directives"], "directives not a field in nextflow platform"
+assert nextflow["directives"]["label"], "label not a field in nextflow platform directives"
+
+assert [i for i in nextflow["directives"]["label"] if i in TIME_LABELS], "time label not filled in"
+assert [i for i in nextflow["directives"]["label"] if i in MEM_LABELS], "mem label not filled in"
+assert [i for i in nextflow["directives"]["label"] if i in CPU_LABELS], "cpu label not filled in"
+
+print("All checks succeeded!", flush=True)
diff --git a/src/common/comp_tests/run_and_check_adata.py b/src/common/comp_tests/run_and_check_adata.py
new file mode 100644
index 0000000000..d2cda5af94
--- /dev/null
+++ b/src/common/comp_tests/run_and_check_adata.py
@@ -0,0 +1,127 @@
+import anndata as ad
+import subprocess
+from os import path
+import yaml
+import re
+
+## VIASH START
+meta = {
+    "executable": "target/docker/denoising/methods/dca/dca",
+    "config": "target/docker/denoising/methods/dca/.config.vsh.yaml",
+    "resources_dir": "resources_test/denoising"
+}
+## VIASH END
+
+# helper functions
+def check_slots(adata, arg):
+    """Check whether an AnnData file contains all for the required
+    slots in the corresponding .info.slots field.
+    """
+    for struc_name, slot_items in arg["info"].get("slots", {}).items():
+        struc_x = getattr(adata, struc_name)
+        
+        if struc_name == "X":
+            if slot_items.get("required", True):
+                assert struc_x is not None,\
+                    f"File '{arg['value']}' is missing slot .{struc_name}"
+        
+        else:
+            for slot_item in slot_items:
+                if slot_item.get("required", True):
+                    assert slot_item["name"] in struc_x,\
+                        f"File '{arg['value']}' is missing slot .{struc_name}['{slot_item['name']}']"
+
+def run_and_check(arguments, cmd):
+    print(">> Checking whether input files exist", flush=True)
+    for arg in arguments:
+        if arg["type"] == "file" and arg["direction"] == "input":
+            assert path.exists(arg["value"]), f"Input file '{arg['value']}' does not exist"
+
+    print(f">> Running script as test", flush=True)
+    out = subprocess.run(cmd, stderr=subprocess.STDOUT)
+
+    if out.stdout:
+        print(out.stdout)
+
+    if out.returncode:
+        print(f"script: \'{' '.join(cmd)}\' exited with an error.")
+        exit(out.returncode)
+
+    print(">> Checking whether output file exists", flush=True)
+    for arg in arguments:
+        if arg["type"] == "file" and arg["direction"] == "output":
+            assert path.exists(arg["value"]), f"Output file '{arg['value']}' does not exist"
+
+    print(">> Reading h5ad files and checking formats", flush=True)
+    adatas = {}
+    for arg in arguments:
+        if arg["type"] == "file" and "slots" in arg["info"]:
+            print(f"Reading and checking {arg['clean_name']}", flush=True)
+            adata = ad.read_h5ad(arg["value"])
+
+            print(f"  {adata}")
+
+            check_slots(adata, arg)
+
+            adatas[arg["clean_name"]] = adata
+
+    print("All checks succeeded!", flush=True)
+
+
+# read viash config
+with open(meta["config"], "r") as file:
+    config = yaml.safe_load(file)
+
+# get resources
+arguments = []
+
+for arg in config["functionality"]["arguments"]:
+    new_arg = arg.copy()
+    arg_info = new_arg.get("info") or {}
+
+    # set clean name
+    clean_name = re.sub("^--", "", arg["name"])
+    new_arg["clean_name"] = clean_name
+
+    # use example to find test resource file
+    if arg["type"] == "file":
+      if arg["direction"] == "input":
+          value = f"{meta['resources_dir']}/{arg['example'][0]}"
+      else:
+          value = f"{clean_name}.h5ad"
+      new_arg["value"] = value
+    elif "test_default" in arg_info:
+        new_arg["value"] = arg_info["test_default"]
+        
+    arguments.append(new_arg)
+
+
+if "test_setup" not in config["functionality"]["info"]:
+    argument_sets = {"run": arguments}
+else:
+    test_setup = config["functionality"]["info"]["test_setup"]
+    argument_sets = {}
+    for name, test_instance in test_setup.items():
+        new_arguments = []
+        for arg in arguments:
+            new_arg = arg.copy()
+            if arg["clean_name"] in test_instance:
+                val = test_instance[arg["clean_name"]]
+                if new_arg["type"] == "file" and new_arg["direction"] == "input":
+                    val = f"{meta['resources_dir']}/{val}"
+                new_arg["value"] = val
+            new_arguments.append(new_arg)
+        argument_sets[name] = new_arguments
+
+for argset_name, argset_args in argument_sets.items():
+    print(f">> Running test '{argset_name}'", flush=True)
+    # construct command
+    cmd = [ meta["executable"] ]
+    for arg in argset_args:
+        if "value" in arg:
+            value = arg["value"]
+            if arg["multiple"] and isinstance(value, list):
+                value = arg["multiple_sep"].join(value)
+            cmd.extend([arg["name"], str(value)])
+
+    run_and_check(argset_args, cmd)
\ No newline at end of file
diff --git a/src/common/create_component/config.vsh.yaml b/src/common/create_component/config.vsh.yaml
new file mode 100644
index 0000000000..b8dc748fb6
--- /dev/null
+++ b/src/common/create_component/config.vsh.yaml
@@ -0,0 +1,71 @@
+functionality:
+  name: create_component
+  namespace: common
+  description: |
+    Create a component Viash component.
+    
+    Usage:
+    ```
+    bin/create_component --task denoising --type method --language r --name foo
+    bin/create_component --task denoising --type metric --language python --name bar
+    ```
+  arguments:
+    - type: string
+      name: --task
+      description: Which task the component will be added to.
+      example: denoising
+    - type: string
+      name: --type
+      example: metric
+      description: The type of component to create. Typically must be one of 'method', 'control_method' or 'metric'.
+    - type: string
+      name: --language
+      description: Which scripting language to use. Options are 'python', 'r'.
+      default: python
+      choices: [python, r]
+    - type: string
+      name: --name
+      example: new_comp
+      description: Name of the new method, formatted in snake case.
+    - type: file
+      name: --output
+      direction: output
+      # required: true
+      description: Path to the component directory. Suggested location is `src/<TASK>/<TYPE>s/<NAME>`.
+      default: src/tasks/${VIASH_PAR_TASK}/${VIASH_PAR_TYPE}s/${VIASH_PAR_NAME}
+    - type: file
+      name: --api_file
+      description: |
+        Which API file to use. Defaults to `src/<TASK>/api/comp_<TYPE>.yaml`.
+        In tasks with different subtypes of method, this location might not exist and you might need
+        to manually specify a different API file to inherit from.
+      must_exist: false
+      # required: true
+      default: src/tasks/${VIASH_PAR_TASK}/api/comp_${VIASH_PAR_TYPE}.yaml
+    - type: file
+      name: --viash_yaml
+      description: |
+        Path to the project config file. Needed for knowing the relative location of a file to the project root.
+      # required: true
+      default: "_viash.yaml"
+  resources:
+    - type: python_script
+      path: script.py
+    - path: /src/common/helper_functions/read_and_merge_yaml.py
+  test_resources:
+    - type: python_script
+      path: test.py
+    - path: /src
+      dest: openproblems/src
+    - path: /_viash.yaml
+      dest: openproblems/_viash.yaml
+platforms:
+  - type: docker
+    image: python:3.10-slim
+    setup:
+      - type: python
+        pypi: ruamel.yaml
+  - type: native
+  - type: nextflow
+
+
diff --git a/src/common/create_component/script.py b/src/common/create_component/script.py
new file mode 100644
index 0000000000..8c954a66d4
--- /dev/null
+++ b/src/common/create_component/script.py
@@ -0,0 +1,476 @@
+from typing import Any
+from pathlib import Path
+import sys
+import os
+import re
+
+## VIASH START
+par = {
+  "task": "denoising",
+  "type": "method",
+  "language": "python",
+  "name": "new_comp",
+  "output": "src/tasks/denoising/methods/new_comp",
+  "api_file": "src/tasks/denoising/api/comp_method.yaml",
+  "viash_yaml": "_viash.yaml"
+}
+## VIASH END
+
+# import helper function
+sys.path.append(meta["resources_dir"])
+from read_and_merge_yaml import read_and_merge_yaml
+
+def strip_margin(text: str) -> str:
+  return re.sub("(^|\n)[ \t]*\|", "\\1", text)
+
+def create_config(par, component_type, pretty_name, script_path) -> str:
+  info_str = generate_info(par, component_type, pretty_name)
+  resources_str = generate_resources(par, script_path)
+  docker_platform = generate_docker_platform(par)
+
+  return strip_margin(f'''\
+    |# The API specifies which type of component this is.
+    |# It contains specifications for:
+    |#   - The input/output files
+    |#   - Common parameters
+    |#   - A unit test
+    |__merge__: {os.path.relpath(par["api_file"], par["output"])}
+    |
+    |functionality:
+    |  # A unique identifier for your component (required).
+    |  # Can contain only lowercase letters or underscores.
+    |  name: {par["name"]}
+    |
+    |  # Metadata for your component
+    |  info:
+    |{info_str}
+    |  # Component-specific parameters (optional)
+    |  # arguments:
+    |  #   - name: "--n_neighbors"
+    |  #     type: "integer"
+    |  #     default: 5
+    |  #     description: Number of neighbors to use.
+    |
+    |  # Resources required to run the component
+    |  resources:
+    |{resources_str}
+    |platforms:
+    |  # Specifications for the Docker image for this component.
+    |{docker_platform}
+    |  # This platform allows running the component natively
+    |  - type: native
+    |  # Allows turning the component into a Nextflow module / pipeline.
+    |  - type: nextflow
+    |    directives:
+    |      label: [midtime,midmem, midcpu]
+    |'''
+  )
+
+def generate_info(par, component_type, pretty_name) -> str:
+  """Generate the functionality info for a component."""
+  if component_type in ["method", "control_method"]:
+    str = strip_margin(f'''\
+      |    # A relatively short label, used when rendering visualisarions (required)
+      |    label: {pretty_name}
+      |    # A one sentence summary of how this method works (required). Used when 
+      |    # rendering summary tables.
+      |    summary: "FILL IN: A one sentence summary of this method."
+      |    # A multi-line description of how this component works (required). Used
+      |    # when rendering reference documentation.
+      |    description: |
+      |      FILL IN: A (multi-line) description of how this method works.
+      |    # Which normalisation method this component prefers to use (required).
+      |    preferred_normalization: log_cp10k
+      |''')
+    if component_type == "method":
+      str += strip_margin(f'''\
+        |    # A reference key from the bibtex library at src/common/library.bib (required).
+        |    reference: bibtex_reference_key
+        |    # URL to the documentation for this method (required).
+        |    documentation_url: https://url.to/the/documentation
+        |    # URL to the code repository for this method (required).
+        |    repository_url: https://github.com/organisation/repository
+        |''')
+    return str
+  elif component_type == "metric":
+    return strip_margin(f'''\
+      |    metrics:
+      |      # A unique identifier for your metric (required).
+      |      # Can contain only lowercase letters or underscores.
+      |      name: {par["name"]}
+      |      # A relatively short label, used when rendering visualisarions (required)
+      |      label: {pretty_name}
+      |      # A one sentence summary of how this metric works (required). Used when 
+      |      # rendering summary tables.
+      |      summary: "FILL IN: A one sentence summary of this metric."
+      |      # A multi-line description of how this component works (required). Used
+      |      # when rendering reference documentation.
+      |      description: |
+      |        FILL IN: A (multi-line) description of how this metric works.
+      |      # A reference key from the bibtex library at src/common/library.bib (required).
+      |      reference: bibtex_reference_key
+      |      # URL to the documentation for this metric (required).
+      |      documentation_url: https://url.to/the/documentation
+      |      # URL to the code repository for this metric (required).
+      |      repository_url: https://github.com/organisation/repository
+      |      # The minimum possible value for this metric (required)
+      |      min: 0
+      |      # The maximum possible value for this metric (required)
+      |      max: 1
+      |      # Whether a higher value represents a 'better' solution (required)
+      |      maximize: true
+      |''')
+
+
+def generate_resources(par, script_path) -> str:
+  """Add the script to the functionality resources."""
+  if par["language"] == "python":
+    type_str = "python_script"
+  elif par["language"] == "r":
+    type_str = "r_script"
+
+  return strip_margin(f'''\
+    |    # The script of your component (required)
+    |    - type: {type_str}
+    |      path: {script_path}
+    |    # Additional resources your script needs (optional)
+    |    # - type: file
+    |    #   path: weights.pt
+    |''')
+
+def generate_docker_platform(par) -> str:
+  """Set up the docker platform for Python."""
+  if par["language"] == "python":
+    image_str = "openproblems/base_python:1.0.0"
+    setup_type = "python"
+    package_example = "scib==1.1.5"
+  elif par["language"] == "r":
+    image_str = "openproblems/base_r:1.0.0"
+    setup_type = "r"
+    package_example = "tidyverse"
+  return strip_margin(f'''\
+    |  - type: docker
+    |    image: {image_str}
+    |    # Add custom dependencies here (optional). For more information, see
+    |    # https://viash.io/reference/config/platforms/docker/#setup .
+    |    # setup:
+    |    #   - type: {setup_type}
+    |    #     packages: {package_example}
+    |''')
+
+def set_par_values(config) -> None:
+  """Adds values to each of the arguments in a config file."""
+  args = config['functionality']['arguments']
+  for argi, arg in enumerate(args):
+    key = re.sub("^-*", "", arg['name'])
+
+    # find value
+    if arg["type"] != "file":
+      value = arg.get("default", arg.get("example", "..."))
+    elif arg.get("direction", "input") == "input":
+      key_strip = key.replace("input_", "")
+      value = f'resources_test/{par["task"]}/pancreas/{key_strip}.h5ad'
+    else:
+      key_strip = key.replace("output_", "")
+      value = f'{key_strip}.h5ad'
+
+    # store key and value
+    config['functionality']['arguments'][argi]["key"] = key
+    config['functionality']['arguments'][argi]["value"] = value
+  
+def look_for_adata_arg(args, uns_field):
+  """Look for an argument that has a .uns[uns_field] in its info.slots."""
+  for arg in args:
+    uns = arg.get("info", {}).get("slots", {}).get("uns", [])
+    for unval in uns:
+      if unval.get("name") == uns_field:
+        return arg["key"]
+  return "adata"
+
+def write_output_python(arg, copy_from_adata, is_metric):
+  """Create code for writing the output h5ad files."""
+  slots = arg.get("info", {}).get("slots", {})
+  outer = []
+  for group_name, slots in slots.items():
+    inner = []
+    for slot in slots:
+      if group_name == "uns" and slot["name"] in ["dataset_id", "normalization_id"]:
+        value = f"{copy_from_adata}.uns['{slot['name']}']"
+      elif group_name == "uns" and slot["name"] == "method_id":
+        if is_metric:
+          value = f"{copy_from_adata}.uns['{slot['name']}']"
+        else:
+          value = "meta['functionality_name']"
+      else:
+        value = group_name + "_" + slot["name"]
+      inner.append(f"'{slot['name']}': {value}")
+    inner_values = ',\n    '.join(inner)
+    outer.append(f"{group_name}={{\n    {inner_values}\n  }}")
+  outer_values = ',\n  '.join(outer)
+  return strip_margin(
+    f'''\
+      |print("Write {arg["key"]} AnnData to file", flush=True)
+      |{arg["key"]} = ad.AnnData(
+      |  {outer_values}
+      |)
+      |{arg["key"]}.write_h5ad(par['{arg["key"]}'], compression='gzip')'''
+  )
+
+def write_output_r(arg, copy_from_adata, is_metric):
+  """Create code for writing the output h5ad files."""
+  slots = arg.get("info", {}).get("slots", {})
+  outer = []
+  for group_name, slots in slots.items():
+    inner = []
+    for slot in slots:
+      if group_name == "uns" and slot["name"] in ["dataset_id", "normalization_id"]:
+        value = f"{copy_from_adata}$uns[[\"{slot['name']}\"]]"
+      elif group_name == "uns" and slot["name"] == "method_id":
+        if is_metric:
+          value = f"{copy_from_adata}$uns[[\"{slot['name']}\"]]"
+        else:
+          value = "meta[[\"functionality_name\"]]"
+      else:
+        value = group_name + "_" + slot["name"]
+      inner.append(f"{slot['name']} = {value}")
+    inner_values = ',\n    '.join(inner)
+    outer.append(f"{group_name} = list(\n    {inner_values}\n  )")
+  outer_values = ',\n  '.join(outer)
+  return strip_margin(
+    f'''\
+      |cat("Write {arg["key"]} AnnData to file\\n")
+      |{arg["key"]} <- anndata::AnnData(
+      |  {outer_values}
+      |)
+      |{arg["key"]}$write_h5ad(par[["{arg["key"]}"]], compression = "gzip")'''
+  )
+
+def create_python_script(par, config, type):
+  args = config['functionality']['arguments']
+
+  # create the arguments of the par string
+  par_string = ",\n  ".join(f"'{arg['key']}': '{arg['value']}'" for arg in args)
+
+  # create code for reading the input h5ad file
+  read_h5ad_string = "\n".join(
+    f"{arg['key']} = ad.read_h5ad(par['{arg['key']}'])"
+    for arg in args
+    if arg['type'] == "file"
+    and arg.get('direction', "input") == "input"
+  )
+
+  # determine which adata to copy from
+  copy_from_adata = look_for_adata_arg(args, "method_id" if type == "metric" else "dataset_id")
+
+  # create code for writing the output h5ad files
+  write_h5ad_string = "\n".join(
+    write_output_python(arg, copy_from_adata, type == "metric")
+    for arg in args
+    if arg["type"] == "file"
+    and arg.get("direction", "input") == "output"
+  )
+
+  if type == 'metric':
+    processing_string = strip_margin(f'''\
+      |print('Compute metrics', flush=True)
+      |# metric_ids and metric_values can have length > 1
+      |# but should be of equal length
+      |uns_metric_ids = [ '{par['name']}' ]
+      |uns_metric_values = [ 0.5 ]''')
+  else:
+    processing_string = strip_margin(f'''\
+      |print('Preprocess data', flush=True)
+      |# ... preprocessing ...
+      |
+      |print('Train model', flush=True)
+      |# ... train model ...
+      |
+      |print('Generate predictions', flush=True)
+      |# ... generate predictions ...''')
+
+  script = strip_margin(f'''\
+    |import anndata as ad
+    |
+    |## VIASH START
+    |# Note: this section is auto-generated by viash at runtime. To edit it, make changes
+    |# in config.vsh.yaml and then run `viash config inject config.vsh.yaml`.
+    |par = {{
+    |  {par_string}
+    |}}
+    |meta = {{
+    |  'functionality_name': '{par["name"]}'
+    |}}
+    |## VIASH END
+    |
+    |print('Reading input files', flush=True)
+    |{read_h5ad_string}
+    |
+    |{processing_string}
+    |
+    |{write_h5ad_string}
+    |''')
+
+  return script
+
+def create_r_script(par, api_spec, type):
+  args = api_spec['functionality']['arguments']
+
+  # create the arguments of the par string
+  par_string = ",\n  ".join(f'{arg["key"]} = "{arg["value"]}"' for arg in args)
+
+  # create helpers for reading the h5ad file
+  read_h5ad_string = "\n".join(
+    f'{arg["key"]} <- anndata::read_h5ad(par[["{arg["key"]}"]])'
+    for arg in args
+    if arg['type'] == "file"
+    and arg.get("direction", "input") == "input"
+  )
+
+  # determine which adata to copy from
+  copy_from_adata = look_for_adata_arg(args, "method_id" if type == "metric" else "dataset_id")
+
+  # create code for writing the output h5ad files
+  write_h5ad_string = "\n".join(
+    write_output_r(arg, copy_from_adata, type == "metric")
+    for arg in args
+    if arg["type"] == "file"
+    and arg.get("direction", "input") == "output"
+  )
+
+  if type == 'metric':
+    processing_string = strip_margin(f'''\
+      |cat("Compute metrics\\n")
+      |# metric_ids and metric_values can have length > 1
+      |# but should be of equal length
+      |uns_metric_ids <- c("{par['name']}")
+      |uns_metric_values <- c(0.5)''')
+  else:
+    processing_string = strip_margin(f'''\
+      |cat("Preprocess data\\n")
+      |# ... preprocessing ...
+      |
+      |cat("Train model\\n")
+      |# ... train model ...
+      |
+      |cat("Generate predictions\\n")
+      |# ... generate predictions ...''')
+
+  script = strip_margin(f'''\
+    |library(anndata)
+    |
+    |## VIASH START
+    |par <- list(
+    |  {par_string}
+    |)
+    |meta <- list(
+    |  functionality_name = "{par["name"]}"
+    |)
+    |## VIASH END
+    |
+    |cat("Reading input files\\n")
+    |{read_h5ad_string}
+    |
+    |{processing_string}
+    |
+    |{write_h5ad_string}
+    |''')
+
+  return script
+
+# def read_viash_config(file):
+#   file = file.absolute()
+
+#   # read in config
+#   command = ["viash", "config", "view", str(file)]
+
+#   # Execute the command and capture the output
+#   output = subprocess.check_output(
+#     command,
+#     universal_newlines=True,
+#     cwd=str(file.parent)
+#   )
+
+#   # Parse the output as YAML
+#   config = yaml.load(output)
+
+#   return config
+
+
+def main(par):
+  ####### CHECK INPUTS #######
+  print("Check inputs", flush=True)
+  assert re.match("[a-z][a-z0-9_]*", par["name"]), "Name should match the regular expression '[a-z][a-z0-9_]*'. Example: 'my_component'."
+  assert len(par['name']) <= 50, "Method name should be at most 50 characters."
+
+  pretty_name = re.sub("_", " ", par['name']).title()
+
+  ####### CHECK LANGUAGE #######
+  print("Check language", flush=True)
+  # check language and determine script path
+  if par["language"] == "python":
+    script_path = "script.py"
+  elif par["language"] == "r":
+    script_path = "script.R"
+  else:
+    sys.exit(f"Unrecognized language parameter '{par['language']}'.")
+
+  ## CHECK API FILE
+  print("Check API file", flush=True)
+  api_file = Path(par["api_file"])
+  viash_yaml = Path(par["viash_yaml"])
+  project_dir = viash_yaml.parent
+  if not api_file.exists():
+    comp_types = [x.with_suffix("").name.removeprefix("comp_") for x in api_file.parent.glob("**/comp_*.y*ml")]
+    list.sort(comp_types)
+    sys.exit(strip_margin(f"""\
+      |Error: Invalid --type argument.
+      |  Reason: Could not find API file at '{api_file.relative_to(project_dir)}'.
+      |  Possible values for --type: {', '.join(comp_types)}."""))
+  
+  ## READ API FILE
+  print("Read API file", flush=True)
+  api = read_and_merge_yaml(api_file)
+  comp_type = api.get("functionality", {}).get("info", {}).get("type", {})
+  if not comp_type:
+    sys.exit(strip_margin(f"""\
+      |Error: API file is incorrectly formatted.
+      |  Reason: Could not find component type at `.functionality.info.type`.'
+      |  Please fix the formatting of the API file."""))
+
+  ####### CREATE OUTPUT DIR #######
+  print("Create output dir", flush=True)
+  out_dir = Path(par["output"])
+  out_dir.mkdir(exist_ok=True)
+
+  ####### CREATE CONFIG #######
+  print("Create config", flush=True)
+  config_file = out_dir / "config.vsh.yaml"
+
+  # get config template
+  config_str = create_config(par, comp_type, pretty_name, script_path)
+
+  with open(config_file, "w") as f:
+    f.write(config_str)
+
+  ####### CREATE SCRIPT #######
+  print("Create script", flush=True)
+  script_file = out_dir / script_path
+
+  # set reasonable values
+  set_par_values(api)
+
+  if par["language"] == "python":
+    script_out = create_python_script(par, api, comp_type)
+
+  if par["language"] == "r":
+    script_out = create_r_script(par, api, comp_type)
+  
+  # write script
+  with open(script_file, "w") as f:
+    f.write(script_out)
+
+  print("Done!", flush=True)
+
+
+if __name__ == "__main__":
+  main(par)
diff --git a/src/common/create_component/script.sh b/src/common/create_component/script.sh
new file mode 100755
index 0000000000..9fef9ef3a7
--- /dev/null
+++ b/src/common/create_component/script.sh
@@ -0,0 +1,5 @@
+TASK=dimensionality_reduction
+viash run src/common/create_component/config.vsh.yaml -- --task $TASK --type metric --name foor --language r
+viash run src/common/create_component/config.vsh.yaml -- --task $TASK --type method --name foor --language r
+viash run src/common/create_component/config.vsh.yaml -- --task $TASK --type method --name foopy
+viash run src/common/create_component/config.vsh.yaml -- --task $TASK --type metric --name foopy
\ No newline at end of file
diff --git a/src/common/create_component/test.py b/src/common/create_component/test.py
new file mode 100644
index 0000000000..a53e54a18e
--- /dev/null
+++ b/src/common/create_component/test.py
@@ -0,0 +1,52 @@
+import os
+import subprocess
+from os import path
+from ruamel.yaml import YAML
+
+## VIASH START
+meta = {
+    'executable': 'foo'
+}
+## VIASH END
+
+opv2 = f"{meta['resources_dir']}/openproblems"
+output_path = f"{opv2}/src/tasks/label_projection/methods/test_method"
+
+cmd = [
+    meta['executable'],
+    '--task', 'label_projection',
+    '--type', 'method',
+    '--name', 'test_method',
+    '--language', 'python'
+]
+
+print('>> Running the script as test', flush=True)
+out = subprocess.run(cmd, stderr=subprocess.STDOUT, cwd=opv2)
+
+if out.stdout:
+    print(out.stdout)
+
+if out.returncode:
+    print(f"script: '{cmd}' exited with an error.")
+    exit(out.returncode)
+
+print('>> Checking whether output files exist', flush=True)
+assert os.path.exists(output_path), "Output dir does not exist"
+
+conf_f = path.join(output_path, 'config.vsh.yaml')
+assert os.path.exists(conf_f), "Config file does not exist"
+
+script_f = path.join(output_path, "script.py")
+assert os.path.exists(script_f), "Script file does not exist"
+
+print('>> Checking file contents', flush=True)
+yaml = YAML(typ='safe', pure=True)
+with open(conf_f) as f:
+    conf_data = yaml.load(f)
+
+assert conf_data['functionality']['name'] == 'test_method', "Name should be equal to 'test_method'"
+# assert conf_data['platforms'][0]['image'] == 'python:3.10', "Python image should be equal to python:3.10"
+
+
+print('All checks succeeded!', flush=True)
+
diff --git a/src/common/create_task_readme/config.vsh.yaml b/src/common/create_task_readme/config.vsh.yaml
new file mode 100644
index 0000000000..cff0917b0d
--- /dev/null
+++ b/src/common/create_task_readme/config.vsh.yaml
@@ -0,0 +1,69 @@
+functionality:
+  name: create_task_readme
+  namespace: common
+  description: |
+    Create a README for the task.
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - type: string
+          name: --task
+          description: Which task the component will be added to.
+          example: denoising
+          required: false
+        - type: file
+          name: --task_dir
+          description: Path to the task directory.
+          default: src/tasks/${VIASH_PAR_TASK}
+          required: false
+        - type: file
+          name: --viash_yaml
+          description: |
+            Path to the project config file. Needed for knowing the relative location of a file to the project root.
+          default: "_viash.yaml"
+        - type: string
+          name: --github_url
+          description: |
+            URL to the GitHub repository. Needed for linking to the source code.
+          default: "https://github.com/openproblems-bio/openproblems/tree/main/"
+    - name: Outputs
+      arguments:
+        - type: file
+          name: --output
+          direction: output
+          description: Path to the component directory. Suggested location is `src/tasks/<TASK>/README.md`.
+          default: src/tasks/${VIASH_PAR_TASK}/README.md
+  resources:
+    - type: r_script
+      path: script.R
+    - path: /src/common/helper_functions/read_and_merge_yaml.R
+    - path: /src/common/helper_functions/read_api_files.R
+    - path: /src/common/helper_functions/strip_margin.R
+  test_resources:
+    - type: r_script
+      path: test.R
+    - path: /src
+      dest: openproblems/src
+    - path: /_viash.yaml
+      dest: openproblems/_viash.yaml
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        packages: [dplyr, purrr, rlang, glue, yaml, fs, cli, igraph, rmarkdown, processx]
+      - type: apt
+        packages: [jq, curl]
+      - type: docker
+        # download and install quarto-*-linux-amd64.deb from latest release
+        run: |
+          release_info=$(curl -s https://api.github.com/repos/quarto-dev/quarto-cli/releases/latest) && \
+            download_url=$(printf "%s" "$release_info" | jq -r '.assets[] | select(.name | test("quarto-.*-linux-amd64.deb")) | .browser_download_url') && \
+            curl -sL "$download_url" -o /opt/quarto.deb && \
+            dpkg -i /opt/quarto.deb && \
+            rm /opt/quarto.deb
+  - type: native
+  - type: nextflow
+    directives: 
+      label: [midtime, lowmem, lowcpu]
+
diff --git a/src/common/create_task_readme/render_all.sh b/src/common/create_task_readme/render_all.sh
new file mode 100755
index 0000000000..e44195c1ed
--- /dev/null
+++ b/src/common/create_task_readme/render_all.sh
@@ -0,0 +1,10 @@
+#!/bin/bash
+
+set -e
+
+TASK_IDS=`ls src/tasks`
+
+for task_id in $TASK_IDS; do
+  echo ">> Processing $task_id"
+  viash run src/common/create_task_readme/config.vsh.yaml -- --task $task_id
+done
\ No newline at end of file
diff --git a/src/common/create_task_readme/script.R b/src/common/create_task_readme/script.R
new file mode 100644
index 0000000000..35320e4d97
--- /dev/null
+++ b/src/common/create_task_readme/script.R
@@ -0,0 +1,134 @@
+library(rlang, quietly = TRUE, warn.conflicts = FALSE)
+library(purrr, quietly = TRUE, warn.conflicts = FALSE)
+library(dplyr, quietly = TRUE, warn.conflicts = FALSE)
+
+## VIASH START
+par <- list(
+  "task" = "batch_integration",
+  "task_dir" = "src/tasks/batch_integration",
+  "output" = "src/tasks/batch_integration/README.md",
+  "viash_yaml" = "_viash.yaml",
+  "github_url" = "https://github.com/openproblems-bio/openproblems/tree/main/"
+)
+meta <- list(
+  "resources_dir" = "src/common/helper_functions",
+  "temp_dir" = "temp/"
+)
+## VIASH END
+
+if (is.null(par$task) && is.null(par$task_dir)) {
+  stop("Either 'task' or 'task_dir' must be provided")
+}
+if (is.null(par$viash_yaml)) {
+  stop("Argument 'viash_yaml' must be provided")
+}
+if (is.null(par$output)) {
+  stop("Argument 'output' must be provided")
+}
+
+# import helper function
+source(paste0(meta["resources_dir"], "/read_and_merge_yaml.R"))
+source(paste0(meta["resources_dir"], "/strip_margin.R"))
+source(paste0(meta["resources_dir"], "/read_api_files.R"))
+
+cat("Read task info\n")
+task_api <- read_task_api(par[["task_dir"]])
+
+# determine ordering
+root <- .task_graph_get_root(task_api)
+
+r_graph <- render_task_graph(task_api, root)
+
+cat("Render API details\n")
+order <- names(igraph::bfs(task_api$task_graph, root)$order)
+r_details <- map_chr(
+  order,
+  function(file_name) {
+    if (file_name %in% names(task_api$comp_specs)) {
+      render_component(task_api$comp_specs[[file_name]])
+    } else {
+      render_file(task_api$file_specs[[file_name]])
+    }
+  }
+)
+
+cat("Render authors\n")
+authors_str <-
+  if (nrow(task_api$authors) > 0) {
+    paste0(
+      "\n## Authors & contributors\n\n",
+      task_api$authors %>% knitr::kable() %>% paste(collapse = "\n"),
+      "\n"
+    )
+  } else {
+    ""
+  }
+readme_str <-
+  if (is.null(task_api$task_info$readme) || is.na(task_api$task_info$readme)) {
+    ""
+  } else {
+    paste0(
+      "\n## README\n\n",
+      task_api$task_info$readme,
+      "\n"
+    )
+  }
+
+cat("Generate qmd content\n")
+relative_path <- par[["task_dir"]] %>%
+  gsub(paste0(dirname(par[["viash_yaml"]]), "/*"), "", .) %>%
+  gsub("/*$", "", .)
+source_url <- paste0(par[["github_url"]], relative_path)
+qmd_content <- strip_margin(glue::glue("
+  §---
+  §title: \"{task_api$task_info$label}\"
+  §format: gfm
+  §---
+  §
+  §<!--
+  §This file is automatically generated from the tasks's api/*.yaml files.
+  §Do not edit this file directly.
+  §-->
+  §
+  §{task_api$task_info$summary}
+  §
+  §Path to source: [`{relative_path}`]({source_url})
+  §
+  §{readme_str}
+  §
+  §## Motivation
+  §
+  §{task_api$task_info$motivation}
+  §
+  §## Description
+  §
+  §{task_api$task_info$description}
+  §{authors_str}
+  §## API
+  §
+  §{r_graph}
+  §
+  §{paste(r_details, collapse = '\n\n')}
+  §
+  §"), symbol = "§")
+
+cat("Write README.qmd to file\n")
+qmd_file <- tempfile(
+  pattern = "README_",
+  fileext = ".qmd",
+  tmpdir = meta$temp_dir
+)
+
+if (!dir.exists(meta$temp_dir)) {
+  dir.create(meta$temp_dir, recursive = TRUE)
+}
+writeLines(qmd_content, qmd_file)
+
+cat("Render README.qmd to README.md\n")
+out <- processx::run(
+  command = "quarto",
+  args = c("render", qmd_file, "--output", "-"),
+  echo = TRUE
+)
+
+writeLines(out$stdout, par$output)
diff --git a/src/common/create_task_readme/test.R b/src/common/create_task_readme/test.R
new file mode 100644
index 0000000000..3a981fe7ca
--- /dev/null
+++ b/src/common/create_task_readme/test.R
@@ -0,0 +1,30 @@
+requireNamespace("assertthat", quietly = TRUE)
+
+## VIASH START
+## VIASH END
+
+opv2 <- paste0(meta$resources_dir, "/openproblems")
+output_path <- "output.md"
+
+cat(">> Running the script as test\n")
+system(paste(
+  meta["executable"],
+  "--task", "label_projection",
+  "--output", output_path,
+  "--task_dir", paste0(opv2, "/src/tasks/label_projection"),
+  "--viash_yaml", paste0(opv2, "/_viash.yaml")
+))
+
+cat(">> Checking whether output files exist\n")
+assertthat::assert_that(file.exists(output_path))
+
+cat(">> Checking file contents\n")
+lines <- readLines(output_path)
+assertthat::assert_that(any(grepl("# Label projection", lines)))
+assertthat::assert_that(any(grepl("# Description", lines)))
+assertthat::assert_that(any(grepl("# Motivation", lines)))
+assertthat::assert_that(any(grepl("# Authors", lines)))
+assertthat::assert_that(any(grepl("flowchart LR", lines)))
+assertthat::assert_that(any(grepl("# File format:", lines)))
+
+cat("All checks succeeded!\n")
diff --git a/src/common/decompress_gzip/config.vsh.yaml b/src/common/decompress_gzip/config.vsh.yaml
new file mode 100644
index 0000000000..2716dc554d
--- /dev/null
+++ b/src/common/decompress_gzip/config.vsh.yaml
@@ -0,0 +1,25 @@
+functionality:
+  name: decompress_gzip
+  namespace: common
+  arguments:
+    - name: --input
+      type: file
+      description: Input file
+      example: /path/to/file.gz
+    - name: --output
+      type: file
+      description: Output file
+      example: /path/to/file
+      direction: output
+  resources:
+    - type: bash_script
+      path: script.sh
+  test_resources:
+    - type: bash_script
+      path: test.sh
+platforms:
+  - type: docker
+    image: ubuntu:latest
+  - type: nextflow
+    directives:
+      label: [midtime, lowmem, lowcpu]
diff --git a/src/common/decompress_gzip/script.sh b/src/common/decompress_gzip/script.sh
new file mode 100644
index 0000000000..f0486b6068
--- /dev/null
+++ b/src/common/decompress_gzip/script.sh
@@ -0,0 +1,3 @@
+#!/bin/bash
+
+gunzip "$par_input" -c > "$par_output"
\ No newline at end of file
diff --git a/src/common/decompress_gzip/test.sh b/src/common/decompress_gzip/test.sh
new file mode 100644
index 0000000000..17bb20afbf
--- /dev/null
+++ b/src/common/decompress_gzip/test.sh
@@ -0,0 +1,22 @@
+#!/bin/bash
+
+set -e
+
+## VIASH START
+## VIASH END
+
+echo "> Creating test file"
+echo "Foo bar" > uncompressed.txt
+
+echo "> Compressing file"
+gzip uncompressed.txt -c > compressed.txt.gz
+
+echo "> Decompressing file"
+"$meta_executable" \
+  --input "compressed.txt.gz" \
+  --output "decompressed.txt"
+
+echo "> Comparing files"
+diff uncompressed.txt decompressed.txt
+
+echo "> Test succeeded!"
\ No newline at end of file
diff --git a/src/common/extract_metadata/config.vsh.yaml b/src/common/extract_metadata/config.vsh.yaml
new file mode 100644
index 0000000000..76e73cb975
--- /dev/null
+++ b/src/common/extract_metadata/config.vsh.yaml
@@ -0,0 +1,40 @@
+functionality:
+  name: extract_metadata
+  namespace: common
+  description: Extract the metadata from an h5ad file.
+  argument_groups:
+    - name:  Inputs
+      arguments:
+        - name: --input
+          type: file
+          required: true
+          description: A h5ad file.
+        - name: --schema
+          type: file
+          required: false
+          description: An optional schema with which to annotate the output
+    - name: Output
+      arguments:
+        - name: --output
+          type: file
+          required: true
+          description: A yaml file containing the metadata.
+          example: output_meta.yaml
+          direction: output
+  resources:
+    - type: python_script
+      path: script.py
+  test_resources:
+    - path: /resources_test/common/pancreas
+    - path: /src/datasets/api/file_raw.yaml
+    - type: python_script
+      path: test.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    test_setup:
+      - type: python
+        packages: viashpy
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/common/extract_metadata/script.py b/src/common/extract_metadata/script.py
new file mode 100644
index 0000000000..7a55b50e21
--- /dev/null
+++ b/src/common/extract_metadata/script.py
@@ -0,0 +1,206 @@
+import anndata as ad
+import yaml
+import numpy as np
+import pandas as pd
+import scipy
+import os
+import datetime
+
+## VIASH START
+par = {
+  'input': 'resources_test/common/pancreas/dataset.h5ad',
+  'schema': 'src/datasets/api/file_raw.yaml',
+  'output': 'output/meta.yaml',
+}
+## VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input']).copy()
+
+if par["schema"]:
+  print("Load schema", flush=True)
+  with open(par["schema"], "r") as f:
+    schema = yaml.safe_load(f)
+else:
+  schema = None
+
+####################################################################################################
+## Helper functions for extracting the dataset metadata in uns                                    ##
+####################################################################################################
+def is_atomic(obj):
+  return isinstance(obj, str) or isinstance(obj, int) or isinstance(obj, bool) or isinstance(obj, float)
+
+def to_atomic(obj):
+  if isinstance(obj, np.float64):
+    return float(obj)
+  elif isinstance(obj, np.int64):
+    return int(obj)
+  elif isinstance(obj, np.bool_):
+    return bool(obj)
+  elif isinstance(obj, np.str_):
+    return str(obj)
+  return obj
+
+def is_list_of_atomics(obj):
+  if not isinstance(obj, (list,pd.core.series.Series,np.ndarray)):
+    return False
+  return all(is_atomic(elem) for elem in obj)
+
+def to_list_of_atomics(obj):
+  if isinstance(obj, pd.core.series.Series):
+    obj = obj.to_numpy()
+  if isinstance(obj, np.ndarray):
+    obj = obj.tolist()
+  return [to_atomic(elem) for elem in obj]
+
+def is_dict_of_atomics(obj):
+  if not isinstance(obj, dict):
+    return False
+  return all(is_atomic(elem) for _, elem in obj.items())
+
+def to_dict_of_atomics(obj):
+  return {k: to_atomic(v) for k, v in obj.items()}
+
+
+####################################################################################################
+## Helper functions for extracting metadata about the used data structures                        ##
+####################################################################################################
+def get_structure_shape(obj) -> list:
+  if isinstance(obj, np.ndarray):
+    return list(obj.shape)
+  elif scipy.sparse.issparse(obj):
+    return list(obj.shape)
+  elif isinstance(obj, pd.core.frame.DataFrame):
+    return list(obj.shape)
+  elif isinstance(obj, pd.core.series.Series):
+    return list(obj.shape)
+  elif isinstance(obj, list):
+    return [len(obj)]
+  elif isinstance(obj, dict):
+    return [len(obj)]
+  elif is_atomic(obj):
+    return [1]
+  return None
+
+def get_structure_type(obj) -> str:
+  # return one of: atomic, dataFrame, vector, dict, denseMatrix, sparseMatrix
+  if is_atomic(obj):
+    return "atomic"
+  elif isinstance(obj, (list,pd.core.series.Series)):
+    return "vector"
+  elif isinstance(obj, dict):
+    return "dict"
+  elif isinstance(obj, pd.core.frame.DataFrame):
+    return "dataframe"
+  elif scipy.sparse.issparse(obj):
+    return "sparsematrix"
+  elif isinstance(obj, np.ndarray):
+    return "densematrix"
+  return "other: " + str(type(obj))
+
+def get_structure_dtype(obj) -> str:
+  if isinstance(obj, np.ndarray):
+    return obj.dtype.name
+  elif isinstance(obj, pd.core.series.Series):
+    return obj.dtype.name
+  elif isinstance(obj, pd.core.frame.DataFrame):
+    return [dtype.name for dtype in obj.dtypes]
+  elif scipy.sparse.issparse(obj):
+    return obj.dtype.name
+  elif is_atomic(obj):
+    return type(obj).__name__
+  return None
+
+def get_structure_schema_info(struct, key) -> dict:
+  if schema is None:
+    return {}
+  struct_args = schema.get("info", {}).get("slots", {}).get(struct, {})
+  if struct_args is None:
+    return {}
+  if struct == "X":
+    return struct_args
+  
+  # look for item with the correct name
+  struct_results = [x for x in struct_args if x.get("name") == key]
+
+  # return None if no match is found
+  if len(struct_results) != 1:
+    return {}
+
+  return struct_results[0]
+
+def get_structure(adata, struct):
+  adata_struct = getattr(adata, struct)
+
+  # turn `adata_struct` into a dict for `X`
+  if (struct == "X"):
+    adata_struct = {"X": adata_struct} if adata_struct is not None else {}
+
+  output = []
+
+  for key, value in adata_struct.items():
+    out = {
+      "name": key,
+      "type": get_structure_type(value),
+      "shape": get_structure_shape(value),
+      "dtype": get_structure_dtype(value),
+    }
+
+    # see if the schema has information about this struct
+    schema_info = get_structure_schema_info(struct, key)
+
+    if schema_info.get("description"):
+      out["description"] = schema_info.get("description")
+    if schema_info.get("type"):
+      out["schema_type"] = schema_info.get("type")
+
+    output.append(out)
+  
+  return output
+
+####################################################################################################
+## Other helper functions                                                                         ##
+####################################################################################################
+
+def get_file_size(path: str) -> int:
+  """Get the file size in bytes of the file at the given path."""
+  return os.path.getsize(path)
+
+def get_file_creation_time(path: str) -> str:
+  """Get the creation time of the file at the given path."""
+  # Get file creation time
+  creation_time = os.path.getctime(path)
+  # Convert creation time from seconds since epoch to a readable timestamp
+  creation_time = datetime.datetime.fromtimestamp(creation_time)
+  # Format the datetime object as 'DD-MM-YYYY'
+  creation_time = creation_time.strftime('%d-%m-%Y')
+  return str(creation_time)
+
+
+print("Extract metadata from object", flush=True)
+# Extract metadata about the adata object
+uns = {}
+for key, val in adata.uns.items():
+  if is_atomic(val):
+    uns[key] = to_atomic(val)
+  elif is_list_of_atomics(val) and len(val) <= 10:
+    uns[key] = to_list_of_atomics(val)
+  elif is_dict_of_atomics(val) and len(val) <= 10:
+    uns[key] = to_dict_of_atomics(val)
+
+uns["file_size"] = get_file_size(par["input"])
+uns["date_created"] = get_file_creation_time(par["input"])
+
+# Extract metadata about the data structures
+structure = {
+  struct: get_structure(adata, struct)
+  for struct
+  in ["X", "obs", "var", "obsp", "varp", "obsm", "varm", "layers", "uns"]
+}
+
+# ¢reate metadata object
+meta = {"uns": uns, "structure": structure}
+
+print("Write metadata to file", flush=True)
+with open(par["output"], "w") as f:
+  yaml.dump(meta, f, indent=2)
diff --git a/src/common/extract_metadata/test.py b/src/common/extract_metadata/test.py
new file mode 100644
index 0000000000..8af023d8f6
--- /dev/null
+++ b/src/common/extract_metadata/test.py
@@ -0,0 +1,26 @@
+import sys
+import re
+import pytest
+import json
+import subprocess
+
+## VIASH START
+## VIASH END
+
+input_path = meta["resources_dir"] + "/pancreas/dataset.h5ad"
+schema_path = meta["resources_dir"] + "/file_raw.yaml"
+
+def test_run(run_component, tmp_path):
+  output_path = tmp_path / "meta.yaml"
+
+  run_component([
+    "--input", input_path,
+    "--schema", schema_path,
+    "--output", str(output_path),
+  ])
+
+  assert output_path.exists(), "Output path does not exist"
+
+
+if __name__ == "__main__":
+  sys.exit(pytest.main([__file__]))
diff --git a/src/common/extract_scores/config.vsh.yaml b/src/common/extract_scores/config.vsh.yaml
new file mode 100644
index 0000000000..72270b7a95
--- /dev/null
+++ b/src/common/extract_scores/config.vsh.yaml
@@ -0,0 +1,35 @@
+functionality:
+  name: "extract_scores"
+  status: disabled
+  namespace: "common"
+  description: "Extract evaluation data frame on output"
+  arguments:
+    - name: "--input"
+      alternatives: ["-i"]
+      type: "file"
+      multiple: true
+      default: "input.h5ad"
+      description: "Input h5ad files containing metadata and metrics in adata.uns"
+    - name: "--column_names"
+      type: "string"
+      multiple: true
+      default: [ "dataset_id", "method_id", "metric_ids", "metric_values" ]
+      description: "Which fields from adata.uns to extract and store as a data frame."
+    - name: "--output"
+      alternatives: ["-o"]
+      type: "file"
+      direction: "output"
+      default: "output.tsv"
+      description: "Output tsv"
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [ tidyverse ]
+  - type: nextflow
+    directives:
+      label: [midtime, lowmem, lowcpu]
diff --git a/src/common/extract_scores/script.R b/src/common/extract_scores/script.R
new file mode 100644
index 0000000000..6b540380ab
--- /dev/null
+++ b/src/common/extract_scores/script.R
@@ -0,0 +1,30 @@
+cat("Loading dependencies\n")
+library(anndata, warn.conflicts = FALSE)
+options(tidyverse.quiet = TRUE)
+library(tidyverse)
+library(assertthat)
+
+## VIASH START
+par <- list(
+  input = "resources_test/label_projection/pancreas/knn_accuracy.h5ad",
+  output = "scores.tsv"
+)
+inp <- par$input[[1]]
+## VIASH END
+
+cat("Reading input h5ad files\n")
+scores <- map_df(par$input, function(inp) {
+  cat("Reading '", inp, "'\n", sep = "")
+  ad <- read_h5ad(inp)
+
+  for (uns_name in par$column_names) {
+    assert_that(
+      uns_name %in% names(ad$uns),
+      msg = paste0("File ", inp, " must contain `uns['", uns_name, "']`")
+    )
+  }
+
+  data.frame(ad$uns[par$column_names])
+})
+
+write_tsv(scores, par$output)
diff --git a/src/common/helper_functions/read_and_merge_yaml.R b/src/common/helper_functions/read_and_merge_yaml.R
new file mode 100644
index 0000000000..932d3feb92
--- /dev/null
+++ b/src/common/helper_functions/read_and_merge_yaml.R
@@ -0,0 +1,144 @@
+#' Read a Viash YAML
+#'
+#' If the YAML contains a "__merge__" key anywhere in the yaml,
+#' the path specified in that YAML will be read and the two
+#' lists will be merged. This is a recursive procedure.
+#'
+#' @param path Path to Viash YAML
+read_and_merge_yaml <- function(path, project_path = .ram_find_project(path)) {
+  path <- normalizePath(path, mustWork = FALSE)
+  data <- tryCatch({
+    suppressWarnings(yaml::read_yaml(path))
+  }, error = function(e) {
+    stop("Could not read ", path, ". Error: ", e)
+  })
+  .ram_process_merge(data, data, path, project_path)
+}
+
+.ram_find_project <- function(path) {
+  path <- normalizePath(path, mustWork = FALSE)
+  check <- paste0(dirname(path), "/_viash.yaml")
+  if (file.exists(check)) {
+    dirname(check)
+  } else if (check == "//_viash.yaml") {
+    NULL
+  } else {
+    .ram_find_project(dirname(check))
+  }
+}
+
+.ram_is_named_list <- function(obj) {
+  is.null(obj) || (is.list(obj) && (length(obj) == 0 || !is.null(names(obj))))
+}
+
+.ram_process_merge <- function(data, root_data, path, project_path) {
+  if (.ram_is_named_list(data)) {
+    # check whether children have `__merge__` entries
+    processed_data <- lapply(data, function(dat) {
+      .ram_process_merge(dat, root_data, path, project_path)
+    })
+    processed_data <- lapply(names(data), function(nm) {
+      dat <- data[[nm]]
+      .ram_process_merge(dat, root_data, path, project_path)
+    })
+    names(processed_data) <- names(data)
+
+    # if current element has __merge__, read list2 yaml and combine with data
+    new_data <-
+      if ("__merge__" %in% names(processed_data) && !.ram_is_named_list(processed_data$`__merge__`)) {
+        new_data_path <- .ram_resolve_path(
+          path = processed_data$`__merge__`,
+          project_path = project_path,
+          parent_path = dirname(path)
+        )
+        read_and_merge_yaml(new_data_path, project_path)
+      } else if ("$ref" %in% names(processed_data) && !.ram_is_named_list(processed_data$`$ref`)) {
+        ref_parts <- strsplit(processed_data$`$ref`, "#")[[1]]
+
+        # resolve the path in $ref
+        x <-
+          if (ref_parts[[1]] == "") {
+            root_data
+          } else {
+            new_data_path <- .ram_resolve_path(
+              path = ref_parts[[1]],
+              project_path = project_path,
+              parent_path = dirname(path)
+            )
+            new_data_path <- normalizePath(new_data_path, mustWork = FALSE)
+
+            # read in the new data
+            tryCatch({
+              suppressWarnings(yaml::read_yaml(new_data_path))
+            }, error = function(e) {
+              stop("Could not read ", new_data_path, ". Error: ", e)
+            })
+          }
+        x_root <- x
+        
+
+        # Navigate the path and retrieve the referenced data
+        ref_path_parts <- unlist(strsplit(ref_parts[[2]], "/"))
+        for (part in ref_path_parts) {
+          if (part == "") {
+            next
+          } else if (part %in% names(x)) {
+            x <- x[[part]]
+          } else {
+            stop("Could not find ", processed_data$`$ref`, " in ", path)
+          }
+        }
+
+        # postprocess the new data
+        if (ref_parts[[1]] == "") {
+          x
+        } else {
+          .ram_process_merge(x, x_root, new_data_path, project_path)
+        }
+      } else {
+        list()
+      }
+
+    .ram_deep_merge(new_data, processed_data)
+  } else if (is.list(data)) {
+    lapply(data, function(dat) {
+      .ram_process_merge(dat, root_data, path, project_path)
+    })
+  } else {
+    data
+  }
+}
+
+.ram_resolve_path <- function(path, project_path, parent_path) {
+  ifelse(
+    grepl("^/", path),
+    paste0(project_path, "/", path),
+    fs::path_abs(path, parent_path)
+  )
+}
+
+.ram_deep_merge <- function(list1, list2) {
+  if (.ram_is_named_list(list1) && .ram_is_named_list(list2)) {
+    # if list1 and list2 are objects, recursively merge
+    keys <- unique(c(names(list1), names(list2)))
+    out <- lapply(keys, function(key) {
+      if (key %in% names(list1)) {
+        if (key %in% names(list2)) {
+          .ram_deep_merge(list1[[key]], list2[[key]])
+        } else {
+          list1[[key]]
+        }
+      } else {
+        list2[[key]]
+      }
+    })
+    names(out) <- keys
+    out
+  } else if (is.list(list1) && is.list(list2)) {
+    # if list1 and list2 are both lists, append
+    c(list1, list2)
+  } else {
+    # else override list1 with list2
+    list2
+  }
+}
\ No newline at end of file
diff --git a/src/common/helper_functions/read_and_merge_yaml.py b/src/common/helper_functions/read_and_merge_yaml.py
new file mode 100644
index 0000000000..b74995aed1
--- /dev/null
+++ b/src/common/helper_functions/read_and_merge_yaml.py
@@ -0,0 +1,52 @@
+def read_and_merge_yaml(path):
+    """Read a Viash YAML
+    
+    If the YAML contains a "__merge__" key anywhere in the yaml,
+    the path specified in that YAML will be read and the two
+    lists will be merged. This is a recursive procedure.
+    
+    Arguments:
+    path -- Path to the Viash YAML"""
+    from ruamel.yaml import YAML
+
+    yaml = YAML(typ='safe', pure=True)
+
+    with open(path, 'r') as stream:
+        data = yaml.load(stream)
+    return _ram_process_merge(data, path)
+
+def _ram_deep_merge(dict1, dict2):
+    if isinstance(dict1, dict) and isinstance(dict2, dict):
+        keys = set(list(dict1.keys()) + list(dict2.keys()))
+        out = {}
+        for key in keys:
+            if key in dict1:
+                if key in dict2:
+                    out[key] = _ram_deep_merge(dict1[key], dict2[key])
+                else:
+                    out[key] = dict1[key]
+            else:
+                out[key] = dict2[key]
+        return out
+    elif isinstance(dict1, list) and isinstance(dict2, list):
+        return dict1 + dict2
+    else:
+        return dict2
+
+def _ram_process_merge(data, path):
+    import os
+    if isinstance(data, dict):
+        processed_data = {k: _ram_process_merge(v, path) for k, v in data.items()}
+
+        if "__merge__" in processed_data:
+            new_data_path = os.path.join(os.path.dirname(path), processed_data["__merge__"])
+            new_data = read_and_merge_yaml(new_data_path)
+        else:
+            new_data = {}
+
+        return _ram_deep_merge(new_data, processed_data)
+    elif isinstance(data, list):
+        return [_ram_process_merge(dat, path) for dat in data]
+    else:
+        return data
+
diff --git a/src/common/helper_functions/read_anndata_partial.py b/src/common/helper_functions/read_anndata_partial.py
new file mode 100644
index 0000000000..efbea0592d
--- /dev/null
+++ b/src/common/helper_functions/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/src/common/helper_functions/read_api_files.R b/src/common/helper_functions/read_api_files.R
new file mode 100644
index 0000000000..f2cf49b2f8
--- /dev/null
+++ b/src/common/helper_functions/read_api_files.R
@@ -0,0 +1,493 @@
+
+anndata_struct_names <- c("obs", "var", "obsm", "obsp", "varm", "varp", "layers", "uns")
+
+read_file_spec <- function(path) {
+  spec <- read_and_merge_yaml(path)
+  out <- list(
+    info = read_file_info(spec, path)
+  )
+  if (out$info$file_type == "h5ad" || "slots" %in% names(spec$info)) {
+    out$info$file_type <- "h5ad"
+    out$slots <- read_anndata_slots(spec, path)
+  }
+  if (out$info$file_type == "csv" || out$info$file_type == "tsv" || out$info$file_type == "parquet") {
+    out$columns <- read_tabular_columns(spec, path)
+  }
+  out
+}
+read_file_info <- function(spec, path) {
+  # TEMP: make it readable
+  spec$info$slots <- NULL
+  df <- list_as_tibble(spec)
+  if (list_contains_tibble(spec$info)) {
+    df <- dplyr::bind_cols(df, list_as_tibble(spec$info))
+  }
+  df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+  df$description <- df$description %||% NA_character_ %>% as.character
+  df$summary <- df$summary %||% NA_character_ %>% as.character
+  as_tibble(df)
+}
+read_anndata_slots <- function(spec, path) {
+  map_df(
+    anndata_struct_names,
+    function(struct_name, slot) {
+      slot <- spec$info$slots[[struct_name]]
+      if (is.null(slot)) return(NULL)
+      df <- map_df(slot, as.data.frame)
+      df$struct <- struct_name
+      df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+      df$required <- df$required %||% TRUE %|% TRUE
+      df$multiple <- df$multiple %||% FALSE %|% FALSE
+      as_tibble(df)
+    }
+  )
+}
+read_tabular_columns <- function(spec, path) {
+  map_df(
+    spec$info$columns,
+    function(column) {
+      df <- list_as_tibble(column)
+      df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+      df$required <- df$required %||% TRUE %|% TRUE
+      df$multiple <- df$multiple %||% FALSE %|% FALSE
+      as_tibble(df)
+    }
+  )
+}
+
+format_file_format <- function(spec) {
+  if (spec$info$file_type == "h5ad") {
+    example <- spec$slots %>%
+      group_by(struct) %>%
+      summarise(
+        str = paste0(unique(struct), ": ", paste0("'", name, "'", collapse = ", "))
+      ) %>%
+      arrange(match(struct, anndata_struct_names))
+
+    c("    AnnData object", paste0("     ", example$str))
+  } else if (spec$info$file_type == "csv" || spec$info$file_type == "tsv" || spec$info$file_type == "parquet") {
+    example <- spec$columns %>%
+      summarise(
+        str = paste0("'", name, "'", collapse = ", ")
+      )
+
+    c("    Tabular data", paste0("     ", example$str))
+  } else {
+    ""
+  }
+}
+
+format_file_format_as_kable <- function(spec) {
+  if (spec$info$file_type == "h5ad") {
+    spec$slots %>%
+      mutate(
+        tag_str = pmap_chr(lst(required), function(required) {
+          out <- c()
+          if (!required) {
+            out <- c(out, "Optional")
+          }
+          if (length(out) == 0) {
+            ""
+          } else {
+            paste0("(_", paste(out, collapse = ", "), "_) ")
+          }
+        })
+      ) %>%
+      transmute(
+        Slot = paste0("`", struct, "[\"", name, "\"]`"),
+        Type = paste0("`", type, "`"),
+        Description = paste0(
+          tag_str,
+          description %>% gsub(" *\n *", " ", .) %>% gsub("\\. *$", "", .), 
+          "."
+        )
+      ) %>%
+      knitr::kable()
+  } else if (spec$info$file_type == "csv" || spec$info$file_type == "tsv" || spec$info$file_type == "parquet") {
+    spec$columns %>%
+      mutate(
+        tag_str = pmap_chr(lst(required), function(required) {
+          out <- c()
+          if (!required) {
+            out <- c(out, "Optional")
+          }
+          if (length(out) == 0) {
+            ""
+          } else {
+            paste0("(_", paste(out, collapse = ", "), "_) ")
+          }
+        })
+      ) %>%
+      transmute(
+        Column = paste0("`", name, "`"),
+        Type = paste0("`", type, "`"),
+        Description = paste0(
+          tag_str,
+          description %>% gsub(" *\n *", " ", .) %>% gsub("\\. *$", "", .), 
+          "."
+        )
+      ) %>%
+      knitr::kable()
+  } else {
+    ""
+  }
+}
+
+list_contains_tibble <- function(li) {
+  is.list(li) && any(sapply(li, is.atomic))
+}
+
+list_as_tibble <- function(li) {
+  as.data.frame(li[sapply(li, is.atomic)], check.names = FALSE)
+}
+
+read_comp_spec <- function(path) {
+  spec_yaml <- read_and_merge_yaml(path)
+  list(
+    info = read_comp_info(spec_yaml, path),
+    args = read_comp_args(spec_yaml, path)
+  )
+}
+
+read_comp_info <- function(spec_yaml, path) {
+  # TEMP: make it readable
+  spec_yaml$functionality$arguments <- NULL
+  spec_yaml$functionality$argument_groups <- NULL
+  
+  df <- list_as_tibble(spec_yaml$functionality)
+  if (nrow(df) == 0) {
+    df <- data.frame(a = 1)[, integer(0)]
+  }
+  if (list_contains_tibble(spec_yaml$functionality$info)) {
+    df <- dplyr::bind_cols(df, list_as_tibble(spec_yaml$functionality$info))
+  }
+  if (list_contains_tibble(spec_yaml$functionality$info$type_info)) {
+    df <- dplyr::bind_cols(df, list_as_tibble(spec_yaml$functionality$info$type_info))
+  }
+  df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+  as_tibble(df)
+}
+
+read_comp_args <- function(spec_yaml, path) {
+  arguments <- spec_yaml$functionality$arguments
+  for (arg_group in spec_yaml$functionality$argument_groups) {
+    arguments <- c(arguments, arg_group$arguments)
+  }
+  map_df(arguments, function(arg) {
+    df <- list_as_tibble(arg)
+    if (list_contains_tibble(arg$info)) {
+      df <- dplyr::bind_cols(df, list_as_tibble(arg$info))
+    }
+    df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+    df$arg_name <- gsub("^-*", "", arg$name)
+    df$direction <- df$direction %||% "input" %|% "input"
+    df$parent <- df$`__merge__` %||% NA_character_ %>% basename() %>% gsub("\\.yaml", "", .)
+    df$required <- df$required %||% FALSE %|% FALSE
+    df$default <- df$default %||% NA_character_ %>% as.character
+    df$example <- df$example %||% NA_character_ %>% as.character
+    df$description <- df$description %||% NA_character_ %>% as.character
+    df$summary <- df$summary %||% NA_character_ %>% as.character
+    df
+  })
+}
+
+format_comp_args_as_tibble <- function(spec) {
+  if (nrow(spec$args) == 0) return("")
+  spec$args %>%
+    mutate(
+      tag_str = pmap_chr(lst(required, direction), function(required, direction) {
+        out <- c()
+        if (!required) {
+          out <- c(out, "Optional")
+        }
+        if (direction == "output") {
+          out <- c(out, "Output")
+        }
+        if (length(out) == 0) {
+          ""
+        } else {
+          paste0("(_", paste(out, collapse = ", "), "_) ")
+        }
+      })
+    ) %>%
+    transmute(
+      Name = paste0("`--", arg_name, "`"),
+      Type = paste0("`", type, "`"),
+      Description = paste0(
+        tag_str,
+        (summary %|% description) %>% gsub(" *\n *", " ", .) %>% gsub("\\. *$", "", .), 
+        ".",
+        ifelse(!is.na(default), paste0(" Default: `", default, "`."), "")
+      )
+    ) %>%
+    knitr::kable()
+}
+
+# path <- "src/datasets/api/comp_processor_knn.yaml"
+render_component <- function(spec) {
+  if (is.character(spec)) {
+    spec <- read_comp_spec(spec)
+  }
+
+  strip_margin(glue::glue("
+    §## Component type: {spec$info$label}
+    §
+    §Path: [`src/{spec$info$namespace}`](https://github.com/openproblems-bio/openproblems/tree/main/src/{spec$info$namespace})
+    §
+    §{spec$info$summary}
+    §
+    §Arguments:
+    §
+    §:::{{.small}}
+    §{paste(format_comp_args_as_tibble(spec), collapse = '\n')}
+    §:::
+    §
+    §"), symbol = "§")
+}
+
+# path <- "src/datasets/api/file_pca.yaml"
+render_file <- function(spec) {
+  if (is.character(spec)) {
+    spec <- read_file_spec(spec)
+  }
+
+  if (!"label" %in% names(spec$info)) {
+    spec$info$label <- basename(spec$info$example)
+  }
+
+  example <-
+    if (is.null(spec$info$example) || is.na(spec$info$example)) {
+      ""
+    } else {
+      paste0("Example file: `", spec$info$example, "`")
+    }
+
+  description <-
+    if (is.null(spec$info$description) || is.na(spec$info$description)) {
+      ""
+    } else {
+      paste0("Description:\n\n", spec$info$description)
+    }
+
+  strip_margin(glue::glue("
+    §## File format: {spec$info$label}
+    §
+    §{spec$info$summary %||% ''}
+    §
+    §{example}
+    §
+    §{description}
+    §
+    §Format:
+    §
+    §:::{{.small}}
+    §{paste(format_file_format(spec), collapse = '\n')}
+    §:::
+    §
+    §Slot description:
+    §
+    §:::{{.small}}
+    §{paste(format_file_format_as_kable(spec), collapse = '\n')}
+    §:::
+    §
+    §"), symbol = "§")
+}
+
+# path <- "src/tasks/denoising"
+read_task_api <- function(path) {
+  cli::cli_inform("Looking for project root")
+  project_path <- .ram_find_project(path)
+  api_dir <- paste0(path, "/api")
+
+  cli::cli_inform("Reading task info")
+  task_info_yaml <- list.files(api_dir, pattern = "task_info.ya?ml", full.names = TRUE)
+  assertthat::assert_that(length(task_info_yaml) == 1)
+  task_info <- read_and_merge_yaml(task_info_yaml, project_path)
+
+  cli::cli_inform("Reading task authors")
+  authors <- map_df(task_info$authors, function(aut) {
+    aut$roles <- paste(aut$roles, collapse = ", ")
+    list_as_tibble(aut)
+  })
+
+  cli::cli_inform("Reading component yamls")
+  comp_yamls <- list.files(api_dir, pattern = "comp_.*\\.ya?ml", full.names = TRUE)
+  comps <- map(comp_yamls, read_comp_spec)
+  comp_info <- map_df(comps, "info")
+  comp_args <- map_df(comps, "args")
+  names(comps) <- basename(comp_yamls) %>% gsub("\\..*$", "", .)
+
+  cli::cli_inform("Reading file yamls")
+  file_yamls <- .ram_resolve_path(
+    path = na.omit(unique(comp_args$`__merge__`)),
+    project_path = project_path,
+    parent_path = api_dir
+  )
+  files <- map(file_yamls, read_file_spec)
+  names(files) <- basename(file_yamls) %>% gsub("\\..*$", "", .)
+  file_info <- map_df(files, "info")
+  file_slots <- map_df(files, "slots")
+
+  cli::cli_inform("Generating task graph")
+  task_graph <- create_task_graph(file_info, comp_info, comp_args)
+
+  list(
+    task_info = task_info,
+    file_specs = files,
+    file_info = file_info,
+    file_slots = file_slots,
+    comp_specs = comps,
+    comp_info = comp_info,
+    comp_args = comp_args,
+    task_graph = task_graph,
+    authors = authors
+  )
+}
+
+
+create_task_graph <- function(file_info, comp_info, comp_args) {
+  clean_id <- function(id) {
+    gsub("graph", "graaf", id)
+  }
+  nodes <-
+    bind_rows(
+      file_info %>%
+        mutate(id = file_name, label = label, is_comp = FALSE),
+      comp_info %>%
+        mutate(id = file_name, label = label, is_comp = TRUE)
+    ) %>%
+      select(id, label, everything()) %>%
+      mutate(str = paste0(
+        "  ",
+        clean_id(id),
+        ifelse(is_comp, "[/\"", "(\""),
+        label,
+        ifelse(is_comp, "\"/]", "\")")
+      ))
+  edges <- bind_rows(
+    comp_args %>%
+      filter(type == "file", direction == "input") %>%
+      mutate(
+        from = parent,
+        to = file_name,
+        arrow = "---"
+      ),
+    comp_args %>%
+      filter(type == "file", direction == "output") %>%
+      mutate(
+        from = file_name,
+        to = parent,
+        arrow = "-->"
+      )
+  ) %>%
+    select(from, to, everything()) %>%
+    mutate(str = paste0("  ", clean_id(from), arrow, clean_id(to)))
+
+  igraph::graph_from_data_frame(
+    edges,
+    vertices = nodes,
+    directed = TRUE
+  )
+}
+
+.task_graph_get_root <- function(task_api) {
+  root <- names(which(igraph::degree(task_api$task_graph, mode = "in") == 0))
+  if (length(root) > 1) {
+    warning(
+      "There should probably only be one node with in-degree equal to 0.\n",
+      "  Nodes with in-degree == 0: ", paste(root, collapse = ", ")
+    )
+  }
+  root[[1]]
+}
+
+render_task_graph <- function(task_api, root = .task_graph_get_root(task_api)) {
+  order <- names(igraph::bfs(task_api$task_graph, root)$order)
+
+  vdf <- igraph::as_data_frame(task_api$task_graph, "vertices") %>%
+    arrange(match(name, order))
+  edf <- igraph::as_data_frame(task_api$task_graph, "edges") %>%
+    arrange(match(from, order), match(to, order))
+
+  strip_margin(glue::glue("
+    §```mermaid
+    §flowchart LR
+    §{paste(vdf$str, collapse = '\n')}
+    §{paste(edf$str, collapse = '\n')}
+    §```
+    §"), symbol = "§")
+}
+
+
+
+# Recursive function to process each property with indentation
+.render_example_process_property <- function(prop, prop_name = NULL, indent_level = 0) {
+  if (is.null(prop_name)) {
+    prop_name <- ""
+  }
+
+  out <- c()
+
+  # define helper variables
+  indent_spaces <- strrep(" ", indent_level)
+  next_indent_spaces <- strrep(" ", indent_level + 2)
+
+  # add comment if available
+  if ("description" %in% names(prop)) {
+    comment <- gsub("\n", paste0("\n", indent_spaces, "# "), stringr::str_trim(prop$description))
+    out <- c(out, indent_spaces, "# ", comment, "\n")
+  }
+
+  # add variable
+  out <- c(out, indent_spaces, prop_name, ": ")
+
+  if (prop$type == "object" && "properties" %in% names(prop)) {
+    # Handle object with properties
+    prop_names <- setdiff(names(prop$properties), "additionalProperties")
+    sub_props <- unlist(lapply(prop_names, function(sub_prop_name) {
+      prop_out <- .render_example_process_property(
+        prop$properties[[sub_prop_name]],
+        sub_prop_name,
+        indent_level + 2
+      )
+      c(prop_out, "\n")
+    }))
+    c(out, "\n", sub_props[-length(sub_props)])
+  } else if (prop$type == "array") {
+    if (is.list(prop$items) && "properties" %in% names(prop$items)) {
+      # Handle array of objects
+      array_items_yaml <- unlist(lapply(names(prop$items$properties), function(item_prop_name) {
+        prop_out <- .render_example_process_property(
+          prop$items$properties[[item_prop_name]],
+          item_prop_name,
+          indent_level + 4
+        )
+        c(prop_out, "\n")
+      }))
+      c(out, "\n", next_indent_spaces, "- ", array_items_yaml[-1])
+    } else {
+      # Handle simple array
+      c(out, "[ ... ]")
+    }
+  } else {
+    c(out, "...")
+  }
+}
+
+# Function for rendering an example yaml based on a JSON schema
+render_example <- function(json_schema) {
+  if (!"properties" %in% names(json_schema)) {
+    return("")
+  }
+  text <-
+    unlist(lapply(names(json_schema$properties), function(prop_name) {
+      out <- .render_example_process_property(
+        json_schema$properties[[prop_name]],
+        prop_name,
+        0
+      )
+      c(out, "\n")
+    }))
+
+  paste(text, collapse = "")
+}
\ No newline at end of file
diff --git a/src/common/helper_functions/setup_logger.py b/src/common/helper_functions/setup_logger.py
new file mode 100644
index 0000000000..ae71eb9611
--- /dev/null
+++ b/src/common/helper_functions/setup_logger.py
@@ -0,0 +1,12 @@
+def setup_logger():
+    import logging
+    from sys import stdout
+
+    logger = logging.getLogger()
+    logger.setLevel(logging.INFO)
+    console_handler = logging.StreamHandler(stdout)
+    logFormatter = logging.Formatter("%(asctime)s %(levelname)-8s %(message)s")
+    console_handler.setFormatter(logFormatter)
+    logger.addHandler(console_handler)
+
+    return logger
\ No newline at end of file
diff --git a/src/common/helper_functions/strip_margin.R b/src/common/helper_functions/strip_margin.R
new file mode 100644
index 0000000000..3830d58d79
--- /dev/null
+++ b/src/common/helper_functions/strip_margin.R
@@ -0,0 +1,3 @@
+strip_margin <- function(text, symbol = "\\|") {
+  gsub(paste0("(^|\n)[ \t]*", symbol), "\\1", text)
+}
\ No newline at end of file
diff --git a/src/common/helper_functions/strip_margin.py b/src/common/helper_functions/strip_margin.py
new file mode 100644
index 0000000000..fbfb39dec9
--- /dev/null
+++ b/src/common/helper_functions/strip_margin.py
@@ -0,0 +1,3 @@
+def strip_margin(text: str) -> str:
+  import re
+  return re.sub("(^|\n)[ \t]*\|", "\\1", text)
\ No newline at end of file
diff --git a/src/common/helper_functions/subset_anndata.py b/src/common/helper_functions/subset_anndata.py
new file mode 100644
index 0000000000..80bd160872
--- /dev/null
+++ b/src/common/helper_functions/subset_anndata.py
@@ -0,0 +1,83 @@
+"""Helper functions related to subsetting AnnData objects based on the file format 
+specifications in the .config.vsh.yaml and slot mapping overrides."""
+
+def read_config_slots_info(config_file, slot_mapping = {}):
+    """Read the .config.vsh.yaml to find out which output slots need to be copied to which output file.
+    
+    Arguments:
+    config_file -- Path to the .config.vsh.yaml file (required).
+    slot_mapping -- Which slots to retain. Must be a dictionary whose keys are the names 
+      of the AnnData structs, and values is another dictionary with destination value
+      names as keys and source value names as values.
+      Example of slot_mapping: 
+        ```
+        slot_mapping = {
+          "layers": {
+            "counts": par["layer_counts"],
+          },
+          "obs": {
+            "cell_type": par["obs_cell_type"],
+            "batch": par["obs_batch"],
+          }
+        }
+        ```
+    """
+    import yaml
+    import re
+
+    # read output spec from yaml
+    with open(config_file, "r") as object_name:
+        config = yaml.safe_load(object_name)
+
+    output_struct_slots = {}
+
+    # fetch info on which slots should be copied to which file
+    for arg in config["functionality"]["arguments"]:
+        # argument is an output file with a slot specification
+        if arg["direction"] == "output" and arg.get("info", {}).get("slots"):
+            object_name = re.sub("--", "", arg["name"])
+            
+            struct_slots = arg['info']['slots']
+            out = {}
+            for (struct, slots) in struct_slots.items():
+                out_struct = {}
+                for slot in slots:
+                    # if slot_mapping[struct][slot['name']] exists, use that as the source slot name
+                    # otherwise use slot['name']
+                    source_slot = slot_mapping.get(struct, {}).get(slot["name"], slot["name"])
+                    out_struct[slot["name"]] = source_slot
+                out[struct] = out_struct
+
+            output_struct_slots[object_name] = out
+
+    return output_struct_slots
+
+# create new anndata objects according to api spec
+def subset_anndata(adata, slot_info):
+    """Create new anndata object according to slot info specifications.
+    
+    Arguments:
+    adata -- An AnnData object to subset (required)
+    slot_info -- Which slots to retain, typically one of the items in the output of read_config_slots_info.
+      Must be a dictionary whose keys are the names of the AnnData structs, and values is another 
+      dictionary with destination value names as keys and source value names as values. 
+      """
+    import pandas as pd
+    import anndata as ad
+
+    structs = ["layers", "obs", "var", "uns", "obsp", "obsm", "varp", "varm"]
+    kwargs = {}
+
+    for struct in structs:
+        slot_mapping = slot_info.get(struct, {})
+        data = {dest : getattr(adata, struct)[src] for (dest, src) in slot_mapping.items()}
+        if len(data) > 0:
+            if struct in ['obs', 'var']:
+                data = pd.concat(data, axis=1)
+            kwargs[struct] = data
+        elif struct in ['obs', 'var']:
+            # if no columns need to be copied, we still need an 'obs' and a 'var' 
+            # to help determine the shape of the adata
+            kwargs[struct] = getattr(adata, struct).iloc[:,[]]
+
+    return ad.AnnData(**kwargs)
\ No newline at end of file
diff --git a/src/common/library.bib b/src/common/library.bib
new file mode 100644
index 0000000000..af730fe8cd
--- /dev/null
+++ b/src/common/library.bib
@@ -0,0 +1,2191 @@
+@misc{10x2018pbmc,
+	title = {1k PBMCs from a Healthy Donor (v3 chemistry)},
+	author = {{10x Genomics}},
+	year = {2018},
+	url = {https://www.10xgenomics.com/resources/datasets/1-k-pbm-cs-from-a-healthy-donor-v-3-chemistry-3-standard-3-0-0}
+}
+
+
+@misc{10x2019heart,
+	title = {Human Heart},
+	author = {{10x Genomics}},
+	year = {2019},
+	url = {https://www.10xgenomics.com/datasets/human-heart-1-standard-1-0-0}
+}
+
+
+@misc{10x2019lymph,
+	title = {Human Lymph Node},
+	author = {{10x Genomics}},
+	year = {2019},
+	url = {https://www.10xgenomics.com/datasets/human-lymph-node-1-standard-1-0-0}
+}
+
+
+@misc{10x2019pbmc,
+	title = {5k Peripheral Blood Mononuclear Cells (PBMCs) from a Healthy Donor with a Panel of TotalSeq-B Antibodies (v3 chemistry)},
+	author = {{10x Genomics}},
+	year = {2019},
+	url = {https://www.10xgenomics.com/resources/datasets/5-k-peripheral-blood-mononuclear-cells-pbm-cs-from-a-healthy-donor-with-cell-surface-proteins-v-3-chemistry-3-1-standard-3-1-0}
+}
+
+
+@misc{10x2020breast,
+	title = {Human Breast Cancer: Whole Transcriptome Analysis},
+	author = {{10x Genomics}},
+	year = {2020},
+	url = {https://www.10xgenomics.com/datasets/human-breast-cancer-whole-transcriptome-analysis-1-standard-1-2-0}
+}
+
+
+@misc{10x2020cerebellum,
+	title = {Human Cerebellum: Whole Transcriptome Analysis},
+	author = {{10x Genomics}},
+	year = {2020},
+	url = {https://www.10xgenomics.com/datasets/human-cerebellum-whole-transcriptome-analysis-1-standard-1-2-0}
+}
+
+
+@misc{10x2020kidney,
+	title = {Mouse Kidney Section (Coronal)},
+	author = {{10x Genomics}},
+	year = {2020},
+	url = {https://www.10xgenomics.com/datasets/mouse-kidney-section-coronal-1-standard-1-1-0}
+}
+
+
+@misc{10x2021breast,
+	title = {Human Breast Cancer: Ductal Carcinoma In Situ, Invasive Carcinoma (FFPE)},
+	author = {{10x Genomics}},
+	year = {2021},
+	url = {https://www.10xgenomics.com/datasets/human-breast-cancer-ductal-carcinoma-in-situ-invasive-carcinoma-ffpe-1-standard-1-3-0}
+}
+
+
+@misc{10x2021prostate,
+	title = {Normal Human Prostate (FFPE)},
+	author = {{10x Genomics}},
+	year = {2021},
+	url = {https://www.10xgenomics.com/datasets/normal-human-prostate-ffpe-1-standard-1-3-0}
+}
+
+
+@misc{10x2022brain,
+	title = {Mouse Brain Coronal Section 1 (FFPE)},
+	author = {{10x Genomics}},
+	year = {2022},
+	url = {https://www.10xgenomics.com/datasets/mouse-brain-coronal-section-1-ffpe-2-standard}
+}
+
+
+@misc{10x2022cervical,
+	title = {Human Cervical Cancer (FFPE)},
+	author = {{10x Genomics}},
+	year = {2022},
+	url = {https://www.10xgenomics.com/datasets/human-cervical-cancer-1-standard}
+}
+
+
+@misc{10x2022olfactory,
+	title = {Adult Mouse Olfactory Bulb},
+	author = {{10x Genomics}},
+	year = {2022},
+	url = {https://www.10xgenomics.com/datasets/adult-mouse-olfactory-bulb-1-standard-1}
+}
+
+
+@misc{10x2022intestine,
+	title = {Human Intestine Cancer (FPPE)},
+	author = {{10x Genomics}},
+	year = {2022},
+	url = {https://www.10xgenomics.com/datasets/human-intestine-cancer-1-standard}
+}
+
+
+@misc{10x2022melanoma,
+	title = {Human Melanoma, IF Stained (FFPE)},
+	author = {{10x Genomics}},
+	year = {2022},
+	url = {https://www.10xgenomics.com/datasets/human-melanoma-if-stained-ffpe-2-standard}
+}
+
+
+@misc{10x2022prostate,
+	title = {Human Prostate Cancer, Adjacent Normal Section with IF Staining (FFPE)},
+	author = {{10x Genomics}},
+	year = {2022},
+	url = {https://www.10xgenomics.com/datasets/human-prostate-cancer-adjacent-normal-section-with-if-staining-ffpe-1-standard}
+}
+
+
+@misc{10x2023brain,
+	title = {Human Brain Cancer, 11 mm Capture Area (FFPE)},
+	author = {{10x Genomics}},
+	year = {2023},
+	url = {https://www.10xgenomics.com/datasets/human-brain-cancer-11-mm-capture-area-ffpe-2-standard}
+}
+
+
+@misc{10x2023colon,
+	title = {Visium CytAssist Gene Expression Libraries of Post-Xenium Human Colon Cancer (FFPE)},
+	author = {{10x Genomics}},
+	year = {2023},
+	url = {https://www.10xgenomics.com/datasets/visium-cytassist-gene-expression-libraries-of-post-xenium-human-colon-cancer-ffpe-using-the-human-whole-transcriptome-probe-set-2-standard}
+}
+
+
+@misc{10x2023colorectal,
+	title = {Human Colorectal Cancer, 11 mm Capture Area (FFPE)},
+	author = {{10x Genomics}},
+	year = {2023},
+	url = {https://www.10xgenomics.com/datasets/human-colorectal-cancer-11-mm-capture-area-ffpe-2-standard}
+}
+
+
+@misc{10x2023embryo,
+	title = {Visium CytAssist, Mouse Embryo, 11 mm Capture Area (FFPE)},
+	author = {{10x Genomics}},
+	year = {2023},
+	url = {https://www.10xgenomics.com/datasets/visium-cytassist-mouse-embryo-11-mm-capture-area-ffpe-2-standard}
+}
+
+
+@misc{10x2023kidney,
+	title = {Human Kidney, 11 mm Capture Area (FFPE)},
+	author = {{10x Genomics}},
+	year = {2023},
+	url = {https://www.10xgenomics.com/datasets/human-kidney-11-mm-capture-area-ffpe-2-standard}
+}
+
+
+@misc{10x2023lung,
+	title = {Human Lung Cancer, 11 mm Capture Area (FFPE)},
+	author = {{10x Genomics}},
+	year = {2023},
+	url = {https://www.10xgenomics.com/datasets/human-lung-cancer-11-mm-capture-area-ffpe-2-standard}
+}
+
+
+@misc{10x2023mousebrain,
+	title = {Visium CytAssist Gene Expression Libraries of Post-Xenium Mouse Brain (FF)},
+	author = {{10x Genomics}},
+	year = {2023},
+	url = {https://www.10xgenomics.com/datasets/visium-cytassist-gene-expression-libraries-of-post-xenium-mouse-brain-ff-using-the-mouse-whole-transcriptome-probe-set-2-standard}
+}
+
+
+@article{agostinis2022newwave,
+	doi = {10.1093/bioinformatics/btac149},
+	url = {https://doi.org/10.1093/bioinformatics/btac149},
+	year = {2022},
+	month = {Mar.},
+	publisher = {Oxford University Press ({OUP})},
+	volume = {38},
+	number = {9},
+	pages = {2648--2650},
+	author = {Federico Agostinis and Chiara Romualdi and Gabriele Sales and Davide Risso},
+	editor = {Yann Ponty},
+	title = {NewWave: a scalable R/Bioconductor package for the dimensionality reduction and batch effect removal of single-cell {RNA}-seq data},
+	journal = {Bioinformatics}
+}
+
+
+@article{agrawal2021mde,
+	title = {Minimum-Distortion Embedding},
+	author = {Akshay Agrawal and Alnur Ali and Stephen Boyd},
+	year = {2021},
+	journal = {Foundations and Trends{\textregistered} in Machine Learning},
+	publisher = {Now Publishers},
+	volume = {14},
+	number = {3},
+	pages = {211--378},
+	doi = {10.1561/2200000090},
+	url = {https://doi.org/10.1561/2200000090}
+}
+
+
+@article{aliee2021autogenes,
+	title = {{AutoGeneS}: Automatic gene selection using multi-objective optimization for {RNA}-seq deconvolution},
+	author = {Hananeh Aliee and Fabian J. Theis},
+	year = {2021},
+	month = {Jul.},
+	journal = {Cell Systems},
+	publisher = {Elsevier {BV}},
+	volume = {12},
+	number = {7},
+	pages = {706--715.e4},
+	doi = {10.1016/j.cels.2021.05.006},
+	url = {https://doi.org/10.1016/j.cels.2021.05.006}
+}
+
+
+@inproceedings{amelio2015normalized,
+	doi = {10.1145/2808797.2809344},
+	url = {https://doi.org/10.1145/2808797.2809344},
+	year = {2015},
+	month = {Aug.},
+	publisher = {{ACM}},
+	author = {Alessia Amelio and Clara Pizzuti},
+	title = {Is Normalized Mutual Information a Fair Measure for Comparing Community Detection Methods?},
+	booktitle = {Proceedings of the 2015 {IEEE}/{ACM} International Conference on Advances in Social Networks Analysis and Mining 2015}
+}
+
+
+@article{andersson2020single,
+	title = {Single-cell and spatial transcriptomics enables probabilistic inference of cell type topography},
+	author = {Alma Andersson and Joseph Bergenstr{\aa}hle and Michaela Asp and Ludvig Bergenstr{\aa}hle and Aleksandra Jurek and Jos{\'{e}} Fern{\'{a}}ndez Navarro and Joakim Lundeberg},
+	year = {2020},
+	month = {Oct.},
+	journal = {Communications Biology},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {3},
+	number = {1},
+	doi = {10.1038/s42003-020-01247-y},
+	url = {https://doi.org/10.1038/s42003-020-01247-y}
+}
+
+
+@article{andersson2021sepal,
+  title={sepal: Identifying transcript profiles with spatial patterns by diffusion-based modeling},
+  author={Andersson, Alma and Lundeberg, Joakim},
+  journal={Bioinformatics},
+  volume={37},
+  number={17},
+  pages={2644--2650},
+  year={2021},
+  publisher={Oxford University Press},
+  doi={10.1093/bioinformatics/btab164}
+}
+
+
+@string{apr = {Apr.}}
+
+
+@string{aug = {Aug.}}
+
+
+@article{batson2019molecular,
+	title = {Molecular Cross-Validation for Single-Cell RNA-seq},
+	author = {Batson, Joshua and Royer, Lo{\"\i}c and Webber, James},
+	year = {2019},
+	journal = {bioRxiv},
+	publisher = {Cold Spring Harbor Laboratory},
+	doi = {10.1101/786269},
+	url = {https://www.biorxiv.org/content/early/2019/09/30/786269},
+	elocation-id = {786269},
+	eprint = {https://www.biorxiv.org/content/early/2019/09/30/786269.full.pdf}
+}
+
+
+@article{biancalani2021deep,
+	title = {Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram},
+	author = {Tommaso Biancalani and Gabriele Scalia and Lorenzo Buffoni and Raghav Avasthi and Ziqing Lu and Aman Sanger and Neriman Tokcan and Charles R. Vanderburg and {\AA}sa Segerstolpe and Meng Zhang and Inbal Avraham-Davidi and Sanja Vickovic and Mor Nitzan and Sai Ma and Ayshwarya Subramanian and Michal Lipinski and Jason Buenrostro and Nik Bear Brown and Duccio Fanelli and Xiaowei Zhuang and Evan Z. Macosko and Aviv Regev},
+	year = {2021},
+	month = {Oct.},
+	journal = {Nature Methods},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {18},
+	number = {11},
+	pages = {1352--1362},
+	doi = {10.1038/s41592-021-01264-7},
+	url = {https://doi.org/10.1038/s41592-021-01264-7}
+}
+
+
+@article{bintayyash2021non,
+    author = {BinTayyash, Nuha and Georgaka, Sokratia and John, S T and Ahmed, Sumon and Boukouvalas, Alexis and Hensman, James and Rattray, Magnus},
+    title = "{Non-parametric modelling of temporal and spatial counts data from RNA-seq experiments}",
+    journal = {Bioinformatics},
+    volume = {37},
+    number = {21},
+    pages = {3788-3795},
+    year = {2021},
+    month = {07},
+    issn = {1367-4803},
+    doi = {10.1093/bioinformatics/btab486},
+    url = {https://doi.org/10.1093/bioinformatics/btab486},
+    eprint = {https://academic.oup.com/bioinformatics/article-pdf/37/21/3788/50336570/btab486.pdf},
+}
+
+
+@article{bland2000odds,
+	title = {Statistics Notes: The odds ratio},
+	author = {J. M. Bland},
+	year = {2000},
+	month = {May},
+	journal = {{BMJ}},
+	publisher = {{BMJ}},
+	volume = {320},
+	number = {7247},
+	pages = {1468--1468},
+	doi = {10.1136/bmj.320.7247.1468},
+	url = {https://doi.org/10.1136/bmj.320.7247.1468}
+}
+
+
+@article{breiman2001random,
+	doi = {10.1023/a:1010933404324},
+	url = {https://doi.org/10.1023/a:1010933404324},
+	year = {2001},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {45},
+	number = {1},
+	pages = {5--32},
+	author = {Leo Breiman},
+	journal = {Machine Learning}
+}
+
+
+@article{bttner2018test,
+	title = {A test metric for assessing single-cell {RNA}-seq batch correction},
+	author = {Maren B\"{u}ttner and Zhichao Miao and F. Alexander Wolf and Sarah A. Teichmann and Fabian J. Theis},
+	year = {2018},
+	month = {Dec.},
+	journal = {Nature Methods},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {16},
+	number = {1},
+	pages = {43--49},
+	doi = {10.1038/s41592-018-0254-1},
+	url = {https://doi.org/10.1038/s41592-018-0254-1}
+}
+
+
+@article{cabello2020singlecellsignalr,
+	title = {{SingleCellSignalR}: inference of intercellular networks from single-cell transcriptomics},
+	author = {Simon Cabello-Aguilar and M{\'{e}}lissa Alame and Fabien Kon-Sun-Tack and Caroline Fau and Matthieu Lacroix and Jacques Colinge},
+	year = {2020},
+	month = {Mar.},
+	journal = {Nucleic Acids Research},
+	publisher = {Oxford University Press ({OUP})},
+	volume = {48},
+	number = {10},
+	pages = {e55--e55},
+	doi = {10.1093/nar/gkaa183},
+	url = {https://doi.org/10.1093/nar/gkaa183}
+}
+
+
+@article{cable2021robust,
+	title = {Robust decomposition of cell type mixtures in spatial transcriptomics},
+	author = {Dylan M. Cable and Evan Murray and Luli S. Zou and Aleksandrina Goeva and Evan Z. Macosko and Fei Chen and Rafael A. Irizarry},
+	year = {2021},
+	month = {Feb.},
+	journal = {Nature Biotechnology},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {40},
+	number = {4},
+	pages = {517--526},
+	doi = {10.1038/s41587-021-00830-w},
+	url = {https://doi.org/10.1038/s41587-021-00830-w}
+}
+
+
+@misc{cannoodt2021viashfromscripts,
+	doi = {10.48550/ARXIV.2110.11494},
+	url = {https://arxiv.org/abs/2110.11494},
+	author = {Cannoodt,  Robrecht and Cannoodt,  Hendrik and Van de Kerckhove,  Eric and Boschmans,  Andy and De Maeyer,  Dries and Verbeiren,  Toni},
+	keywords = {Software Engineering (cs.SE),  FOS: Computer and information sciences,  FOS: Computer and information sciences},
+	title = {Viash: from scripts to pipelines},
+	publisher = {arXiv},
+	year = {2021},
+	copyright = {Creative Commons Attribution Non Commercial Share Alike 4.0 International}
+}
+
+
+@article{cai2023spanve,
+  title={Spanve: an Statistical Method to Detect Clustering-friendly Spatially Variable Genes in Large-scale Spatial Transcriptomics Data},
+  author={Cai, Guoxin and Chen, Yichang and Chen, Shuqing and Gu, Xun and Zhou, Zhan},
+  journal={bioRxiv},
+  pages={2023--02},
+  year={2023},
+  publisher={Cold Spring Harbor Laboratory},
+  doi={10.1101/2023.02.08.527623}
+}
+
+
+@article{cao2018joint,
+	title = {Joint profiling of chromatin accessibility and gene expression in thousands of single cells},
+	author = {Junyue Cao and Darren A. Cusanovich and Vijay Ramani and Delasa Aghamirzaie and Hannah A. Pliner and Andrew J. Hill and Riza M. Daza and Jose L. McFaline-Figueroa and Jonathan S. Packer and Lena Christiansen and Frank J. Steemers and Andrew C. Adey and Cole Trapnell and Jay Shendure},
+	year = {2018},
+	month = {Sept.},
+	journal = {Science},
+	publisher = {American Association for the Advancement of Science ({AAAS})},
+	volume = {361},
+	number = {6409},
+	pages = {1380--1385},
+	doi = {10.1126/science.aau0730},
+	url = {https://doi.org/10.1126/science.aau0730}
+}
+
+
+@article{cao2020human,
+	title = {A human cell atlas of fetal gene expression},
+	author = {Junyue Cao and Diana R. O'Day and Hannah A. Pliner and Paul D. Kingsley and Mei Deng and Riza M. Daza and Michael A. Zager and Kimberly A. Aldinger and Ronnie Blecher-Gonen and Fan Zhang and Malte Spielmann and James Palis and Dan Doherty and Frank J. Steemers and Ian A. Glass and Cole Trapnell and Jay Shendure},
+	year = {2020},
+	month = {Nov.},
+	journal = {Science},
+	publisher = {American Association for the Advancement of Science ({AAAS})},
+	volume = {370},
+	number = {6518},
+	doi = {10.1126/science.aba7721},
+	url = {https://doi.org/10.1126/science.aba7721}
+}
+
+
+@article{chai2014root,
+	doi = {10.5194/gmdd-7-1525-2014},
+	url = {https://doi.org/10.5194/gmdd-7-1525-2014},
+	year = {2014},
+	month = {Feb.},
+	publisher = {Copernicus {GmbH}},
+	author = {T. Chai and R. R. Draxler},
+	title = {Root mean square error ({RMSE}) or mean absolute error ({MAE})?}
+}
+
+
+@article{chang2022spatial,
+  title={Spatial omics representation and functional tissue module inference using graph Fourier transform},
+  author={Chang, Yuzhou and Liu, Jixin and Ma, Anjun and Jiang, Sizun and Krull, Jordan and Yeo, Yao Yu and Liu, Yang and Rodig, Scott J and Barouch, Dan H and Fan, Rong and others},
+  journal={bioRxiv},
+  pages={2022--12},
+  year={2022},
+  publisher={Cold Spring Harbor Laboratory},
+  doi={10.1101/2022.12.10.519929}
+}
+
+
+@article{chazarragil2021flexible,
+	doi = {10.1093/nar/gkab004},
+	url = {https://doi.org/10.1093/nar/gkab004},
+	year = {2021},
+	month = {Feb.},
+	publisher = {Oxford University Press ({OUP})},
+	volume = {49},
+	number = {7},
+	pages = {e42--e42},
+	author = {Ruben Chazarra-Gil and Stijn van~Dongen and Vladimir~Yu Kiselev and Martin Hemberg},
+	title = {Flexible comparison of batch correction methods for single-cell {RNA}-seq using {BatchBench}},
+	journal = {Nucleic Acids Research}
+}
+
+
+@article{chen2009local,
+	title = {Local Multidimensional Scaling for Nonlinear Dimension Reduction, Graph Drawing, and Proximity Analysis},
+	author = {Lisha Chen and Andreas Buja},
+	year = {2009},
+	month = {Mar.},
+	journal = {Journal of the American Statistical Association},
+	publisher = {Informa {UK} Limited},
+	volume = {104},
+	number = {485},
+	pages = {209--219},
+	doi = {10.1198/jasa.2009.0111},
+	url = {https://doi.org/10.1198/jasa.2009.0111}
+}
+
+
+@inproceedings{chen2016xgboost,
+	title = {{XGBoost}},
+	author = {Tianqi Chen and Carlos Guestrin},
+	year = {2016},
+	month = {Aug.},
+	booktitle = {Proceedings of the 22nd {ACM} {SIGKDD} International Conference on Knowledge Discovery and Data Mining},
+	publisher = {{Acm}},
+	doi = {10.1145/2939672.2939785},
+	url = {https://doi.org/10.1145/2939672.2939785}
+}
+
+
+@article{cichocki2009fast,
+	title = {Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations},
+	author = {Andrzej Cichocki and Anh-Huy Phan},
+	year = {2009},
+	journal = {{IEICE} Transactions on Fundamentals of Electronics,  Communications and Computer Sciences},
+	publisher = {Institute of Electronics,  Information and Communications Engineers ({IEICE})},
+	volume = {E92-a},
+	number = {3},
+	pages = {708--721},
+	doi = {10.1587/transfun.e92.a.708},
+	url = {https://doi.org/10.1587/transfun.e92.a.708}
+}
+
+
+@article{coifman2006diffusion,
+	title = {Diffusion maps},
+	author = {Ronald R. Coifman and St{\'{e}}phane Lafon},
+	year = {2006},
+	month = {Jul.},
+	journal = {Applied and Computational Harmonic Analysis},
+	publisher = {Elsevier {BV}},
+	volume = {21},
+	number = {1},
+	pages = {5--30},
+	doi = {10.1016/j.acha.2006.04.006},
+	url = {https://doi.org/10.1016/j.acha.2006.04.006}
+}
+
+
+@article{cover1967nearest,
+	title = {Nearest neighbor pattern classification},
+	author = {T. Cover and P. Hart},
+	year = {1967},
+	month = {Jan},
+	journal = {{IEEE} Transactions on Information Theory},
+	publisher = {Institute of Electrical and Electronics Engineers ({IEEE})},
+	volume = {13},
+	number = {1},
+	pages = {21--27},
+	doi = {10.1109/tit.1967.1053964},
+	url = {https://doi.org/10.1109/tit.1967.1053964}
+}
+
+
+@inproceedings{davis2006prauc,
+	title = {The relationship between Precision-Recall and {ROC} curves},
+	author = {Jesse Davis and Mark Goadrich},
+	year = {2006},
+	booktitle = {Proceedings of the 23rd international conference on Machine learning  - {ICML} {\textquotesingle}06},
+	publisher = {{ACM} Press},
+	doi = {10.1145/1143844.1143874},
+	url = {https://doi.org/10.1145/1143844.1143874}
+}
+
+
+@string{dec = {Dec.}}
+
+@article{Demetci2020scot,
+	author = {Pinar Demetci and Rebecca Santorella and Bj{\"o}rn Sandstede and William Stafford Noble and Ritambhara Singh},
+	title = {Gromov-Wasserstein optimal transport to align single-cell multi-omics data},
+	elocation-id = {2020.04.28.066787},
+	year = {2020},
+	doi = {10.1101/2020.04.28.066787},
+	publisher = {Cold Spring Harbor Laboratory},
+	URL = {https://www.biorxiv.org/content/early/2020/11/11/2020.04.28.066787},
+	eprint = {https://www.biorxiv.org/content/early/2020/11/11/2020.04.28.066787.full.pdf},
+	journal = {bioRxiv}
+}	
+
+
+@article{dimitrov2022comparison,
+	title = {Comparison of methods and resources for cell-cell communication inference from single-cell {RNA}-Seq data},
+	author = {Daniel Dimitrov and D{\'{e}}nes T\"{u}rei and Martin Garrido-Rodriguez and Paul L. Burmedi and James S. Nagai and Charlotte Boys and Ricardo O. Ramirez Flores and Hyojin Kim and Bence Szalai and Ivan G. Costa and Alberto Valdeolivas and Aur{\'{e}}lien Dugourd and Julio Saez-Rodriguez},
+	year = {2022},
+	month = {Jun.},
+	journal = {Nature Communications},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {13},
+	number = {1},
+	doi = {10.1038/s41467-022-30755-0},
+	url = {https://doi.org/10.1038/s41467-022-30755-0}
+}
+
+
+@article{donoho2017yearsdatascience,
+	doi = {10.1080/10618600.2017.1384734},
+	url = {https://doi.org/10.1080/10618600.2017.1384734},
+	year = {2017},
+	month = {Oct.},
+	publisher = {Informa {UK} Limited},
+	volume = {26},
+	number = {4},
+	pages = {745--766},
+	author = {David Donoho},
+	title = {50 Years of Data Science},
+	journal = {Journal of Computational and Graphical Statistics}
+}
+
+
+@article{efremova2020cellphonedb,
+	title = {{CellPhoneDB}: inferring cell{\textendash}cell communication from combined expression of multi-subunit ligand{\textendash}receptor complexes},
+	author = {Mirjana Efremova and Miquel Vento-Tormo and Sarah A. Teichmann and Roser Vento-Tormo},
+	year = {2020},
+	month = {Feb.},
+	journal = {Nature Protocols},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {15},
+	number = {4},
+	pages = {1484--1506},
+	doi = {10.1038/s41596-020-0292-x},
+	url = {https://doi.org/10.1038/s41596-020-0292-x}
+}
+
+
+@article{emmons2016analysis,
+  title = {Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale},
+  volume = {11},
+  ISSN = {1932-6203},
+  url = {http://dx.doi.org/10.1371/journal.pone.0159161},
+  doi = {10.1371/journal.pone.0159161},
+  number = {7},
+  journal = {PLOS ONE},
+  publisher = {Public Library of Science (PLoS)},
+  author = {Emmons,  Scott and Kobourov,  Stephen and Gallant,  Mike and B\"{o}rner,  Katy},
+  editor = {Dovrolis,  Constantine},
+  year = {2016},
+  month = jul,
+  pages = {e0159161}
+}
+
+
+@article{eraslan2019single,
+	title = {Single-cell {RNA}-seq denoising using a deep count autoencoder},
+	author = {G\"{o}kcen Eraslan and Lukas M. Simon and Maria Mircea and Nikola S. Mueller and Fabian J. Theis},
+	year = {2019},
+	month = {Jan},
+	journal = {Nature Communications},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {10},
+	number = {1},
+	doi = {10.1038/s41467-018-07931-2},
+	url = {https://doi.org/10.1038/s41467-018-07931-2}
+}
+
+
+@article{fang2022conservation,
+  title = {Conservation and divergence of cortical cell organization in human and mouse revealed by MERFISH},
+  volume = {377},
+  ISSN = {1095-9203},
+  url = {http://dx.doi.org/10.1126/science.abm1741},
+  DOI = {10.1126/science.abm1741},
+  number = {6601},
+  journal = {Science},
+  publisher = {American Association for the Advancement of Science (AAAS)},
+  author = {Fang,  Rongxin and Xia,  Chenglong and Close,  Jennie L. and Zhang,  Meng and He,  Jiang and Huang,  Zhengkai and Halpern,  Aaron R. and Long,  Brian and Miller,  Jeremy A. and Lein,  Ed S. and Zhuang,  Xiaowei},
+  year = {2022},
+  month = jul,
+  pages = {56-62}
+}
+
+
+@string{feb = {Feb.}}
+
+
+@article{fix1989discriminatory,
+	doi = {10.2307/1403797},
+	url = {https://doi.org/10.2307/1403797},
+	year = {1989},
+	month = {Dec.},
+	publisher = {{JSTOR}},
+	volume = {57},
+	number = {3},
+	pages = {238},
+	author = {Evelyn Fix and J. L. Hodges},
+	title = {Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties},
+	journal = {International Statistical Review / Revue Internationale de Statistique}
+}
+
+
+@article{gower1975generalized,
+	title = {Generalized procrustes analysis},
+	author = {J. C. Gower},
+	year = {1975},
+	month = {Mar.},
+	journal = {Psychometrika},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {40},
+	number = {1},
+	pages = {33--51},
+	doi = {10.1007/bf02291478},
+	url = {https://doi.org/10.1007/bf02291478}
+}
+
+
+@article{grandini2020metrics,
+	title = {Metrics for Multi-Class Classification: an Overview},
+	author = {Grandini,  Margherita and Bagli,  Enrico and Visani,  Giorgio},
+	year = {2020},
+	journal = {arXiv},
+	publisher = {Cornell University},
+	doi = {10.48550/arxiv.2008.05756},
+	url = {https://arxiv.org/abs/2008.05756},
+	copyright = {arXiv.org perpetual, non-exclusive license},
+	keywords = {Machine Learning (stat.ML),  Machine Learning (cs.LG),  FOS: Computer and information sciences,  FOS: Computer and information sciences}
+}
+
+
+@article{granja2021archr,
+	title = {{ArchR} is a scalable software package for integrative single-cell chromatin accessibility analysis},
+	author = {Jeffrey M. Granja and M. Ryan Corces and Sarah E. Pierce and S. Tansu Bagdatli and Hani Choudhry and Howard Y. Chang and William J. Greenleaf},
+	year = {2021},
+	month = {Feb.},
+	journal = {Nature Genetics},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {53},
+	number = {3},
+	pages = {403--411},
+	doi = {10.1038/s41588-021-00790-6},
+	url = {https://doi.org/10.1038/s41588-021-00790-6}
+}
+
+
+@article{grn2014validation,
+	title = {Validation of noise models for single-cell transcriptomics},
+	author = {Dominic Gr\"{u}n and Lennart Kester and Alexander van Oudenaarden},
+	year = {2014},
+	month = {Apr.},
+	journal = {Nature Methods},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {11},
+	number = {6},
+	pages = {637--640},
+	doi = {10.1038/nmeth.2930},
+	url = {https://doi.org/10.1038/nmeth.2930}
+}
+
+
+@article{haghverdi2018batch,
+	title = {Batch effects in single-cell {RNA}-sequencing data are corrected by matching mutual nearest neighbors},
+	author = {Laleh Haghverdi and Aaron T L Lun and Michael D Morgan and John C Marioni},
+	year = {2018},
+	month = {Apr.},
+	journal = {Nature Biotechnology},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {36},
+	number = {5},
+	pages = {421--427},
+	doi = {10.1038/nbt.4091},
+	url = {https://doi.org/10.1038/nbt.4091}
+}
+
+
+@article{hammarlund2018cengen,
+	title = {The {CeNGEN} Project: The Complete Gene Expression Map of an Entire Nervous System},
+	author = {Marc Hammarlund and Oliver Hobert and David M. Miller and Nenad Sestan},
+	year = {2018},
+	month = {Aug.},
+	journal = {Neuron},
+	publisher = {Elsevier {BV}},
+	volume = {99},
+	number = {3},
+	pages = {430--433},
+	doi = {10.1016/j.neuron.2018.07.042},
+	url = {https://doi.org/10.1016/j.neuron.2018.07.042}
+}
+
+
+@article{hansen2012removing,
+	title = {Adjusting batch effects in microarray expression data using empirical Bayes methods},
+	author = {W. Evan Johnson and Cheng Li and Ariel Rabinovic},
+	year = {2006},
+	month = {Apr.},
+	journal = {Biostatistics},
+	publisher = {Oxford University Press ({OUP})},
+	volume = {8},
+	number = {1},
+	pages = {118--127},
+	doi = {10.1093/biostatistics/kxj037},
+	url = {https://doi.org/10.1093/biostatistics/kxj037}
+}
+
+
+@article{hao2021integrated,
+	title = {Integrated analysis of multimodal single-cell data},
+	author = {Yuhan Hao and Stephanie Hao and Erica Andersen-Nissen and William M. Mauck and Shiwei Zheng and Andrew Butler and Maddie J. Lee and Aaron J. Wilk and Charlotte Darby and Michael Zager and Paul Hoffman and Marlon Stoeckius and Efthymia Papalexi and Eleni P. Mimitou and Jaison Jain and Avi Srivastava and Tim Stuart and Lamar M. Fleming and Bertrand Yeung and Angela J. Rogers and Juliana M. McElrath and Catherine A. Blish and Raphael Gottardo and Peter Smibert and Rahul Satija},
+	year = {2021},
+	month = {Jun.},
+	journal = {Cell},
+	publisher = {Elsevier {BV}},
+	volume = {184},
+	number = {13},
+	pages = {3573--3587.e29},
+	doi = {10.1016/j.cell.2021.04.048},
+	url = {https://doi.org/10.1016/j.cell.2021.04.048}
+}
+
+
+@article{hao2021somde,
+  title={SOMDE: a scalable method for identifying spatially variable genes with self-organizing map},
+  author={Hao, Minsheng and Hua, Kui and Zhang, Xuegong},
+  journal={Bioinformatics},
+  volume={37},
+  number={23},
+  pages={4392--4398},
+  year={2021},
+  publisher={Oxford University Press},
+  doi={10.1093/bioinformatics/btab471}
+}
+
+
+@article{hie2019efficient,
+	title = {Efficient integration of heterogeneous single-cell transcriptomes using Scanorama},
+	author = {Brian Hie and Bryan Bryson and Bonnie Berger},
+	year = {2019},
+	month = {May},
+	journal = {Nature Biotechnology},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {37},
+	number = {6},
+	pages = {685--691},
+	doi = {10.1038/s41587-019-0113-3},
+	url = {https://doi.org/10.1038/s41587-019-0113-3}
+}
+
+
+@article{hinton1989connectionist,
+	title = {Connectionist learning procedures},
+	author = {Geoffrey E. Hinton},
+	year = {1989},
+	month = {Sept.},
+	journal = {Artificial Intelligence},
+	publisher = {Elsevier {BV}},
+	volume = {40},
+	number = {1-3},
+	pages = {185--234},
+	doi = {10.1016/0004-3702(89)90049-0},
+	url = {https://doi.org/10.1016/0004-3702(89)90049-0}
+}
+
+
+@book{hosmer2013applied,
+	title = {Applied logistic regression},
+	author = {Hosmer Jr, D.W. and Lemeshow, S. and Sturdivant, R.X.},
+	year = {2013},
+	publisher = {John Wiley \& Sons},
+	volume = {398}
+}
+
+
+@article{hou2019scmatch,
+	title = {{scMatch}: a single-cell gene expression profile annotation tool using reference datasets},
+	author = {Rui Hou and Elena Denisenko and Alistair R R Forrest},
+	year = {2019},
+	month = {Apr.},
+	journal = {Bioinformatics},
+	publisher = {Oxford University Press ({OUP})},
+	volume = {35},
+	number = {22},
+	pages = {4688--4695},
+	doi = {10.1093/bioinformatics/btz292},
+	url = {https://doi.org/10.1093/bioinformatics/btz292},
+	editor = {Janet Kelso}
+}
+
+
+@article{hou2020predicting,
+	title = {Predicting cell-to-cell communication networks using {NATMI}},
+	author = {Rui Hou and Elena Denisenko and Huan Ting Ong and Jordan A. Ramilowski and Alistair R. R. Forrest},
+	year = {2020},
+	month = {Oct.},
+	journal = {Nature Communications},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {11},
+	number = {1},
+	doi = {10.1038/s41467-020-18873-z},
+	url = {https://doi.org/10.1038/s41467-020-18873-z}
+}
+
+
+@article{hou2020systematic,
+	title = {A systematic evaluation of single-cell {RNA}-sequencing imputation methods},
+	author = {Wenpin Hou and Zhicheng Ji and Hongkai Ji and Stephanie C. Hicks},
+	year = {2020},
+	month = {Aug.},
+	journal = {Genome Biology},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {21},
+	number = {1},
+	doi = {10.1186/s13059-020-02132-x},
+	url = {https://doi.org/10.1186/s13059-020-02132-x}
+}
+
+
+@article{hubert1985comparing,
+	doi = {10.1007/bf01908075},
+	url = {https://doi.org/10.1007/bf01908075},
+	year = {1985},
+	month = {Dec.},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {2},
+	number = {1},
+	pages = {193--218},
+	author = {Lawrence Hubert and Phipps Arabie},
+	title = {Comparing partitions},
+	journal = {Journal of Classification}
+}
+
+
+@article{hu2021spagcn,
+  title={SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network},
+  author={Hu, Jian and Li, Xiangjie and Coleman, Kyle and Schroeder, Amelia and Ma, Nan and Irwin, David J and Lee, Edward B and Shinohara, Russell T and Li, Mingyao},
+  journal={Nature methods},
+  volume={18},
+  number={11},
+  pages={1342--1351},
+  year={2021},
+  publisher={Nature Publishing Group US New York},
+  doi={10.1038/s41592-021-01255-8}
+}
+
+
+@string{jan = {Jan}}
+
+
+@string{jul = {Jul.}}
+
+
+@string{jun = {Jun.}}
+
+
+@article{kats2021spatialde2,
+  title={SpatialDE2: fast and localized variance component analysis of spatial transcriptomics},
+  author={Kats, Ilia and Vento-Tormo, Roser and Stegle, Oliver},
+  journal={Biorxiv},
+  pages={2021--10},
+  year={2021},
+  publisher={Cold Spring Harbor Laboratory},
+  doi={10.1101/2021.10.27.466045}
+}
+
+
+@article{kendall1938new,
+	doi = {10.1093/biomet/30.1-2.81},
+	url = {https://doi.org/10.1093/biomet/30.1-2.81},
+	year = {1938},
+	month = {Jun.},
+	publisher = {Oxford University Press ({OUP})},
+	volume = {30},
+	number = {1-2},
+	pages = {81--93},
+	author = {M. G. KENDALL},
+	title = {A new measure of rank correlation},
+	journal = {Biometrika}
+}
+
+
+@article{kiselev2019challenges,
+	title = {Challenges in unsupervised clustering of single-cell {RNA}-seq data},
+	author = {Vladimir Yu Kiselev and Tallulah S. Andrews and Martin Hemberg},
+	year = {2019},
+	month = {Jan},
+	journal = {Nature Reviews Genetics},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {20},
+	number = {5},
+	pages = {273--282},
+	doi = {10.1038/s41576-018-0088-9},
+	url = {https://doi.org/10.1038/s41576-018-0088-9}
+}
+
+
+@article{kleshchevnikov2022cell2location,
+	title = {Cell2location maps fine-grained cell types in spatial transcriptomics},
+	author = {Vitalii Kleshchevnikov and Artem Shmatko and Emma Dann and Alexander Aivazidis and Hamish W. King and Tong Li and Rasa Elmentaite and Artem Lomakin and Veronika Kedlian and Adam Gayoso and Mika Sarkin Jain and Jun Sung Park and Lauma Ramona and Elizabeth Tuck and Anna Arutyunyan and Roser Vento-Tormo and Moritz Gerstung and Louisa James and Oliver Stegle and Omer Ali Bayraktar},
+	year = {2022},
+	month = {Jan},
+	journal = {Nature Biotechnology},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {40},
+	number = {5},
+	pages = {661--671},
+	doi = {10.1038/s41587-021-01139-4},
+	url = {https://doi.org/10.1038/s41587-021-01139-4}
+}
+
+
+@article{korsunsky2019fast,
+	title = {Fast,  sensitive and accurate integration of single-cell data with Harmony},
+	author = {Ilya Korsunsky and Nghia Millard and Jean Fan and Kamil Slowikowski and Fan Zhang and Kevin Wei and Yuriy Baglaenko and Michael Brenner and Po-ru Loh and Soumya Raychaudhuri},
+	year = {2019},
+	month = {Nov.},
+	journal = {Nature Methods},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {16},
+	number = {12},
+	pages = {1289--1296},
+	doi = {10.1038/s41592-019-0619-0},
+	url = {https://doi.org/10.1038/s41592-019-0619-0}
+}
+
+
+@article{kraemer2018dimred,
+	title = {{dimRed} and {coRanking} - Unifying Dimensionality Reduction in R},
+	author = {Guido Kraemer and Markus Reichstein and Miguel, D. Mahecha},
+	year = {2018},
+	journal = {The R Journal},
+	publisher = {The R Foundation},
+	volume = {10},
+	number = {1},
+	pages = {342},
+	doi = {10.32614/rj-2018-039},
+	url = {https://doi.org/10.32614/rj-2018-039}
+}
+
+
+@article{kruskal1964mds,
+	title = {Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis},
+	author = {J. B. Kruskal},
+	year = {1964},
+	month = {Mar.},
+	journal = {Psychometrika},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {29},
+	number = {1},
+	pages = {1--27},
+	doi = {10.1007/bf02289565},
+	url = {https://doi.org/10.1007/bf02289565}
+}
+
+
+@article{kuppe2022spatial,
+  title={Spatial multi-omic map of human myocardial infarction},
+  author={Kuppe, Christoph and Ramirez Flores, Ricardo O and Li, Zhijian and Hayat, Sikander and Levinson, Rebecca T and Liao, Xian and Hannani, Monica T and Tanevski, Jovan and W{\"u}nnemann, Florian and Nagai, James S and others},
+  journal={Nature},
+  volume={608},
+  number={7924},
+  pages={766--777},
+  year={2022},
+  publisher={Nature Publishing Group UK London}
+}
+
+
+@article{lance2022multimodal,
+	title = {Multimodal single cell data integration challenge: results and lessons learned},
+	author = {Lance, Christopher and Luecken, Malte D. and Burkhardt, Daniel B. and Cannoodt, Robrecht and Rautenstrauch, Pia and Laddach, Anna and Ubingazhibov, Aidyn and Cao, Zhi-Jie and Deng, Kaiwen and Khan, Sumeer and Liu, Qiao and Russkikh, Nikolay and Ryazantsev, Gleb and Ohler, Uwe and , and Pisco, Angela Oliveira and Bloom, Jonathan and Krishnaswamy, Smita and Theis, Fabian J.},
+	year = {2022},
+	journal = {bioRxiv},
+	publisher = {Cold Spring Harbor Laboratory},
+	doi = {10.1101/2022.04.11.487796},
+	url = {https://www.biorxiv.org/content/early/2022/04/12/2022.04.11.487796},
+	elocation-id = {2022.04.11.487796},
+	eprint = {https://www.biorxiv.org/content/early/2022/04/12/2022.04.11.487796.full.pdf}
+}
+
+
+@article{lance2024predicting,
+	title = {Predicting cellular profiles across modalities in longitudinal single-cell data: An Open Problems competition},
+	author = {...},
+	year = {2024},
+	journal = {In preparation},
+}
+
+
+@book{lawson1995solving,
+	title = {Solving Least Squares Problems},
+	author = {Charles L. Lawson and Richard J. Hanson},
+	year = {1995},
+	month = {Jan},
+	publisher = {Society for Industrial and Applied Mathematics},
+	doi = {10.1137/1.9781611971217},
+	url = {https://doi.org/10.1137/1.9781611971217}
+}
+
+
+@article{lee2009quality,
+	title = {Quality assessment of dimensionality reduction: Rank-based criteria},
+	author = {John A. Lee and Michel Verleysen},
+	year = {2009},
+	month = {Mar.},
+	journal = {Neurocomputing},
+	publisher = {Elsevier {BV}},
+	volume = {72},
+	number = {7-9},
+	pages = {1431--1443},
+	doi = {10.1016/j.neucom.2008.12.017},
+	url = {https://doi.org/10.1016/j.neucom.2008.12.017}
+}
+
+
+@article{li2021bayesian,
+	author = {Li, Qiwei and Zhang, Minzhe and Xie, Yang and Xiao, Guanghua},
+    title = "{Bayesian modeling of spatial molecular profiling data via Gaussian process}",
+    journal = {Bioinformatics},
+    volume = {37},
+    number = {22},
+    pages = {4129-4136},
+    year = {2021},
+    month = {06},
+    abstract = "{The location, timing and abundance of gene expression (both mRNA and proteins) within a tissue define the molecular mechanisms of cell functions. Recent technology breakthroughs in spatial molecular profiling, including imaging-based technologies and sequencing-based technologies, have enabled the comprehensive molecular characterization of single cells while preserving their spatial and morphological contexts. This new bioinformatics scenario calls for effective and robust computational methods to identify genes with spatial patterns.We represent a novel Bayesian hierarchical model to analyze spatial transcriptomics data, with several unique characteristics. It models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model that greatly increases model stability and robustness. Besides, the Bayesian inference framework allows us to borrow strength in parameter estimation in a de novo fashion. As a result, the proposed model shows competitive performances in accuracy and robustness over existing methods in both simulation studies and two real data applications.The related R/C++ source code is available at https://github.com/Minzhe/BOOST-GP.Supplementary data are available at Bioinformatics online. }",
+    issn = {1367-4803},
+    doi = {10.1093/bioinformatics/btab455},
+    url = {https://doi.org/10.1093/bioinformatics/btab455},
+    eprint = {https://academic.oup.com/bioinformatics/article-pdf/37/22/4129/50335106/btab455.pdf},
+}
+
+
+@article{linderman2018zero,
+	title = {Zero-preserving imputation of scRNA-seq data using low-rank approximation},
+	author = {Linderman, George C. and Zhao, Jun and Kluger, Yuval},
+	year = {2018},
+	journal = {bioRxiv},
+	publisher = {Cold Spring Harbor Laboratory},
+	doi = {10.1101/397588},
+	url = {https://www.biorxiv.org/content/early/2018/08/22/397588},
+	elocation-id = {397588},
+	eprint = {https://www.biorxiv.org/content/early/2018/08/22/397588.full.pdf}
+}
+
+
+@article{liu2020high,
+  title = {High-Spatial-Resolution Multi-Omics Sequencing via Deterministic Barcoding in Tissue},
+  volume = {183},
+  ISSN = {0092-8674},
+  url = {http://dx.doi.org/10.1016/j.cell.2020.10.026},
+  DOI = {10.1016/j.cell.2020.10.026},
+  number = {6},
+  journal = {Cell},
+  publisher = {Elsevier BV},
+  author = {Liu,  Yang and Yang,  Mingyu and Deng,  Yanxiang and Su,  Graham and Enninful,  Archibald and Guo,  Cindy C. and Tebaldi,  Toma and Zhang,  Di and Kim,  Dongjoo and Bai,  Zhiliang and Norris,  Eileen and Pan,  Alisia and Li,  Jiatong and Xiao,  Yang and Halene,  Stephanie and Fan,  Rong},
+  year = {2020},
+  month = dec,
+  pages = {1665--1681.e18}
+}
+
+
+@article{lohoff2021integration,
+  title = {Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis},
+  volume = {40},
+  ISSN = {1546-1696},
+  url = {http://dx.doi.org/10.1038/s41587-021-01006-2},
+  DOI = {10.1038/s41587-021-01006-2},
+  number = {1},
+  journal = {Nature Biotechnology},
+  publisher = {Springer Science and Business Media LLC},
+  author = {Lohoff,  T. and Ghazanfar,  S. and Missarova,  A. and Koulena,  N. and Pierson,  N. and Griffiths,  J. A. and Bardot,  E. S. and Eng,  C.-H. L. and Tyser,  R. C. V. and Argelaguet,  R. and Guibentif,  C. and Srinivas,  S. and Briscoe,  J. and Simons,  B. D. and Hadjantonakis,  A.-K. and G\"{o}ttgens,  B. and Reik,  W. and Nichols,  J. and Cai,  L. and Marioni,  J. C.},
+  year = {2021},
+  month = sep,
+  pages = {74-85}
+}
+
+
+@article{lopez2018deep,
+	title = {Deep generative modeling for single-cell transcriptomics},
+	author = {Romain Lopez and Jeffrey Regier and Michael B. Cole and Michael I. Jordan and Nir Yosef},
+	year = {2018},
+	month = {Nov.},
+	journal = {Nature Methods},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {15},
+	number = {12},
+	pages = {1053--1058},
+	doi = {10.1038/s41592-018-0229-2},
+	url = {https://doi.org/10.1038/s41592-018-0229-2}
+}
+
+
+@article{lopez2022destvi,
+	title = {{DestVI} identifies continuums of cell types in spatial transcriptomics data},
+	author = {Romain Lopez and Baoguo Li and Hadas Keren-Shaul and Pierre Boyeau and Merav Kedmi and David Pilzer and Adam Jelinski and Ido Yofe and Eyal David and Allon Wagner and Can Ergen and Yoseph Addadi and Ofra Golani and Franca Ronchese and Michael I. Jordan and Ido Amit and Nir Yosef},
+	year = {2022},
+	month = {Apr.},
+	journal = {Nature Biotechnology},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {40},
+	number = {9},
+	pages = {1360--1369},
+	doi = {10.1038/s41587-022-01272-8},
+	url = {https://doi.org/10.1038/s41587-022-01272-8}
+}
+
+
+@article{lotfollahi2020query,
+	title = {Query to reference single-cell integration with transfer learning},
+	author = {Lotfollahi, Mohammad and Naghipourfar, Mohsen and Luecken, Malte D. and Khajavi, Matin and B{\"u}ttner, Maren and Avsec, Ziga and Misharin, Alexander V. and Theis, Fabian J.},
+	year = {2020},
+	journal = {bioRxiv},
+	publisher = {Cold Spring Harbor Laboratory},
+	doi = {10.1101/2020.07.16.205997},
+	url = {https://doi.org/10.1101/2020.07.16.205997},
+	elocation-id = {2020.07.16.205997},
+	eprint = {https://www.biorxiv.org/content/early/2020/07/16/2020.07.16.205997.full.pdf}
+}
+
+
+@article{luecken2022benchmarking,
+	title = {Benchmarking atlas-level data integration in single-cell genomics},
+	author = {Malte D. Luecken and M. B\"{u}ttner and K. Chaichoompu and A. Danese and M. Interlandi and M. F. Mueller and D. C. Strobl and L. Zappia and M. Dugas and M. Colom{\'{e}}-Tatch{\'{e}} and Fabian J. Theis},
+	year = {2021},
+	month = {Dec.},
+	journal = {Nature Methods},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {19},
+	number = {1},
+	pages = {41--50},
+	doi = {10.1038/s41592-021-01336-8},
+	url = {https://doi.org/10.1038/s41592-021-01336-8}
+}
+
+
+@article{lueks2011evaluate,
+	title = {How to Evaluate Dimensionality Reduction? - Improving the Co-ranking Matrix},
+	author = {Lueks, Wouter and Mokbel, Bassam and Biehl, Michael and Hammer, Barbara},
+	year = {2011},
+	journal = {arXiv},
+	doi = {10.48550/ARXIV.1110.3917},
+	url = {https://arxiv.org/abs/1110.3917},
+	copyright = {arXiv.org perpetual, non-exclusive license},
+	keywords = {Machine Learning (cs.LG), Information Retrieval (cs.IR), FOS: Computer and information sciences, FOS: Computer and information sciences}
+}
+
+
+@misc{lun2019fastmnn,
+	title = {A description of the theory behind the fastMNN algorithm},
+	author = {Lun, Aaron},
+	year = {2019},
+	url = {https://marionilab.github.io/FurtherMNN2018/theory/description.html}
+}
+
+
+@string{mar = {Mar.}}
+
+
+@string{may = {May}}
+
+
+@article{mcinnes2018umap,
+	title = {UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction},
+	author = {McInnes,  Leland and Healy,  John and Melville,  James},
+	year = {2018},
+	journal = {arXiv},
+	publisher = {Cornell University},
+	doi = {10.48550/arxiv.1802.03426},
+	url = {https://arxiv.org/abs/1802.03426},
+	copyright = {arXiv.org perpetual,  non-exclusive license},
+	keywords = {Machine Learning (stat.ML),  Computational Geometry (cs.CG),  Machine Learning (cs.LG),  FOS: Computer and information sciences,  FOS: Computer and information sciences}
+}
+
+
+@article{mereu2020benchmarking,
+	doi = {10.1038/s41587-020-0469-4},
+	author = {Mereu, Elisabetta and Lafzi, Atefeh and Moutinho, Catia and Ziegenhain, Christoph and McCarthy, Davis J and Alvarez-Varela, Adrian and Batlle, Eduard and Sagar and Gruen, Dominic and Lau, Julia K and others},
+	journal = {Nature biotechnology},
+	number = {6},
+	pages = {747--755},
+	publisher = {Nature Publishing Group US New York},
+	title = {Benchmarking single-cell {RNA}-sequencing protocols for cell atlas projects},
+	volume = {38},
+	year = {2020}
+}
+
+
+@inbook{miles2005rsquared,
+	title = {Encyclopedia of Statistics in Behavioral Science},
+	author = {Jeremy Miles},
+	year = {2005},
+	month = {Oct.},
+	publisher = {John Wiley {\&} Sons,  Ltd},
+	doi = {10.1002/0470013192.bsa526},
+	url = {https://doi.org/10.1002/0470013192.bsa526},
+	chapter = {{R-Squared}, Adjusted {R-Squared}}
+}
+
+
+@article{moon2019visualizing,
+	title = {Visualizing structure and transitions in high-dimensional biological data},
+	author = {Kevin R. Moon and David van Dijk and Zheng Wang and Scott Gigante and Daniel B. Burkhardt and William S. Chen and Kristina Yim and Antonia van den Elzen and Matthew J. Hirn and Ronald R. Coifman and Natalia B. Ivanova and Guy Wolf and Smita Krishnaswamy},
+	year = {2019},
+	month = {Dec.},
+	journal = {Nature Biotechnology},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {37},
+	number = {12},
+	pages = {1482--1492},
+	doi = {10.1038/s41587-019-0336-3},
+	url = {https://doi.org/10.1038/s41587-019-0336-3}
+}
+
+
+@article{narayan2021assessing,
+	title = {Assessing single-cell transcriptomic variability through density-preserving data visualization},
+	author = {Ashwin Narayan and Bonnie Berger and Hyunghoon Cho},
+	year = {2021},
+	month = {Jan},
+	journal = {Nature Biotechnology},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {39},
+	number = {6},
+	pages = {765--774},
+	doi = {10.1038/s41587-020-00801-7},
+	url = {https://doi.org/10.1038/s41587-020-00801-7}
+}
+
+
+@article{nestorowa2016single,
+	title = {A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation},
+	author = {Sonia Nestorowa and Fiona K. Hamey and Blanca Pijuan Sala and Evangelia Diamanti and Mairi Shepherd and Elisa Laurenti and Nicola K. Wilson and David G. Kent and Berthold G\"{o}ttgens},
+	year = {2016},
+	month = {Aug.},
+	journal = {Blood},
+	publisher = {American Society of Hematology},
+	volume = {128},
+	number = {8},
+	pages = {e20--e31},
+	doi = {10.1182/blood-2016-05-716480},
+	url = {https://doi.org/10.1182/blood-2016-05-716480}
+}
+
+
+@inproceedings{luecken2021neurips,
+	author = {Luecken, Malte and Burkhardt, Daniel and Cannoodt, Robrecht and Lance, Christopher and Agrawal, Aditi and Aliee, Hananeh and Chen, Ann and Deconinck, Louise and Detweiler, Angela and Granados, Alejandro and Huynh, Shelly and Isacco, Laura and Kim, Yang and Klein, Dominik and DE KUMAR, BONY and Kuppasani, Sunil and Lickert, Heiko and McGeever, Aaron and Melgarejo, Joaquin and Mekonen, Honey and Morri, Maurizio and M\"{u}ller, Michaela and Neff, Norma and Paul, Sheryl and Rieck, Bastian and Schneider, Kaylie and Steelman, Scott and Sterr, Michael and Treacy, Daniel and Tong, Alexander and Villani, Alexandra-Chloe and Wang, Guilin and Yan, Jia and Zhang, Ce and Pisco, Angela and Krishnaswamy, Smita and Theis, Fabian and Bloom, Jonathan M},
+	booktitle = {Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks},
+	editor = {J. Vanschoren and S. Yeung},
+	pages = {},
+	publisher = {Curran},
+	title = {A sandbox for prediction and integration of DNA, RNA, and proteins in single cells},
+	url = {https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/file/158f3069a435b314a80bdcb024f8e422-Paper-round2.pdf},
+	volume = {1},
+	year = {2021}
+}
+
+
+@string{nov = {Nov.}}
+
+
+@string{oct = {Oct.}}
+
+
+@article{olsson2016single,
+	title = {Single-cell analysis of mixed-lineage states leading to a binary cell fate choice},
+	author = {Andre Olsson and Meenakshi Venkatasubramanian and Viren K. Chaudhri and Bruce J. Aronow and Nathan Salomonis and Harinder Singh and H. Leighton Grimes},
+	year = {2016},
+	month = {Aug.},
+	journal = {Nature},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {537},
+	number = {7622},
+	pages = {698--702},
+	doi = {10.1038/nature19348},
+	url = {https://doi.org/10.1038/nature19348}
+}
+
+
+@misc{openproblems,
+	title = {Open Problems},
+	author = {{Open Problems for Single Cell Analysis Consortium}},
+	year = {2022},
+	url = {https://openproblems.bio}
+}
+
+
+@article{palla2022squidpy,
+  title={Squidpy: a scalable framework for spatial omics analysis},
+  author={Palla, Giovanni and Spitzer, Hannah and Klein, Michal and Fischer, David and Schaar, Anna Christina and Kuemmerle, Louis Benedikt and Rybakov, Sergei and Ibarra, Ignacio L and Holmberg, Olle and Virshup, Isaac and others},
+  journal={Nature methods},
+  volume={19},
+  number={2},
+  pages={171--178},
+  year={2022},
+  publisher={Nature Publishing Group US New York},
+  doi={10.1038/s41592-021-01358-2}
+}
+
+
+@article{pearson1895regression,
+	doi = {10.1098/rspl.1895.0041},
+	title = {VII. Note on regression and inheritance in the case of two parents},
+	author = {Pearson, Karl},
+	journal = {proceedings of the royal society of London},
+	volume = {58},
+	number = {347-352},
+	pages = {240--242},
+	year = {1895},
+	publisher = {The Royal Society London}
+}
+
+
+@article{pearson1901pca,
+	title = {On lines and planes of closest fit to systems of points in space},
+	author = {Karl Pearson},
+	year = {1901},
+	month = {Nov.},
+	journal = {The London,  Edinburgh,  and Dublin Philosophical Magazine and Journal of Science},
+	publisher = {Informa {UK} Limited},
+	volume = {2},
+	number = {11},
+	pages = {559--572},
+	doi = {10.1080/14786440109462720},
+	url = {https://doi.org/10.1080/14786440109462720}
+}
+
+
+@article{pliner2019supervised,
+	title = {Supervised classification enables rapid annotation of cell atlases},
+	author = {Hannah A. Pliner and Jay Shendure and Cole Trapnell},
+	year = {2019},
+	month = {Sept.},
+	journal = {Nature Methods},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {16},
+	number = {10},
+	pages = {983--986},
+	doi = {10.1038/s41592-019-0535-3},
+	url = {https://doi.org/10.1038/s41592-019-0535-3}
+}
+
+
+@article{polanski2020bbknn,
+	title = {{BBKNN}: fast batch alignment of single cell transcriptomes},
+	author = {Krzysztof Pola{\'{n}}ski and Matthew D Young and Zhichao Miao and Kerstin B Meyer and Sarah A Teichmann and Jong-Eun Park},
+	year = {2019},
+	month = {Aug.},
+	journal = {Bioinformatics},
+	publisher = {Oxford University Press ({OUP})},
+	doi = {10.1093/bioinformatics/btz625},
+	url = {https://doi.org/10.1093/bioinformatics/btz625},
+	editor = {Bonnie Berger}
+}
+
+
+@article{raredon2022computation,
+	title = {Computation and visualization of cell{\textendash}cell signaling topologies in single-cell systems data using Connectome},
+	author = {Micha Sam Brickman Raredon and Junchen Yang and James Garritano and Meng Wang and Dan Kushnir and Jonas Christian Schupp and Taylor S. Adams and Allison M. Greaney and Katherine L. Leiby and Naftali Kaminski and Yuval Kluger and Andre Levchenko and Laura E. Niklason},
+	year = {2022},
+	month = {Mar.},
+	journal = {Scientific Reports},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {12},
+	number = {1},
+	doi = {10.1038/s41598-022-07959-x},
+	url = {https://doi.org/10.1038/s41598-022-07959-x}
+}
+
+
+@article{rodriques2019slide,
+	title = {Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution},
+	author = {Samuel G. Rodriques and Robert R. Stickels and Aleksandrina Goeva and Carly A. Martin and Evan Murray and Charles R. Vanderburg and Joshua Welch and Linlin M. Chen and Fei Chen and Evan Z. Macosko},
+	year = {2019},
+	month = {Mar.},
+	journal = {Science},
+	publisher = {American Association for the Advancement of Science ({AAAS})},
+	volume = {363},
+	number = {6434},
+	pages = {1463--1467},
+	doi = {10.1126/science.aaw1219},
+	url = {https://doi.org/10.1126/science.aaw1219}
+}
+
+
+@article{russell2023slide,
+  title = {Slide-tags enables single-nucleus barcoding for multimodal spatial genomics},
+  volume = {625},
+  ISSN = {1476-4687},
+  url = {http://dx.doi.org/10.1038/s41586-023-06837-4},
+  DOI = {10.1038/s41586-023-06837-4},
+  number = {7993},
+  journal = {Nature},
+  publisher = {Springer Science and Business Media LLC},
+  author = {Russell,  Andrew J. C. and Weir,  Jackson A. and Nadaf,  Naeem M. and Shabet,  Matthew and Kumar,  Vipin and Kambhampati,  Sandeep and Raichur,  Ruth and Marrero,  Giovanni J. and Liu,  Sophia and Balderrama,  Karol S. and Vanderburg,  Charles R. and Shanmugam,  Vignesh and Tian,  Luyi and Iorgulescu,  J. Bryan and Yoon,  Charles H. and Wu,  Catherine J. and Macosko,  Evan Z. and Chen,  Fei},
+  year = {2023},
+  month = dec,
+  pages = {101–109}
+}
+
+
+@InProceedings{santos2009on,
+	author = {Santos, Jorge M. and Embrechts, Mark"},
+	editor = {Alippi, Cesare and Polycarpou, Marios and Panayiotou, Christos and Ellinas, Georgios},
+	title = {On the Use of the Adjusted Rand Index as a Metric for Evaluating Supervised Classification},
+	booktitle = {Artificial Neural Networks -- ICANN 2009},
+	year = {2009},
+	publisher = {Springer Berlin Heidelberg},
+	address = {Berlin, Heidelberg},
+	pages = {175--184},
+	isbn = {978-3-642-04277-5}, 
+	doi = {10.1007/978-3-642-04277-5_18},
+	url = {https://doi.org/10.1007/978-3-642-04277-5_18}
+}
+
+
+@article{sarkar2021separating,
+	title = {Separating measurement and expression models clarifies confusion in single-cell {RNA} sequencing analysis},
+	author = {Abhishek Sarkar and Matthew Stephens},
+	year = {2021},
+	month = {May},
+	journal = {Nature Genetics},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {53},
+	number = {6},
+	pages = {770--777},
+	doi = {10.1038/s41588-021-00873-4},
+	url = {https://doi.org/10.1038/s41588-021-00873-4}
+}
+
+
+@article{schober2018correlation,
+	title = {Correlation Coefficients},
+	author = {Patrick Schober and Christa Boer and Lothar A. Schwarte},
+	year = {2018},
+	month = {May},
+	journal = {Anesthesia {\&} Analgesia},
+	publisher = {Ovid Technologies (Wolters Kluwer Health)},
+	volume = {126},
+	number = {5},
+	pages = {1763--1768},
+	doi = {10.1213/ane.0000000000002864},
+	url = {https://doi.org/10.1213/ane.0000000000002864}
+}
+
+
+@string{sep = {Sept.}}
+
+
+@inproceedings{stanley2020harmonic,
+	title = {Harmonic Alignment},
+	author = {Jay S. Stanley and Scott Gigante and Guy Wolf and Smita Krishnaswamy},
+	year = {2020},
+	month = {Jan},
+	booktitle = {Proceedings of the 2020 {SIAM} International Conference on Data Mining},
+	publisher = {Society for Industrial and Applied Mathematics},
+	pages = {316--324},
+	doi = {10.1137/1.9781611976236.36},
+	url = {https://doi.org/10.1137/1.9781611976236.36}
+}
+
+
+@article{stickels2020highly,
+  title = {Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2},
+  volume = {39},
+  ISSN = {1546-1696},
+  url = {http://dx.doi.org/10.1038/s41587-020-0739-1},
+  DOI = {10.1038/s41587-020-0739-1},
+  number = {3},
+  journal = {Nature Biotechnology},
+  publisher = {Springer Science and Business Media LLC},
+  author = {Stickels,  Robert R. and Murray,  Evan and Kumar,  Pawan and Li,  Jilong and Marshall,  Jamie L. and Di Bella,  Daniela J. and Arlotta,  Paola and Macosko,  Evan Z. and Chen,  Fei},
+  year = {2020},
+  month = dec,
+  pages = {313–319}
+}
+
+
+@article{stoeckius2017simultaneous,
+	title = {Simultaneous epitope and transcriptome measurement in single cells},
+	author = {Marlon Stoeckius and Christoph Hafemeister and William Stephenson and Brian Houck-Loomis and Pratip K Chattopadhyay and Harold Swerdlow and Rahul Satija and Peter Smibert},
+	year = {2017},
+	month = {Jul.},
+	journal = {Nature Methods},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {14},
+	number = {9},
+	pages = {865--868},
+	doi = {10.1038/nmeth.4380},
+	url = {https://doi.org/10.1038/nmeth.4380}
+}
+
+
+@article{stuart2019comprehensive,
+	title = {Comprehensive Integration of Single-Cell Data},
+	author = {Stuart, T. and Butler, A. and Hoffman, P. and Hafemeister, C. and Papalexi, E. and Mauck, W.M. and Hao, Y. and Stoeckius, M. and Smibert, P. and Satija, R.},
+	year = {2019},
+	journal = {Cell},
+	volume = {177},
+	number = {7},
+	pages = {1888--1902.e21},
+	doi = {10.1016/j.cell.2019.05.031}
+}
+
+
+@article{sun2020statistical,
+  title={Statistical analysis of spatial expression patterns for spatially resolved transcriptomic studies},
+  author={Sun, Shiquan and Zhu, Jiaqiang and Zhou, Xiang},
+  journal={Nature methods},
+  volume={17},
+  number={2},
+  pages={193--200},
+  year={2020},
+  publisher={Nature Publishing Group US New York},
+  doi={10.1038/s41592-019-0701-7}
+}
+
+
+@article{svensson2018spatialde,
+  title={SpatialDE: identification of spatially variable genes},
+  author={Svensson, Valentine and Teichmann, Sarah A and Stegle, Oliver},
+  journal={Nature methods},
+  volume={15},
+  number={5},
+  pages={343--346},
+  year={2018},
+  publisher={Nature Publishing Group}, 
+  doi={10.1038/nmeth.4636}
+}
+
+
+@article{szubert2019structurepreserving,
+	title = {Structure-preserving visualisation of high dimensional single-cell datasets},
+	author = {Benjamin Szubert and Jennifer E. Cole and Claudia Monaco and Ignat Drozdov},
+	year = {2019},
+	month = {Jun.},
+	journal = {Scientific Reports},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {9},
+	number = {1},
+	doi = {10.1038/s41598-019-45301-0},
+	url = {https://doi.org/10.1038/s41598-019-45301-0}
+}
+
+
+@article{tabula2018single,
+	title = {Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris},
+	author = {{Tabula Muris Consortium}},
+	year = {2018},
+	month = {Oct.},
+	journal = {Nature},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {562},
+	number = {7727},
+	pages = {367--372},
+	doi = {10.1038/s41586-018-0590-4},
+	url = {https://doi.org/10.1038/s41586-018-0590-4}
+}
+
+
+@article{tabula2020single,
+	title = {A single-cell transcriptomic atlas characterizes ageing tissues in the mouse},
+	author = {{Tabula Muris Consortium}},
+	year = {2020},
+	month = {Jul.},
+	journal = {Nature},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {583},
+	number = {7817},
+	pages = {590--595},
+	doi = {10.1038/s41586-020-2496-1},
+	url = {https://doi.org/10.1038/s41586-020-2496-1}
+}
+
+
+@article{tasic2016adult,
+	title = {Adult mouse cortical cell taxonomy revealed by single cell transcriptomics},
+	author = {Bosiljka Tasic and Vilas Menon and Thuc Nghi Nguyen and Tae Kyung Kim and Tim Jarsky and Zizhen Yao and Boaz Levi and Lucas T Gray and Staci A Sorensen and Tim Dolbeare and Darren Bertagnolli and Jeff Goldy and Nadiya Shapovalova and Sheana Parry and Changkyu Lee and Kimberly Smith and Amy Bernard and Linda Madisen and Susan M Sunkin and Michael Hawrylycz and Christof Koch and Hongkui Zeng},
+	year = {2016},
+	month = {Jan},
+	journal = {Nature Neuroscience},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {19},
+	number = {2},
+	pages = {335--346},
+	doi = {10.1038/nn.4216},
+	url = {https://doi.org/10.1038/nn.4216}
+}
+
+
+@article{tian2019benchmarking,
+	title = {Benchmarking single cell {RNA}-sequencing analysis pipelines using mixture control experiments},
+	author = {Luyi Tian and Xueyi Dong and Saskia Freytag and Kim-Anh L{\^{e}} Cao and Shian Su and Abolfazl JalalAbadi and Daniela Amann-Zalcenstein and Tom S. Weber and Azadeh Seidi and Jafar S. Jabbari and Shalin H. Naik and Matthew E. Ritchie},
+	year = {2019},
+	month = {May},
+	journal = {Nature Methods},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {16},
+	number = {6},
+	pages = {479--487},
+	doi = {10.1038/s41592-019-0425-8},
+	url = {https://doi.org/10.1038/s41592-019-0425-8}
+}
+
+
+@article{tran2020benchmark,
+	doi = {10.1186/s13059-019-1850-9},
+	url = {https://doi.org/10.1186/s13059-019-1850-9},
+	year = {2020},
+	month = {Jan},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {21},
+	number = {1},
+	author = {Hoa Thi Nhu Tran and Kok Siong Ang and Marion Chevrier and Xiaomeng Zhang and Nicole Yee Shin Lee and Michelle Goh and Jinmiao Chen},
+	title = {A benchmark of batch-effect correction methods for single-cell {RNA} sequencing data},
+	journal = {Genome Biology}
+}
+
+
+@article{van2018recovering,
+	title = {Recovering Gene Interactions from Single-Cell Data Using Data Diffusion},
+	author = {David van Dijk and Roshan Sharma and Juozas Nainys and Kristina Yim and Pooja Kathail and Ambrose J. Carr and Cassandra Burdziak and Kevin R. Moon and Christine L. Chaffer and Diwakar Pattabiraman and Brian Bierie and Linas Mazutis and Guy Wolf and Smita Krishnaswamy and Dana Pe'er},
+	year = {2018},
+	month = {Jul.},
+	journal = {Cell},
+	publisher = {Elsevier {BV}},
+	volume = {174},
+	number = {3},
+	pages = {716--729.e27},
+	doi = {10.1016/j.cell.2018.05.061},
+	url = {https://doi.org/10.1016/j.cell.2018.05.061}
+}
+
+
+@article{vandermaaten2008visualizing,
+	title = {Visualizing Data using t-SNE},
+	author = {{van der} Maaten, Laurens and Hinton, Geoffrey},
+	year = {2008},
+	journal = {Journal of Machine Learning Research},
+	volume = {9},
+	number = {86},
+	pages = {2579--2605},
+	url = {http://jmlr.org/papers/v9/vandermaaten08a.html}
+}
+
+
+@inproceedings{venna2001neighborhood,
+	title = {Neighborhood Preservation in Nonlinear Projection Methods: An Experimental Study},
+	author = {Jarkko Venna and Samuel Kaski},
+	year = {2001},
+	booktitle = {Artificial Neural Networks {\textemdash} {ICANN} 2001},
+	publisher = {Springer Berlin Heidelberg},
+	pages = {485--491},
+	doi = {{10.1007/3-540-44668-0\_68}},
+	url = {{https://doi.org/10.1007/3-540-44668-0\_68}}
+}
+
+
+@article{venna2006local,
+	title = {Local multidimensional scaling},
+	author = {Jarkko Venna and Samuel Kaski},
+	year = {2006},
+	month = {Jul.},
+	journal = {Neural Networks},
+	publisher = {Elsevier {BV}},
+	volume = {19},
+	number = {6-7},
+	pages = {889--899},
+	doi = {10.1016/j.neunet.2006.05.014},
+	url = {https://doi.org/10.1016/j.neunet.2006.05.014}
+}
+
+
+@article{virshup2021anndataannotateddata,
+	doi = {10.1101/2021.12.16.473007},
+	url = {https://doi.org/10.1101/2021.12.16.473007},
+	year = {2021},
+	month = {Dec.},
+	publisher = {Cold Spring Harbor Laboratory},
+	author = {Isaac Virshup and Sergei Rybakov and Fabian J. Theis and Philipp Angerer and F. Alexander Wolf},
+	title = {anndata: Annotated data}
+}
+
+
+@article{wagner2018knearest,
+	title = {K-nearest neighbor smoothing for high-throughput single-cell RNA-Seq data},
+	author = {Wagner, Florian and Yan, Yun and Yanai, Itai},
+	year = {2018},
+	journal = {bioRxiv},
+	publisher = {Cold Spring Harbor Laboratory},
+	doi = {10.1101/217737},
+	url = {https://www.biorxiv.org/content/early/2018/04/09/217737},
+	elocation-id = {217737},
+	eprint = {https://www.biorxiv.org/content/early/2018/04/09/217737.full.pdf}
+}
+
+
+@article{wagner2018single,
+	title = {Single-cell mapping of gene expression landscapes and lineage in the zebrafish embryo},
+	author = {Daniel E. Wagner and Caleb Weinreb and Zach M. Collins and James A. Briggs and Sean G. Megason and Allon M. Klein},
+	year = {2018},
+	month = {Jun.},
+	journal = {Science},
+	publisher = {American Association for the Advancement of Science ({AAAS})},
+	volume = {360},
+	number = {6392},
+	pages = {981--987},
+	doi = {10.1126/science.aar4362},
+	url = {https://doi.org/10.1126/science.aar4362}
+}
+
+
+@article{wang2013target,
+	title = {Target analysis by integration of transcriptome and {ChIP}-seq data with {BETA}},
+	author = {Su Wang and Hanfei Sun and Jian Ma and Chongzhi Zang and Chenfei Wang and Juan Wang and Qianzi Tang and Clifford A Meyer and Yong Zhang and X Shirley Liu},
+	year = {2013},
+	month = {Nov.},
+	journal = {Nature Protocols},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {8},
+	number = {12},
+	pages = {2502--2515},
+	doi = {10.1038/nprot.2013.150},
+	url = {https://doi.org/10.1038/nprot.2013.150}
+}
+
+
+@article{wang2017visualization,
+	title = {Visualization and analysis of single-cell {RNA}-seq data by kernel-based similarity learning},
+	volume = {14},
+	copyright = {2017 Springer Nature America, Inc.},
+	issn = {1548-7105},
+	url = {https://www.nature.com/articles/nmeth.4207},
+	doi = {10.1038/nmeth.4207},
+	abstract = {The SIMLR software identifies similarities between cells across a range of single-cell RNA-seq data, enabling effective dimension reduction, clustering and visualization.},
+	language = {en},
+	number = {4},
+	journal = {Nature Methods},
+	author = {Wang, Bo and Zhu, Junjie and Pierson, Emma and Ramazzotti, Daniele and Batzoglou, Serafim},
+	month = apr,
+	year = {2017},
+	publisher = {Nature Publishing Group},
+	keywords = {Gene expression, Genome informatics, Machine learning, Statistical methods},
+	pages = {414--416},
+}
+
+
+@article{wang2018three,
+  title = {Three-dimensional intact-tissue sequencing of single-cell transcriptional states},
+  volume = {361},
+  ISSN = {1095-9203},
+  url = {http://dx.doi.org/10.1126/science.aat5691},
+  DOI = {10.1126/science.aat5691},
+  number = {6400},
+  journal = {Science},
+  publisher = {American Association for the Advancement of Science (AAAS)},
+  author = {Wang,  Xiao and Allen,  William E. and Wright,  Matthew A. and Sylwestrak,  Emily L. and Samusik,  Nikolay and Vesuna,  Sam and Evans,  Kathryn and Liu,  Cindy and Ramakrishnan,  Charu and Liu,  Jia and Nolan,  Garry P. and Bava,  Felice-Alessio and Deisseroth,  Karl},
+  year = {2018},
+  month = jul 
+}
+
+
+@article{wang2022high,
+  title = {High-resolution 3D spatiotemporal transcriptomic maps of developing Drosophila embryos and larvae},
+  volume = {57},
+  ISSN = {1534-5807},
+  url = {http://dx.doi.org/10.1016/j.devcel.2022.04.006},
+  DOI = {10.1016/j.devcel.2022.04.006},
+  number = {10},
+  journal = {Developmental Cell},
+  publisher = {Elsevier BV},
+  author = {Wang,  Mingyue and Hu,  Qinan and Lv,  Tianhang and Wang,  Yuhang and Lan,  Qing and Xiang,  Rong and Tu,  Zhencheng and Wei,  Yanrong and Han,  Kai and Shi,  Chang and Guo,  Junfu and Liu,  Chao and Yang,  Tao and Du,  Wensi and An,  Yanru and Cheng,  Mengnan and Xu,  Jiangshan and Lu,  Haorong and Li,  Wangsheng and Zhang,  Shaofang and Chen,  Ao and Chen,  Wei and Li,  Yuxiang and Wang,  Xiaoshan and Xu,  Xun and Hu,  Yuhui and Liu,  Longqi},
+  year = {2022},
+  month = may,
+  pages = {1271--1283.e4}
+}
+
+
+@article{weber2023nnsvg,
+  title={nnSVG for the scalable identification of spatially variable genes using nearest-neighbor Gaussian processes},
+  author={Weber, Lukas M and Saha, Arkajyoti and Datta, Abhirup and Hansen, Kasper D and Hicks, Stephanie C},
+  journal={Nature communications},
+  volume={14},
+  number={1},
+  pages={4059},
+  year={2023},
+  publisher={Nature Publishing Group UK London},
+  doi={10.1038/s41467-023-39748-z}
+}
+
+
+@article{welch2019single,
+	title = {Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity},
+	author = {Joshua D. Welch and Velina Kozareva and Ashley Ferreira and Charles Vanderburg and Carly Martin and Evan Z. Macosko},
+	year = {2019},
+	month = {Jun.},
+	journal = {Cell},
+	publisher = {Elsevier {BV}},
+	volume = {177},
+	number = {7},
+	pages = {1873--1887.e17},
+	doi = {10.1016/j.cell.2019.05.006},
+	url = {https://doi.org/10.1016/j.cell.2019.05.006}
+}
+
+
+@article{wilkinson1973symbolic,
+	doi = {10.2307/2346786},
+	url = {https://doi.org/10.2307/2346786},
+	year = {1973},
+	publisher = {{JSTOR}},
+	volume = {22},
+	number = {3},
+	pages = {392},
+	author = {G. N. Wilkinson and C. E. Rogers},
+	title = {Symbolic Description of Factorial Models for Analysis of Variance},
+	journal = {Applied Statistics}
+}
+
+
+@article{wu2021single,
+	title = {A single-cell and spatially resolved atlas of human breast cancers},
+	author = {Sunny Z. Wu and Ghamdan Al-Eryani and Daniel Lee Roden and Simon Junankar and Kate Harvey and Alma Andersson and Aatish Thennavan and Chenfei Wang and James R. Torpy and Nenad Bartonicek and Taopeng Wang and Ludvig Larsson and Dominik Kaczorowski and Neil I. Weisenfeld and Cedric R. Uytingco and Jennifer G. Chew and Zachary W. Bent and Chia-Ling Chan and Vikkitharan Gnanasambandapillai and Charles-Antoine Dutertre and Laurence Gluch and Mun N. Hui and Jane Beith and Andrew Parker and Elizabeth Robbins and Davendra Segara and Caroline Cooper and Cindy Mak and Belinda Chan and Sanjay Warrier and Florent Ginhoux and Ewan Millar and Joseph E. Powell and Stephen R. Williams and X. Shirley Liu and Sandra O'Toole and Elgene Lim and Joakim Lundeberg and Charles M. Perou and Alexander Swarbrick},
+	year = {2021},
+	month = {Sept.},
+	journal = {Nature Genetics},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {53},
+	number = {9},
+	pages = {1334--1347},
+	doi = {10.1038/s41588-021-00911-1},
+	url = {https://doi.org/10.1038/s41588-021-00911-1}
+}
+
+
+@article{xiong2020neuralee,
+	title = {{NeuralEE}: A {GPU}-Accelerated Elastic Embedding Dimensionality Reduction Method for Visualizing Large-Scale {scRNA}-Seq Data},
+	author = {Jiankang Xiong and Fuzhou Gong and Lin Wan and Liang Ma},
+	year = {2020},
+	month = {Oct.},
+	journal = {Frontiers in Genetics},
+	publisher = {Frontiers Media {SA}},
+	volume = {11},
+	doi = {10.3389/fgene.2020.00786},
+	url = {https://doi.org/10.3389/fgene.2020.00786}
+}
+
+
+@article{xiong2021online,
+	title = {Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space},
+	author = {Lei Xiong and Kang Tian and Yuzhe Li and Weixi Ning and Xin Gao and Qiangfeng Cliff Zhang},
+	year = {2022},
+	month = {Oct.},
+	journal = {Nature Communications},
+	publisher = {Springer Science and Business Media {LLC}},
+	volume = {13},
+	number = {1},
+	doi = {10.1038/s41467-022-33758-z},
+	url = {https://doi.org/10.1038/s41467-022-33758-z}
+}
+
+
+@article{xu2021probabilistic,
+	title = {Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models},
+	author = {Chenling Xu and Romain Lopez and Edouard Mehlman and Jeffrey Regier and Michael I Jordan and Nir Yosef},
+	year = {2021},
+	month = {Jan},
+	journal = {Molecular Systems Biology},
+	publisher = {{Embo}},
+	volume = {17},
+	number = {1},
+	doi = {10.15252/msb.20209620},
+	url = {https://doi.org/10.15252/msb.20209620}
+}
+
+
+@article{zappia2018exploring,
+	doi = {10.1371/journal.pcbi.1006245},
+	url = {https://doi.org/10.1371/journal.pcbi.1006245},
+	year = {2018},
+	month = {Jun.},
+	publisher = {Public Library of Science ({PLoS})},
+	volume = {14},
+	number = {6},
+	pages = {e1006245},
+	author = {Luke Zappia and Belinda Phipson and Alicia Oshlack},
+	editor = {Dina Schneidman},
+	title = {Exploring the single-cell {RNA}-seq analysis landscape with the {scRNA}-tools database},
+	journal = {{PLOS} Computational Biology}
+}
+
+
+@article{zhang2021pydrmetrics,
+	title = {{pyDRMetrics} - A Python toolkit for dimensionality reduction quality assessment},
+	author = {Yinsheng Zhang and Qian Shang and Guoming Zhang},
+	year = {2021},
+	month = {Feb.},
+	journal = {Heliyon},
+	publisher = {Elsevier {BV}},
+	volume = {7},
+	number = {2},
+	pages = {e06199},
+	doi = {10.1016/j.heliyon.2021.e06199},
+	url = {https://doi.org/10.1016/j.heliyon.2021.e06199}
+}
+
+
+@article{zhang2022identification,
+  title={Identification of spatially variable genes with graph cuts},
+  author={Zhang, Ke and Feng, Wanwan and Wang, Peng},
+  journal={Nature Communications},
+  volume={13},
+  number={1},
+  pages={5488},
+  year={2022},
+  publisher={Nature Publishing Group UK London},
+  doi={10.1038/s41467-022-33182-3}
+}
+
+
+@article{zhu2021spark,
+  title={SPARK-X: non-parametric modeling enables scalable and robust detection of spatial expression patterns for large spatial transcriptomic studies},
+  author={Zhu, Jiaqiang and Sun, Shiquan and Zhou, Xiang},
+  journal={Genome biology},
+  volume={22},
+  number={1},
+  pages={184},
+  year={2021},
+  publisher={Springer},
+  doi={10.1186/s13059-021-02404-0}
+}
+
+
+@article {hrovatin2023delineating,
+	author = {Karin Hrovatin and Aim{\'e}e Bastidas-Ponce and Mostafa Bakhti and Luke Zappia and Maren B{\"u}ttner and Ciro Sallino and Michael Sterr and Anika B{\"o}ttcher and Adriana Migliorini and Heiko Lickert and Fabian J. Theis},
+	title = {Delineating mouse β-cell identity during lifetime and in diabetes with a single cell atlas},
+	elocation-id = {2022.12.22.521557},
+	year = {2023},
+	doi = {10.1101/2022.12.22.521557},
+	publisher = {Cold Spring Harbor Laboratory},
+	URL = {https://www.biorxiv.org/content/early/2023/04/25/2022.12.22.521557},
+	eprint = {https://www.biorxiv.org/content/early/2023/04/25/2022.12.22.521557.full.pdf},
+	journal = {bioRxiv}
+}
+
+@article{sikkema2023integrated,
+  title = {An integrated cell atlas of the lung in health and disease},
+  volume = {29},
+  ISSN = {1546-170X},
+  url = {http://dx.doi.org/10.1038/s41591-023-02327-2},
+  DOI = {10.1038/s41591-023-02327-2},
+  number = {6},
+  journal = {Nature Medicine},
+  publisher = {Springer Science and Business Media LLC},
+  author = {Sikkema,  Lisa and Ramírez-Suástegui,  Ciro and Strobl,  Daniel C. and Gillett,  Tessa E. and Zappia,  Luke and Madissoon,  Elo and Markov,  Nikolay S. and Zaragosi,  Laure-Emmanuelle and Ji,  Yuge and Ansari,  Meshal and Arguel,  Marie-Jeanne and Apperloo,  Leonie and Banchero,  Martin and Bécavin,  Christophe and Berg,  Marijn and Chichelnitskiy,  Evgeny and Chung,  Mei-i and Collin,  Antoine and Gay,  Aurore C. A. and Gote-Schniering,  Janine and Hooshiar Kashani,  Baharak and Inecik,  Kemal and Jain,  Manu and Kapellos,  Theodore S. and Kole,  Tessa M. and Leroy,  Sylvie and Mayr,  Christoph H. and Oliver,  Amanda J. and von Papen,  Michael and Peter,  Lance and Taylor,  Chase J. and Walzthoeni,  Thomas and Xu,  Chuan and Bui,  Linh T. and De Donno,  Carlo and Dony,  Leander and Faiz,  Alen and Guo,  Minzhe and Gutierrez,  Austin J. and Heumos,  Lukas and Huang,  Ni and Ibarra,  Ignacio L. and Jackson,  Nathan D. and Kadur Lakshminarasimha Murthy,  Preetish and Lotfollahi,  Mohammad and Tabib,  Tracy and Talavera-López,  Carlos and Travaglini,  Kyle J. and Wilbrey-Clark,  Anna and Worlock,  Kaylee B. and Yoshida,  Masahiro and Chen,  Yuexin and Hagood,  James S. and Agami,  Ahmed and Horvath,  Peter and Lundeberg,  Joakim and Marquette,  Charles-Hugo and Pryhuber,  Gloria and Samakovlis,  Chistos and Sun,  Xin and Ware,  Lorraine B. and Zhang,  Kun and van den Berge,  Maarten and Bossé,  Yohan and Desai,  Tushar J. and Eickelberg,  Oliver and Kaminski,  Naftali and Krasnow,  Mark A. and Lafyatis,  Robert and Nikolic,  Marko Z. and Powell,  Joseph E. and Rajagopal,  Jayaraj and Rojas,  Mauricio and Rozenblatt-Rosen,  Orit and Seibold,  Max A. and Sheppard,  Dean and Shepherd,  Douglas P. and Sin,  Don D. and Timens,  Wim and Tsankov,  Alexander M. and Whitsett,  Jeffrey and Xu,  Yan and Banovich,  Nicholas E. and Barbry,  Pascal and Duong,  Thu Elizabeth and Falk,  Christine S. and Meyer,  Kerstin B. and Kropski,  Jonathan A. and Pe’er,  Dana and Schiller,  Herbert B. and Tata,  Purushothama Rao and Schultze,  Joachim L. and Teichmann,  Sara A. and Misharin,  Alexander V. and Nawijn,  Martijn C. and Luecken,  Malte D. and Theis,  Fabian J.},
+  year = {2023},
+  month = jun,
+  pages = {1563–1577}
+}
+
+@article{consortium2022tabula,
+  title = {The Tabula Sapiens: A multiple-organ,  single-cell transcriptomic atlas of humans},
+  volume = {376},
+  ISSN = {1095-9203},
+  url = {http://dx.doi.org/10.1126/science.abl4896},
+  DOI = {10.1126/science.abl4896},
+  number = {6594},
+  journal = {Science},
+  publisher = {American Association for the Advancement of Science (AAAS)},
+  author = {Jones,  Robert C. and Karkanias,  Jim and Krasnow,  Mark A. and Pisco,  Angela Oliveira and Quake,  Stephen R. and Salzman,  Julia and Yosef,  Nir and Bulthaup,  Bryan and Brown,  Phillip and Harper,  William and Hemenez,  Marisa and Ponnusamy,  Ravikumar and Salehi,  Ahmad and Sanagavarapu,  Bhavani A. and Spallino,  Eileen and Aaron,  Ksenia A. and Concepcion,  Waldo and Gardner,  James M. and Kelly,  Burnett and Neidlinger,  Nikole and Wang,  Zifa and Crasta,  Sheela and Kolluru,  Saroja and Morri,  Maurizio and Pisco,  Angela Oliveira and Tan,  Serena Y. and Travaglini,  Kyle J. and Xu,  Chenling and Alcántara-Hernández,  Marcela and Almanzar,  Nicole and Antony,  Jane and Beyersdorf,  Benjamin and Burhan,  Deviana and Calcuttawala,  Kruti and Carter,  Matthew M. and Chan,  Charles K. F. and Chang,  Charles A. and Chang,  Stephen and Colville,  Alex and Crasta,  Sheela and Culver,  Rebecca N. and Cvijović,  Ivana and D’Amato,  Gaetano and Ezran,  Camille and Galdos,  Francisco X. and Gillich,  Astrid and Goodyer,  William R. and Hang,  Yan and Hayashi,  Alyssa and Houshdaran,  Sahar and Huang,  Xianxi and Irwin,  Juan C. and Jang,  SoRi and Juanico,  Julia Vallve and Kershner,  Aaron M. and Kim,  Soochi and Kiss,  Bernhard and Kolluru,  Saroja and Kong,  William and Kumar,  Maya E. and Kuo,  Angera H. and Leylek,  Rebecca and Li,  Baoxiang and Loeb,  Gabriel B. and Lu,  Wan-Jin and Mantri,  Sruthi and Markovic,  Maxim and McAlpine,  Patrick L. and de Morree,  Antoine and Morri,  Maurizio and Mrouj,  Karim and Mukherjee,  Shravani and Muser,  Tyler and Neuh\"{o}fer,  Patrick and Nguyen,  Thi D. and Perez,  Kimberly and Phansalkar,  Ragini and Pisco,  Angela Oliveira and Puluca,  Nazan and Qi,  Zhen and Rao,  Poorvi and Raquer-McKay,  Hayley and Schaum,  Nicholas and Scott,  Bronwyn and Seddighzadeh,  Bobak and Segal,  Joe and Sen,  Sushmita and Sikandar,  Shaheen and Spencer,  Sean P. and Steffes,  Lea C. and Subramaniam,  Varun R. and Swarup,  Aditi and Swift,  Michael and Travaglini,  Kyle J. and Van Treuren,  Will and Trimm,  Emily and Veizades,  Stefan and Vijayakumar,  Sivakamasundari and Vo,  Kim Chi and Vorperian,  Sevahn K. and Wang,  Wanxin and Weinstein,  Hannah N. W. and Winkler,  Juliane and Wu,  Timothy T. H. and Xie,  Jamie and Yung,  Andrea R. and Zhang,  Yue and Detweiler,  Angela M. and Mekonen,  Honey and Neff,  Norma F. and Sit,  Rene V. and Tan,  Michelle and Yan,  Jia and Bean,  Gregory R. and Charu,  Vivek and Forgó,  Erna and Martin,  Brock A. and Ozawa,  Michael G. and Silva,  Oscar and Tan,  Serena Y. and Toland,  Angus and Vemuri,  Venkata N. P. and Afik,  Shaked and Awayan,  Kyle and Botvinnik,  Olga Borisovna and Byrne,  Ashley and Chen,  Michelle and Dehghannasiri,  Roozbeh and Detweiler,  Angela M. and Gayoso,  Adam and Granados,  Alejandro A. and Li,  Qiqing and Mahmoudabadi,  Gita and McGeever,  Aaron and de Morree,  Antoine and Olivieri,  Julia Eve and Park,  Madeline and Pisco,  Angela Oliveira and Ravikumar,  Neha and Salzman,  Julia and Stanley,  Geoff and Swift,  Michael and Tan,  Michelle and Tan,  Weilun and Tarashansky,  Alexander J. and Vanheusden,  Rohan and Vorperian,  Sevahn K. and Wang,  Peter and Wang,  Sheng and Xing,  Galen and Xu,  Chenling and Yosef,  Nir and Alcántara-Hernández,  Marcela and Antony,  Jane and Chan,  Charles K. F. and Chang,  Charles A. and Colville,  Alex and Crasta,  Sheela and Culver,  Rebecca and Dethlefsen,  Les and Ezran,  Camille and Gillich,  Astrid and Hang,  Yan and Ho,  Po-Yi and Irwin,  Juan C. and Jang,  SoRi and Kershner,  Aaron M. and Kong,  William and Kumar,  Maya E. and Kuo,  Angera H. and Leylek,  Rebecca and Liu,  Shixuan and Loeb,  Gabriel B. and Lu,  Wan-Jin and Maltzman,  Jonathan S. and Metzger,  Ross J. and de Morree,  Antoine and Neuh\"{o}fer,  Patrick and Perez,  Kimberly and Phansalkar,  Ragini and Qi,  Zhen and Rao,  Poorvi and Raquer-McKay,  Hayley and Sasagawa,  Koki and Scott,  Bronwyn and Sinha,  Rahul and Song,  Hanbing and Spencer,  Sean P. and Swarup,  Aditi and Swift,  Michael and Travaglini,  Kyle J. and Trimm,  Emily and Veizades,  Stefan and Vijayakumar,  Sivakamasundari and Wang,  Bruce and Wang,  Wanxin and Winkler,  Juliane and Xie,  Jamie and Yung,  Andrea R. and Artandi,  Steven E. and Beachy,  Philip A. and Clarke,  Michael F. and Giudice,  Linda C. and Huang,  Franklin W. and Huang,  Kerwyn Casey and Idoyaga,  Juliana and Kim,  Seung K. and Krasnow,  Mark and Kuo,  Christin S. and Nguyen,  Patricia and Quake,  Stephen R. and Rando,  Thomas A. and Red-Horse,  Kristy and Reiter,  Jeremy and Relman,  David A. and Sonnenburg,  Justin L. and Wang,  Bruce and Wu,  Albert and Wu,  Sean M. and Wyss-Coray,  Tony},
+  year = {2022},
+  month = may 
+}
+
+@article{dominguez2022crosstissue,
+  title = {Cross-tissue immune cell analysis reveals tissue-specific features in humans},
+  volume = {376},
+  ISSN = {1095-9203},
+  url = {http://dx.doi.org/10.1126/science.abl5197},
+  DOI = {10.1126/science.abl5197},
+  number = {6594},
+  journal = {Science},
+  publisher = {American Association for the Advancement of Science (AAAS)},
+  author = {Domínguez Conde,  C. and Xu,  C. and Jarvis,  L. B. and Rainbow,  D. B. and Wells,  S. B. and Gomes,  T. and Howlett,  S. K. and Suchanek,  O. and Polanski,  K. and King,  H. W. and Mamanova,  L. and Huang,  N. and Szabo,  P. A. and Richardson,  L. and Bolt,  L. and Fasouli,  E. S. and Mahbubani,  K. T. and Prete,  M. and Tuck,  L. and Richoz,  N. and Tuong,  Z. K. and Campos,  L. and Mousa,  H. S. and Needham,  E. J. and Pritchard,  S. and Li,  T. and Elmentaite,  R. and Park,  J. and Rahmani,  E. and Chen,  D. and Menon,  D. K. and Bayraktar,  O. A. and James,  L. K. and Meyer,  K. B. and Yosef,  N. and Clatworthy,  M. R. and Sims,  P. A. and Farber,  D. L. and Saeb-Parsy,  K. and Jones,  J. L. and Teichmann,  S. A.},
+  year = {2022},
+  month = may 
+}
+
+@article{eraslan2022singlenucleus,
+  title = {Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function},
+  volume = {376},
+  ISSN = {1095-9203},
+  url = {http://dx.doi.org/10.1126/science.abl4290},
+  DOI = {10.1126/science.abl4290},
+  number = {6594},
+  journal = {Science},
+  publisher = {American Association for the Advancement of Science (AAAS)},
+  author = {Eraslan,  G\"{o}kcen and Drokhlyansky,  Eugene and Anand,  Shankara and Fiskin,  Evgenij and Subramanian,  Ayshwarya and Slyper,  Michal and Wang,  Jiali and Van Wittenberghe,  Nicholas and Rouhana,  John M. and Waldman,  Julia and Ashenberg,  Orr and Lek,  Monkol and Dionne,  Danielle and Win,  Thet Su and Cuoco,  Michael S. and Kuksenko,  Olena and Tsankov,  Alexander M. and Branton,  Philip A. and Marshall,  Jamie L. and Greka,  Anna and Getz,  Gad and Segrè,  Ayellet V. and Aguet,  Fran\c{c}ois and Rozenblatt-Rosen,  Orit and Ardlie,  Kristin G. and Regev,  Aviv},
+  year = {2022},
+  month = may 
+}
+
+@article{li2023integrated,
+  title = {Integrated multi-omics single cell atlas of the human retina},
+  url = {http://dx.doi.org/10.1101/2023.11.07.566105},
+  DOI = {10.1101/2023.11.07.566105},
+  publisher = {Cold Spring Harbor Laboratory},
+  author = {Li,  Jin and Wang,  Jun and Ibarra,  Ignacio L and Cheng,  Xuesen and Luecken,  Malte D and Lu,  Jiaxiong and Monavarfeshani,  Aboozar and Yan,  Wenjun and Zheng,  Yiqiao and Zuo,  Zhen and Zayas Colborn,  Samantha Lynn and Cortez,  Berenice Sarahi and Owen,  Leah A and Tran,  Nicholas M and Shekhar,  Karthik and Sanes,  Joshua R and Stout,  J Timothy and Chen,  Shiming and Li,  Yumei and DeAngelis,  Margaret M and Theis,  Fabian J and Chen,  Rui},
+  year = {2023},
+  month = nov 
+}
+
+@article{wilson2022multimodal,
+  title = {Multimodal single cell sequencing implicates chromatin accessibility and genetic background in diabetic kidney disease progression},
+  volume = {13},
+  ISSN = {2041-1723},
+  url = {http://dx.doi.org/10.1038/s41467-022-32972-z},
+  DOI = {10.1038/s41467-022-32972-z},
+  number = {1},
+  journal = {Nature Communications},
+  publisher = {Springer Science and Business Media LLC},
+  author = {Wilson,  Parker C. and Muto,  Yoshiharu and Wu,  Haojia and Karihaloo,  Anil and Waikar,  Sushrut S. and Humphreys,  Benjamin D.},
+  year = {2022},
+  month = sep 
+}
+
+@article{steuernagel2022hypomap,
+  title = {HypoMap—a unified single-cell gene expression atlas of the murine hypothalamus},
+  volume = {4},
+  ISSN = {2522-5812},
+  url = {http://dx.doi.org/10.1038/s42255-022-00657-y},
+  DOI = {10.1038/s42255-022-00657-y},
+  number = {10},
+  journal = {Nature Metabolism},
+  publisher = {Springer Science and Business Media LLC},
+  author = {Steuernagel,  Lukas and Lam,  Brian Y. H. and Klemm,  Paul and Dowsett,  Georgina K. C. and Bauder,  Corinna A. and Tadross,  John A. and Hitschfeld,  Tamara Sotelo and del Rio Martin,  Almudena and Chen,  Weiyi and de Solis,  Alain J. and Fenselau,  Henning and Davidsen,  Peter and Cimino,  Irene and Kohnke,  Sara N. and Rimmington,  Debra and Coll,  Anthony P. and Beyer,  Andreas and Yeo,  Giles S. H. and Br\"{u}ning,  Jens C.},
+  year = {2022},
+  month = oct,
+  pages = {1402–1419}
+}
+
+@article{tian2023singlecell,
+  title = {Single-cell DNA methylation and 3D genome architecture in the human brain},
+  volume = {382},
+  ISSN = {1095-9203},
+  url = {http://dx.doi.org/10.1126/science.adf5357},
+  DOI = {10.1126/science.adf5357},
+  number = {6667},
+  journal = {Science},
+  publisher = {American Association for the Advancement of Science (AAAS)},
+  author = {Tian,  Wei and Zhou,  Jingtian and Bartlett,  Anna and Zeng,  Qiurui and Liu,  Hanqing and Castanon,  Rosa G. and Kenworthy,  Mia and Altshul,  Jordan and Valadon,  Cynthia and Aldridge,  Andrew and Nery,  Joseph R. and Chen,  Huaming and Xu,  Jiaying and Johnson,  Nicholas D. and Lucero,  Jacinta and Osteen,  Julia K. and Emerson,  Nora and Rink,  Jon and Lee,  Jasper and Li,  Yang E. and Siletti,  Kimberly and Liem,  Michelle and Claffey,  Naomi and O’Connor,  Carolyn and Yanny,  Anna Marie and Nyhus,  Julie and Dee,  Nick and Casper,  Tamara and Shapovalova,  Nadiya and Hirschstein,  Daniel and Ding,  Song-Lin and Hodge,  Rebecca and Levi,  Boaz P. and Keene,  C. Dirk and Linnarsson,  Sten and Lein,  Ed and Ren,  Bing and Behrens,  M. Margarita and Ecker,  Joseph R.},
+  year = {2023},
+  month = oct 
+}
+
+
+@article{sonrel2023metaanalysis,
+  title = {Meta-analysis of (single-cell method) benchmarks reveals the need for extensibility and interoperability},
+  volume = {24},
+  ISSN = {1474-760X},
+  url = {http://dx.doi.org/10.1186/s13059-023-02962-5},
+  DOI = {10.1186/s13059-023-02962-5},
+  number = {1},
+  journal = {Genome Biology},
+  publisher = {Springer Science and Business Media LLC},
+  author = {Sonrel,  Anthony and Luetge,  Almut and Soneson,  Charlotte and Mallona,  Izaskun and Germain,  Pierre-Luc and Knyazev,  Sergey and Gilis,  Jeroen and Gerber,  Reto and Seurinck,  Ruth and Paul,  Dominique and Sonder,  Emanuel and Crowell,  Helena L. and Fanaswala,  Imran and Al-Ajami,  Ahmad and Heidari,  Elyas and Schmeing,  Stephan and Milosavljevic,  Stefan and Saeys,  Yvan and Mangul,  Serghei and Robinson,  Mark D.},
+  year = {2023},
+  month = may 
+}
+
+
+@article{saelens2019comparison,
+  title = {A comparison of single-cell trajectory inference methods},
+  volume = {37},
+  ISSN = {1546-1696},
+  url = {http://dx.doi.org/10.1038/s41587-019-0071-9},
+  DOI = {10.1038/s41587-019-0071-9},
+  number = {5},
+  journal = {Nature Biotechnology},
+  publisher = {Springer Science and Business Media LLC},
+  author = {Saelens,  Wouter and Cannoodt,  Robrecht and Todorov,  Helena and Saeys,  Yvan},
+  year = {2019},
+  month = apr,
+  pages = {547–554}
+}
+
+
+@article{huang2018savergene,
+  title = {SAVER: gene expression recovery for single-cell RNA sequencing},
+  volume = {15},
+  ISSN = {1548-7105},
+  url = {http://dx.doi.org/10.1038/s41592-018-0033-z},
+  DOI = {10.1038/s41592-018-0033-z},
+  number = {7},
+  journal = {Nature Methods},
+  publisher = {Springer Science and Business Media LLC},
+  author = {Huang,  Mo and Wang,  Jingshu and Torre,  Eduardo and Dueck,  Hannah and Shaffer,  Sydney and Bonasio,  Roberto and Murray,  John I. and Raj,  Arjun and Li,  Mingyao and Zhang,  Nancy R.},
+  year = {2018},
+  month = jun,
+  pages = {539–542}
+}
+
+
+@article{chari2023speciousart,
+  title = {The specious art of single-cell genomics},
+  volume = {19},
+  ISSN = {1553-7358},
+  url = {http://dx.doi.org/10.1371/journal.pcbi.1011288},
+  DOI = {10.1371/journal.pcbi.1011288},
+  number = {8},
+  journal = {PLOS Computational Biology},
+  publisher = {Public Library of Science (PLoS)},
+  author = {Chari,  Tara and Pachter,  Lior},
+  editor = {Papin,  Jason A.},
+  year = {2023},
+  month = aug,
+  pages = {e1011288}
+}
+
diff --git a/src/common/process_dataset_metadata/run/config.vsh.yaml b/src/common/process_dataset_metadata/run/config.vsh.yaml
new file mode 100644
index 0000000000..550b621ef6
--- /dev/null
+++ b/src/common/process_dataset_metadata/run/config.vsh.yaml
@@ -0,0 +1,29 @@
+functionality:
+  name: run
+  namespace: common/process_dataset_metadata
+  description: >-
+    This workflow transforms the meta information of the datasets into a format
+    that can be used by the website.
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input"
+          type: file
+          required: true
+          direction: input
+          example: meta.yaml
+    - name: Outputs
+      arguments:
+        - name: "--output"
+          type: file
+          required: true
+          direction: output
+          default: meta.json
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+  dependencies: 
+    - name: common/process_task_results/yaml_to_json
+platforms:
+  - type: nextflow
\ No newline at end of file
diff --git a/src/common/process_dataset_metadata/run/main.nf b/src/common/process_dataset_metadata/run/main.nf
new file mode 100644
index 0000000000..2e453d5d52
--- /dev/null
+++ b/src/common/process_dataset_metadata/run/main.nf
@@ -0,0 +1,17 @@
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    | yaml_to_json.run(
+      fromState: ["input"],
+      toState: ["output"]
+    )
+
+    | setState(["output"])
+
+    emit:
+    output_ch
+}
\ No newline at end of file
diff --git a/src/common/process_dataset_metadata/run/run.sh b/src/common/process_dataset_metadata/run/run.sh
new file mode 100644
index 0000000000..27ea225ed3
--- /dev/null
+++ b/src/common/process_dataset_metadata/run/run.sh
@@ -0,0 +1,53 @@
+#!/bin/bash
+
+# fail on error
+set -e
+
+# ensure we're in the root of the repo
+REPO_ROOT=$(git rev-parse --show-toplevel)
+cd "$REPO_ROOT"
+
+DATASET_DIR="s3://openproblems-data/resources/datasets/"
+
+for LOADER in $(aws s3 ls $DATASET_DIR); do
+
+  if [ "$LOADER" == "PRE" ]; then
+    continue
+  fi
+
+  BASE_DIR="${DATASET_DIR%/}/$LOADER"
+
+  for DATASET in $(aws s3 ls $BASE_DIR); do
+    
+    if [ "$DATASET" == "PRE" ]; then
+      continue
+    fi
+    
+    FILE_DIR="${BASE_DIR%/}/${DATASET%/}/log_cp10k/"
+    FILES=$(aws s3 ls $FILE_DIR)
+    metafiles=$(echo "$FILES" | grep "meta" | awk '{print $NF}')
+    # metafiles=$(find $INPUT -type f -name "*meta*")
+    # echo $metafiles
+
+    for metafile in $metafiles; do
+      INPUT="${FILE_DIR%/}/$metafile"
+      OUTPUT_DIR="../website/datasets/$LOADER/${DATASET%/}/data/"
+      OUTPUT_FILE="${metafile%.*}.json"
+      echo "Processing $LOADER - $DATASET : $INPUT"
+
+      # start the 
+      NXF_VER=23.10.0 nextflow run . \
+      -main-script target/nextflow/common/process_dataset_metadata/run/main.nf \
+      -profile docker \
+      -c src/wf_utils/labels_ci.config \
+      --id "extract_metadata" \
+      --input "$INPUT" \
+      --output "$OUTPUT_FILE" \
+      --output_state "state.yaml" \
+      --publish_dir "$OUTPUT_DIR"
+    done
+
+# cause quarto rerender to index page when in preview mode
+# touch ../website/results/$TASK/index.qmd
+    done
+done
\ No newline at end of file
diff --git a/src/common/process_task_results/api/get_info.yaml b/src/common/process_task_results/api/get_info.yaml
new file mode 100644
index 0000000000..9691936615
--- /dev/null
+++ b/src/common/process_task_results/api/get_info.yaml
@@ -0,0 +1,23 @@
+functionality:
+  namespace: common/process_task_results
+  arguments:
+    - name: "--input"
+      type: "file"
+      example: 
+      description: "A yaml file"
+    - name: "--task_id"
+      type: "string"
+      description: "A task dir"
+      example: label_projection
+    - name: "--output"
+      type: "file"
+      direction: "output"
+      default: "output.json"
+      description: "Output json"
+  test_resources: 
+    - type: python_script
+      path: /src/common/comp_tests/check_get_info.py
+    - path: /src
+      dest: openproblems/src
+    - path: /_viash.yaml
+      dest: openproblems/_viash.yaml
\ No newline at end of file
diff --git a/src/common/process_task_results/generate_qc/config.vsh.yaml b/src/common/process_task_results/generate_qc/config.vsh.yaml
new file mode 100644
index 0000000000..68a5d19682
--- /dev/null
+++ b/src/common/process_task_results/generate_qc/config.vsh.yaml
@@ -0,0 +1,39 @@
+functionality:
+  name: "generate_qc"
+  description: "Generate task QC metrics"
+  namespace: common/process_task_results
+  arguments:
+    - name: "--task_info"
+      type: "file"
+      example: task_info.json
+      description: "Task info file"
+    - name: "--method_info"
+      type: "file"
+      example: method_info.json
+      description: "Method info file"
+    - name: "--metric_info"
+      type: "file"
+      example: metric_info.json
+      description: "Metric info file"
+    - name: "--dataset_info"
+      type: "file"
+      example: dataset_info.json
+      description: "Dataset info file"
+    - name: "--results"
+      type: "file"
+      example: results.json
+      description: "Results file"
+    - name: "--output"
+      type: "file"
+      direction: "output"
+      default: "output.json"
+      description: "Output json"
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [lowmem, lowtime, lowcpu]
diff --git a/src/common/process_task_results/generate_qc/script.py b/src/common/process_task_results/generate_qc/script.py
new file mode 100644
index 0000000000..f15a877522
--- /dev/null
+++ b/src/common/process_task_results/generate_qc/script.py
@@ -0,0 +1,294 @@
+import json
+import numpy as np
+
+## VIASH START
+## VIASH END
+
+EXPECTED_TASK_FIELDS = ["task_id", "task_name", "task_summary", "task_description"]
+EXPECTED_METHOD_FIELDS = ["task_id", "commit_sha", "method_id", "method_name", "method_summary", "paper_reference", "is_baseline"]
+EXPECTED_METRIC_FIELDS = ["task_id", "commit_sha", "metric_id", "metric_name", "metric_summary", "paper_reference", "maximize"]
+EXPECTED_DATASET_FIELDS = ["task_id", "dataset_id", "dataset_name", "dataset_summary", "data_reference", "data_url"]
+
+def dump_json(obj, fp):
+    """Dump to JSON in a numpy-safe fashion."""
+    json.dump(
+        obj,
+        fp,
+        indent=4,
+        sort_keys=False,
+        separators=(", ", ": "),
+        ensure_ascii=False,
+    )
+
+def create_quality_control(task_info, dataset_info, method_info, metric_info, results):
+    """Quality control to detect anomalies in the results."""
+    task_id = task_info["task_id"]
+
+    result_qc = []
+
+    def add_qc(
+        category: str,
+        name: str,
+        value,
+        severity_value: float,
+        code: str,
+        message: str,
+    ) -> None:
+        "Add an entry to the result qc"
+        if severity_value <= 1:
+            severity = 0
+        elif severity_value <= 2:
+            severity = 1
+        elif severity_value <= 3:
+            severity = 2
+        else:
+            severity = 3
+        result_qc.append({
+            "task_id": task_id,
+            "category": category,
+            "name": name,
+            "value": value,
+            "severity": severity,
+            "severity_value": severity_value,
+            "code": code,
+            "message": message
+        })
+    
+    def percent_missing(list_of_dicts, field):
+        are_missing = []
+        for item in list_of_dicts:
+            if field == 'paper_reference' and item.get('is_baseline', False):
+                are_missing.append(0.0)
+            elif field in item and item[field] is not None:
+                are_missing.append(0.0)
+            else:
+                are_missing.append(1.0)
+        return np.mean(are_missing)
+    
+    # check task_info
+    for field in EXPECTED_TASK_FIELDS:
+        pct_missing = percent_missing([task_info], field)
+        add_qc(
+            "Task info",
+            f"Pct '{field}' missing",
+            pct_missing,
+            3.0 if pct_missing > 0 else 0.0,
+            "percent_missing([task_info], field)",
+            f"Task metadata field '{field}' should be defined\n"
+            f"  Task id: {task_id}\n"
+            f"  Field: {field}\n"
+        )
+    
+    # check method_info
+    for field in EXPECTED_METHOD_FIELDS:
+        pct_missing = percent_missing(method_info, field)
+        add_qc(
+            "Method info",
+            f"Pct '{field}' missing",
+            pct_missing,
+            3.0 if pct_missing > 0 else 0.0,
+            "percent_missing(method_info, field)",
+            f"Method metadata field '{field}' should be defined\n"
+            f"  Task id: {task_id}\n"
+            f"  Field: {field}\n"
+        )
+
+    # check metric_info
+    for field in EXPECTED_METRIC_FIELDS:
+        pct_missing = percent_missing(metric_info, field)
+        add_qc(
+            "Metric info",
+            f"Pct '{field}' missing",
+            pct_missing,
+            3.0 if pct_missing > 0 else 0.0,
+            "percent_missing(metric_info, field)",
+            f"Metric metadata field '{field}' should be defined\n"
+            f"  Task id: {task_id}\n"
+            f"  Field: {field}\n"
+        )
+
+    # check dataset_info
+    for field in EXPECTED_DATASET_FIELDS:
+        pct_missing = percent_missing(dataset_info, field)
+        add_qc(
+            "Dataset info",
+            f"Pct '{field}' missing",
+            pct_missing,
+            3.0 if pct_missing > 0 else 0.0,
+            "percent_missing(dataset_info, field)",
+            f"Dataset metadata field '{field}' should be defined\n"
+            f"  Task id: {task_id}\n"
+            f"  Field: {field}\n"
+        )
+
+    # turn results into long format for easier processing
+    results_long = [
+        {
+            "task_id": x["task_id"],
+            "method_id": x["method_id"],
+            "dataset_id": x["dataset_id"],
+            "metric_id": metric["metric_id"],
+            "metric_value" : x["metric_values"].get(metric["metric_id"]),
+            "scaled_score" : x["scaled_scores"].get(metric["metric_id"]),
+        }
+        for metric in metric_info
+        for x in results
+    ]
+
+    # check percentage missing
+    pct_missing = 1 - len(results_long) / (len(method_info) * len(metric_info) * len(dataset_info))
+    add_qc(
+        "Raw data",
+        "Number of results",
+        len(results),
+        pct_missing / .1,
+        "len(results) == len(method_info) * len(metric_info) * len(dataset_info)",
+        f"Number of results should be equal to #methods × #metrics × #datasets.\n"
+        f"  Task id: {task_id}\n"
+        f"  Number of results: {len(results)}\n"
+        f"  Number of methods: {len(method_info)}\n"
+        f"  Number of metrics: {len(metric_info)}\n"
+        f"  Number of datasets: {len(dataset_info)}\n"
+    )
+
+    # QC per metric
+    for metric in metric_info:
+        metric_id = metric["metric_id"]
+        values = [
+            res
+            for res in results_long
+            if res["metric_id"] == metric_id
+            and res["metric_value"] is not None
+            and np.isreal(res["metric_value"])
+        ]
+        pct_missing = 1 - len(values) / len(dataset_info) / len(method_info)
+
+        add_qc(
+            "Raw results",
+            f"Metric '{metric_id}' %missing",
+            pct_missing,
+            pct_missing / .1,
+            "pct_missing <= .1",
+            f"Percentage of missing results should be less than 10%.\n"
+            f"  Task id: {task_id}\n"
+            f"  Metric id: {metric_id}\n"
+            f"  Percentage missing: {pct_missing*100:.0f}%\n"
+        )
+
+    # QC per method
+    for method in method_info:
+        method_id = method["method_id"]
+        values = [ 
+            res
+            for res in results_long
+            if res["method_id"] == method_id
+            and res["metric_value"] is not None
+            and np.isreal(res["metric_value"])
+        ]
+        pct_missing = 1 - len(values) / len(dataset_info) / len(metric_info)
+
+        add_qc(
+            "Raw results",
+            f"Method '{method_id}' %missing",
+            pct_missing,
+            pct_missing / .1,
+            "pct_missing <= .1",
+            f"Percentage of missing results should be less than 10%.\n"
+            f"  Task id: {task_id}\n"
+            f"  method id: {method_id}\n"
+            f"  Percentage missing: {pct_missing*100:.0f}%\n"
+        )
+
+    # QC per dataset
+    for dataset in dataset_info:
+        dataset_id = dataset["dataset_id"]
+        values = [
+            res
+            for res in results_long
+            if res["dataset_id"] == dataset_id
+            and res["metric_value"] is not None
+            and np.isreal(res["metric_value"])
+        ]
+        pct_missing = 1 - len(values) / len(metric_info) / len(method_info)
+
+        add_qc(
+            "Raw results",
+            f"Dataset '{dataset_id}' %missing",
+            pct_missing,
+            pct_missing / .1,
+            "pct_missing <= .1",
+            f"Percentage of missing results should be less than 10%.\n"
+            f"  Task id: {task_id}\n"
+            f"  dataset id: {dataset_id}\n"
+            f"  Percentage missing: {pct_missing*100:.0f}%\n"
+        )
+
+
+    # QC per metric and method
+    for metric in metric_info:
+        for method in method_info:
+            metric_id = metric["metric_id"]
+            method_id = method["method_id"]
+            scores = [ 
+                res["scaled_score"]
+                for res in results_long
+                if res["metric_id"] == metric_id
+                and res["method_id"] == method_id
+                and res["scaled_score"] is not None
+                and np.isreal(res["scaled_score"])
+            ]
+
+            if len(scores) >= 1:
+                worst_score = np.min(scores).item()
+                best_score = np.max(scores).item()
+
+                add_qc(
+                    "Scaling",
+                    f"Worst score {method_id} {metric_id}",
+                    worst_score,
+                    worst_score / -1,
+                    "worst_score >= -1",
+                    f"Method {method_id} performs much worse than baselines.\n"
+                    f"  Task id: {task_id}\n"
+                    f"  Method id: {method_id}\n"
+                    f"  Metric id: {metric_id}\n"
+                    f"  Worst score: {worst_score}%\n"
+                )
+
+                add_qc(
+                    "Scaling",
+                    f"Best score {method_id} {metric_id}",
+                    best_score,
+                    best_score / 2,
+                    "best_score <= 2",
+                    f"Method {method_id} performs a lot better than baselines.\n"
+                    f"  Task id: {task_id}\n"
+                    f"  Method id: {method_id}\n"
+                    f"  Metric id: {metric_id}\n"
+                    f"  Best score: {best_score}%\n"
+                )
+
+    return result_qc
+
+def main(par):
+    # read data from files
+    with open(par["task_info"], "r", encoding="utf8") as file:
+        task_info = json.load(file)
+    with open(par["method_info"], "r", encoding="utf8") as file:
+        method_info = json.load(file)
+    with open(par["metric_info"], "r", encoding="utf8") as file:
+        metric_info = json.load(file)
+    with open(par["dataset_info"], "r", encoding="utf8") as file:
+        dataset_info = json.load(file)
+    with open(par["results"], "r", encoding="utf8") as file:
+        results = json.load(file)
+
+    # create info objects
+    quality_control = create_quality_control(task_info, dataset_info, method_info, metric_info, results)
+
+    # write data to files
+    with open(par["output"], "w", encoding="utf8") as file:
+        dump_json(quality_control, file)
+
+if __name__ == "__main__":
+    main(par)
diff --git a/src/common/process_task_results/get_api_info/config.vsh.yaml b/src/common/process_task_results/get_api_info/config.vsh.yaml
new file mode 100644
index 0000000000..0e7eb1696e
--- /dev/null
+++ b/src/common/process_task_results/get_api_info/config.vsh.yaml
@@ -0,0 +1,18 @@
+__merge__: ../api/get_info.yaml
+functionality:
+  status: disabled
+  name: get_api_info
+  description: "Extract api info"
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [ purrr, dplyr, yaml, rlang, processx ]
+  - type: nextflow
+    directives:
+      label: [lowmem, lowtime, lowcpu]
+  - type: native
diff --git a/src/common/process_task_results/get_api_info/script.R b/src/common/process_task_results/get_api_info/script.R
new file mode 100644
index 0000000000..1686dee222
--- /dev/null
+++ b/src/common/process_task_results/get_api_info/script.R
@@ -0,0 +1,79 @@
+library(purrr)
+library(dplyr)
+library(yaml)
+library(rlang)
+
+## VIASH START
+par <- list(
+  input = ".",
+  task_id = "label_projection",
+  output = "output/api.json"
+)
+## VIASH END
+
+comp_yamls <- list.files(paste(par$input, "src/tasks", par$task_id, "api", sep = "/"), pattern = "comp_", full.names = TRUE)
+file_yamls <- list.files(paste(par$input, "src/tasks", par$task_id, "api", sep = "/"), pattern = "file_", full.names = TRUE)
+
+# list component - file args links
+comp_file <- map_df(comp_yamls, function(yaml_file) {
+  conf <- yaml::read_yaml(yaml_file)
+
+  map_df(conf$functionality$arguments, function(arg) {
+    tibble(
+      comp_name = basename(yaml_file) %>% gsub("\\.yaml", "", .),
+      arg_name = gsub("^-*", "", arg$name),
+      direction = arg$direction %||% "input",
+      file_name = basename(arg$`__merge__`) %>% gsub("\\.yaml", "", .)
+    )
+  })
+})
+
+# get component info
+comp_info <- map_df(comp_yamls, function(yaml_file) {
+  conf <- yaml::read_yaml(yaml_file)
+
+  tibble(
+    name = basename(yaml_file) %>% gsub("\\.yaml", "", .),
+    label = name %>% gsub("comp_", "", .) %>% gsub("_", " ", .)
+  )
+})
+
+# get file info
+file_info <- map_df(file_yamls, function(yaml_file) {
+  arg <- yaml::read_yaml(yaml_file)
+  
+  tibble(
+    name = basename(yaml_file) %>% gsub("\\.yaml", "", .),
+    description = arg$description,
+    label = arg$info$label,
+    example = arg$example,
+    clean_label = name %>% gsub("file_", "", .) %>% gsub("_", " ", .)
+  )
+})
+
+# get file - slot args
+file_slot <- map_df(file_yamls, function(yaml_file) {
+  arg <- yaml::read_yaml(yaml_file)
+
+  map2_df(names(arg$info$slots), arg$info$slots, function(group_name, slot) {
+    df <- map_df(slot, as.data.frame)
+    df$struct <- group_name
+    df$file_name <- basename(yaml_file) %>% gsub("\\.yaml", "", .)
+    as_tibble(df)
+  })
+}) %>%
+  mutate(multiple = multiple %|% FALSE)
+
+out <- list(
+  comp_info = purrr::transpose(comp_info),
+  file_info = purrr::transpose(file_info),
+  comp_file_io = purrr::transpose(comp_file),
+  file_schema = purrr::transpose(file_slot)
+)
+
+jsonlite::write_json(
+  out,
+  par$output,
+  auto_unbox = TRUE,
+  pretty = TRUE
+)
diff --git a/src/common/process_task_results/get_dataset_info/config.vsh.yaml b/src/common/process_task_results/get_dataset_info/config.vsh.yaml
new file mode 100644
index 0000000000..10247a22ba
--- /dev/null
+++ b/src/common/process_task_results/get_dataset_info/config.vsh.yaml
@@ -0,0 +1,20 @@
+__merge__: ../api/get_info.yaml
+functionality:
+  name: "get_dataset_info"
+  description: "Extract dataset info and convert to expected format for website results"
+  resources:
+    - type: r_script
+      path: script.R
+  test_resources:
+    - type: file
+      path: /resources_test/common/task_metadata/dataset_info.yaml
+      dest: test_file.yaml
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [ purrr, yaml, rlang, processx ]
+  - type: nextflow
+    directives:
+      label: [lowmem, lowtime, lowcpu]
diff --git a/src/common/process_task_results/get_dataset_info/script.R b/src/common/process_task_results/get_dataset_info/script.R
new file mode 100644
index 0000000000..a2c5317c05
--- /dev/null
+++ b/src/common/process_task_results/get_dataset_info/script.R
@@ -0,0 +1,54 @@
+requireNamespace("jsonlite", quietly = TRUE)
+requireNamespace("yaml", quietly = TRUE)
+library(purrr, warn.conflicts = FALSE)
+library(rlang, warn.conflicts = FALSE)
+
+## VIASH START
+par <- list(
+  input = "output/label_projection/dataset_uns.yaml",
+  output = "output/dataset_info.json"
+)
+## VIASH END
+
+datasets <- yaml::yaml.load_file(par$input)
+
+# transform into format expected by website
+outputs <- map(datasets, function(dataset) {
+  # ↑ the 'dataset' object could be used as the new format
+
+  # TODO: it'd be nice if the s3 path was also included in the dataset info
+
+  # construct v1 format
+  out <- list(
+    "task_id" = par$task_id,
+    "dataset_id" = dataset$dataset_id,
+    "dataset_name" = dataset$dataset_name,
+    "dataset_summary" = dataset$dataset_summary,
+    "dataset_description" = dataset$dataset_description %||% NA_character_,
+    "data_reference" = dataset$dataset_reference %||% NA_character_,
+    "data_url" = dataset$dataset_url %||% NA_character_,
+    "date_created" = dataset$date_created %||% NA_character_,
+    "file_size" = dataset$file_size %||% NA_character_
+  )
+
+  if (!is.null(dataset[["common_dataset_id"]])) {
+    out[["common_dataset_id"]] <- dataset[["common_dataset_id"]]
+  }
+
+  # show warning when certain data is missing and return null?
+  for (n in names(out)) {
+    if (is.null(out[[n]])) {
+      out_as_str <- jsonlite::toJSON(out, auto_unbox = TRUE, pretty = TRUE)
+      stop("missing value for value '", n, "' in ", out_as_str)
+    }
+  }
+
+  out
+})
+
+jsonlite::write_json(
+  outputs,
+  par$output,
+  auto_unbox = TRUE,
+  pretty = TRUE
+)
diff --git a/src/common/process_task_results/get_method_info/config.vsh.yaml b/src/common/process_task_results/get_method_info/config.vsh.yaml
new file mode 100644
index 0000000000..053bbac53c
--- /dev/null
+++ b/src/common/process_task_results/get_method_info/config.vsh.yaml
@@ -0,0 +1,20 @@
+__merge__: ../api/get_info.yaml
+functionality:
+  name: "get_method_info"
+  description: "Extract method info"
+  resources:
+    - type: r_script
+      path: script.R
+  test_resources:
+    - type: file
+      path: /resources_test/common/task_metadata/method_configs.yaml
+      dest: test_file.yaml
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [ purrr, yaml, rlang, processx ]
+  - type: nextflow
+    directives:
+      label: [lowmem, lowtime, lowcpu]
diff --git a/src/common/process_task_results/get_method_info/script.R b/src/common/process_task_results/get_method_info/script.R
new file mode 100644
index 0000000000..a332413b69
--- /dev/null
+++ b/src/common/process_task_results/get_method_info/script.R
@@ -0,0 +1,76 @@
+requireNamespace("jsonlite", quietly = TRUE)
+requireNamespace("yaml", quietly = TRUE)
+library(purrr, warn.conflicts = FALSE)
+library(rlang, warn.conflicts = FALSE)
+
+## VIASH START
+par <- list(
+  input = "output/temp/method_configs.yaml",
+  output = "output/test/method_info.json"
+)
+## VIASH END
+
+configs <- yaml::yaml.load_file(par$input)
+
+outputs <- map(configs, function(config) {
+  if (length(config$functionality$status) > 0 && config$functionality$status == "disabled") {
+    return(NULL)
+  }
+
+  # prep for viash 0.9.0
+  build_info <- config$build_info %||% config$info
+  if ("functionality" %in% names(config)) {
+    config[names(config$functionality)] <- config$functionality
+    config[["functionality"]] <- NULL
+  }
+
+  info <- config$info
+
+  # add extra info
+  info$config_path <- gsub(".*/src/", "src/", build_info$config)
+  info$task_id <- gsub("/.*", "", config$namespace)
+  info$id <- config$name
+  info$namespace <- config$namespace
+  info$commit_sha <- build_info$git_commit %||% "missing-sha"
+  info$code_version <- "missing-version"
+    info$implementation_url <- paste0(
+      build_info$git_remote, "/blob/",
+      build_info$git_commit, "/",
+      info$config_path
+    )
+
+  # ↑ this could be used as the new format
+
+  # construct v1 format
+  out <- list(
+    task_id = info$task_id,
+    method_id = info$id,
+    method_name = info$label,
+    method_summary = info$summary,
+    method_description = info$description,
+    is_baseline = grepl("control", info$type),
+    paper_reference = info$reference %||% NA_character_,
+    code_url = info$repository_url %||% NA_character_,
+    implementation_url = info$implementation_url %||% NA_character_,
+    code_version = NA_character_,
+    commit_sha = info$commit_sha
+  )
+
+  # show warning when certain data is missing and return null?
+  for (n in names(out)) {
+    if (is.null(out[[n]])) {
+      out_as_str <- jsonlite::toJSON(out, auto_unbox = TRUE, pretty = TRUE)
+      stop("missing value for value '", n, "' in ", out_as_str)
+    }
+  }
+
+  # return output
+  out
+})
+
+jsonlite::write_json(
+  outputs,
+  par$output,
+  auto_unbox = TRUE,
+  pretty = TRUE
+)
\ No newline at end of file
diff --git a/src/common/process_task_results/get_metric_info/config.vsh.yaml b/src/common/process_task_results/get_metric_info/config.vsh.yaml
new file mode 100644
index 0000000000..ee5833b5b9
--- /dev/null
+++ b/src/common/process_task_results/get_metric_info/config.vsh.yaml
@@ -0,0 +1,20 @@
+__merge__: ../api/get_info.yaml
+functionality:
+  name: "get_metric_info"
+  description: "Extract metric info"
+  resources:
+    - type: r_script
+      path: script.R
+  test_resources:
+    - type: file
+      path: /resources_test/common/task_metadata/metric_configs.yaml
+      dest: test_file.yaml
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [ purrr, yaml, rlang, processx ]
+  - type: nextflow
+    directives:
+      label: [lowmem, lowtime, lowcpu]
diff --git a/src/common/process_task_results/get_metric_info/script.R b/src/common/process_task_results/get_metric_info/script.R
new file mode 100644
index 0000000000..5ef8f6b04b
--- /dev/null
+++ b/src/common/process_task_results/get_metric_info/script.R
@@ -0,0 +1,81 @@
+requireNamespace("jsonlite", quietly = TRUE)
+requireNamespace("yaml", quietly = TRUE)
+library(purrr, warn.conflicts = FALSE)
+library(rlang, warn.conflicts = FALSE)
+
+## VIASH START
+par <- list(
+  input = "output/temp/metric_configs.yaml",
+  output = "output/metric_info.json"
+)
+## VIASH END
+
+configs <- yaml::yaml.load_file(par$input)
+
+outputs <- map(configs, function(config) {
+  if (length(config$functionality$status) > 0 && config$functionality$status == "disabled") {
+    return(NULL)
+  }
+
+  # prep for viash 0.9.0
+  build_info <- config$build_info %||% config$info
+  if ("functionality" %in% names(config)) {
+    config[names(config$functionality)] <- config$functionality
+    config[["functionality"]] <- NULL
+  }
+
+  map(
+    config$info$metrics,
+    function(info) {
+      # add extra info
+      info$config_path <- gsub(".*/src/", "src/", build_info$config)
+      info$task_id <- gsub("/.*", "", config$namespace)
+      info$id <- info$name
+      info$component_id <- config$name
+      info$namespace <- config$namespace
+      info$commit_sha <- build_info$git_commit %||% "missing-sha"
+      info$code_version <- "missing-version"
+      info$implementation_url <- paste0(
+        build_info$git_remote, "/blob/",
+        build_info$git_commit, "/",
+        info$config_path
+      )
+
+      # ↑ this could be used as the new format
+
+      # construct v1 format
+      out <- list(
+        task_id = info$task_id,
+        metric_id = info$id,
+        metric_name = info$label,
+        metric_summary = info$summary,
+        metric_description = info$description,
+        paper_reference = info$reference %||% NA_character_,
+        implementation_url = info$implementation_url %||% NA_character_,
+        code_version = NA_character_,
+        commit_sha = info$commit_sha,
+        maximize = info$maximize
+      )
+
+      # show warning when certain data is missing and return null?
+      for (n in names(out)) {
+        if (is.null(out[[n]])) {
+          out_as_str <- jsonlite::toJSON(out, auto_unbox = TRUE, pretty = TRUE)
+          stop("missing value for value '", n, "' in ", out_as_str)
+        }
+      }
+
+      # return output
+      out
+    }
+  )
+})
+
+outputs <- unlist(outputs, recursive = FALSE)
+
+jsonlite::write_json(
+  outputs,
+  par$output,
+  auto_unbox = TRUE,
+  pretty = TRUE
+)
\ No newline at end of file
diff --git a/src/common/process_task_results/get_results/config.vsh.yaml b/src/common/process_task_results/get_results/config.vsh.yaml
new file mode 100644
index 0000000000..cd639fad4d
--- /dev/null
+++ b/src/common/process_task_results/get_results/config.vsh.yaml
@@ -0,0 +1,51 @@
+functionality:
+  name: "get_results"
+  description: "Extract execution info"
+  namespace: common/process_task_results
+  arguments:
+    - name: "--task_id"
+      type: "string"
+      example: "batch_integration"
+      description: "Task id"
+    - name: "--input_scores"
+      type: "file"
+      example: score_uns.yaml
+      description: "Scores file"
+    - name: "--input_execution"
+      type: "file"
+      example: trace.txt
+      description: "Nextflow log file"
+    - name: "--input_dataset_info"
+      type: "file"
+      example: dataset_info.json
+      description: "Method info file"
+    - name: "--input_method_info"
+      type: "file"
+      example: method_info.json
+      description: "Method info file"
+    - name: "--input_metric_info"
+      type: "file"
+      example: metric_info.json
+      description: "Metric info file"
+    - name: "--output_results"
+      type: "file"
+      direction: "output"
+      default: "results.json"
+      description: "Output json"
+    - name: "--output_metric_execution_info"
+      type: "file"
+      direction: "output"
+      default: "metric_execution_info.json"
+      description: "Output metric execution info"
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [ purrr, yaml, rlang, dplyr, tidyr, readr, lubridate, dynutils, processx ]
+  - type: nextflow
+    directives:
+      label: [lowmem, lowtime, lowcpu]
diff --git a/src/common/process_task_results/get_results/script.R b/src/common/process_task_results/get_results/script.R
new file mode 100644
index 0000000000..822562aa18
--- /dev/null
+++ b/src/common/process_task_results/get_results/script.R
@@ -0,0 +1,237 @@
+requireNamespace("jsonlite", quietly = TRUE)
+requireNamespace("yaml", quietly = TRUE)
+requireNamespace("dynutils", quietly = TRUE)
+requireNamespace("readr", quietly = TRUE)
+requireNamespace("lubridate", quietly = TRUE)
+library(dplyr, warn.conflicts = FALSE)
+library(tidyr, warn.conflicts = FALSE)
+library(purrr, warn.conflicts = FALSE)
+library(rlang, warn.conflicts = FALSE)
+
+## VIASH START
+dir <- "work/c1/6660ea0cc6155d7e13fa341d16057b/_viash_par"
+par <- list(
+  task_id = "task_1",
+  input_scores = paste0(dir, "/input_scores_1/score_uns.yaml"),
+  input_execution = paste0(dir, "/input_execution_1/trace.txt"),
+  input_dataset_info = paste0(dir, "/input_dataset_info_1/output.json"),
+  input_method_info = paste0(dir, "/input_method_info_1/output.json"),
+  input_metric_info = paste0(dir, "/input_metric_info_1/output.json"),
+  output_results = "output/results.json",
+  output_metric_execution_info = "output/metric_execution_info.json"
+)
+## VIASH END
+
+# --- helper functions ---------------------------------------------------------
+cat("Loading helper functions\n")
+parse_exit <- function(x) {
+  if (is.na(x) || x == "-") {
+    NA_integer_
+  } else {
+    as.integer(x)
+  }
+}
+parse_duration <- function(x) {
+  if (is.na(x) || x == "-") {
+    NA_real_
+  } else {
+    as.numeric(lubridate::duration(toupper(x)))
+  }
+}
+parse_cpu <- function(x) {
+  if (is.na(x) || x == "-") {
+    NA_real_
+  } else {
+    as.numeric(gsub(" *%", "", x))
+  }
+}
+parse_size <- function(x) {
+  out <-
+    if (is.na(x) || x == "-") {
+      NA_integer_
+    } else if (grepl("GB", x)) {
+      as.numeric(gsub(" *GB", "", x)) * 1024
+    } else if (grepl("MB", x)) {
+      as.numeric(gsub(" *MB", "", x))
+    } else if (grepl("KB", x)) {
+      as.numeric(gsub(" *KB", "", x)) / 1024
+    } else if (grepl("B", x)) {
+      as.numeric(gsub(" *B", "", x)) / 1024 / 1024
+    } else {
+      NA_integer_
+    }
+  as.integer(ceiling(out))
+}
+
+# --- read input files ---------------------------------------------------------
+cat("Reading input files\n")
+# read scores
+raw_scores <-
+  yaml::yaml.load_file(par$input_scores) %>%
+  map_df(function(x) {
+    tryCatch({
+      as_tibble(as.data.frame(
+        x[c("dataset_id", "method_id", "metric_ids", "metric_values")]
+      ))
+    }, error = function(e) {
+      message("Encountered error while reading scores: ", e$message)
+      NULL
+    })
+  })
+
+# read metric info
+dataset_info <- jsonlite::read_json(par$input_dataset_info, simplifyVector = TRUE)
+method_info <- jsonlite::read_json(par$input_method_info, simplifyVector = TRUE)
+metric_info <- jsonlite::read_json(par$input_metric_info, simplifyVector = TRUE)
+
+# --- process scores and execution info ----------------------------------------
+cat("Processing scores and execution info\n")
+scale_scores <- function(values, is_control, maximize) {
+  control_values <- values[is_control & !is.na(values)]
+  if (length(control_values) < 2) {
+    return(NA_real_)
+  }
+
+  min_control_value <- min(control_values)
+  max_control_value <- max(control_values)
+
+  if (min_control_value == max_control_value) {
+    return(NA_real_)
+  }
+
+  scaled <- (values - min_control_value) / (max_control_value - min_control_value)
+
+  if (maximize) {
+    scaled
+  } else {
+    1 - scaled
+  }
+}
+aggregate_scores <- function(scaled_score) {
+  mean(pmin(1, pmax(0, scaled_score)) %|% 0)
+}
+scores <- raw_scores %>%
+  complete(
+    dataset_id,
+    method_id,
+    metric_ids,
+    fill = list(metric_values = NA_real_)
+  ) %>%
+  left_join(method_info %>% select(method_id, is_baseline), by = "method_id") %>%
+  left_join(metric_info %>% select(metric_ids = metric_id, maximize), by = "metric_ids") %>%
+  group_by(metric_ids, dataset_id) %>%
+  mutate(scaled_score = scale_scores(metric_values, is_baseline, maximize[[1]]) %|% 0) %>%
+  group_by(dataset_id, method_id) %>%
+  summarise(
+    metric_values = list(as.list(setNames(metric_values, metric_ids))),
+    scaled_scores = list(as.list(setNames(scaled_score, metric_ids))),
+    mean_score = aggregate_scores(scaled_score),
+    .groups = "drop"
+  )
+
+# read nxf log and process the task id
+norm_methods <- "/log_cp10k|/log_cpm|/sqrt_cp10k|/sqrt_cpm|/l1_sqrt|/log_scran_pooling"
+id_regex <- paste0("^.*:(.*)_process \\(([^\\.]*)(", norm_methods, ")?(.[^\\.]*)?\\.(.*)\\)$")
+
+trace <- readr::read_tsv(par$input_execution) %>%
+  mutate(
+    id = name,
+    process_id = stringr::str_extract(id, id_regex, 1L),
+    dataset_id = stringr::str_extract(id, id_regex, 2L),
+    normalization_id = gsub("^/", "", stringr::str_extract(id, id_regex, 3L)),
+    grp4 = gsub("^\\.", "", stringr::str_extract(id, id_regex, 4L)),
+    grp5 = stringr::str_extract(id, id_regex, 5L),
+    submit = strptime(submit, "%Y-%m-%d %H:%M:%S"),
+  ) %>%
+  # detect whether entry is a metric or a method
+  mutate(
+    method_id = ifelse(is.na(grp4), grp5, grp4),
+    metric_id = ifelse(is.na(grp4), grp4, grp5)
+  ) %>%
+  select(-grp4, -grp5) %>%
+  filter(!is.na(method_id)) %>%
+  # take last entry for each run
+  arrange(desc(submit)) %>%
+  group_by(name) %>%
+  slice(1) %>%
+  ungroup()
+
+# parse values
+execution_info <- trace %>%
+  filter(process_id == method_id) %>% # only keep method entries
+  rowwise() %>%
+  transmute(
+    dataset_id,
+    normalization_id,
+    method_id,
+    resources = list(list(
+      exit_code = parse_exit(exit),
+      duration_sec = parse_duration(realtime),
+      cpu_pct = parse_cpu(`%cpu`),
+      peak_memory_mb = parse_size(peak_vmem),
+      disk_read_mb = parse_size(rchar),
+      disk_write_mb = parse_size(wchar)
+    ))
+  ) %>%
+  ungroup()
+
+# combine scores with execution info
+# fill up missing entries with NAs and 0s
+metric_ids <- unique(raw_scores$metric_ids)
+rep_names <- function(val) {
+  setNames(
+    as.list(rep(val, length(metric_ids))),
+    metric_ids
+  )
+}
+out <- full_join(
+  scores,
+  execution_info,
+  by = c("method_id", "dataset_id")
+) %>%
+  rowwise() %>%
+  mutate(
+    task_id = par$task_id,
+    metric_values = list(metric_values %||% rep_names(NA_real_)),
+    scaled_scores = list(scaled_scores %||% rep_names(0)),
+    mean_score = mean_score %|% 0,
+  ) %>%
+  ungroup()
+
+
+# --- process metric execution info --------------------------------------------
+cat("Processing metric execution info\n")
+metric_execution_info <- trace %>%
+  filter(process_id == metric_id) %>% # only keep metric entries
+  rowwise() %>%
+  transmute(
+    dataset_id,
+    normalization_id,
+    method_id,
+    metric_id,
+    resources = list(list(
+      exit_code = parse_exit(exit),
+      duration_sec = parse_duration(realtime),
+      cpu_pct = parse_cpu(`%cpu`),
+      peak_memory_mb = parse_size(peak_vmem),
+      disk_read_mb = parse_size(rchar),
+      disk_write_mb = parse_size(wchar)
+    ))
+  ) %>%
+  ungroup()
+
+# --- write output files -------------------------------------------------------
+cat("Writing output files\n")
+# write output files
+jsonlite::write_json(
+  purrr::transpose(out),
+  par$output_results,
+  auto_unbox = TRUE,
+  pretty = TRUE
+)
+jsonlite::write_json(
+  purrr::transpose(metric_execution_info),
+  par$output_metric_execution_info,
+  auto_unbox = TRUE,
+  pretty = TRUE
+)
diff --git a/src/common/process_task_results/get_task_info/config.vsh.yaml b/src/common/process_task_results/get_task_info/config.vsh.yaml
new file mode 100644
index 0000000000..2e8fbd2b66
--- /dev/null
+++ b/src/common/process_task_results/get_task_info/config.vsh.yaml
@@ -0,0 +1,20 @@
+__merge__: ../api/get_info.yaml
+functionality:
+  name: "get_task_info"
+  description: "Extract task info"
+  resources:
+    - type: r_script
+      path: script.R
+  test_resources:
+    - type: file
+      path: /resources_test/common/task_metadata/task_info.yaml
+      dest: test_file.yaml
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [ purrr, yaml, rlang, processx ]
+  - type: nextflow
+    directives:
+      label: [lowmem, lowtime, lowcpu]
diff --git a/src/common/process_task_results/get_task_info/script.R b/src/common/process_task_results/get_task_info/script.R
new file mode 100644
index 0000000000..71f1cb777a
--- /dev/null
+++ b/src/common/process_task_results/get_task_info/script.R
@@ -0,0 +1,40 @@
+requireNamespace("jsonlite", quietly = TRUE)
+requireNamespace("yaml", quietly = TRUE)
+library(purrr, warn.conflicts = FALSE)
+library(rlang, warn.conflicts = FALSE)
+
+## VIASH START
+par <- list(
+  input = "output/temp/task_info.yaml",
+  output = "output/test/task_info.json"
+)
+## VIASH END
+
+info <- yaml::yaml.load_file(par$input)
+# ↑ this could be used as the new format
+
+# construct v1 format
+out <- list(
+  task_id = info$name,
+  commit_sha = NA_character_,
+  task_name = info$label,
+  task_summary = info$summary,
+  task_description = paste0(info$motivation, "\n\n", info$description),
+  repo = "openproblems-bio/openproblems",
+  authors = info$authors
+)
+
+# show warning when certain data is missing and return null?
+for (n in names(out)) {
+  if (is.null(out[[n]])) {
+    out_as_str <- jsonlite::toJSON(out, auto_unbox = TRUE, pretty = TRUE)
+    stop("missing value for value '", n, "' in ", out_as_str)
+  }
+}
+
+jsonlite::write_json(
+  out,
+  par$output,
+  auto_unbox = TRUE,
+  pretty = TRUE
+)
diff --git a/src/common/process_task_results/run/config.vsh.yaml b/src/common/process_task_results/run/config.vsh.yaml
new file mode 100644
index 0000000000..d746a54245
--- /dev/null
+++ b/src/common/process_task_results/run/config.vsh.yaml
@@ -0,0 +1,91 @@
+functionality:
+  name: run
+  namespace: common/process_task_results
+  description: >-
+    This workflow transforms the meta information of the results into a format
+    that can be used by the website.
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input_scores"
+          type: file
+          required: true
+          direction: input
+          description: A yaml file containing the scores of each of the methods
+          example: score_uns.yaml
+        - name: "--input_method_configs"
+          type: file
+          required: true
+          direction: input
+          example: method_configs.yaml
+        - name: "--input_metric_configs"
+          type: file
+          required: true
+          direction: input
+          example: metric_configs.yaml
+        - name: "--input_dataset_info"
+          type: file
+          required: true
+          direction: input
+          example: dataset_info.yaml
+        - name: "--input_execution"
+          type: file
+          required: true
+          direction: input
+          example: trace.txt
+        - name: "--input_task_info"
+          type: file
+          required: true
+          direction: input
+          example: task_info.yaml
+    - name: Outputs
+      arguments:
+        - name: "--output_scores"
+          type: file
+          required: true
+          direction: output
+          description: A yaml file containing the scores of each of the methods
+          default: results.json
+        - name: "--output_method_info"
+          type: file
+          required: true
+          direction: output
+          default: method_info.json
+        - name: "--output_metric_info"
+          type: file
+          required: true
+          direction: output
+          default: metric_info.json
+        - name: "--output_dataset_info"
+          type: file
+          required: true
+          direction: output
+          default: dataset_info.json
+        - name: "--output_task_info"
+          type: file
+          required: true
+          direction: output
+          default: task_info.json
+        - name: "--output_qc"
+          type: file
+          required: true
+          direction: output
+          default: quality_control.json
+        - name: "--output_metric_execution_info"
+          type: file
+          required: true
+          direction: output
+          default: metric_execution_info.json
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+  dependencies:
+    - name: common/process_task_results/get_results
+    - name: common/process_task_results/get_method_info
+    - name: common/process_task_results/get_metric_info
+    - name: common/process_task_results/get_dataset_info
+    - name: common/process_task_results/get_task_info
+    - name: common/process_task_results/generate_qc
+platforms:
+  - type: nextflow
\ No newline at end of file
diff --git a/src/common/process_task_results/run/main.nf b/src/common/process_task_results/run/main.nf
new file mode 100644
index 0000000000..dadbcfa1f6
--- /dev/null
+++ b/src/common/process_task_results/run/main.nf
@@ -0,0 +1,91 @@
+// workflow auto {
+//   findStates(params, meta.config)
+//     | meta.workflow.run(
+//       auto: [publish: "state"]
+//     )
+// }
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    | get_task_info.run(
+      key: "task_info",
+      fromState: [ 
+        "input": "input_task_info"
+      ],
+      toState: ["output_task": "output"]
+    )
+
+    // extract task id from task info
+    | map { id, state ->
+      def task_id = readJson(state.output_task).task_id
+      [id, state + ["task_id": task_id]]
+    }
+
+    | get_method_info.run(
+      fromState: [ 
+        "input": "input_method_configs",
+        "task_id" : "task_id"
+      ],
+      toState: ["output_method": "output"]
+    )
+
+    | get_metric_info.run(
+      fromState: [ 
+        "input": "input_metric_configs",
+        "task_id" : "task_id"
+      ],
+      toState: ["output_metric": "output"]
+    )
+
+    | get_dataset_info.run(
+      fromState: [
+        "task_id" : "task_id",
+        "input": "input_dataset_info",
+      ],
+      toState: ["output_dataset": "output"]
+    )
+
+    | get_results.run(
+      fromState: [ 
+        "task_id": "task_id",
+        "input_scores": "input_scores",
+        "input_execution": "input_execution",
+        "input_dataset_info": "output_dataset",
+        "input_method_info": "output_method",
+        "input_metric_info": "output_metric"
+      ],
+      toState: [
+        "output_results": "output_results",
+        "output_metric_execution_info": "output_metric_execution_info"
+      ]
+    )
+
+    | generate_qc.run(
+      fromState: [
+        "task_info": "output_task",
+        "method_info": "output_method",
+        "metric_info": "output_metric",
+        "dataset_info": "output_dataset",
+        "results": "output_results"
+      ],
+      toState: ["output_qc": "output"]
+    )
+
+    | setState([
+      "output_scores": "output_results",
+      "output_method_info": "output_method",
+      "output_metric_info": "output_metric",
+      "output_dataset_info": "output_dataset",
+      "output_task_info": "output_task",
+      "output_qc": "output_qc",
+      "output_metric_execution_info": "output_metric_execution_info"
+    ])
+
+  emit:
+  output_ch
+}
\ No newline at end of file
diff --git a/src/common/process_task_results/run/run_nf_tower_test.sh b/src/common/process_task_results/run/run_nf_tower_test.sh
new file mode 100644
index 0000000000..ca74e357a1
--- /dev/null
+++ b/src/common/process_task_results/run/run_nf_tower_test.sh
@@ -0,0 +1,38 @@
+#!/bin/bash
+
+DATASETS_DIR="s3://openproblems-data/resources/batch_integration/results/"
+
+# try running on nf tower
+cat > /tmp/params.yaml << 'HERE'
+id: batch_integration_transform
+input_scores: "$DATASETS_DIR/scores.yaml"
+input_dataset_info: "$DATASETS_DIR/dataset_info.yaml"
+input_method_configs: "$DATASETS_DIR/method_configs.yaml"
+input_metric_configs: "$DATASETS_DIR/metric_configs.yaml"
+input_execution: "$DATASETS_DIR/trace.txt"
+input_task_info: "$DATASETS_DIR/task_info.yaml"
+task_id: "batch_integration" 
+output_scores: "results.json"
+output_method_info: "method_info.json"
+output_metric_info: "metric_info.json"
+output_dataset_info: "dataset_info.json"
+output_task_info: "task_info.json"
+publish_dir: $DATASETS_DIR
+HERE
+
+cat > /tmp/nextflow.config << HERE
+process {
+  executor = 'awsbatch'
+}
+
+
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/common/workflows/transform_meta/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --config /tmp/nextflow.config
\ No newline at end of file
diff --git a/src/common/process_task_results/run/run_test.sh b/src/common/process_task_results/run/run_test.sh
new file mode 100755
index 0000000000..762785b754
--- /dev/null
+++ b/src/common/process_task_results/run/run_test.sh
@@ -0,0 +1,43 @@
+#!/bin/bash
+
+# fail on error
+set -e
+
+# ensure we're in the root of the repo
+REPO_ROOT=$(git rev-parse --show-toplevel)
+cd "$REPO_ROOT"
+
+for TASK in "denoising" "dimensionality_reduction" "batch_integration" "label_projection" "match_modalities" "predict_modality"; do
+# for TASK in "label_projection"; do
+  BASE_DIR="s3://openproblems-data/resources/$TASK/results/"
+  
+  # find subdir in bucket with latest date
+  DATE=$(aws s3 ls $BASE_DIR --recursive | awk '{print $4}' | grep 'task_info.yaml' | sort -r | head -n 1 | sed 's#.*/run_\(.*\)/[^/]*$#\1#')
+  
+  INPUT_DIR="$BASE_DIR/run_$DATE"
+  OUTPUT_DIR="../website/results/$TASK/data"
+
+  # # temp sync
+  # aws s3 sync $INPUT_DIR output/temp
+
+  echo "Processing $TASK - $DATE"
+
+  # start the run
+  NXF_VER=23.10.0 nextflow run . \
+    -main-script target/nextflow/common/process_task_results/run/main.nf \
+    -profile docker \
+    -resume \
+    -c src/wf_utils/labels_ci.config \
+    --id "process" \
+    --input_scores "$INPUT_DIR/score_uns.yaml" \
+    --input_dataset_info "$INPUT_DIR/dataset_uns.yaml" \
+    --input_method_configs "$INPUT_DIR/method_configs.yaml" \
+    --input_metric_configs "$INPUT_DIR/metric_configs.yaml" \
+    --input_execution "$INPUT_DIR/trace.txt" \
+    --input_task_info "$INPUT_DIR/task_info.yaml" \
+    --output_state "state.yaml" \
+    --publish_dir "$OUTPUT_DIR"
+
+  # cause quarto rerender to index page when in preview mode
+  touch ../website/results/$TASK/index.qmd
+done
\ No newline at end of file
diff --git a/src/common/process_task_results/yaml_to_json/config.vsh.yaml b/src/common/process_task_results/yaml_to_json/config.vsh.yaml
new file mode 100644
index 0000000000..7231cdcdbf
--- /dev/null
+++ b/src/common/process_task_results/yaml_to_json/config.vsh.yaml
@@ -0,0 +1,16 @@
+__merge__: ../api/get_info.yaml
+functionality:
+  name: "yaml_to_json"
+  description: "convert yaml file to json file"
+  resources:
+    - type: python_script
+      path: script.py
+  test_resources:
+    - type: file
+      path: /resources_test/common/task_metadata/dataset_info.yaml
+      dest: test_file.yaml
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+  - type: native
diff --git a/src/common/process_task_results/yaml_to_json/script.py b/src/common/process_task_results/yaml_to_json/script.py
new file mode 100644
index 0000000000..45f6374515
--- /dev/null
+++ b/src/common/process_task_results/yaml_to_json/script.py
@@ -0,0 +1,16 @@
+import yaml
+import json
+
+## VIASH START
+par = {
+    "input": ".",
+    "task_id": "denoising",
+    "output": "output/task.json",
+}
+## VIASH END
+
+with open(par["input"], "r") as f:
+    yaml_file = yaml.safe_load(f)
+
+with open(par["output"], "w") as out:
+    json.dump(yaml_file, out, indent=2)
diff --git a/src/common/resources_test_scripts/aws_sync.sh b/src/common/resources_test_scripts/aws_sync.sh
new file mode 100644
index 0000000000..0541df125a
--- /dev/null
+++ b/src/common/resources_test_scripts/aws_sync.sh
@@ -0,0 +1,7 @@
+#!/bin/bash
+
+echo "Run the command in this script manually"
+exit 1
+
+aws s3 sync "resources_test" "s3://openproblems-data/resources_test" --exclude "*/temp*" --exclude "*/tmp*" --delete --dryrun
+aws s3 sync "resources" "s3://openproblems-data/resources" --exclude */temp_* --delete --dryrun
diff --git a/src/common/resources_test_scripts/task_metadata.sh b/src/common/resources_test_scripts/task_metadata.sh
new file mode 100755
index 0000000000..cd9072f443
--- /dev/null
+++ b/src/common/resources_test_scripts/task_metadata.sh
@@ -0,0 +1,139 @@
+#!/bin/bash
+
+# make sure folloewing command has been executed
+# viash ns build -q 'common'
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+DATASETS_DIR="resources_test/batch_integration"
+OUTPUT_DIR="resources_test/common/task_metadata"
+
+
+if [ ! -d "$OUTPUT_DIR" ]; then
+  mkdir -p "$OUTPUT_DIR"
+fi
+
+# Create small git sha input file
+sha_file="$OUTPUT_DIR/input_git_sha.json"
+
+cat <<EOT > $sha_file
+[
+  {
+    "path": "tasks/denoising/README.md",
+    "last_modified": "2022-09-20 14:26:51 -0400",
+    "sha": "3fe9251ba906061b6769eed2ac9da0db5f8e26bb"
+  },
+  {
+    "path": "tasks/denoising/__init__.py",
+    "last_modified": "2022-09-30 14:49:17 +0200",
+    "sha": "c97decf07adb2e3050561d6fa9ae46132be07bef"
+  },
+  {
+    "path": "tasks/denoising/api.py",
+    "last_modified": "2022-10-21 13:56:15 -0400",
+    "sha": "b460ecb183328c857cbbf653488f522a4034a61c"
+  },
+  {
+    "path": "tasks/denoising/datasets/__init__.py",
+    "last_modified": "2022-11-23 10:32:02 -0500",
+    "sha": "725ff0c46140aaa6bbded68646256f64bc63df6d"
+  },
+  {
+    "path": "tasks/denoising/datasets/pancreas.py",
+    "last_modified": "2022-12-04 12:06:43 -0500",
+    "sha": "4bb8a7e04545a06c336d3d9364a1dd84fa2af1a4"
+  },
+  {
+    "path": "tasks/denoising/datasets/pbmc.py",
+    "last_modified": "2022-12-04 12:06:43 -0500",
+    "sha": "4bb8a7e04545a06c336d3d9364a1dd84fa2af1a4"
+  },
+  {
+    "path": "tasks/denoising/datasets/tabula_muris_senis.py",
+    "last_modified": "2022-12-04 12:06:43 -0500",
+    "sha": "4bb8a7e04545a06c336d3d9364a1dd84fa2af1a4"
+  },
+  {
+    "path": "tasks/denoising/datasets/utils.py",
+    "last_modified": "2022-11-15 17:19:16 -0500",
+    "sha": "c2470ce02e6f196267cec1c554ba7ae389c0956a"
+  },
+  {
+    "path": "tasks/denoising/methods/__init__.py",
+    "last_modified": "2022-10-21 13:56:15 -0400",
+    "sha": "b460ecb183328c857cbbf653488f522a4034a61c"
+  },
+  {
+    "path": "tasks/denoising/methods/alra.R",
+    "last_modified": "2022-05-16 15:10:42 -0400",
+    "sha": "ba06cf71b564eb23823a662341055dc5ac2be231"
+  },
+  {
+    "path": "tasks/denoising/methods/alra.py",
+    "last_modified": "2022-07-25 12:29:34 -0400",
+    "sha": "411a416150ecabce25e1f59bde422a029d0a8baa"
+  },
+  {
+    "path": "tasks/denoising/methods/baseline.py",
+    "last_modified": "2022-10-21 13:56:15 -0400",
+    "sha": "b460ecb183328c857cbbf653488f522a4034a61c"
+  },
+  {
+    "path": "tasks/denoising/methods/dca.py",
+    "last_modified": "2022-12-01 15:38:21 -0500",
+    "sha": "aa2253779e9aa9cd178f54ac0f3b6ba521ecd59f"
+  },
+  {
+    "path": "tasks/denoising/methods/knn_smoothing.py",
+    "last_modified": "2022-11-14 11:54:15 -0500",
+    "sha": "bbecf4e9ad90007c2711394e7fbd8e49cbd3e4a1"
+  },
+  {
+    "path": "tasks/denoising/methods/magic.py",
+    "last_modified": "2022-11-14 11:57:35 -0500",
+    "sha": "2af9a4918ed3370859f71774558068961f6d22c6"
+  },
+  {
+    "path": "tasks/denoising/metrics/__init__.py",
+    "last_modified": "2021-01-19 13:31:20 -0500",
+    "sha": "8e0600c516c392fa747137415b6a93b8af0f61d8"
+  },
+  {
+    "path": "tasks/denoising/metrics/mse.py",
+    "last_modified": "2022-11-15 17:19:16 -0500",
+    "sha": "c2470ce02e6f196267cec1c554ba7ae389c0956a"
+  },
+  {
+    "path": "tasks/denoising/metrics/poisson.py",
+    "last_modified": "2022-12-04 12:06:43 -0500",
+    "sha": "4bb8a7e04545a06c336d3d9364a1dd84fa2af1a4"
+  }
+]
+EOT
+    
+# Create all metadata
+export NXF_VER=22.04.5
+
+nextflow run . \
+  -main-script target/nextflow/batch_integration/workflows/run_benchmark/main.nf \
+  -profile docker \
+  -resume \
+  -c src/wf_utils/labels_ci.config \
+  -with-trace \
+  -entry auto \
+  --input_states "$DATASETS_DIR/pancreas/state.yaml" \
+  --rename_keys 'input_dataset:output_dataset,input_solution:output_solution' \
+  --settings '{"output_scores": "scores.yaml", "output_dataset_info": "dataset_info.yaml", "output_method_configs": "method_configs.yaml", "output_metric_configs": "metric_configs.yaml", "output_task_info": "task_info.yaml", "method_ids": ["bbknn", "mnnpy", "mnnr"]}' \
+  --publish_dir "$OUTPUT_DIR" \
+  --output_state "state.yaml"
+
+cp trace.txt "$OUTPUT_DIR/trace.txt"
+
+
+viash run src/common/process_task_results/get_method_info/config.vsh.yaml -- --input "$OUTPUT_DIR/method_configs.yaml" --output "$OUTPUT_DIR/method_info.json"
diff --git a/src/common/schemas/api_component.yaml b/src/common/schemas/api_component.yaml
new file mode 100644
index 0000000000..b197e2e367
--- /dev/null
+++ b/src/common/schemas/api_component.yaml
@@ -0,0 +1,67 @@
+title: Component API
+description: |
+  A component type specification file.
+type: object
+required: [functionality]
+properties:
+  functionality:
+    type: object
+    description: Information regarding the functionality of the component.
+    required: [namespace, info, arguments, test_resources]
+    additionalProperties: false
+    properties:
+      namespace:
+        "$ref": "defs_common.yaml#/definitions/Namespace"
+      info:
+        type: object
+        description: Metadata of the component.
+        additionalProperties: false
+        required: [type, type_info]
+        properties:
+          type:
+            "$ref": "defs_common.yaml#/definitions/ComponentType"
+          subtype:
+            "$ref": "defs_common.yaml#/definitions/ComponentSubtype"
+          type_info:
+            type: object
+            description: Metadata related to the component type.
+            required: [label, summary, description]
+            properties:
+              label:
+                $ref: "defs_common.yaml#/definitions/Label"
+              summary: 
+                $ref: "defs_common.yaml#/definitions/Summary"
+              description: 
+                $ref: "defs_common.yaml#/definitions/Description"
+      arguments:
+        type: array
+        description: Component-specific parameters.
+        items:
+          anyOf:
+            - $ref: 'defs_common.yaml#/definitions/ComponentAPIFile'
+            - $ref: 'defs_viash.yaml#/definitions/BooleanArgument'
+            - $ref: 'defs_viash.yaml#/definitions/BooleanArgument'
+            - $ref: 'defs_viash.yaml#/definitions/BooleanTrueArgument'
+            - $ref: 'defs_viash.yaml#/definitions/BooleanFalseArgument'
+            - $ref: 'defs_viash.yaml#/definitions/DoubleArgument'
+            - $ref: 'defs_viash.yaml#/definitions/IntegerArgument'
+            - $ref: 'defs_viash.yaml#/definitions/LongArgument'
+            - $ref: 'defs_viash.yaml#/definitions/StringArgument'
+      resources:
+        type: array
+        description: Resources required to run the component.
+        items:
+          "$ref": "defs_viash.yaml#/definitions/Resource"
+      test_resources:
+        type: array
+        description: One or more scripts and resources used to test the component.
+        items:
+          "$ref": "defs_viash.yaml#/definitions/Resource"
+  platforms:
+    type: array
+    description: A list of platforms which Viash generates target artifacts for.
+    items:
+      anyOf:
+        - "$ref": "defs_common.yaml#/definitions/PlatformDocker"
+        - "$ref": "defs_common.yaml#/definitions/PlatformNative"
+        - "$ref": "defs_common.yaml#/definitions/PlatformVdsl3"
diff --git a/src/common/schemas/api_file.yaml b/src/common/schemas/api_file.yaml
new file mode 100644
index 0000000000..6294439eda
--- /dev/null
+++ b/src/common/schemas/api_file.yaml
@@ -0,0 +1,26 @@
+title: File API
+description: A file format specification file.
+type: object
+additionalProperties: false
+required: [type, example, info]
+properties:
+  type:
+    const: file
+  example:
+    description: A file in the `resources_test` folder which is an example of this file format.
+    type: string
+  __merge__:
+    $ref: "defs_common.yaml#/definitions/Merge"
+  info:
+    description: 'Structured information. Can be any shape: a string, vector, map or even nested map.'
+    type: object
+    required: [label, summary]
+    properties:
+      label:
+        $ref: "defs_common.yaml#/definitions/Label"
+      summary: 
+        $ref: "defs_common.yaml#/definitions/Summary"
+      description: 
+        $ref: "defs_common.yaml#/definitions/Description"
+      slots:
+        $ref: "defs_common.yaml#/definitions/AnnDataSlots"
diff --git a/src/common/schemas/defs_common.yaml b/src/common/schemas/defs_common.yaml
new file mode 100644
index 0000000000..60b9946210
--- /dev/null
+++ b/src/common/schemas/defs_common.yaml
@@ -0,0 +1,256 @@
+definitions:
+  PlatformVdsl3:
+    title: VDSL3
+    description: Next-gen platform for generating NextFlow VDSL3 modules.
+    properties:
+      type:
+        const: nextflow
+        description: Next-gen platform for generating NextFlow VDSL3 modules.
+      directives:
+        $ref: 'defs_viash.yaml#/definitions/NextflowDirectives'
+    required: [ type ]
+    additionalProperties: false
+  PlatformDocker:
+    title: Docker platform
+    description: |
+      Run a Viash component on a Docker backend platform.
+      By specifying which dependencies your component needs, users are be able to build
+      a docker container from scratch using the setup flag, or pull it from a docker repository.
+    type: object
+    properties:
+      type:
+        const: docker
+        description: Run a Viash component on a Docker backend platform.
+      image:
+        type: string
+        description: The base container to start from. You can also add the tag here
+          if you wish.
+      run_args:
+        anyOf:
+        - type: string
+          description: Add docker run arguments.
+        - type: array
+          items:
+            type: string
+          description: Add docker run arguments.
+      target_image_source:
+        type: string
+        description: The source of the target image. This is used for defining labels
+          in the dockerfile.
+      setup:
+        type: array
+        items:
+          "$ref": "defs_viash.yaml#/definitions/Requirements"
+      test_setup:
+        type: array
+        items:
+          "$ref": "defs_viash.yaml#/definitions/Requirements"
+    required: [type, image]
+    additionalProperties: false
+  PlatformNative:
+    title: Native platform
+    type: object
+    properties:
+      type:
+        const: native
+        description: Specifies the type of the platform. Running a Viash component
+          on a native platform means that the script will be executed in your current
+          environment.
+    required: [ type ]
+    additionalProperties: false
+  PreferredNormalization:
+    enum: [l1_sqrt, log_cpm, log_cp10k, log_scran_pooling, sqrt_cpm, sqrt_cp10k, counts]
+    description: |
+      Which normalization method a component prefers. 
+      
+      Each value corresponds to a normalization component in the directory `src/datasets/normalization`.
+  ComponentSubtype:
+    type: string
+    description: |
+      A component subtype, in case the task has multiple subtypes of methods and metrics.
+  ComponentType:
+    type: string
+    description: |
+      A component subtype, in case the task has multiple subtypes of methods and metrics.
+  Name:
+    type: string
+    description: |
+      A unique identifier. Can only contain lowercase letters, numbers or underscores.
+    pattern: "^[a-z_][a-z0-9_]*$"
+    maxLength: 50
+  Namespace:
+    type: string
+    description: |
+      The namespace a component is part of.
+    pattern: "^[a-z_][a-z0-9_/]*$"
+  Label:
+    type: string
+    description: |
+      A unique, human-readable, short label. Used for creating summary tables and visualisations.
+    maxLength: 50
+  Image:
+    type: string
+    description: |
+      The name of the image file to use for the component on the website.
+  Summary:
+    type: string
+    description: |
+      A one sentence summary of purpose and methodology. Used for creating an overview tables.
+    minLength: 15
+    maxLength: 180
+  Description:
+    type: string
+    description: |
+      A longer description (one or more paragraphs). Used for creating reference documentation and supplementary information.
+    minLength: 30
+  BibtexReference:
+    type: string
+    description: |
+      A bibtex reference key to the paper where the component is described.
+  DocumentationURL:
+    type: string
+    format: uri
+    pattern: "^https://"
+    description: The url to the documentation of the used software library.
+  RepositoryURL:
+    type: string
+    format: uri
+    pattern: "^https://"
+    description: The url to the repository of the used software library.
+  MethodVariants:
+    type: object
+    description: Alternative parameter sets which should be evaluated in the benchmark.
+    properties:
+      preferred_normalization:
+        "$ref": "#/definitions/PreferredNormalization"
+  CompAPIMerge:
+    type: string
+    description: |
+      The API specifies which type of component this is.
+      It contains specifications for:
+
+        - The input/output files
+        - Common parameters
+        - A unit test
+  Merge:
+    type: string
+    description: |
+      Another YAML to inherit values from.
+  ComponentAPIFile:
+    description: A `file` type argument has a string value that points to a file or folder path.
+    type: object
+    properties:
+      name:
+        description: "The name of the argument. Can be in the formats `--foo`, `-f` or `foo`. The number of dashes determines how values can be passed:  \n\n  - `--foo` is a long option, which can be passed with `executable_name --foo=value` or `executable_name --foo value`\n  - `-f` is a short option, which can be passed with `executable_name -f value`\n  - `foo` is an argument, which can be passed with `executable_name value`  \n"
+        type: string
+      __merge__:
+        type: string
+        description: The file format specification file.
+      direction:
+        description: Makes this argument an `input` or an `output`, as in does the file/folder needs to be read or written. `input` by default.
+        $ref: 'defs_viash.yaml#/definitions/Direction'
+      info:
+        description: 'Structured information. Can be any shape: a string, vector, map or even nested map.'
+        type: object
+      required:
+        description: Make the value for this argument required. If set to `true`, an error will be produced if no value was provided. `false` by default.
+        type: boolean
+    required: [name, __merge__, direction, required]
+    additionalProperties: false
+  AnnDataSlots:
+    properties:
+      X:
+        $ref: "#/definitions/AnnDataSlot"
+      layers:
+        type: array
+        items:
+          $ref: "#/definitions/AnnDataSlot"
+      var:
+        type: array
+        items:
+          $ref: "#/definitions/AnnDataSlot"
+      varm:
+        type: array
+        items:
+          $ref: "#/definitions/AnnDataSlot"
+      varp:
+        type: array
+        items:
+          $ref: "#/definitions/AnnDataSlot"
+      obs:
+        type: array
+        items:
+          $ref: "#/definitions/AnnDataSlot"
+      obsm:
+        type: array
+        items:
+          $ref: "#/definitions/AnnDataSlot"
+      obsp:
+        type: array
+        items:
+          $ref: "#/definitions/AnnDataSlot"
+      uns:
+        type: array
+        items:
+          oneOf:
+            - $ref: "#/definitions/AnnDataSlot"
+            - $ref: "#/definitions/AnnDataSlotObject"
+  AnnDataSlot:
+    properties:
+      type:
+        enum: [integer, double, string, boolean]
+      name:
+        type: string
+        description: A unique identifier.
+        pattern: "^[a-zA-Z_][a-zA-Z0-9_]*$"
+      description:
+        type: string
+      required:
+        type: boolean
+    required: [type, name, description, required]
+  AnnDataSlotObject:
+    properties:
+      type:
+        enum: [object]
+      name:
+        type: string
+        description: A unique identifier.
+        pattern: "^[a-zA-Z_][a-zA-Z0-9_]*$"
+      description:
+        type: string
+      required:
+        type: boolean
+    required: [type, name, description, required]
+  Author:
+    description: Author metadata.
+    type: object
+    additionalProperties: false
+    properties:
+      name:
+        description: Full name of the author, usually in the name of FirstName MiddleName LastName.
+        type: string
+      info:
+        description: Additional information on the author
+        type: object
+        additionalProperties: false
+        properties:
+          github:
+            type: string
+          orcid:
+            type: string
+          email:
+            type: string
+          twitter:
+            type: string
+          linkedin:
+            type: string
+      roles:
+        description: |
+            Role of the author. Possible values:
+
+            * `"author"`: Authors who have made substantial contributions to the component.
+            * `"maintainer"`: The maintainer of the component.
+            * `"contributor"`: Authors who have made smaller contributions (such as code patches etc.).
+        type: array
+        items:
+          enum: [maintainer, author, contributor]
\ No newline at end of file
diff --git a/src/common/schemas/defs_viash.yaml b/src/common/schemas/defs_viash.yaml
new file mode 100644
index 0000000000..fff25ab382
--- /dev/null
+++ b/src/common/schemas/defs_viash.yaml
@@ -0,0 +1,2252 @@
+$schema: "https://json-schema.org/draft-07/schema#"
+title: Viash config schema definitions.
+oneOf:
+  - $ref: "#/definitions/Config"
+definitions:
+  Config:
+    description: "A Viash Config"
+    properties:
+      functionality:
+        description: "The functionality-part of the config file describes the behaviour\
+          \ of the script in terms of arguments and resources.\nBy specifying a few restrictions\
+          \ (e.g. mandatory arguments) and adding some descriptions, Viash will automatically\
+          \ generate a stylish command-line interface for you.\n"
+        $ref: "#/definitions/Functionality"
+      platforms:
+        description: "Definition of the platforms"
+        type: "array"
+        items:
+          $ref: "#/definitions/Platforms"
+      info:
+        description: "Definition of meta data"
+        $ref: "#/definitions/Info"
+    required:
+    - "functionality"
+    additionalProperties: false
+  NativePlatform:
+    description: "Running a Viash component on a native platform means that the script\
+      \ will be executed in your current environment.\nAny dependencies are assumed\
+      \ to have been installed by the user, so the native platform is meant for developers\
+      \ (who know what they're doing) or for simple bash scripts (which have no extra\
+      \ dependencies).\n"
+    type: "object"
+    properties:
+      id:
+        description: "As with all platforms, you can give a platform a different name.\
+          \ By specifying `id: foo`, you can target this platform (only) by specifying\
+          \ `-p foo` in any of the Viash commands."
+        type: "string"
+      type:
+        description: "Running a Viash component on a native platform means that the\
+          \ script will be executed in your current environment.\nAny dependencies\
+          \ are assumed to have been installed by the user, so the native platform\
+          \ is meant for developers (who know what they're doing) or for simple bash\
+          \ scripts (which have no extra dependencies).\n"
+        const: "native"
+    required:
+    - "type"
+    additionalProperties: false
+  DockerPlatform:
+    description: "Run a Viash component on a Docker backend platform.\nBy specifying\
+      \ which dependencies your component needs, users will be able to build a docker\
+      \ container from scratch using the setup flag, or pull it from a docker repository.\n"
+    type: "object"
+    properties:
+      organization:
+        description: "Name of a container's [organization](https://docs.docker.com/docker-hub/orgs/)."
+        type: "string"
+      registry:
+        description: "The URL to the a [custom Docker registry](https://docs.docker.com/registry/)"
+        type: "string"
+      image:
+        description: "The base container to start from. You can also add the tag here\
+          \ if you wish."
+        type: "string"
+      tag:
+        description: "Specify a Docker image based on its tag."
+        type: "string"
+      target_tag:
+        description: "The tag the resulting image gets. Advanced usage only."
+        type: "string"
+      run_args:
+        anyOf:
+        - description: "Add [docker run](https://docs.docker.com/engine/reference/run/)\
+            \ arguments."
+          type: "string"
+        - description: "Add [docker run](https://docs.docker.com/engine/reference/run/)\
+            \ arguments."
+          type: "array"
+          items:
+            type: "string"
+      namespace_separator:
+        description: "The separator between the namespace and the name of the component,\
+          \ used for determining the image name. Default: `\"/\"`."
+        type: "string"
+      resolve_volume:
+        description: "Enables or disables automatic volume mapping. Enabled when set\
+          \ to `Automatic` or disabled when set to `Manual`. Default: `Automatic`."
+        $ref: "#/definitions/DockerResolveVolume"
+      port:
+        anyOf:
+        - description: "A list of enabled ports. This doesn't change the Dockerfile\
+            \ but gets added as a command-line argument at runtime."
+          type: "string"
+        - description: "A list of enabled ports. This doesn't change the Dockerfile\
+            \ but gets added as a command-line argument at runtime."
+          type: "array"
+          items:
+            type: "string"
+      setup:
+        description: "A list of requirements for installing the following types of\
+          \ packages:\n\n - @[apt](apt_req)\n - @[apk](apk_req)\n - @[Docker setup\
+          \ instructions](docker_req)\n - @[JavaScript](javascript_req)\n - @[Python](python_req)\n\
+          \ - @[R](r_req)\n - @[Ruby](ruby_req)\n - @[yum](yum_req)\n\nThe order in\
+          \ which these dependencies are specified determines the order in which they\
+          \ will be installed.\n"
+        type: "array"
+        items:
+          $ref: "#/definitions/Requirements"
+      workdir:
+        description: "The working directory when starting the container. This doesn't\
+          \ change the Dockerfile but gets added as a command-line argument at runtime."
+        type: "string"
+      target_image:
+        description: "If anything is specified in the setup section, running the `---setup`\
+          \ will result in an image with the name of `<target_image>:<version>`. If\
+          \ nothing is specified in the `setup` section, simply `image` will be used.\
+          \ Advanced usage only."
+        type: "string"
+      cmd:
+        anyOf:
+        - description: "Set the default command being executed when running the Docker\
+            \ container."
+          type: "string"
+        - description: "Set the default command being executed when running the Docker\
+            \ container."
+          type: "array"
+          items:
+            type: "string"
+      target_image_source:
+        description: "The source of the target image. This is used for defining labels\
+          \ in the dockerfile."
+        type: "string"
+      test_setup:
+        description: "Additional requirements specific for running unit tests."
+        type: "array"
+        items:
+          $ref: "#/definitions/Requirements"
+      entrypoint:
+        anyOf:
+        - description: "Override the entrypoint of the base container. Default set\
+            \ `ENTRYPOINT []`."
+          type: "string"
+        - description: "Override the entrypoint of the base container. Default set\
+            \ `ENTRYPOINT []`."
+          type: "array"
+          items:
+            type: "string"
+      id:
+        description: "As with all platforms, you can give a platform a different name.\
+          \ By specifying `id: foo`, you can target this platform (only) by specifying\
+          \ `-p foo` in any of the Viash commands."
+        type: "string"
+      target_registry:
+        description: "The URL where the resulting image will be pushed to. Advanced\
+          \ usage only."
+        type: "string"
+      setup_strategy:
+        description: "The Docker setup strategy to use when building a container.\n\
+          \n| Strategy | Description |\n|-----|----------|\n| `alwaysbuild` / `build`\
+          \ / `b` | Always build the image from the dockerfile. This is the default\
+          \ setup strategy.\n| `alwayscachedbuild` / `cachedbuild` / `cb` | Always\
+          \ build the image from the dockerfile, with caching enabled.\n| `ifneedbebuild`\
+          \ |  Build the image if it does not exist locally.\n| `ifneedbecachedbuild`\
+          \ | Build the image with caching enabled if it does not exist locally, with\
+          \ caching enabled.\n| `alwayspull` / `pull` / `p` |  Try to pull the container\
+          \ from [Docker Hub](https://hub.docker.com) or the @[specified docker registry](docker_registry).\n\
+          | `alwayspullelsebuild` / `pullelsebuild` |  Try to pull the image from\
+          \ a registry and build it if it doesn't exist.\n| `alwayspullelsecachedbuild`\
+          \ / `pullelsecachedbuild` |  Try to pull the image from a registry and build\
+          \ it with caching if it doesn't exist.\n| `ifneedbepull` |  If the image\
+          \ does not exist locally, pull the image.\n| `ifneedbepullelsebuild` | \
+          \ If the image does not exist locally, pull the image. If the image does\
+          \ exist, build it.\n| `ifneedbepullelsecachedbuild` | If the image does\
+          \ not exist locally, pull the image. If the image does exist, build it with\
+          \ caching enabled.\n| `push` | Push the container to [Docker Hub](https://hub.docker.com)\
+          \  or the @[specified docker registry](docker_registry).\n| `pushifnotpresent`\
+          \ | Push the container to [Docker Hub](https://hub.docker.com) or the @[specified\
+          \ docker registry](docker_registry) if the @[tag](docker_tag) does not exist\
+          \ yet.\n| `donothing` / `meh` | Do not build or pull anything.\n\n"
+        $ref: "#/definitions/DockerSetupStrategy"
+      type:
+        description: "Run a Viash component on a Docker backend platform.\nBy specifying\
+          \ which dependencies your component needs, users will be able to build a\
+          \ docker container from scratch using the setup flag, or pull it from a\
+          \ docker repository.\n"
+        const: "docker"
+      target_organization:
+        description: "The organization set in the resulting image. Advanced usage\
+          \ only."
+        type: "string"
+      chown:
+        description: "In Linux, files created by a Docker container will be owned\
+          \ by `root`. With `chown: true`, Viash will automatically change the ownership\
+          \ of output files (arguments with `type: file` and `direction: output`)\
+          \ to the user running the Viash command after execution of the component.\
+          \ Default value: `true`."
+        type: "boolean"
+    required:
+    - "image"
+    - "type"
+    additionalProperties: false
+  NextflowVdsl3Platform:
+    description: "Next-gen platform for generating NextFlow VDSL3 modules."
+    type: "object"
+    properties:
+      auto:
+        description: "@[Automated processing flags](nextflow_auto) which can be toggled\
+          \ on or off:\n\n| Flag | Description | Default |\n|---|---------|----|\n\
+          | `simplifyInput` | If `true`, an input tuple only containing only a single\
+          \ File (e.g. `[\"foo\", file(\"in.h5ad\")]`) is automatically transformed\
+          \ to a map (i.e. `[\"foo\", [ input: file(\"in.h5ad\") ] ]`). | `true` |\n\
+          | `simplifyOutput` | If `true`, an output tuple containing a map with a\
+          \ File (e.g. `[\"foo\", [ output: file(\"out.h5ad\") ] ]`) is automatically\
+          \ transformed to a map (i.e. `[\"foo\", file(\"out.h5ad\")]`). | `true`\
+          \ |\n| `transcript` | If `true`, the module's transcripts from `work/` are\
+          \ automatically published to `params.transcriptDir`. If not defined, `params.publishDir\
+          \ + \"/_transcripts\"` will be used. Will throw an error if neither are\
+          \ defined. | `false` |\n| `publish` | If `true`, the module's outputs are\
+          \ automatically published to `params.publishDir`.  Will throw an error if\
+          \ `params.publishDir` is not defined. | `false` |\n\n"
+        $ref: "#/definitions/NextflowAuto"
+      directives:
+        description: "@[Directives](nextflow_directives) are optional settings that\
+          \ affect the execution of the process. These mostly match up with the Nextflow\
+          \ counterparts.  \n"
+        $ref: "#/definitions/NextflowDirectives"
+      container:
+        description: "Specifies the Docker platform id to be used to run Nextflow."
+        type: "string"
+      debug:
+        description: "Whether or not to print debug messages."
+        type: "boolean"
+      id:
+        description: "Every platform can be given a specific id that can later be\
+          \ referred to explicitly when running or building the Viash component."
+        type: "string"
+      type:
+        description: "Next-gen platform for generating NextFlow VDSL3 modules."
+        const: "nextflow"
+      config:
+        description: "Allows tweaking how the @[Nextflow Config](nextflow_config)\
+          \ file is generated."
+        $ref: "#/definitions/NextflowConfig"
+    required:
+    - "type"
+    additionalProperties: false
+  Platforms:
+    anyOf:
+    - $ref: "#/definitions/NativePlatform"
+    - $ref: "#/definitions/DockerPlatform"
+    - $ref: "#/definitions/NextflowVdsl3Platform"
+  Info:
+    description: "Meta information fields filled in by Viash during build."
+    type: "object"
+    properties:
+      git_tag:
+        description: "Git tag."
+        type: "string"
+      git_remote:
+        description: "Git remote name."
+        type: "string"
+      viash_version:
+        description: "The Viash version that was used to build the component."
+        type: "string"
+      config:
+        description: "Path to the config used during build."
+        type: "string"
+      output:
+        description: "Folder path to the build artifacts."
+        type: "string"
+      platform:
+        description: "The platform id used during build."
+        type: "string"
+      git_commit:
+        description: "Git commit hash."
+        type: "string"
+      executable:
+        description: "Output folder with main executable path."
+        type: "string"
+    required:
+    - "config"
+    additionalProperties: false
+  Functionality:
+    description: "The functionality-part of the config file describes the behaviour\
+      \ of the script in terms of arguments and resources.\nBy specifying a few restrictions\
+      \ (e.g. mandatory arguments) and adding some descriptions, Viash will automatically\
+      \ generate a stylish command-line interface for you.\n"
+    type: "object"
+    properties:
+      name:
+        description: "Name of the component and the filename of the executable when\
+          \ built with `viash build`."
+        type: "string"
+      info:
+        description: "Structured information. Can be any shape: a string, vector,\
+          \ map or even nested map."
+        type: "object"
+      version:
+        description: "Version of the component. This field will be used to version\
+          \ the executable and the Docker container."
+        type: "string"
+      authors:
+        description: "A list of @[authors](author). An author must at least have a\
+          \ name, but can also have a list of roles, an e-mail address, and a map\
+          \ of custom properties.\n\nSuggested values for roles are:\n \n| Role |\
+          \ Abbrev. | Description |\n|------|---------|-------------|\n| maintainer\
+          \ | mnt | for the maintainer of the code. Ideally, exactly one maintainer\
+          \ is specified. |\n| author | aut | for persons who have made substantial\
+          \ contributions to the software. |\n| contributor | ctb| for persons who\
+          \ have made smaller contributions (such as code patches).\n| datacontributor\
+          \ | dtc | for persons or organisations that contributed data sets for the\
+          \ software\n| copyrightholder | cph | for all copyright holders. This is\
+          \ a legal concept so should use the legal name of an institution or corporate\
+          \ body.\n| funder | fnd | for persons or organizations that furnished financial\
+          \ support for the development of the software\n\nThe [full list of roles](https://www.loc.gov/marc/relators/relaterm.html)\
+          \ is extremely comprehensive.\n"
+        type: "array"
+        items:
+          $ref: "#/definitions/Author"
+      status:
+        description: "Allows setting a component to active, deprecated or disabled."
+        $ref: "#/definitions/Status"
+      requirements:
+        description: "@[Computational requirements](computational_requirements) related\
+          \ to running the component. \n`cpus` specifies the maximum number of (logical)\
+          \ cpus a component is allowed to use., whereas\n`memory` specifies the maximum\
+          \ amount of memory a component is allowed to allicate. Memory units must\
+          \ be\nin B, KB, MB, GB, TB or PB."
+        $ref: "#/definitions/ComputationalRequirements"
+      resources:
+        description: "@[Resources](resources) are files that support the component.\
+          \ The first resource should be @[a script](scripting_languages) that will\
+          \ be executed when the functionality is run. Additional resources will be\
+          \ copied to the same directory.\n\nCommon properties:\n\n * type: `file`\
+          \ / `r_script` / `python_script` / `bash_script` / `javascript_script` /\
+          \ `scala_script` / `csharp_script`, specifies the type of the resource.\
+          \ The first resource cannot be of type `file`. When the type is not specified,\
+          \ the default type is simply `file`.\n * dest: filename, the resulting name\
+          \ of the resource.  From within a script, the file can be accessed at `meta[\"\
+          resources_dir\"] + \"/\" + dest`. If unspecified, `dest` will be set to\
+          \ the basename of the `path` parameter.\n * path: `path/to/file`, the path\
+          \ of the input file. Can be a relative or an absolute path, or a URI. Mutually\
+          \ exclusive with `text`.\n * text: ...multiline text..., the content of\
+          \ the resulting file specified as a string. Mutually exclusive with `path`.\n\
+          \ * is_executable: `true` / `false`, whether the resulting resource file\
+          \ should be made executable.\n"
+        type: "array"
+        items:
+          $ref: "#/definitions/Resource"
+      test_resources:
+        description: "One or more @[scripts](scripting_languages) to be used to test\
+          \ the component behaviour when `viash test` is invoked. Additional files\
+          \ of type `file` will be made available only during testing. Each test script\
+          \ should expect no command-line inputs, be platform-independent, and return\
+          \ an exit code >0 when unexpected behaviour occurs during testing. See @[Unit\
+          \ Testing](unit_testing) for more info."
+        type: "array"
+        items:
+          $ref: "#/definitions/Resource"
+      argument_groups:
+        description: "A grouping of the @[arguments](argument), used to display the\
+          \ help message.\n\n - `name: foo`, the name of the argument group. \n -\
+          \ `description: Description of foo`, a description of the argument group.\
+          \ Multiline descriptions are supported.\n - `arguments: [arg1, arg2, ...]`,\
+          \ list of the arguments names.\n\n"
+        type: "array"
+        items:
+          $ref: "#/definitions/ArgumentGroup"
+      description:
+        description: "A description of the component. This will be displayed with\
+          \ `--help`."
+        type: "string"
+      usage:
+        description: "A description on how to use the component. This will be displayed\
+          \ with `--help` under the 'Usage:' section."
+        type: "string"
+      namespace:
+        description: "Namespace this component is a part of. See the @[Namespaces\
+          \ guide](namespace) for more information on namespaces."
+        type: "string"
+      arguments:
+        description: "A list of @[arguments](argument) for this component. For each\
+          \ argument, a type and a name must be specified. Depending on the type of\
+          \ argument, different properties can be set. See these reference pages per\
+          \ type for more information:  \n\n - @[string](arg_string)\n - @[file](arg_file)\n\
+          \ - @[integer](arg_integer)\n - @[double](arg_double)\n - @[boolean](arg_boolean)\n\
+          \ - @[boolean_true](arg_boolean_true)\n - @[boolean_false](arg_boolean_false)\n"
+        type: "array"
+        items:
+          $ref: "#/definitions/Argument"
+    required:
+    - "name"
+    additionalProperties: false
+  Author:
+    description: "Author metadata."
+    type: "object"
+    properties:
+      name:
+        description: "Full name of the author, usually in the name of FirstName MiddleName\
+          \ LastName."
+        type: "string"
+      email:
+        description: "E-mail of the author."
+        type: "string"
+      info:
+        description: "Structured information. Can be any shape: a string, vector,\
+          \ map or even nested map."
+        type: "object"
+      roles:
+        anyOf:
+        - description: "Role of the author. Suggested items:\n\n* `\"author\"`: Authors\
+            \ who have made substantial contributions to the component.\n* `\"maintainer\"\
+            `: The maintainer of the component.\n* `\"contributor\"`: Authors who\
+            \ have made smaller contributions (such as code patches etc.).\n"
+          type: "string"
+        - description: "Role of the author. Suggested items:\n\n* `\"author\"`: Authors\
+            \ who have made substantial contributions to the component.\n* `\"maintainer\"\
+            `: The maintainer of the component.\n* `\"contributor\"`: Authors who\
+            \ have made smaller contributions (such as code patches etc.).\n"
+          type: "array"
+          items:
+            type: "string"
+      props:
+        description: "Author properties. Must be a map of strings."
+        type: "object"
+        additionalProperties:
+          description: "Author properties. Must be a map of strings."
+          type: "string"
+    required:
+    - "name"
+    additionalProperties: false
+  ComputationalRequirements:
+    description: "Computational requirements related to running the component."
+    type: "object"
+    properties:
+      cpus:
+        description: "The maximum number of (logical) cpus a component is allowed\
+          \ to use."
+        type: "integer"
+      commands:
+        description: "A list of commands which should be present on the system for\
+          \ the script to function."
+        type: "array"
+        items:
+          type: "string"
+      memory:
+        description: "The maximum amount of memory a component is allowed to allocate.\
+          \ Unit must be one of B, KB, MB, GB, TB or PB."
+        type: "string"
+    required: []
+    additionalProperties: false
+  RubyRequirements:
+    description: "Specify which Ruby packages should be available in order to run\
+      \ the component."
+    type: "object"
+    properties:
+      type:
+        description: "Specify which Ruby packages should be available in order to\
+          \ run the component."
+        const: "ruby"
+      packages:
+        anyOf:
+        - description: "Specifies which packages to install."
+          type: "string"
+        - description: "Specifies which packages to install."
+          type: "array"
+          items:
+            type: "string"
+    required:
+    - "type"
+    additionalProperties: false
+  YumRequirements:
+    description: "Specify which yum packages should be available in order to run the\
+      \ component."
+    type: "object"
+    properties:
+      type:
+        description: "Specify which yum packages should be available in order to run\
+          \ the component."
+        const: "yum"
+      packages:
+        anyOf:
+        - description: "Specifies which packages to install."
+          type: "string"
+        - description: "Specifies which packages to install."
+          type: "array"
+          items:
+            type: "string"
+    required:
+    - "type"
+    additionalProperties: false
+  JavascriptRequirements:
+    description: "Specify which JavaScript packages should be available in order to\
+      \ run the component."
+    type: "object"
+    properties:
+      github:
+        anyOf:
+        - description: "Specifies which packages to install from GitHub."
+          type: "string"
+        - description: "Specifies which packages to install from GitHub."
+          type: "array"
+          items:
+            type: "string"
+      url:
+        anyOf:
+        - description: "Specifies which packages to install using a generic URI."
+          type: "string"
+        - description: "Specifies which packages to install using a generic URI."
+          type: "array"
+          items:
+            type: "string"
+      git:
+        anyOf:
+        - description: "Specifies which packages to install using a Git URI."
+          type: "string"
+        - description: "Specifies which packages to install using a Git URI."
+          type: "array"
+          items:
+            type: "string"
+      npm:
+        anyOf:
+        - description: "Specifies which packages to install from npm."
+          type: "string"
+        - description: "Specifies which packages to install from npm."
+          type: "array"
+          items:
+            type: "string"
+      type:
+        description: "Specify which JavaScript packages should be available in order\
+          \ to run the component."
+        const: "javascript"
+      packages:
+        anyOf:
+        - description: "Specifies which packages to install from npm."
+          type: "string"
+        - description: "Specifies which packages to install from npm."
+          type: "array"
+          items:
+            type: "string"
+    required:
+    - "type"
+    additionalProperties: false
+  DockerRequirements:
+    description: "Specify which Docker commands should be run during setup."
+    type: "object"
+    properties:
+      run:
+        anyOf:
+        - description: "Specifies which `RUN` entries to add to the Dockerfile while\
+            \ building it."
+          type: "string"
+        - description: "Specifies which `RUN` entries to add to the Dockerfile while\
+            \ building it."
+          type: "array"
+          items:
+            type: "string"
+      label:
+        anyOf:
+        - description: "Specifies which `LABEL` entries to add to the Dockerfile while\
+            \ building it."
+          type: "string"
+        - description: "Specifies which `LABEL` entries to add to the Dockerfile while\
+            \ building it."
+          type: "array"
+          items:
+            type: "string"
+      build_args:
+        anyOf:
+        - description: "Specifies which `ARG` entries to add to the Dockerfile while\
+            \ building it."
+          type: "string"
+        - description: "Specifies which `ARG` entries to add to the Dockerfile while\
+            \ building it."
+          type: "array"
+          items:
+            type: "string"
+      type:
+        description: "Specify which Docker commands should be run during setup."
+        const: "docker"
+      add:
+        anyOf:
+        - description: "Specifies which `ADD` entries to add to the Dockerfile while\
+            \ building it."
+          type: "string"
+        - description: "Specifies which `ADD` entries to add to the Dockerfile while\
+            \ building it."
+          type: "array"
+          items:
+            type: "string"
+      env:
+        anyOf:
+        - description: "Specifies which `ENV` entries to add to the Dockerfile while\
+            \ building it. Unlike `ARG`, `ENV` entries are also accessible from inside\
+            \ the container."
+          type: "string"
+        - description: "Specifies which `ENV` entries to add to the Dockerfile while\
+            \ building it. Unlike `ARG`, `ENV` entries are also accessible from inside\
+            \ the container."
+          type: "array"
+          items:
+            type: "string"
+      copy:
+        anyOf:
+        - description: "Specifies which `COPY` entries to add to the Dockerfile while\
+            \ building it."
+          type: "string"
+        - description: "Specifies which `COPY` entries to add to the Dockerfile while\
+            \ building it."
+          type: "array"
+          items:
+            type: "string"
+    required:
+    - "type"
+    additionalProperties: false
+  RRequirements:
+    description: "Specify which R packages should be available in order to run the\
+      \ component."
+    type: "object"
+    properties:
+      bioc:
+        anyOf:
+        - description: "Specifies which packages to install from BioConductor."
+          type: "string"
+        - description: "Specifies which packages to install from BioConductor."
+          type: "array"
+          items:
+            type: "string"
+      github:
+        anyOf:
+        - description: "Specifies which packages to install from GitHub."
+          type: "string"
+        - description: "Specifies which packages to install from GitHub."
+          type: "array"
+          items:
+            type: "string"
+      gitlab:
+        anyOf:
+        - description: "Specifies which packages to install from GitLab."
+          type: "string"
+        - description: "Specifies which packages to install from GitLab."
+          type: "array"
+          items:
+            type: "string"
+      url:
+        anyOf:
+        - description: "Specifies which packages to install using a generic URI."
+          type: "string"
+        - description: "Specifies which packages to install using a generic URI."
+          type: "array"
+          items:
+            type: "string"
+      bioc_force_install:
+        description: "Forces packages specified in `bioc` to be reinstalled, even\
+          \ if they are already present in the container. Default: false."
+        type: "boolean"
+      git:
+        anyOf:
+        - description: "Specifies which packages to install using a Git URI."
+          type: "string"
+        - description: "Specifies which packages to install using a Git URI."
+          type: "array"
+          items:
+            type: "string"
+      cran:
+        anyOf:
+        - description: "Specifies which packages to install from CRAN."
+          type: "string"
+        - description: "Specifies which packages to install from CRAN."
+          type: "array"
+          items:
+            type: "string"
+      bitbucket:
+        anyOf:
+        - description: "Specifies which packages to install from Bitbucket."
+          type: "string"
+        - description: "Specifies which packages to install from Bitbucket."
+          type: "array"
+          items:
+            type: "string"
+      svn:
+        anyOf:
+        - description: "Specifies which packages to install using an SVN URI."
+          type: "string"
+        - description: "Specifies which packages to install using an SVN URI."
+          type: "array"
+          items:
+            type: "string"
+      packages:
+        anyOf:
+        - description: "Specifies which packages to install from CRAN."
+          type: "string"
+        - description: "Specifies which packages to install from CRAN."
+          type: "array"
+          items:
+            type: "string"
+      script:
+        anyOf:
+        - description: "Specifies a code block to run as part of the build."
+          type: "string"
+        - description: "Specifies a code block to run as part of the build."
+          type: "array"
+          items:
+            type: "string"
+      type:
+        description: "Specify which R packages should be available in order to run\
+          \ the component."
+        const: "r"
+    required:
+    - "type"
+    additionalProperties: false
+  ApkRequirements:
+    description: "Specify which apk packages should be available in order to run the\
+      \ component."
+    type: "object"
+    properties:
+      type:
+        description: "Specify which apk packages should be available in order to run\
+          \ the component."
+        const: "apk"
+      packages:
+        anyOf:
+        - description: "Specifies which packages to install."
+          type: "string"
+        - description: "Specifies which packages to install."
+          type: "array"
+          items:
+            type: "string"
+    required:
+    - "type"
+    additionalProperties: false
+  PythonRequirements:
+    description: "Specify which Python packages should be available in order to run\
+      \ the component."
+    type: "object"
+    properties:
+      github:
+        anyOf:
+        - description: "Specifies which packages to install from GitHub."
+          type: "string"
+        - description: "Specifies which packages to install from GitHub."
+          type: "array"
+          items:
+            type: "string"
+      gitlab:
+        anyOf:
+        - description: "Specifies which packages to install from GitLab."
+          type: "string"
+        - description: "Specifies which packages to install from GitLab."
+          type: "array"
+          items:
+            type: "string"
+      pip:
+        anyOf:
+        - description: "Specifies which packages to install from pip."
+          type: "string"
+        - description: "Specifies which packages to install from pip."
+          type: "array"
+          items:
+            type: "string"
+      pypi:
+        anyOf:
+        - description: "Specifies which packages to install from PyPI using pip."
+          type: "string"
+        - description: "Specifies which packages to install from PyPI using pip."
+          type: "array"
+          items:
+            type: "string"
+      git:
+        anyOf:
+        - description: "Specifies which packages to install using a Git URI."
+          type: "string"
+        - description: "Specifies which packages to install using a Git URI."
+          type: "array"
+          items:
+            type: "string"
+      upgrade:
+        description: "Sets the `--upgrade` flag when set to true. Default: true."
+        type: "boolean"
+      packages:
+        anyOf:
+        - description: "Specifies which packages to install from pip."
+          type: "string"
+        - description: "Specifies which packages to install from pip."
+          type: "array"
+          items:
+            type: "string"
+      url:
+        anyOf:
+        - description: "Specifies which packages to install using a generic URI."
+          type: "string"
+        - description: "Specifies which packages to install using a generic URI."
+          type: "array"
+          items:
+            type: "string"
+      svn:
+        anyOf:
+        - description: "Specifies which packages to install using an SVN URI."
+          type: "string"
+        - description: "Specifies which packages to install using an SVN URI."
+          type: "array"
+          items:
+            type: "string"
+      bazaar:
+        anyOf:
+        - description: "Specifies which packages to install using a Bazaar URI."
+          type: "string"
+        - description: "Specifies which packages to install using a Bazaar URI."
+          type: "array"
+          items:
+            type: "string"
+      script:
+        anyOf:
+        - description: "Specifies a code block to run as part of the build."
+          type: "string"
+        - description: "Specifies a code block to run as part of the build."
+          type: "array"
+          items:
+            type: "string"
+      type:
+        description: "Specify which Python packages should be available in order to\
+          \ run the component."
+        const: "python"
+      mercurial:
+        anyOf:
+        - description: "Specifies which packages to install using a Mercurial URI."
+          type: "string"
+        - description: "Specifies which packages to install using a Mercurial URI."
+          type: "array"
+          items:
+            type: "string"
+      user:
+        description: "Sets the `--user` flag when set to true. Default: false."
+        type: "boolean"
+    required:
+    - "type"
+    additionalProperties: false
+  AptRequirements:
+    description: "Specify which apt packages should be available in order to run the\
+      \ component."
+    type: "object"
+    properties:
+      interactive:
+        description: "If `false`, the Debian frontend is set to non-interactive (recommended).\
+          \ Default: false."
+        type: "boolean"
+      type:
+        description: "Specify which apt packages should be available in order to run\
+          \ the component."
+        const: "apt"
+      packages:
+        anyOf:
+        - description: "Specifies which packages to install."
+          type: "string"
+        - description: "Specifies which packages to install."
+          type: "array"
+          items:
+            type: "string"
+    required:
+    - "type"
+    additionalProperties: false
+  Requirements:
+    anyOf:
+    - $ref: "#/definitions/RubyRequirements"
+    - $ref: "#/definitions/YumRequirements"
+    - $ref: "#/definitions/JavascriptRequirements"
+    - $ref: "#/definitions/DockerRequirements"
+    - $ref: "#/definitions/RRequirements"
+    - $ref: "#/definitions/ApkRequirements"
+    - $ref: "#/definitions/PythonRequirements"
+    - $ref: "#/definitions/AptRequirements"
+  StringArgument:
+    description: "A `string` type argument has a value made up of an ordered sequences\
+      \ of characters, like \"Hello\" or \"I'm a string\"."
+    type: "object"
+    properties:
+      alternatives:
+        anyOf:
+        - description: "List of alternative format variations for this argument."
+          type: "string"
+        - description: "List of alternative format variations for this argument."
+          type: "array"
+          items:
+            type: "string"
+      name:
+        description: "The name of the argument. Can be in the formats `--foo`, `-f`\
+          \ or `foo`. The number of dashes determines how values can be passed:  \n\
+          \n  - `--foo` is a long option, which can be passed with `executable_name\
+          \ --foo=value` or `executable_name --foo value`\n  - `-f` is a short option,\
+          \ which can be passed with `executable_name -f value`\n  - `foo` is an argument,\
+          \ which can be passed with `executable_name value`  \n"
+        type: "string"
+      choices:
+        description: "Limit the amount of valid values for this argument to those\
+          \ set in this list. When set and a value not present in the list is provided,\
+          \ an error will be produced."
+        type: "array"
+        items:
+          type: "string"
+      info:
+        description: "Structured information. Can be any shape: a string, vector,\
+          \ map or even nested map."
+        type: "object"
+      default:
+        anyOf:
+        - description: "The default value when no argument value is provided. This\
+            \ will not work if the [`required`](#required) property is enabled."
+          type: "string"
+        - description: "The default value when no argument value is provided. This\
+            \ will not work if the [`required`](#required) property is enabled."
+          type: "array"
+          items:
+            type: "string"
+      example:
+        anyOf:
+        - description: "An example value for this argument. If no [`default`](#default)\
+            \ property was specified, this will be used for that purpose."
+          type: "string"
+        - description: "An example value for this argument. If no [`default`](#default)\
+            \ property was specified, this will be used for that purpose."
+          type: "array"
+          items:
+            type: "string"
+      description:
+        description: "A description of the argument. This will be displayed with `--help`."
+        type: "string"
+      multiple_sep:
+        description: "The delimiter character for providing [`multiple`](#multiple)\
+          \ values. `:` by default."
+        type: "string"
+      multiple:
+        description: "Treat the argument value as an array. Arrays can be passed using\
+          \ the delimiter `--foo=1:2:3` or by providing the same argument multiple\
+          \ times `--foo 1 --foo 2`. You can use a custom delimiter by using the [`multiple_sep`](#multiple_sep)\
+          \ property. `false` by default."
+        type: "boolean"
+      type:
+        description: "A `string` type argument has a value made up of an ordered sequences\
+          \ of characters, like \"Hello\" or \"I'm a string\"."
+        const: "string"
+      required:
+        description: "Make the value for this argument required. If set to `true`,\
+          \ an error will be produced if no value was provided. `false` by default."
+        type: "boolean"
+    required:
+    - "name"
+    - "type"
+    additionalProperties: false
+  BooleanArgument:
+    description: "A `boolean` type argument has two possible values: `true` or `false`."
+    type: "object"
+    properties:
+      alternatives:
+        anyOf:
+        - description: "List of alternative format variations for this argument."
+          type: "string"
+        - description: "List of alternative format variations for this argument."
+          type: "array"
+          items:
+            type: "string"
+      name:
+        description: "The name of the argument. Can be in the formats `--trim`, `-t`\
+          \ or `trim`. The number of dashes determines how values can be passed: \
+          \ \n\n  - `--trim` is a long option, which can be passed with `executable_name\
+          \ --trim`\n  - `-t` is a short option, which can be passed with `executable_name\
+          \ -t`\n  - `trim` is an argument, which can be passed with `executable_name\
+          \ trim`  \n"
+        type: "string"
+      info:
+        description: "Structured information. Can be any shape: a string, vector,\
+          \ map or even nested map."
+        type: "object"
+      default:
+        anyOf:
+        - description: "The default value when no argument value is provided. This\
+            \ will not work if the [`required`](#required) property is enabled."
+          type: "boolean"
+        - description: "The default value when no argument value is provided. This\
+            \ will not work if the [`required`](#required) property is enabled."
+          type: "array"
+          items:
+            type: "boolean"
+      example:
+        anyOf:
+        - description: "An example value for this argument. If no [`default`](#default)\
+            \ property was specified, this will be used for that purpose."
+          type: "boolean"
+        - description: "An example value for this argument. If no [`default`](#default)\
+            \ property was specified, this will be used for that purpose."
+          type: "array"
+          items:
+            type: "boolean"
+      description:
+        description: "A description of the argument. This will be displayed with `--help`."
+        type: "string"
+      multiple_sep:
+        description: "The delimiter character for providing [`multiple`](#multiple)\
+          \ values. `:` by default."
+        type: "string"
+      multiple:
+        description: "Treat the argument value as an array. Arrays can be passed using\
+          \ the delimiter `--foo=1:2:3` or by providing the same argument multiple\
+          \ times `--foo 1 --foo 2`. You can use a custom delimiter by using the [`multiple_sep`](#multiple_sep)\
+          \ property. `false` by default."
+        type: "boolean"
+      type:
+        description: "A `boolean` type argument has two possible values: `true` or\
+          \ `false`."
+        const: "boolean"
+      required:
+        description: "Make the value for this argument required. If set to `true`,\
+          \ an error will be produced if no value was provided. `false` by default."
+        type: "boolean"
+    required:
+    - "name"
+    - "type"
+    additionalProperties: false
+  BooleanTrueArgument:
+    description: "An argument of the `boolean_true` type acts like a `boolean` flag\
+      \ with a default value of `false`. When called as an argument it sets the `boolean`\
+      \ to `true`."
+    type: "object"
+    properties:
+      alternatives:
+        anyOf:
+        - description: "List of alternative format variations for this argument."
+          type: "string"
+        - description: "List of alternative format variations for this argument."
+          type: "array"
+          items:
+            type: "string"
+      name:
+        description: "The name of the argument. Can be in the formats `--silent`,\
+          \ `-s` or `silent`. The number of dashes determines how values can be passed:\
+          \  \n\n  - `--silent` is a long option, which can be passed with `executable_name\
+          \ --silent`\n  - `-s` is a short option, which can be passed with `executable_name\
+          \ -s`\n  - `silent` is an argument, which can be passed with `executable_name\
+          \ silent`  \n"
+        type: "string"
+      info:
+        description: "Structured information. Can be any shape: a string, vector,\
+          \ map or even nested map."
+        type: "object"
+      description:
+        description: "A description of the argument. This will be displayed with `--help`."
+        type: "string"
+      type:
+        description: "An argument of the `boolean_true` type acts like a `boolean`\
+          \ flag with a default value of `false`. When called as an argument it sets\
+          \ the `boolean` to `true`."
+        const: "boolean_true"
+    required:
+    - "name"
+    - "type"
+    additionalProperties: false
+  IntegerArgument:
+    description: "An `integer` type argument has a numeric value without decimal points."
+    type: "object"
+    properties:
+      alternatives:
+        anyOf:
+        - description: "List of alternative format variations for this argument."
+          type: "string"
+        - description: "List of alternative format variations for this argument."
+          type: "array"
+          items:
+            type: "string"
+      name:
+        description: "The name of the argument. Can be in the formats `--foo`, `-f`\
+          \ or `foo`. The number of dashes determines how values can be passed:  \n\
+          \n  - `--foo` is a long option, which can be passed with `executable_name\
+          \ --foo=value` or `executable_name --foo value`\n  - `-f` is a short option,\
+          \ which can be passed with `executable_name -f value`\n  - `foo` is an argument,\
+          \ which can be passed with `executable_name value`  \n"
+        type: "string"
+      choices:
+        description: "Limit the amount of valid values for this argument to those\
+          \ set in this list. When set and a value not present in the list is provided,\
+          \ an error will be produced."
+        type: "array"
+        items:
+          type: "integer"
+      info:
+        description: "Structured information. Can be any shape: a string, vector,\
+          \ map or even nested map."
+        type: "object"
+      max:
+        description: "Maximum allowed value for this argument. If set and the provided\
+          \ value is higher than the maximum, an error will be produced. Can be combined\
+          \ with [`min`](#min) to clamp values."
+        type: "integer"
+      default:
+        anyOf:
+        - description: "The default value when no argument value is provided. This\
+            \ will not work if the [`required`](#required) property is enabled."
+          type: "integer"
+        - description: "The default value when no argument value is provided. This\
+            \ will not work if the [`required`](#required) property is enabled."
+          type: "array"
+          items:
+            type: "integer"
+      example:
+        anyOf:
+        - description: "An example value for this argument. If no [`default`](#default)\
+            \ property was specified, this will be used for that purpose."
+          type: "integer"
+        - description: "An example value for this argument. If no [`default`](#default)\
+            \ property was specified, this will be used for that purpose."
+          type: "array"
+          items:
+            type: "integer"
+      description:
+        description: "A description of the argument. This will be displayed with `--help`."
+        type: "string"
+      multiple_sep:
+        description: "The delimiter character for providing [`multiple`](#multiple)\
+          \ values. `:` by default."
+        type: "string"
+      min:
+        description: "Minimum allowed value for this argument. If set and the provided\
+          \ value is lower than the minimum, an error will be produced. Can be combined\
+          \ with [`max`](#max) to clamp values."
+        type: "integer"
+      multiple:
+        description: "Treat the argument value as an array. Arrays can be passed using\
+          \ the delimiter `--foo=1:2:3` or by providing the same argument multiple\
+          \ times `--foo 1 --foo 2`. You can use a custom delimiter by using the [`multiple_sep`](#multiple_sep)\
+          \ property. `false` by default."
+        type: "boolean"
+      type:
+        description: "An `integer` type argument has a numeric value without decimal\
+          \ points."
+        const: "integer"
+      required:
+        description: "Make the value for this argument required. If set to `true`,\
+          \ an error will be produced if no value was provided. `false` by default."
+        type: "boolean"
+    required:
+    - "name"
+    - "type"
+    additionalProperties: false
+  LongArgument:
+    description: "An `long` type argument has a numeric value without decimal points."
+    type: "object"
+    properties:
+      alternatives:
+        anyOf:
+        - description: "List of alternative format variations for this argument."
+          type: "string"
+        - description: "List of alternative format variations for this argument."
+          type: "array"
+          items:
+            type: "string"
+      name:
+        description: "The name of the argument. Can be in the formats `--foo`, `-f`\
+          \ or `foo`. The number of dashes determines how values can be passed:  \n\
+          \n  - `--foo` is a long option, which can be passed with `executable_name\
+          \ --foo=value` or `executable_name --foo value`\n  - `-f` is a short option,\
+          \ which can be passed with `executable_name -f value`\n  - `foo` is an argument,\
+          \ which can be passed with `executable_name value`  \n"
+        type: "string"
+      choices:
+        description: "Limit the amount of valid values for this argument to those\
+          \ set in this list. When set and a value not present in the list is provided,\
+          \ an error will be produced."
+        type: "array"
+        items:
+          type: "integer"
+      info:
+        description: "Structured information. Can be any shape: a string, vector,\
+          \ map or even nested map."
+        type: "object"
+      max:
+        description: "Maximum allowed value for this argument. If set and the provided\
+          \ value is higher than the maximum, an error will be produced. Can be combined\
+          \ with [`min`](#min) to clamp values."
+        type: "integer"
+      default:
+        anyOf:
+        - description: "The default value when no argument value is provided. This\
+            \ will not work if the [`required`](#required) property is enabled."
+          type: "integer"
+        - description: "The default value when no argument value is provided. This\
+            \ will not work if the [`required`](#required) property is enabled."
+          type: "array"
+          items:
+            type: "integer"
+      example:
+        anyOf:
+        - description: "An example value for this argument. If no [`default`](#default)\
+            \ property was specified, this will be used for that purpose."
+          type: "integer"
+        - description: "An example value for this argument. If no [`default`](#default)\
+            \ property was specified, this will be used for that purpose."
+          type: "array"
+          items:
+            type: "integer"
+      description:
+        description: "A description of the argument. This will be displayed with `--help`."
+        type: "string"
+      multiple_sep:
+        description: "The delimiter character for providing [`multiple`](#multiple)\
+          \ values. `:` by default."
+        type: "string"
+      min:
+        description: "Minimum allowed value for this argument. If set and the provided\
+          \ value is lower than the minimum, an error will be produced. Can be combined\
+          \ with [`max`](#max) to clamp values."
+        type: "integer"
+      multiple:
+        description: "Treat the argument value as an array. Arrays can be passed using\
+          \ the delimiter `--foo=1:2:3` or by providing the same argument multiple\
+          \ times `--foo 1 --foo 2`. You can use a custom delimiter by using the [`multiple_sep`](#multiple_sep)\
+          \ property. `false` by default."
+        type: "boolean"
+      type:
+        description: "An `long` type argument has a numeric value without decimal\
+          \ points."
+        const: "long"
+      required:
+        description: "Make the value for this argument required. If set to `true`,\
+          \ an error will be produced if no value was provided. `false` by default."
+        type: "boolean"
+    required:
+    - "name"
+    - "type"
+    additionalProperties: false
+  BooleanFalseArgument:
+    description: "An argument of the `boolean_false` type acts like an inverted `boolean`\
+      \ flag with a default value of `true`. When called as an argument it sets the\
+      \ `boolean` to `false`."
+    type: "object"
+    properties:
+      alternatives:
+        anyOf:
+        - description: "List of alternative format variations for this argument."
+          type: "string"
+        - description: "List of alternative format variations for this argument."
+          type: "array"
+          items:
+            type: "string"
+      name:
+        description: "The name of the argument. Can be in the formats `--no-log`,\
+          \ `-n` or `no-log`. The number of dashes determines how values can be passed:\
+          \  \n\n  - `--no-log` is a long option, which can be passed with `executable_name\
+          \ --no-log`\n  - `-n` is a short option, which can be passed with `executable_name\
+          \ -n`\n  - `no-log` is an argument, which can be passed with `executable_name\
+          \ no-log`  \n"
+        type: "string"
+      info:
+        description: "Structured information. Can be any shape: a string, vector,\
+          \ map or even nested map."
+        type: "object"
+      description:
+        description: "A description of the argument. This will be displayed with `--help`."
+        type: "string"
+      type:
+        description: "An argument of the `boolean_false` type acts like an inverted\
+          \ `boolean` flag with a default value of `true`. When called as an argument\
+          \ it sets the `boolean` to `false`."
+        const: "boolean_false"
+    required:
+    - "name"
+    - "type"
+    additionalProperties: false
+  DoubleArgument:
+    description: "A `double` type argument has a numeric value with decimal points"
+    type: "object"
+    properties:
+      alternatives:
+        anyOf:
+        - description: "List of alternative format variations for this argument."
+          type: "string"
+        - description: "List of alternative format variations for this argument."
+          type: "array"
+          items:
+            type: "string"
+      name:
+        description: "The name of the argument. Can be in the formats `--foo`, `-f`\
+          \ or `foo`. The number of dashes determines how values can be passed:  \n\
+          \n  - `--foo` is a long option, which can be passed with `executable_name\
+          \ --foo=value` or `executable_name --foo value`\n  - `-f` is a short option,\
+          \ which can be passed with `executable_name -f value`\n  - `foo` is an argument,\
+          \ which can be passed with `executable_name value`  \n"
+        type: "string"
+      info:
+        description: "Structured information. Can be any shape: a string, vector,\
+          \ map or even nested map."
+        type: "object"
+      max:
+        description: "Maximum allowed value for this argument. If set and the provided\
+          \ value is higher than the maximum, an error will be produced. Can be combined\
+          \ with [`min`](#min) to clamp values."
+        type: "number"
+      default:
+        anyOf:
+        - description: "The default value when no argument value is provided. This\
+            \ will not work if the [`required`](#required) property is enabled."
+          type: "number"
+        - description: "The default value when no argument value is provided. This\
+            \ will not work if the [`required`](#required) property is enabled."
+          type: "array"
+          items:
+            type: "number"
+      example:
+        anyOf:
+        - description: "An example value for this argument. If no [`default`](#default)\
+            \ property was specified, this will be used for that purpose."
+          type: "number"
+        - description: "An example value for this argument. If no [`default`](#default)\
+            \ property was specified, this will be used for that purpose."
+          type: "array"
+          items:
+            type: "number"
+      description:
+        description: "A description of the argument. This will be displayed with `--help`."
+        type: "string"
+      multiple_sep:
+        description: "The delimiter character for providing [`multiple`](#multiple)\
+          \ values. `:` by default."
+        type: "string"
+      min:
+        description: "Minimum allowed value for this argument. If set and the provided\
+          \ value is lower than the minimum, an error will be produced. Can be combined\
+          \ with [`max`](#max) to clamp values."
+        type: "number"
+      multiple:
+        description: "Treat the argument value as an array. Arrays can be passed using\
+          \ the delimiter `--foo=1:2:3` or by providing the same argument multiple\
+          \ times `--foo 1 --foo 2`. You can use a custom delimiter by using the [`multiple_sep`](#multiple_sep)\
+          \ property. `false` by default."
+        type: "boolean"
+      type:
+        description: "A `double` type argument has a numeric value with decimal points"
+        const: "double"
+      required:
+        description: "Make the value for this argument required. If set to `true`,\
+          \ an error will be produced if no value was provided. `false` by default."
+        type: "boolean"
+    required:
+    - "name"
+    - "type"
+    additionalProperties: false
+  FileArgument:
+    description: "A `file` type argument has a string value that points to a file\
+      \ or folder path."
+    type: "object"
+    properties:
+      alternatives:
+        anyOf:
+        - description: "List of alternative format variations for this argument."
+          type: "string"
+        - description: "List of alternative format variations for this argument."
+          type: "array"
+          items:
+            type: "string"
+      name:
+        description: "The name of the argument. Can be in the formats `--foo`, `-f`\
+          \ or `foo`. The number of dashes determines how values can be passed:  \n\
+          \n  - `--foo` is a long option, which can be passed with `executable_name\
+          \ --foo=value` or `executable_name --foo value`\n  - `-f` is a short option,\
+          \ which can be passed with `executable_name -f value`\n  - `foo` is an argument,\
+          \ which can be passed with `executable_name value`  \n"
+        type: "string"
+      create_parent:
+        description: "If the output filename is a path and it does not exist, create\
+          \ it before executing the script (only for `direction: output`)."
+        type: "boolean"
+      direction:
+        description: "Makes this argument an `input` or an `output`, as in does the\
+          \ file/folder needs to be read or written. `input` by default."
+        $ref: "#/definitions/Direction"
+      info:
+        description: "Structured information. Can be any shape: a string, vector,\
+          \ map or even nested map."
+        type: "object"
+      must_exist:
+        description: "Checks whether the file or folder exists. For input files, this\
+          \ check will happen before the execution of the script, while for output\
+          \ files the check will happen afterwards."
+        type: "boolean"
+      default:
+        anyOf:
+        - description: "The default value when no argument value is provided. This\
+            \ will not work if the [`required`](#required) property is enabled."
+          type: "string"
+        - description: "The default value when no argument value is provided. This\
+            \ will not work if the [`required`](#required) property is enabled."
+          type: "array"
+          items:
+            type: "string"
+      example:
+        anyOf:
+        - description: "An example value for this argument. If no [`default`](#default)\
+            \ property was specified, this will be used for that purpose."
+          type: "string"
+        - description: "An example value for this argument. If no [`default`](#default)\
+            \ property was specified, this will be used for that purpose."
+          type: "array"
+          items:
+            type: "string"
+      description:
+        description: "A description of the argument. This will be displayed with `--help`."
+        type: "string"
+      multiple_sep:
+        description: "The delimiter character for providing [`multiple`](#multiple)\
+          \ values. `:` by default."
+        type: "string"
+      multiple:
+        description: "Treat the argument value as an array. Arrays can be passed using\
+          \ the delimiter `--foo=1:2:3` or by providing the same argument multiple\
+          \ times `--foo 1 --foo 2`. You can use a custom delimiter by using the [`multiple_sep`](#multiple_sep)\
+          \ property. `false` by default."
+        type: "boolean"
+      type:
+        description: "A `file` type argument has a string value that points to a file\
+          \ or folder path."
+        const: "file"
+      required:
+        description: "Make the value for this argument required. If set to `true`,\
+          \ an error will be produced if no value was provided. `false` by default."
+        type: "boolean"
+    required:
+    - "name"
+    - "type"
+    additionalProperties: false
+  Argument:
+    anyOf:
+    - $ref: "#/definitions/StringArgument"
+    - $ref: "#/definitions/BooleanArgument"
+    - $ref: "#/definitions/BooleanTrueArgument"
+    - $ref: "#/definitions/IntegerArgument"
+    - $ref: "#/definitions/LongArgument"
+    - $ref: "#/definitions/BooleanFalseArgument"
+    - $ref: "#/definitions/DoubleArgument"
+    - $ref: "#/definitions/FileArgument"
+  ArgumentGroup:
+    type: "object"
+    properties:
+      name:
+        description: "The name of the argument group."
+        type: "string"
+      description:
+        description: "A description of the argument group. Multiline descriptions\
+          \ are supported."
+        type: "string"
+      arguments:
+        description: "List of the arguments names."
+        type: "array"
+        items:
+          $ref: "#/definitions/Argument"
+    required:
+    - "name"
+    - "arguments"
+    additionalProperties: false
+  JavaScriptScript:
+    description: "An executable JavaScript script.\nWhen defined in functionality.resources,\
+      \ only the first entry will be executed when running the built component or\
+      \ when running `viash run`.\nWhen defined in functionality.test_resources, all\
+      \ entries will be executed during `viash test`."
+    type: "object"
+    properties:
+      path:
+        description: "The path of the input file. Can be a relative or an absolute\
+          \ path, or a URI. Mutually exclusive with `text`."
+        type: "string"
+      text:
+        description: "The content of the resulting file specified as a string. Mutually\
+          \ exclusive with `path`."
+        type: "string"
+      is_executable:
+        description: "Whether the resulting resource file should be made executable."
+        type: "boolean"
+      type:
+        description: "An executable JavaScript script.\nWhen defined in functionality.resources,\
+          \ only the first entry will be executed when running the built component\
+          \ or when running `viash run`.\nWhen defined in functionality.test_resources,\
+          \ all entries will be executed during `viash test`."
+        const: "javascript_script"
+      dest:
+        description: "Resulting filename of the resource. From within a script, the\
+          \ file can be accessed at `meta[\"resources_dir\"] + \"/\" + dest`. If unspecified,\
+          \ `dest` will be set to the basename of the `path` parameter."
+        type: "string"
+    required:
+    - "type"
+    additionalProperties: false
+  CSharpScript:
+    description: "An executable C# script.\nWhen defined in functionality.resources,\
+      \ only the first entry will be executed when running the built component or\
+      \ when running `viash run`.\nWhen defined in functionality.test_resources, all\
+      \ entries will be executed during `viash test`."
+    type: "object"
+    properties:
+      path:
+        description: "The path of the input file. Can be a relative or an absolute\
+          \ path, or a URI. Mutually exclusive with `text`."
+        type: "string"
+      text:
+        description: "The content of the resulting file specified as a string. Mutually\
+          \ exclusive with `path`."
+        type: "string"
+      is_executable:
+        description: "Whether the resulting resource file should be made executable."
+        type: "boolean"
+      type:
+        description: "An executable C# script.\nWhen defined in functionality.resources,\
+          \ only the first entry will be executed when running the built component\
+          \ or when running `viash run`.\nWhen defined in functionality.test_resources,\
+          \ all entries will be executed during `viash test`."
+        const: "csharp_script"
+      dest:
+        description: "Resulting filename of the resource. From within a script, the\
+          \ file can be accessed at `meta[\"resources_dir\"] + \"/\" + dest`. If unspecified,\
+          \ `dest` will be set to the basename of the `path` parameter."
+        type: "string"
+    required:
+    - "type"
+    additionalProperties: false
+  Executable:
+    description: "An executable file."
+    type: "object"
+    properties:
+      path:
+        description: "The path of the input file. Can be a relative or an absolute\
+          \ path, or a URI. Mutually exclusive with `text`."
+        type: "string"
+      text:
+        description: "The content of the resulting file specified as a string. Mutually\
+          \ exclusive with `path`."
+        type: "string"
+      is_executable:
+        description: "Whether the resulting resource file should be made executable."
+        type: "boolean"
+      type:
+        description: "An executable file."
+        const: "executable"
+      dest:
+        description: "Resulting filename of the resource. From within a script, the\
+          \ file can be accessed at `meta[\"resources_dir\"] + \"/\" + dest`. If unspecified,\
+          \ `dest` will be set to the basename of the `path` parameter."
+        type: "string"
+    required:
+    - "type"
+    additionalProperties: false
+  ScalaScript:
+    description: "An executable Scala script.\nWhen defined in functionality.resources,\
+      \ only the first entry will be executed when running the built component or\
+      \ when running `viash run`.\nWhen defined in functionality.test_resources, all\
+      \ entries will be executed during `viash test`."
+    type: "object"
+    properties:
+      path:
+        description: "The path of the input file. Can be a relative or an absolute\
+          \ path, or a URI. Mutually exclusive with `text`."
+        type: "string"
+      text:
+        description: "The content of the resulting file specified as a string. Mutually\
+          \ exclusive with `path`."
+        type: "string"
+      is_executable:
+        description: "Whether the resulting resource file should be made executable."
+        type: "boolean"
+      type:
+        description: "An executable Scala script.\nWhen defined in functionality.resources,\
+          \ only the first entry will be executed when running the built component\
+          \ or when running `viash run`.\nWhen defined in functionality.test_resources,\
+          \ all entries will be executed during `viash test`."
+        const: "scala_script"
+      dest:
+        description: "Resulting filename of the resource. From within a script, the\
+          \ file can be accessed at `meta[\"resources_dir\"] + \"/\" + dest`. If unspecified,\
+          \ `dest` will be set to the basename of the `path` parameter."
+        type: "string"
+    required:
+    - "type"
+    additionalProperties: false
+  NextflowScript:
+    description: "A Nextflow script. Work in progress; added mainly for annotation\
+      \ at the moment."
+    type: "object"
+    properties:
+      path:
+        description: "The path of the input file. Can be a relative or an absolute\
+          \ path, or a URI. Mutually exclusive with `text`."
+        type: "string"
+      text:
+        description: "The content of the resulting file specified as a string. Mutually\
+          \ exclusive with `path`."
+        type: "string"
+      entrypoint:
+        description: "The name of the workflow to be executed."
+        type: "string"
+      is_executable:
+        description: "Whether the resulting resource file should be made executable."
+        type: "boolean"
+      type:
+        description: "A Nextflow script. Work in progress; added mainly for annotation\
+          \ at the moment."
+        const: "nextflow_script"
+      dest:
+        description: "Resulting filename of the resource. From within a script, the\
+          \ file can be accessed at `meta[\"resources_dir\"] + \"/\" + dest`. If unspecified,\
+          \ `dest` will be set to the basename of the `path` parameter."
+        type: "string"
+    required:
+    - "type"
+    additionalProperties: false
+  PlainFile:
+    description: "A plain file. This can only be used as a supporting resource for\
+      \ the main script or unit tests."
+    type: "object"
+    properties:
+      path:
+        description: "The path of the input file. Can be a relative or an absolute\
+          \ path, or a URI. Mutually exclusive with `text`."
+        type: "string"
+      text:
+        description: "The content of the resulting file specified as a string. Mutually\
+          \ exclusive with `path`."
+        type: "string"
+      is_executable:
+        description: "Whether the resulting resource file should be made executable."
+        type: "boolean"
+      type:
+        description: "A plain file. This can only be used as a supporting resource\
+          \ for the main script or unit tests."
+        const: "file"
+      dest:
+        description: "Resulting filename of the resource. From within a script, the\
+          \ file can be accessed at `meta[\"resources_dir\"] + \"/\" + dest`. If unspecified,\
+          \ `dest` will be set to the basename of the `path` parameter."
+        type: "string"
+    required:
+    - "path"
+    additionalProperties: false
+  BashScript:
+    description: "An executable Bash script.\nWhen defined in functionality.resources,\
+      \ only the first entry will be executed when running the built component or\
+      \ when running `viash run`.\nWhen defined in functionality.test_resources, all\
+      \ entries will be executed during `viash test`."
+    type: "object"
+    properties:
+      path:
+        description: "The path of the input file. Can be a relative or an absolute\
+          \ path, or a URI. Mutually exclusive with `text`."
+        type: "string"
+      text:
+        description: "The content of the resulting file specified as a string. Mutually\
+          \ exclusive with `path`."
+        type: "string"
+      is_executable:
+        description: "Whether the resulting resource file should be made executable."
+        type: "boolean"
+      type:
+        description: "An executable Bash script.\nWhen defined in functionality.resources,\
+          \ only the first entry will be executed when running the built component\
+          \ or when running `viash run`.\nWhen defined in functionality.test_resources,\
+          \ all entries will be executed during `viash test`."
+        const: "bash_script"
+      dest:
+        description: "Resulting filename of the resource. From within a script, the\
+          \ file can be accessed at `meta[\"resources_dir\"] + \"/\" + dest`. If unspecified,\
+          \ `dest` will be set to the basename of the `path` parameter."
+        type: "string"
+    required:
+    - "type"
+    additionalProperties: false
+  PythonScript:
+    description: "An executable Python script.\nWhen defined in functionality.resources,\
+      \ only the first entry will be executed when running the built component or\
+      \ when running `viash run`.\nWhen defined in functionality.test_resources, all\
+      \ entries will be executed during `viash test`."
+    type: "object"
+    properties:
+      path:
+        description: "The path of the input file. Can be a relative or an absolute\
+          \ path, or a URI. Mutually exclusive with `text`."
+        type: "string"
+      text:
+        description: "The content of the resulting file specified as a string. Mutually\
+          \ exclusive with `path`."
+        type: "string"
+      is_executable:
+        description: "Whether the resulting resource file should be made executable."
+        type: "boolean"
+      type:
+        description: "An executable Python script.\nWhen defined in functionality.resources,\
+          \ only the first entry will be executed when running the built component\
+          \ or when running `viash run`.\nWhen defined in functionality.test_resources,\
+          \ all entries will be executed during `viash test`."
+        const: "python_script"
+      dest:
+        description: "Resulting filename of the resource. From within a script, the\
+          \ file can be accessed at `meta[\"resources_dir\"] + \"/\" + dest`. If unspecified,\
+          \ `dest` will be set to the basename of the `path` parameter."
+        type: "string"
+    required:
+    - "type"
+    additionalProperties: false
+  RScript:
+    description: "An executable R script.\nWhen defined in functionality.resources,\
+      \ only the first entry will be executed when running the built component or\
+      \ when running `viash run`.\nWhen defined in functionality.test_resources, all\
+      \ entries will be executed during `viash test`."
+    type: "object"
+    properties:
+      path:
+        description: "The path of the input file. Can be a relative or an absolute\
+          \ path, or a URI. Mutually exclusive with `text`."
+        type: "string"
+      text:
+        description: "The content of the resulting file specified as a string. Mutually\
+          \ exclusive with `path`."
+        type: "string"
+      is_executable:
+        description: "Whether the resulting resource file should be made executable."
+        type: "boolean"
+      type:
+        description: "An executable R script.\nWhen defined in functionality.resources,\
+          \ only the first entry will be executed when running the built component\
+          \ or when running `viash run`.\nWhen defined in functionality.test_resources,\
+          \ all entries will be executed during `viash test`."
+        const: "r_script"
+      dest:
+        description: "Resulting filename of the resource. From within a script, the\
+          \ file can be accessed at `meta[\"resources_dir\"] + \"/\" + dest`. If unspecified,\
+          \ `dest` will be set to the basename of the `path` parameter."
+        type: "string"
+    required:
+    - "type"
+    additionalProperties: false
+  Resource:
+    anyOf:
+    - $ref: "#/definitions/JavaScriptScript"
+    - $ref: "#/definitions/CSharpScript"
+    - $ref: "#/definitions/Executable"
+    - $ref: "#/definitions/ScalaScript"
+    - $ref: "#/definitions/NextflowScript"
+    - $ref: "#/definitions/PlainFile"
+    - $ref: "#/definitions/BashScript"
+    - $ref: "#/definitions/PythonScript"
+    - $ref: "#/definitions/RScript"
+  NextflowDirectives:
+    description: "Directives are optional settings that affect the execution of the\
+      \ process.\n"
+    type: "object"
+    properties:
+      beforeScript:
+        description: "The `beforeScript` directive allows you to execute a custom\
+          \ (Bash) snippet before the main process script is run. This may be useful\
+          \ to initialise the underlying cluster environment or for other custom initialisation.\n\
+          \nSee [`beforeScript`](https://www.nextflow.io/docs/latest/process.html#beforeScript).\n"
+        type: "string"
+      module:
+        anyOf:
+        - description: "Environment Modules is a package manager that allows you to\
+            \ dynamically configure your execution environment and easily switch between\
+            \ multiple versions of the same software tool.\n\nIf it is available in\
+            \ your system you can use it with Nextflow in order to configure the processes\
+            \ execution environment in your pipeline.\n\nIn a process definition you\
+            \ can use the `module` directive to load a specific module version to\
+            \ be used in the process execution environment.\n\nSee [`module`](https://www.nextflow.io/docs/latest/process.html#module).\n"
+          type: "string"
+        - description: "Environment Modules is a package manager that allows you to\
+            \ dynamically configure your execution environment and easily switch between\
+            \ multiple versions of the same software tool.\n\nIf it is available in\
+            \ your system you can use it with Nextflow in order to configure the processes\
+            \ execution environment in your pipeline.\n\nIn a process definition you\
+            \ can use the `module` directive to load a specific module version to\
+            \ be used in the process execution environment.\n\nSee [`module`](https://www.nextflow.io/docs/latest/process.html#module).\n"
+          type: "array"
+          items:
+            type: "string"
+      queue:
+        anyOf:
+        - description: "The `queue` directory allows you to set the queue where jobs\
+            \ are scheduled when using a grid based executor in your pipeline.\n\n\
+            See [`queue`](https://www.nextflow.io/docs/latest/process.html#queue).\n"
+          type: "string"
+        - description: "The `queue` directory allows you to set the queue where jobs\
+            \ are scheduled when using a grid based executor in your pipeline.\n\n\
+            See [`queue`](https://www.nextflow.io/docs/latest/process.html#queue).\n"
+          type: "array"
+          items:
+            type: "string"
+      label:
+        anyOf:
+        - description: "The `label` directive allows the annotation of processes with\
+            \ mnemonic identifier of your choice.\n\nSee [`label`](https://www.nextflow.io/docs/latest/process.html#label).\n"
+          type: "string"
+        - description: "The `label` directive allows the annotation of processes with\
+            \ mnemonic identifier of your choice.\n\nSee [`label`](https://www.nextflow.io/docs/latest/process.html#label).\n"
+          type: "array"
+          items:
+            type: "string"
+      container:
+        anyOf:
+        - description: "The `container` directive allows you to execute the process\
+            \ script in a Docker container.\n\nIt requires the Docker daemon to be\
+            \ running in machine where the pipeline is executed, i.e. the local machine\
+            \ when using the local executor or the cluster nodes when the pipeline\
+            \ is deployed through a grid executor.\n\nViash implements allows either\
+            \ a string value or a map. In case a map is used, the allowed keys are:\
+            \ `registry`, `image`, and `tag`. The `image` value must be specified.\n\
+            \nSee [`container`](https://www.nextflow.io/docs/latest/process.html#container).\n"
+          type: "object"
+          additionalProperties:
+            description: "The `container` directive allows you to execute the process\
+              \ script in a Docker container.\n\nIt requires the Docker daemon to\
+              \ be running in machine where the pipeline is executed, i.e. the local\
+              \ machine when using the local executor or the cluster nodes when the\
+              \ pipeline is deployed through a grid executor.\n\nViash implements\
+              \ allows either a string value or a map. In case a map is used, the\
+              \ allowed keys are: `registry`, `image`, and `tag`. The `image` value\
+              \ must be specified.\n\nSee [`container`](https://www.nextflow.io/docs/latest/process.html#container).\n"
+            type: "string"
+        - description: "The `container` directive allows you to execute the process\
+            \ script in a Docker container.\n\nIt requires the Docker daemon to be\
+            \ running in machine where the pipeline is executed, i.e. the local machine\
+            \ when using the local executor or the cluster nodes when the pipeline\
+            \ is deployed through a grid executor.\n\nViash implements allows either\
+            \ a string value or a map. In case a map is used, the allowed keys are:\
+            \ `registry`, `image`, and `tag`. The `image` value must be specified.\n\
+            \nSee [`container`](https://www.nextflow.io/docs/latest/process.html#container).\n"
+          type: "string"
+      publishDir:
+        anyOf:
+        - anyOf:
+          - description: "The `publishDir` directive allows you to publish the process\
+              \ output files to a specified folder.\n\nViash implements this directive\
+              \ as a plain string or a map. The allowed keywords for the map are:\
+              \ `path`, `mode`, `overwrite`, `pattern`, `saveAs`, `enabled`. The `path`\
+              \ key and value are required.\nThe allowed values for `mode` are: `symlink`,\
+              \ `rellink`, `link`, `copy`, `copyNoFollow`, `move`.\n\nSee [`publishDir`](https://www.nextflow.io/docs/latest/process.html#publishdir).\n"
+            type: "string"
+          - description: "The `publishDir` directive allows you to publish the process\
+              \ output files to a specified folder.\n\nViash implements this directive\
+              \ as a plain string or a map. The allowed keywords for the map are:\
+              \ `path`, `mode`, `overwrite`, `pattern`, `saveAs`, `enabled`. The `path`\
+              \ key and value are required.\nThe allowed values for `mode` are: `symlink`,\
+              \ `rellink`, `link`, `copy`, `copyNoFollow`, `move`.\n\nSee [`publishDir`](https://www.nextflow.io/docs/latest/process.html#publishdir).\n"
+            type: "object"
+            additionalProperties:
+              description: "The `publishDir` directive allows you to publish the process\
+                \ output files to a specified folder.\n\nViash implements this directive\
+                \ as a plain string or a map. The allowed keywords for the map are:\
+                \ `path`, `mode`, `overwrite`, `pattern`, `saveAs`, `enabled`. The\
+                \ `path` key and value are required.\nThe allowed values for `mode`\
+                \ are: `symlink`, `rellink`, `link`, `copy`, `copyNoFollow`, `move`.\n\
+                \nSee [`publishDir`](https://www.nextflow.io/docs/latest/process.html#publishdir).\n"
+              type: "string"
+        - description: "The `publishDir` directive allows you to publish the process\
+            \ output files to a specified folder.\n\nViash implements this directive\
+            \ as a plain string or a map. The allowed keywords for the map are: `path`,\
+            \ `mode`, `overwrite`, `pattern`, `saveAs`, `enabled`. The `path` key\
+            \ and value are required.\nThe allowed values for `mode` are: `symlink`,\
+            \ `rellink`, `link`, `copy`, `copyNoFollow`, `move`.\n\nSee [`publishDir`](https://www.nextflow.io/docs/latest/process.html#publishdir).\n"
+          type: "array"
+          items:
+            anyOf:
+            - description: "The `publishDir` directive allows you to publish the process\
+                \ output files to a specified folder.\n\nViash implements this directive\
+                \ as a plain string or a map. The allowed keywords for the map are:\
+                \ `path`, `mode`, `overwrite`, `pattern`, `saveAs`, `enabled`. The\
+                \ `path` key and value are required.\nThe allowed values for `mode`\
+                \ are: `symlink`, `rellink`, `link`, `copy`, `copyNoFollow`, `move`.\n\
+                \nSee [`publishDir`](https://www.nextflow.io/docs/latest/process.html#publishdir).\n"
+              type: "string"
+            - description: "The `publishDir` directive allows you to publish the process\
+                \ output files to a specified folder.\n\nViash implements this directive\
+                \ as a plain string or a map. The allowed keywords for the map are:\
+                \ `path`, `mode`, `overwrite`, `pattern`, `saveAs`, `enabled`. The\
+                \ `path` key and value are required.\nThe allowed values for `mode`\
+                \ are: `symlink`, `rellink`, `link`, `copy`, `copyNoFollow`, `move`.\n\
+                \nSee [`publishDir`](https://www.nextflow.io/docs/latest/process.html#publishdir).\n"
+              type: "object"
+              additionalProperties:
+                description: "The `publishDir` directive allows you to publish the\
+                  \ process output files to a specified folder.\n\nViash implements\
+                  \ this directive as a plain string or a map. The allowed keywords\
+                  \ for the map are: `path`, `mode`, `overwrite`, `pattern`, `saveAs`,\
+                  \ `enabled`. The `path` key and value are required.\nThe allowed\
+                  \ values for `mode` are: `symlink`, `rellink`, `link`, `copy`, `copyNoFollow`,\
+                  \ `move`.\n\nSee [`publishDir`](https://www.nextflow.io/docs/latest/process.html#publishdir).\n"
+                type: "string"
+      maxForks:
+        anyOf:
+        - description: "The `maxForks` directive allows you to define the maximum\
+            \ number of process instances that can be executed in parallel. By default\
+            \ this value is equals to the number of CPU cores available minus 1.\n\
+            \nIf you want to execute a process in a sequential manner, set this directive\
+            \ to one.\n\nSee [`maxForks`](https://www.nextflow.io/docs/latest/process.html#maxforks).\n"
+          type: "string"
+        - description: "The `maxForks` directive allows you to define the maximum\
+            \ number of process instances that can be executed in parallel. By default\
+            \ this value is equals to the number of CPU cores available minus 1.\n\
+            \nIf you want to execute a process in a sequential manner, set this directive\
+            \ to one.\n\nSee [`maxForks`](https://www.nextflow.io/docs/latest/process.html#maxforks).\n"
+          type: "integer"
+      maxErrors:
+        anyOf:
+        - description: "The `maxErrors` directive allows you to specify the maximum\
+            \ number of times a process can fail when using the `retry` error strategy.\
+            \ By default this directive is disabled.\n\nSee [`maxErrors`](https://www.nextflow.io/docs/latest/process.html#maxerrors).\n"
+          type: "string"
+        - description: "The `maxErrors` directive allows you to specify the maximum\
+            \ number of times a process can fail when using the `retry` error strategy.\
+            \ By default this directive is disabled.\n\nSee [`maxErrors`](https://www.nextflow.io/docs/latest/process.html#maxerrors).\n"
+          type: "integer"
+      cpus:
+        anyOf:
+        - description: "The `cpus` directive allows you to define the number of (logical)\
+            \ CPU required by the process' task.\n\nSee [`cpus`](https://www.nextflow.io/docs/latest/process.html#cpus).\n"
+          type: "integer"
+        - description: "The `cpus` directive allows you to define the number of (logical)\
+            \ CPU required by the process' task.\n\nSee [`cpus`](https://www.nextflow.io/docs/latest/process.html#cpus).\n"
+          type: "string"
+      accelerator:
+        description: "The `accelerator` directive allows you to specify the hardware\
+          \ accelerator requirement for the task execution e.g. GPU processor.\n\n\
+          Viash implements this directive as a map with accepted keywords: `type`,\
+          \ `limit`, `request`, and `runtime`.\n\nSee [`accelerator`](https://www.nextflow.io/docs/latest/process.html#accelerator).\n"
+        type: "object"
+        additionalProperties:
+          description: "The `accelerator` directive allows you to specify the hardware\
+            \ accelerator requirement for the task execution e.g. GPU processor.\n\
+            \nViash implements this directive as a map with accepted keywords: `type`,\
+            \ `limit`, `request`, and `runtime`.\n\nSee [`accelerator`](https://www.nextflow.io/docs/latest/process.html#accelerator).\n"
+          type: "string"
+      time:
+        description: "The `time` directive allows you to define how long a process\
+          \ is allowed to run.\n\nSee [`time`](https://www.nextflow.io/docs/latest/process.html#time).\n"
+        type: "string"
+      afterScript:
+        description: "The `afterScript` directive allows you to execute a custom (Bash)\
+          \ snippet immediately after the main process has run. This may be useful\
+          \ to clean up your staging area.\n\nSee [`afterScript`](https://www.nextflow.io/docs/latest/process.html#afterscript).\n"
+        type: "string"
+      executor:
+        description: "The `executor` defines the underlying system where processes\
+          \ are executed. By default a process uses the executor defined globally\
+          \ in the nextflow.config file.\n\nThe `executor` directive allows you to\
+          \ configure what executor has to be used by the process, overriding the\
+          \ default configuration. The following values can be used:\n\n| Name | Executor\
+          \ |\n|------|----------|\n| awsbatch | The process is executed using the\
+          \ AWS Batch service. | \n| azurebatch | The process is executed using the\
+          \ Azure Batch service. | \n| condor | The process is executed using the\
+          \ HTCondor job scheduler. | \n| google-lifesciences | The process is executed\
+          \ using the Google Genomics Pipelines service. | \n| ignite | The process\
+          \ is executed using the Apache Ignite cluster. | \n| k8s | The process is\
+          \ executed using the Kubernetes cluster. | \n| local | The process is executed\
+          \ in the computer where Nextflow is launched. | \n| lsf | The process is\
+          \ executed using the Platform LSF job scheduler. | \n| moab | The process\
+          \ is executed using the Moab job scheduler. | \n| nqsii | The process is\
+          \ executed using the NQSII job scheduler. | \n| oge | Alias for the sge\
+          \ executor. | \n| pbs | The process is executed using the PBS/Torque job\
+          \ scheduler. | \n| pbspro | The process is executed using the PBS Pro job\
+          \ scheduler. | \n| sge | The process is executed using the Sun Grid Engine\
+          \ / Open Grid Engine. | \n| slurm | The process is executed using the SLURM\
+          \ job scheduler. | \n| tes | The process is executed using the GA4GH TES\
+          \ service. | \n| uge | Alias for the sge executor. |\n\nSee [`executor`](https://www.nextflow.io/docs/latest/process.html#executor).\n"
+        type: "string"
+      containerOptions:
+        anyOf:
+        - description: "The `containerOptions` directive allows you to specify any\
+            \ container execution option supported by the underlying container engine\
+            \ (ie. Docker, Singularity, etc). This can be useful to provide container\
+            \ settings only for a specific process e.g. mount a custom path.\n\nSee\
+            \ [`containerOptions`](https://www.nextflow.io/docs/latest/process.html#containeroptions).\n"
+          type: "string"
+        - description: "The `containerOptions` directive allows you to specify any\
+            \ container execution option supported by the underlying container engine\
+            \ (ie. Docker, Singularity, etc). This can be useful to provide container\
+            \ settings only for a specific process e.g. mount a custom path.\n\nSee\
+            \ [`containerOptions`](https://www.nextflow.io/docs/latest/process.html#containeroptions).\n"
+          type: "array"
+          items:
+            type: "string"
+      disk:
+        description: "The `disk` directive allows you to define how much local disk\
+          \ storage the process is allowed to use.\n\nSee [`disk`](https://www.nextflow.io/docs/latest/process.html#disk).\n"
+        type: "string"
+      tag:
+        description: "The `tag` directive allows you to associate each process execution\
+          \ with a custom label, so that it will be easier to identify them in the\
+          \ log file or in the trace execution report.\n\nSee [`tag`](https://www.nextflow.io/docs/latest/process.html#tag).\n"
+        type: "string"
+      conda:
+        anyOf:
+        - description: "The `conda` directive allows for the definition of the process\
+            \ dependencies using the Conda package manager.\n\nNextflow automatically\
+            \ sets up an environment for the given package names listed by in the\
+            \ `conda` directive.\n\nSee [`conda`](https://www.nextflow.io/docs/latest/process.html#conda).\n"
+          type: "string"
+        - description: "The `conda` directive allows for the definition of the process\
+            \ dependencies using the Conda package manager.\n\nNextflow automatically\
+            \ sets up an environment for the given package names listed by in the\
+            \ `conda` directive.\n\nSee [`conda`](https://www.nextflow.io/docs/latest/process.html#conda).\n"
+          type: "array"
+          items:
+            type: "string"
+      machineType:
+        description: " The `machineType` can be used to specify a predefined Google\
+          \ Compute Platform machine type when running using the Google Life Sciences\
+          \ executor.\n\nSee [`machineType`](https://www.nextflow.io/docs/latest/process.html#machinetype).\n"
+        type: "string"
+      stageInMode:
+        description: "The `stageInMode` directive defines how input files are staged-in\
+          \ to the process work directory. The following values are allowed:\n\n|\
+          \ Value | Description |\n|-------|-------------| \n| copy | Input files\
+          \ are staged in the process work directory by creating a copy. | \n| link\
+          \ | Input files are staged in the process work directory by creating an\
+          \ (hard) link for each of them. | \n| symlink | Input files are staged in\
+          \ the process work directory by creating a symbolic link with an absolute\
+          \ path for each of them (default). | \n| rellink | Input files are staged\
+          \ in the process work directory by creating a symbolic link with a relative\
+          \ path for each of them. | \n\nSee [`stageInMode`](https://www.nextflow.io/docs/latest/process.html#stageinmode).\n"
+        type: "string"
+      cache:
+        anyOf:
+        - description: "The `cache` directive allows you to store the process results\
+            \ to a local cache. When the cache is enabled and the pipeline is launched\
+            \ with the resume option, any following attempt to execute the process,\
+            \ along with the same inputs, will cause the process execution to be skipped,\
+            \ producing the stored data as the actual results.\n\nThe caching feature\
+            \ generates a unique key by indexing the process script and inputs. This\
+            \ key is used to identify univocally the outputs produced by the process\
+            \ execution.\n\nThe `cache` is enabled by default, you can disable it\
+            \ for a specific process by setting the cache directive to `false`.\n\n\
+            Accepted values are: `true`, `false`, `\"deep\"`, and `\"lenient\"`.\n\
+            \nSee [`cache`](https://www.nextflow.io/docs/latest/process.html#cache).\n"
+          type: "boolean"
+        - description: "The `cache` directive allows you to store the process results\
+            \ to a local cache. When the cache is enabled and the pipeline is launched\
+            \ with the resume option, any following attempt to execute the process,\
+            \ along with the same inputs, will cause the process execution to be skipped,\
+            \ producing the stored data as the actual results.\n\nThe caching feature\
+            \ generates a unique key by indexing the process script and inputs. This\
+            \ key is used to identify univocally the outputs produced by the process\
+            \ execution.\n\nThe `cache` is enabled by default, you can disable it\
+            \ for a specific process by setting the cache directive to `false`.\n\n\
+            Accepted values are: `true`, `false`, `\"deep\"`, and `\"lenient\"`.\n\
+            \nSee [`cache`](https://www.nextflow.io/docs/latest/process.html#cache).\n"
+          type: "string"
+      pod:
+        anyOf:
+        - description: "The `pod` directive allows the definition of pods specific\
+            \ settings, such as environment variables, secrets and config maps when\
+            \ using the Kubernetes executor.\n\nSee [`pod`](https://www.nextflow.io/docs/latest/process.html#pod).\n"
+          type: "object"
+          additionalProperties:
+            description: "The `pod` directive allows the definition of pods specific\
+              \ settings, such as environment variables, secrets and config maps when\
+              \ using the Kubernetes executor.\n\nSee [`pod`](https://www.nextflow.io/docs/latest/process.html#pod).\n"
+            type: "string"
+        - description: "The `pod` directive allows the definition of pods specific\
+            \ settings, such as environment variables, secrets and config maps when\
+            \ using the Kubernetes executor.\n\nSee [`pod`](https://www.nextflow.io/docs/latest/process.html#pod).\n"
+          type: "array"
+          items:
+            type: "object"
+            additionalProperties:
+              type: "string"
+      penv:
+        description: "The `penv` directive allows you to define the parallel environment\
+          \ to be used when submitting a parallel task to the SGE resource manager.\n\
+          \nSee [`penv`](https://www.nextflow.io/docs/latest/process.html#penv).\n"
+        type: "string"
+      scratch:
+        anyOf:
+        - description: "The `scratch` directive allows you to execute the process\
+            \ in a temporary folder that is local to the execution node.\n\nSee [`scratch`](https://www.nextflow.io/docs/latest/process.html#scratch).\n"
+          type: "boolean"
+        - description: "The `scratch` directive allows you to execute the process\
+            \ in a temporary folder that is local to the execution node.\n\nSee [`scratch`](https://www.nextflow.io/docs/latest/process.html#scratch).\n"
+          type: "string"
+      storeDir:
+        description: "The `storeDir` directive allows you to define a directory that\
+          \ is used as a permanent cache for your process results.\n\nSee [`storeDir`](https://www.nextflow.io/docs/latest/process.html#storeDir).\n"
+        type: "string"
+      maxRetries:
+        anyOf:
+        - description: "The `maxRetries` directive allows you to define the maximum\
+            \ number of times a process instance can be re-submitted in case of failure.\
+            \ This value is applied only when using the retry error strategy. By default\
+            \ only one retry is allowed.\n\nSee [`maxRetries`](https://www.nextflow.io/docs/latest/process.html#maxretries).\n"
+          type: "string"
+        - description: "The `maxRetries` directive allows you to define the maximum\
+            \ number of times a process instance can be re-submitted in case of failure.\
+            \ This value is applied only when using the retry error strategy. By default\
+            \ only one retry is allowed.\n\nSee [`maxRetries`](https://www.nextflow.io/docs/latest/process.html#maxretries).\n"
+          type: "integer"
+      echo:
+        anyOf:
+        - description: "By default the stdout produced by the commands executed in\
+            \ all processes is ignored. By setting the `echo` directive to true, you\
+            \ can forward the process stdout to the current top running process stdout\
+            \ file, showing it in the shell terminal.\n \nSee [`echo`](https://www.nextflow.io/docs/latest/process.html#echo).\n"
+          type: "boolean"
+        - description: "By default the stdout produced by the commands executed in\
+            \ all processes is ignored. By setting the `echo` directive to true, you\
+            \ can forward the process stdout to the current top running process stdout\
+            \ file, showing it in the shell terminal.\n \nSee [`echo`](https://www.nextflow.io/docs/latest/process.html#echo).\n"
+          type: "string"
+      errorStrategy:
+        description: "The `errorStrategy` directive allows you to define how an error\
+          \ condition is managed by the process. By default when an error status is\
+          \ returned by the executed script, the process stops immediately. This in\
+          \ turn forces the entire pipeline to terminate.\n\nTable of available error\
+          \ strategies:\n| Name | Executor |\n|------|----------|\n| `terminate` |\
+          \ Terminates the execution as soon as an error condition is reported. Pending\
+          \ jobs are killed (default) |\n| `finish` | Initiates an orderly pipeline\
+          \ shutdown when an error condition is raised, waiting the completion of\
+          \ any submitted job. |\n| `ignore` | Ignores processes execution errors.\
+          \ |\n| `retry` | Re-submit for execution a process returning an error condition.\
+          \ |\n\nSee [`errorStrategy`](https://www.nextflow.io/docs/latest/process.html#errorstrategy).\n"
+        type: "string"
+      memory:
+        description: "The `memory` directive allows you to define how much memory\
+          \ the process is allowed to use.\n\nSee [`memory`](https://www.nextflow.io/docs/latest/process.html#memory).\n"
+        type: "string"
+      stageOutMode:
+        description: "The `stageOutMode` directive defines how output files are staged-out\
+          \ from the scratch directory to the process work directory. The following\
+          \ values are allowed:\n\n| Value | Description |\n|-------|-------------|\
+          \ \n| copy | Output files are copied from the scratch directory to the work\
+          \ directory. | \n| move | Output files are moved from the scratch directory\
+          \ to the work directory. | \n| rsync | Output files are copied from the\
+          \ scratch directory to the work directory by using the rsync utility. |\n\
+          \nSee [`stageOutMode`](https://www.nextflow.io/docs/latest/process.html#stageoutmode).\n"
+        type: "string"
+    required: []
+    additionalProperties: false
+  NextflowAuto:
+    description: "Automated processing flags which can be toggled on or off."
+    type: "object"
+    properties:
+      simplifyInput:
+        description: "If `true`, an input tuple only containing only a single File\
+          \ (e.g. `[\"foo\", file(\"in.h5ad\")]`) is automatically transformed to\
+          \ a map (i.e. `[\"foo\", [ input: file(\"in.h5ad\") ] ]`).\n\nDefault: `true`.\n"
+        type: "boolean"
+      simplifyOutput:
+        description: "If `true`, an output tuple containing a map with a File (e.g.\
+          \ `[\"foo\", [ output: file(\"out.h5ad\") ] ]`) is automatically transformed\
+          \ to a map (i.e. `[\"foo\", file(\"out.h5ad\")]`).\n\nDefault: `true`.\n"
+        type: "boolean"
+      publish:
+        description: "If `true`, the module's outputs are automatically published\
+          \ to `params.publishDir`.\nWill throw an error if `params.publishDir` is\
+          \ not defined.\n\nDefault: `false`.\n"
+        type: "boolean"
+      transcript:
+        description: "If `true`, the module's transcripts from `work/` are automatically\
+          \ published to `params.transcriptDir`.\nIf not defined, `params.publishDir\
+          \ + \"/_transcripts\"` will be used.\nWill throw an error if neither are\
+          \ defined.\n\nDefault: `false`.\n"
+        type: "boolean"
+    required: []
+    additionalProperties: false
+  NextflowConfig:
+    description: "Allows tweaking how the Nextflow Config file is generated."
+    type: "object"
+    properties:
+      labels:
+        description: "A series of default labels to specify memory and cpu constraints.\n\
+          \nThe default memory labels are defined as \"mem1gb\", \"mem2gb\", \"mem4gb\"\
+          , ... upto \"mem512tb\" and follows powers of 2.\nThe default cpu labels\
+          \ are defined as \"cpu1\", \"cpu2\", \"cpu5\", \"cpu10\", ... upto \"cpu1000\"\
+          \ and follows a semi logarithmic scale (1, 2, 5 per decade).\n\nConceptually\
+          \ it is possible for a Viash Config to overwrite the full labels parameter,\
+          \ however likely it is more efficient to add additional labels\nin the Viash\
+          \ Project with a config mod.\n"
+        type: "object"
+        additionalProperties:
+          description: "A series of default labels to specify memory and cpu constraints.\n\
+            \nThe default memory labels are defined as \"mem1gb\", \"mem2gb\", \"\
+            mem4gb\", ... upto \"mem512tb\" and follows powers of 2.\nThe default\
+            \ cpu labels are defined as \"cpu1\", \"cpu2\", \"cpu5\", \"cpu10\", ...\
+            \ upto \"cpu1000\" and follows a semi logarithmic scale (1, 2, 5 per decade).\n\
+            \nConceptually it is possible for a Viash Config to overwrite the full\
+            \ labels parameter, however likely it is more efficient to add additional\
+            \ labels\nin the Viash Project with a config mod.\n"
+          type: "string"
+      script:
+        anyOf:
+        - description: "Includes a single string or list of strings into the nextflow.config\
+            \ file.\nThis can be used to add custom profiles or include an additional\
+            \ config file.\n"
+          type: "string"
+        - description: "Includes a single string or list of strings into the nextflow.config\
+            \ file.\nThis can be used to add custom profiles or include an additional\
+            \ config file.\n"
+          type: "array"
+          items:
+            type: "string"
+    required: []
+    additionalProperties: false
+  DockerSetupStrategy:
+    $comment: "TODO add descriptions to different strategies"
+    enum:
+    - "cb"
+    - "ifneedbepullelsecachedbuild"
+    - "donothing"
+    - "gentlepush"
+    - "alwayspullelsebuild"
+    - "build"
+    - "alwayspull"
+    - "alwaysbuild"
+    - "ifneedbebuild"
+    - "pullelsebuild"
+    - "p"
+    - "alwayspullelsecachedbuild"
+    - "pull"
+    - "maybepush"
+    - "ifneedbepullelsebuild"
+    - "cachedbuild"
+    - "pullelsecachedbuild"
+    - "push"
+    - "forcepush"
+    - "alwayspush"
+    - "b"
+    - "pushifnotpresent"
+    - "alwayscachedbuild"
+    - "meh"
+    - "ifneedbepull"
+    - "ifneedbecachedbuild"
+    description: "The Docker setup strategy to use when building a container."
+  Direction:
+    enum:
+    - "input"
+    - "output"
+    description: "Makes this argument an `input` or an `output`, as in does the file/folder\
+      \ needs to be read or written. `input` by default."
+  Status:
+    enum:
+    - "enabled"
+    - "disabled"
+    - "deprecated"
+    description: "Allows setting a component to active, deprecated or disabled."
+  DockerResolveVolume:
+    $comment: "TODO make fully case insensitive"
+    enum:
+    - "manual"
+    - "automatic"
+    - "auto"
+    - "Manual"
+    - "Automatic"
+    - "Auto"
+    description: "Enables or disables automatic volume mapping. Enabled when set to\
+      \ `Automatic` or disabled when set to `Manual`. Default: `Automatic`"
diff --git a/src/common/schemas/task_control_method.yaml b/src/common/schemas/task_control_method.yaml
new file mode 100644
index 0000000000..8d62f6be43
--- /dev/null
+++ b/src/common/schemas/task_control_method.yaml
@@ -0,0 +1,68 @@
+title: Control Method
+description: |
+  A control method is used to test the relative performance of all other methods,
+  and also as a quality control for the pipeline as a whole. A control method can
+  either be a positive control or a negative control. The positive control and
+  negative control methods set a maximum and minimum threshold for performance,
+  so any new method should perform better than the negative control methods and
+  worse than the positive control method.
+type: object
+required: [__merge__, functionality, platforms]
+properties:
+  __merge__:
+    "$ref": "defs_common.yaml#/definitions/CompAPIMerge"
+  functionality:
+    type: object
+    description: Information regarding the functionality of the component.
+    required: [name, info, resources]
+    additionalProperties: false
+    properties:
+      name:
+        "$ref": "defs_common.yaml#/definitions/Name"
+      status:
+        "$ref": "defs_viash.yaml#/definitions/Status"
+      info:
+        type: object
+        description: Metadata of the component.
+        additionalProperties: false
+        required: [label, summary, description, preferred_normalization]
+        properties:
+          label:
+            "$ref": "defs_common.yaml#/definitions/Label"
+          summary:
+            "$ref": "defs_common.yaml#/definitions/Summary"
+          description:
+            "$ref": "defs_common.yaml#/definitions/Description"
+          preferred_normalization: 
+            "$ref": "defs_common.yaml#/definitions/PreferredNormalization"
+          reference:
+            "$ref": "defs_common.yaml#/definitions/BibtexReference"
+          documentation_url:
+            "$ref": "defs_common.yaml#/definitions/DocumentationURL"
+          repository_url:
+            "$ref": "defs_common.yaml#/definitions/RepositoryURL"
+          variants:
+            "$ref": "defs_common.yaml#/definitions/MethodVariants"
+      arguments:
+        type: array
+        description: Component-specific parameters.
+        items:
+          "$ref": "defs_viash.yaml#/definitions/Argument"
+      resources:
+        type: array
+        description: Resources required to run the component.
+        items:
+          "$ref": "defs_viash.yaml#/definitions/Resource"
+      test_resources:
+        type: array
+        description: One or more scripts and resources used to test the component.
+        items:
+          "$ref": "defs_viash.yaml#/definitions/Resource"
+  platforms:
+    type: array
+    description: A list of platforms which Viash generates target artifacts for.
+    items:
+      anyOf:
+        - "$ref": "defs_common.yaml#/definitions/PlatformDocker"
+        - "$ref": "defs_common.yaml#/definitions/PlatformNative"
+        - "$ref": "defs_common.yaml#/definitions/PlatformVdsl3"
diff --git a/src/common/schemas/task_info.yaml b/src/common/schemas/task_info.yaml
new file mode 100644
index 0000000000..be6a1e3447
--- /dev/null
+++ b/src/common/schemas/task_info.yaml
@@ -0,0 +1,22 @@
+title: Task info
+description: A file format specification file.
+type: object
+additionalProperties: false
+required: [name, label, summary, motivation, description]
+properties:
+  name:
+    $ref: "defs_common.yaml#/definitions/Name"
+  label:
+    $ref: "defs_common.yaml#/definitions/Label"
+  summary: 
+    $ref: "defs_common.yaml#/definitions/Summary"
+  image:
+    $ref: "defs_common.yaml#/definitions/Image"
+  motivation: 
+    $ref: "defs_common.yaml#/definitions/Description"
+  description: 
+    $ref: "defs_common.yaml#/definitions/Description"
+  authors:
+    type: array
+    items:
+      $ref: "defs_common.yaml#/definitions/Author"
diff --git a/src/common/schemas/task_method.yaml b/src/common/schemas/task_method.yaml
new file mode 100644
index 0000000000..25c59c7a47
--- /dev/null
+++ b/src/common/schemas/task_method.yaml
@@ -0,0 +1,65 @@
+title: Method
+description: |
+  A method is a specific technique used to solve the task problem and is
+  compared to the control methods and other methods to determine the best
+  approach for the task depending on the type of dataset.
+type: object
+required: [__merge__, functionality, platforms]
+properties:
+  __merge__:
+    "$ref": "defs_common.yaml#/definitions/CompAPIMerge"
+  functionality:
+    type: object
+    description: Information regarding the functionality of the component.
+    required: [name, info, resources]
+    additionalProperties: false
+    properties:
+      name:
+        "$ref": "defs_common.yaml#/definitions/Name"
+      status:
+        "$ref": "defs_viash.yaml#/definitions/Status"
+      info:
+        type: object
+        description: Metadata of the component.
+        additionalProperties: false
+        required: [label, summary, description, preferred_normalization, reference, documentation_url, repository_url]
+        properties:
+          label:
+            "$ref": "defs_common.yaml#/definitions/Label"
+          summary:
+            "$ref": "defs_common.yaml#/definitions/Summary"
+          description:
+            "$ref": "defs_common.yaml#/definitions/Description"
+          preferred_normalization: 
+            "$ref": "defs_common.yaml#/definitions/PreferredNormalization"
+          reference:
+            "$ref": "defs_common.yaml#/definitions/BibtexReference"
+          documentation_url:
+            "$ref": "defs_common.yaml#/definitions/DocumentationURL"
+          repository_url:
+            "$ref": "defs_common.yaml#/definitions/RepositoryURL"
+          variants:
+            "$ref": "defs_common.yaml#/definitions/MethodVariants"
+      arguments:
+        type: array
+        description: Component-specific parameters.
+        items:
+          "$ref": "defs_viash.yaml#/definitions/Argument"
+      resources:
+        type: array
+        description: Resources required to run the component.
+        items:
+          "$ref": "defs_viash.yaml#/definitions/Resource"
+      test_resources:
+        type: array
+        description: One or more scripts and resources used to test the component.
+        items:
+          "$ref": "defs_viash.yaml#/definitions/Resource"
+  platforms:
+    type: array
+    description: A list of platforms which Viash generates target artifacts for.
+    items:
+      anyOf:
+        - "$ref": "defs_common.yaml#/definitions/PlatformDocker"
+        - "$ref": "defs_common.yaml#/definitions/PlatformNative"
+        - "$ref": "defs_common.yaml#/definitions/PlatformVdsl3"
diff --git a/src/common/schemas/task_metric.yaml b/src/common/schemas/task_metric.yaml
new file mode 100644
index 0000000000..35932e9e7a
--- /dev/null
+++ b/src/common/schemas/task_metric.yaml
@@ -0,0 +1,86 @@
+title: Metric
+description: |
+  A metric is a quantitative measure used to evaluate the performance of the
+  different methods in solving the specific task problem.
+type: object
+required: [__merge__, functionality, platforms]
+properties:
+  __merge__:
+    "$ref": "defs_common.yaml#/definitions/CompAPIMerge"
+  functionality:
+    type: object
+    description: Information regarding the functionality of the component.
+    required: [name, info, resources]
+    additionalProperties: false
+    properties:
+      name:
+        "$ref": "defs_common.yaml#/definitions/Name"
+      status:
+        "$ref": "defs_viash.yaml#/definitions/Status"
+      info:
+        type: object
+        description: Metadata of the component.
+        additionalProperties: false
+        required: [metrics]
+        properties:
+          metrics:
+            type: array
+            minItems: 1
+            items:
+              type: object
+              description: Metadata of each metric.
+              additionalProperties: false
+              required: [label, summary, description, reference, min, max, maximize]
+              properties:
+                name:
+                  "$ref": "defs_common.yaml#/definitions/Name"
+                label:
+                  "$ref": "defs_common.yaml#/definitions/Label"
+                summary:
+                  "$ref": "defs_common.yaml#/definitions/Summary"
+                description:
+                  "$ref": "defs_common.yaml#/definitions/Description"
+                reference:
+                  "$ref": "defs_common.yaml#/definitions/BibtexReference"
+                documentation_url:
+                  "$ref": "defs_common.yaml#/definitions/DocumentationURL"
+                repository_url:
+                  "$ref": "defs_common.yaml#/definitions/RepositoryURL"
+                variants:
+                  "$ref": "defs_common.yaml#/definitions/MethodVariants"
+                min:
+                  description: The lowest possible value of the metric.
+                  oneOf:
+                    - type: number
+                    - const: "-.inf"
+                max:
+                  description: The highest possible value of the metric.
+                  oneOf:
+                    - type: number
+                    - const: "+.inf"
+                maximize:
+                  type: boolean
+                  description: Whether a higher metric value is better.
+      arguments:
+        type: array
+        description: Component-specific parameters.
+        items:
+          "$ref": "defs_viash.yaml#/definitions/Argument"
+      resources:
+        type: array
+        description: Resources required to run the component.
+        items:
+          "$ref": "defs_viash.yaml#/definitions/Resource"
+      test_resources:
+        type: array
+        description: One or more scripts and resources used to test the component.
+        items:
+          "$ref": "defs_viash.yaml#/definitions/Resource"
+  platforms:
+    type: array
+    description: A list of platforms which Viash generates target artifacts for.
+    items:
+      anyOf:
+        - "$ref": "defs_common.yaml#/definitions/PlatformDocker"
+        - "$ref": "defs_common.yaml#/definitions/PlatformNative"
+        - "$ref": "defs_common.yaml#/definitions/PlatformVdsl3"
diff --git a/src/common/sync_test_resources/config.vsh.yaml b/src/common/sync_test_resources/config.vsh.yaml
new file mode 100644
index 0000000000..f443d634e8
--- /dev/null
+++ b/src/common/sync_test_resources/config.vsh.yaml
@@ -0,0 +1,44 @@
+functionality:
+  name: "sync_test_resources"
+  namespace: "common"
+  version: "dev"
+  description: Synchronise the test resources from s3 to resources_test
+  usage: |
+    sync_test_resources
+    sync_test_resources --input s3://openproblems-data/resources_test --output resources_test
+  arguments:
+    - name: "--input"
+      alternatives: ["-i"]
+      type: string
+      description: "Path to the S3 bucket to sync from."
+      default: "s3://openproblems-data/resources_test"
+    - name: "--output"
+      alternatives: ["-o"]
+      type: file
+      default: resources_test
+      direction: output
+      description: "Path to the test resource directory."
+    - name: "--quiet"
+      type: boolean_true
+      description: "Displays the operations that would be performed using the specified command without actually running them."
+    - name: "--dryrun"
+      type: boolean_true
+      description: "Does not display the operations performed from the specified command."
+    - name: "--delete"
+      type: boolean_true
+      description: "Files that exist in the destination but not in the source are deleted during sync."
+    - name: "--exclude"
+      type: "string"
+      multiple: true
+      description: Exclude all files or objects from the command that matches the specified pattern.
+  resources:
+    - type: bash_script
+      path: script.sh
+  test_resources:
+    - type: bash_script
+      path: run_test.sh
+platforms:
+  - type: docker
+    image: "amazon/aws-cli:2.7.12"
+  - type: native
+  - type: nextflow
diff --git a/src/common/sync_test_resources/run_test.sh b/src/common/sync_test_resources/run_test.sh
new file mode 100755
index 0000000000..67f2504531
--- /dev/null
+++ b/src/common/sync_test_resources/run_test.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+
+## VIASH START
+## VIASH END
+
+echo ">> Run aws s3 sync"
+./$meta_functionality_name \
+  --input s3://openproblems-data/resources_test/common/pancreas \
+  --output foo \
+  --quiet
+
+echo ">> Check whether the right files were copied"
+[ ! -f foo/dataset.h5ad ] && echo csv should have been copied && exit 1
+
+echo ">> Test succeeded!"
\ No newline at end of file
diff --git a/src/common/sync_test_resources/script.sh b/src/common/sync_test_resources/script.sh
new file mode 100644
index 0000000000..c97b9fcdfd
--- /dev/null
+++ b/src/common/sync_test_resources/script.sh
@@ -0,0 +1,34 @@
+#!/bin/bash
+
+## VIASH START
+par_input='s3://openproblems-data/resources_test'
+par_output='resources_test'
+## VIASH END
+
+extra_params=( )
+
+if [ "$par_quiet" == "true" ]; then
+  extra_params+=( "--quiet" )
+fi
+if [ "$par_dryrun" == "true" ]; then
+  extra_params+=( "--dryrun" )
+fi
+if [ "$par_delete" == "true" ]; then
+  extra_params+=( "--delete" )
+fi
+
+if [ ! -z ${par_exclude+x} ]; then
+  IFS=":"
+  for var in $par_exclude; do
+    unset IFS
+    extra_params+=( "--exclude" "$var" )
+  done
+fi
+
+
+# Disable the use of the Amazon EC2 instance metadata service (IMDS).
+# see https://florian.ec/blog/github-actions-awscli-errors/
+# or https://github.com/aws/aws-cli/issues/5234#issuecomment-705831465
+export AWS_EC2_METADATA_DISABLED=true
+
+aws s3 sync "$par_input" "$par_output" --no-sign-request "${extra_params[@]}"
diff --git a/src/datasets/README.md b/src/datasets/README.md
new file mode 100644
index 0000000000..a27e061326
--- /dev/null
+++ b/src/datasets/README.md
@@ -0,0 +1,219 @@
+
+- <a href="#common-datasets" id="toc-common-datasets">Common datasets</a>
+  - <a href="#pipeline-topology" id="toc-pipeline-topology">Pipeline
+    topology</a>
+  - <a href="#file-format-api" id="toc-file-format-api">File format API</a>
+    - <a href="#datasetpcahvg"
+      id="toc-datasetpcahvg"><code>Dataset+Pca+Hvg</code></a>
+    - <a href="#normalized-dataset"
+      id="toc-normalized-dataset"><code>Normalized Dataset</code></a>
+    - <a href="#datasetpca" id="toc-datasetpca"><code>Dataset+Pca</code></a>
+    - <a href="#raw-dataset" id="toc-raw-dataset"><code>Raw Dataset</code></a>
+  - <a href="#component-api" id="toc-component-api">Component API</a>
+    - <a href="#dataset-loader"
+      id="toc-dataset-loader"><code>Dataset Loader</code></a>
+    - <a href="#normalization"
+      id="toc-normalization"><code>Normalization</code></a>
+    - <a href="#processor-hvg"
+      id="toc-processor-hvg"><code>Processor Hvg</code></a>
+    - <a href="#processor-pca"
+      id="toc-processor-pca"><code>Processor Pca</code></a>
+
+# Common datasets
+
+## Pipeline topology
+
+``` mermaid
+%%| column: screen-inset-shaded
+flowchart LR
+  file_dataset(Dataset+Pca+Hvg)
+  file_normalized(Normalized Dataset)
+  file_pca(Dataset+Pca)
+  file_raw(Raw Dataset)
+  comp_dataset_loader[/Dataset Loader/]
+  comp_normalization[/Normalization/]
+  comp_processor_hvg[/Processor Hvg/]
+  comp_processor_pca[/Processor Pca/]
+  file_raw---comp_normalization
+  file_pca---comp_processor_hvg
+  file_normalized---comp_processor_pca
+  comp_dataset_loader-->file_raw
+  comp_normalization-->file_normalized
+  comp_processor_hvg-->file_dataset
+  comp_processor_pca-->file_pca
+```
+
+## File format API
+
+### `Dataset+Pca+Hvg`
+
+A normalised data with a PCA embedding and HVG selection
+
+Used in:
+
+- [processor hvg](#processor%20hvg): output (as output)
+
+Slots:
+
+| struct | name             | type    | description                                                             |
+|:-------|:-----------------|:--------|:------------------------------------------------------------------------|
+| layers | counts           | integer | Raw counts                                                              |
+| layers | normalized       | double  | Normalised expression values                                            |
+| obs    | celltype         | string  | Cell type information                                                   |
+| obs    | batch            | string  | Batch information                                                       |
+| obs    | tissue           | string  | Tissue information                                                      |
+| obs    | size_factors     | double  | The size factors created by the normalisation method, if any.           |
+| var    | hvg              | boolean | Whether or not the feature is considered to be a ‘highly variable gene’ |
+| var    | hvg_score        | integer | A ranking of the features by hvg.                                       |
+| obsm   | X_pca            | double  | The resulting PCA embedding.                                            |
+| varm   | pca_loadings     | double  | The PCA loadings matrix.                                                |
+| uns    | dataset_id       | string  | A unique identifier for the dataset                                     |
+| uns    | normalization_id | string  | Which normalization was used                                            |
+| uns    | pca_variance     | double  | The PCA variance objects.                                               |
+
+Example:
+
+    AnnData object
+     obs: 'celltype', 'batch', 'tissue', 'size_factors'
+     var: 'hvg', 'hvg_score'
+     uns: 'dataset_id', 'normalization_id', 'pca_variance'
+     obsm: 'X_pca'
+     varm: 'pca_loadings'
+     layers: 'counts', 'normalized'
+
+### `Normalized Dataset`
+
+A normalized dataset
+
+Used in:
+
+- [normalization](#normalization): output (as output)
+- [processor pca](#processor%20pca): input (as input)
+
+Slots:
+
+| struct | name             | type    | description                                                   |
+|:-------|:-----------------|:--------|:--------------------------------------------------------------|
+| layers | counts           | integer | Raw counts                                                    |
+| layers | normalized       | double  | Normalised expression values                                  |
+| obs    | celltype         | string  | Cell type information                                         |
+| obs    | batch            | string  | Batch information                                             |
+| obs    | tissue           | string  | Tissue information                                            |
+| obs    | size_factors     | double  | The size factors created by the normalisation method, if any. |
+| uns    | dataset_id       | string  | A unique identifier for the dataset                           |
+| uns    | normalization_id | string  | Which normalization was used                                  |
+
+Example:
+
+    AnnData object
+     obs: 'celltype', 'batch', 'tissue', 'size_factors'
+     uns: 'dataset_id', 'normalization_id'
+     layers: 'counts', 'normalized'
+
+### `Dataset+Pca`
+
+A normalised data with a PCA embedding
+
+Used in:
+
+- [processor hvg](#processor%20hvg): input (as input)
+- [processor pca](#processor%20pca): output (as output)
+
+Slots:
+
+| struct | name             | type    | description                                                   |
+|:-------|:-----------------|:--------|:--------------------------------------------------------------|
+| layers | counts           | integer | Raw counts                                                    |
+| layers | normalized       | double  | Normalised expression values                                  |
+| obs    | celltype         | string  | Cell type information                                         |
+| obs    | batch            | string  | Batch information                                             |
+| obs    | tissue           | string  | Tissue information                                            |
+| obs    | size_factors     | double  | The size factors created by the normalisation method, if any. |
+| obsm   | X_pca            | double  | The resulting PCA embedding.                                  |
+| varm   | pca_loadings     | double  | The PCA loadings matrix.                                      |
+| uns    | dataset_id       | string  | A unique identifier for the dataset                           |
+| uns    | normalization_id | string  | Which normalization was used                                  |
+| uns    | pca_variance     | double  | The PCA variance objects.                                     |
+
+Example:
+
+    AnnData object
+     obs: 'celltype', 'batch', 'tissue', 'size_factors'
+     uns: 'dataset_id', 'normalization_id', 'pca_variance'
+     obsm: 'X_pca'
+     varm: 'pca_loadings'
+     layers: 'counts', 'normalized'
+
+### `Raw Dataset`
+
+An unprocessed dataset as output by a dataset loader.
+
+Used in:
+
+- [dataset loader](#dataset%20loader): output (as output)
+- [normalization](#normalization): input (as input)
+
+Slots:
+
+| struct | name       | type    | description                         |
+|:-------|:-----------|:--------|:------------------------------------|
+| layers | counts     | integer | Raw counts                          |
+| obs    | celltype   | string  | Cell type information               |
+| obs    | batch      | string  | Batch information                   |
+| obs    | tissue     | string  | Tissue information                  |
+| uns    | dataset_id | string  | A unique identifier for the dataset |
+
+Example:
+
+    AnnData object
+     obs: 'celltype', 'batch', 'tissue'
+     uns: 'dataset_id'
+     layers: 'counts'
+
+## Component API
+
+### `Dataset Loader`
+
+Arguments:
+
+| Name       | Type                          | Direction | Description                                           |
+|:-----------|:------------------------------|:----------|:------------------------------------------------------|
+| `--output` | [Raw Dataset](#Raw%20dataset) | output    | An unprocessed dataset as output by a dataset loader. |
+
+### `Normalization`
+
+Arguments:
+
+| Name                 | Type                                        | Direction | Description                                                  |
+|:---------------------|:--------------------------------------------|:----------|:-------------------------------------------------------------|
+| `--input`            | [Raw Dataset](#Raw%20dataset)               | input     | An unprocessed dataset as output by a dataset loader.        |
+| `--output`           | [Normalized Dataset](#Normalized%20dataset) | output    | A normalized dataset                                         |
+| `--layer_output`     | `string`                                    | input     | The name of the layer in which to store the normalized data. |
+| `--obs_size_factors` | `string`                                    | input     | In which .obs slot to store the size factors (if any).       |
+
+### `Processor Hvg`
+
+Arguments:
+
+| Name              | Type                                | Direction | Description                                                                |
+|:------------------|:------------------------------------|:----------|:---------------------------------------------------------------------------|
+| `--input`         | [Dataset+Pca](#Dataset+PCA)         | input     | A normalised data with a PCA embedding                                     |
+| `--layer_input`   | `string`                            | input     | Which layer to use as input for the PCA.                                   |
+| `--output`        | [Dataset+Pca+Hvg](#Dataset+PCA+HVG) | output    | A normalised data with a PCA embedding and HVG selection                   |
+| `--var_hvg`       | `string`                            | input     | In which .var slot to store whether a feature is considered to be hvg.     |
+| `--var_hvg_score` | `string`                            | input     | In which .var slot to store whether a ranking of the features by variance. |
+| `--num_features`  | `integer`                           | input     | The number of HVG to select                                                |
+
+### `Processor Pca`
+
+Arguments:
+
+| Name               | Type                                        | Direction | Description                                                                                                          |
+|:-------------------|:--------------------------------------------|:----------|:---------------------------------------------------------------------------------------------------------------------|
+| `--input`          | [Normalized Dataset](#Normalized%20dataset) | input     | A normalized dataset                                                                                                 |
+| `--layer_input`    | `string`                                    | input     | Which layer to use as input for the PCA.                                                                             |
+| `--output`         | [Dataset+Pca](#Dataset+PCA)                 | output    | A normalised data with a PCA embedding                                                                               |
+| `--obsm_embedding` | `string`                                    | input     | In which .obsm slot to store the resulting embedding.                                                                |
+| `--varm_loadings`  | `string`                                    | input     | In which .varm slot to store the resulting loadings matrix.                                                          |
+| `--uns_variance`   | `string`                                    | input     | In which .uns slot to store the resulting variance objects.                                                          |
+| `--num_components` | `integer`                                   | input     | Number of principal components to compute. Defaults to 50, or 1 - minimum dimension size of selected representation. |
diff --git a/src/datasets/README.qmd b/src/datasets/README.qmd
new file mode 100644
index 0000000000..c20045fadc
--- /dev/null
+++ b/src/datasets/README.qmd
@@ -0,0 +1,203 @@
+---
+format: gfm
+toc: true
+---
+
+```{r setup, include=FALSE}
+library(tidyverse)
+library(rlang)
+
+strip_margin <- function(text, symbol = "\\|") {
+  str_replace_all(text, paste0("(\n?)[ \t]*", symbol), "\\1") 
+}
+
+dir <- "src/datasets"
+dir <- "."
+```
+
+# Common datasets
+
+
+
+## Pipeline topology
+
+```{r data, include=FALSE}
+comp_yamls <- list.files(paste0(dir, "/api"), pattern = "comp_", full.names = TRUE)
+file_yamls <- list.files(paste0(dir, "/api"), pattern = "file_", full.names = TRUE)
+
+comp_file <- map_df(comp_yamls, function(yaml_file) {
+  conf <- yaml::read_yaml(yaml_file)
+
+  map_df(conf$functionality$arguments, function(arg) {
+    df <- tibble(
+      comp_name = basename(yaml_file) %>% gsub("\\.yaml", "", .),
+      type = arg$type,
+      arg_name = str_replace_all(arg$name, "^-*", ""),
+      direction = arg$direction %||% "input",
+      description = arg$description
+    )
+    if ("__merge__" %in% names(arg)) {
+      df$file_name <- basename(arg$`__merge__`) %>% gsub("\\.yaml", "", .)
+    }
+    df
+  })
+})
+
+comp_info <- map_df(comp_yamls, function(yaml_file) {
+  conf <- yaml::read_yaml(yaml_file)
+
+  tibble(
+    name = basename(yaml_file) %>% gsub("\\.yaml", "", .),
+    label = name %>% gsub("comp_", "", .) %>% gsub("_", " ", .)
+  )
+})
+
+file_info <- map_df(file_yamls, function(yaml_file) {
+  arg <- yaml::read_yaml(yaml_file)
+  
+  tibble(
+    name = basename(yaml_file) %>% gsub("\\.yaml", "", .),
+    description = arg$description,
+    example = arg$example,
+    label = arg$info$label %||% (name %>% gsub("file_", "", .) %>% gsub("_", " ", .))
+  )
+})
+
+file_slot <- map_df(file_yamls, function(yaml_file) {
+  arg <- yaml::read_yaml(yaml_file)
+
+  map2_df(names(arg$info$slots), arg$info$slots, function(group_name, slot) {
+    df <- map_df(slot, as.data.frame)
+    df$struct <- group_name
+    df$file_name <- basename(yaml_file) %>% gsub("\\.yaml", "", .)
+    df$multiple <- df$multiple %||% FALSE %|% FALSE
+    as_tibble(df)
+  })
+})
+```
+
+```{r flow, echo=FALSE,warning=FALSE,error=FALSE}
+nodes <- bind_rows(
+  file_info %>%
+    transmute(id = name, label = str_to_title(label), is_comp = FALSE),
+  comp_info %>%
+    transmute(id = name, label = str_to_title(label), is_comp = TRUE)
+) %>%
+  mutate(str = paste0(
+    "  ",
+    id, 
+    ifelse(is_comp, "[/", "("), 
+    label,
+    ifelse(is_comp, "/]", ")")
+  ))
+edges <- bind_rows(
+  comp_file %>%
+    filter(direction == "input", !is.na(file_name)) %>%
+    transmute(
+      from = file_name,
+      to = comp_name,
+      arrow = "---"
+    ),
+  comp_file %>%
+    filter(direction == "output", !is.na(file_name)) %>%
+    transmute(
+      from = comp_name, 
+      to = file_name, 
+      arrow = "-->"
+    )
+) %>%
+  mutate(str = paste0("  ", from, arrow, to))
+
+# note: use ```{mermaid} instead of ```mermaid when rendering to html
+out_str <- strip_margin(glue::glue("
+  §```mermaid
+  §%%| column: screen-inset-shaded
+  §flowchart LR
+  §{paste(nodes$str, collapse = '\n')}
+  §{paste(edges$str, collapse = '\n')}
+  §```
+  §"), symbol = "§")
+knitr::asis_output(out_str)
+```
+
+## File format API
+
+```{r file_api, echo=FALSE,warning=FALSE,error=FALSE,output="asis"}
+for (file_name in file_info$name) {
+  arg_info <- file_info %>% filter(name == file_name)
+  sub_out <- file_slot %>% 
+    filter(file_name == !!file_name) %>% 
+    select(struct, name, type, description)
+
+  used_in <- comp_file %>%
+    filter(file_name == !!file_name) %>%
+    left_join(comp_info %>% select(comp_name = name, comp_label = label), by = "comp_name") %>%
+    mutate(str = paste0("* [", comp_label, "](#", comp_label, "): ", arg_name, " (as ", direction, ")")) %>%
+    pull(str)
+
+  example <- sub_out %>%
+    group_by(struct) %>%
+    summarise(
+      str = paste0(unique(struct), ": ", paste0("'", name, "'", collapse = ", "))
+    ) %>%
+    arrange(match(struct, c("obs", "var", "uns", "obsm", "obsp", "varm", "varp", "layers")))
+
+  example_str <- c("    AnnData object", paste0("     ", example$str))
+  
+  out_str <- strip_margin(glue::glue("
+    §### `{str_to_title(arg_info$label)}`
+    §
+    §{arg_info$description}
+    §
+    §Used in:
+    §
+    §{paste(used_in, collapse = '\n')}
+    §
+    §Slots:
+    §
+    §{paste(knitr::kable(sub_out, format = 'pipe'), collapse = '\n')}
+    §
+    §Example:
+    §
+    §{paste(example_str, collapse = '\n')}
+    §
+    §"), symbol = "§")
+  cat(out_str)
+}
+```
+
+
+
+## Component API
+
+```{r comp_api, echo=FALSE,warning=FALSE,error=FALSE,output="asis"}
+# todo: add description
+# todo: add required info fields
+for (comp_name in comp_info$name) {
+  comp <- comp_info %>% filter(name == comp_name)
+  sub_out <- comp_file %>% 
+    filter(comp_name == !!comp_name) %>%
+    left_join(file_info %>% select(file_name = name, file_desc = description, file_label = label), by = "file_name") %>%
+    transmute(
+      Name = paste0("`--", arg_name, "`"),
+      Type = ifelse(
+        is.na(file_label), 
+        paste0("`", type, "`"), 
+        paste0("[", str_to_title(file_label), "](#", file_label, ")")
+      ),
+      Direction = direction,
+      Description = description %|% file_desc
+    )
+  
+  out_str <- strip_margin(glue::glue("
+    §### `{str_to_title(comp$label)}`
+    §
+    §{ifelse(\"description\" %in% names(comp), comp$description, \"\")}
+    §
+    §Arguments:
+    §
+    §{paste(knitr::kable(sub_out, format = 'pipe'), collapse = '\n')}
+    §"), symbol = "§")
+  cat(out_str)
+}
+```
\ No newline at end of file
diff --git a/src/datasets/api/README.md b/src/datasets/api/README.md
new file mode 100644
index 0000000000..7c3b9c8d87
--- /dev/null
+++ b/src/datasets/api/README.md
@@ -0,0 +1,8 @@
+# Component and file format specifications
+
+This folder contains specifications for file formats and component
+interfaces.
+
+These are not only used for documentation (i.e. to document the file
+format of inputs and outputs of a component), but also for unit testing
+and validation of output files.
diff --git a/src/datasets/api/README.qmd b/src/datasets/api/README.qmd
new file mode 100644
index 0000000000..d31a99367e
--- /dev/null
+++ b/src/datasets/api/README.qmd
@@ -0,0 +1,8 @@
+---
+title: Component and file format specifications
+format: gfm
+---
+
+This folder contains specifications for file formats and component interfaces.
+
+These are not only used for documentation (i.e. to document the file format of inputs and outputs of a component), but also for unit testing and validation of output files.
\ No newline at end of file
diff --git a/src/datasets/api/comp_dataset_loader.yaml b/src/datasets/api/comp_dataset_loader.yaml
new file mode 100644
index 0000000000..75909b106a
--- /dev/null
+++ b/src/datasets/api/comp_dataset_loader.yaml
@@ -0,0 +1,16 @@
+functionality:
+  namespace: "datasets/loaders"
+  info:
+    type: dataset_loader
+    type_info:
+      label: Dataset loader
+      summary: A component which generates a "Common dataset". 
+      description: |
+        A dataset loader will typically have an identifier (e.g. a GEO identifier)
+        or URL as input argument and additional arguments to define where the script needs to download a dataset from and how to process it.
+  arguments:
+    - name: "--output"
+      __merge__: file_raw.yaml
+      direction: "output"
+      required: true
+  test_resources: []
\ No newline at end of file
diff --git a/src/datasets/api/comp_normalization.yaml b/src/datasets/api/comp_normalization.yaml
new file mode 100644
index 0000000000..6f2c1ffa64
--- /dev/null
+++ b/src/datasets/api/comp_normalization.yaml
@@ -0,0 +1,36 @@
+functionality:
+  namespace: "datasets/normalization"
+  info:
+    type: dataset_normalization
+    type_info:
+      label: Dataset normalization
+      summary: |
+        A normalization method which processes the raw counts into a normalized dataset.
+      description:
+        A component for normalizing the raw counts as output by dataset loaders into a normalized dataset.
+  arguments:
+    - name: "--input"
+      __merge__: file_raw.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_normalized.yaml
+      direction: output
+      required: true
+    - name: "--normalization_id"
+      type: string
+      description: "The normalization id to store in the dataset metadata. If not specified, the functionality name will be used."
+      required: false
+    - name: "--layer_output"
+      type: string
+      default: "normalized"
+      description: The name of the layer in which to store the normalized data.
+    - name: "--obs_size_factors"
+      type: string
+      default: "size_factors"
+      description: In which .obs slot to store the size factors (if any).
+  test_resources:
+    - path: /resources_test/common/pancreas
+      dest: resources_test/common/pancreas
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
diff --git a/src/datasets/api/comp_processor_hvg.yaml b/src/datasets/api/comp_processor_hvg.yaml
new file mode 100644
index 0000000000..2e24033aac
--- /dev/null
+++ b/src/datasets/api/comp_processor_hvg.yaml
@@ -0,0 +1,40 @@
+functionality:
+  namespace: "datasets/processors"
+  info:
+    type: dataset_processor
+    type_info:
+      label: HVG
+      summary: |
+        Computes the highly variable genes scores.
+      description: |
+        The resulting AnnData will contain both a boolean 'hvg' column in 'var', as well as a numerical 'hvg_score' in 'var'.
+  arguments:
+    - name: "--input"
+      __merge__: file_normalized.yaml
+      required: true
+      direction: input
+    - name: "--input_layer"
+      type: string
+      default: "normalized"
+      description: Which layer to use as input.
+    - name: "--output"
+      direction: output
+      __merge__: file_hvg.yaml
+      required: true
+    - name: "--var_hvg"
+      type: string
+      default: "hvg"
+      description: "In which .var slot to store whether a feature is considered to be hvg."
+    - name: "--var_hvg_score"
+      type: string
+      default: "hvg_score"
+      description: "In which .var slot to store the gene variance score (normalized dispersion)."
+    - name: "--num_features"
+      type: integer
+      default: 1000
+      description: "The number of HVG to select"
+  test_resources:
+    - path: /resources_test/common/pancreas
+      dest: resources_test/common/pancreas
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
diff --git a/src/datasets/api/comp_processor_knn.yaml b/src/datasets/api/comp_processor_knn.yaml
new file mode 100644
index 0000000000..b0e16f8fc4
--- /dev/null
+++ b/src/datasets/api/comp_processor_knn.yaml
@@ -0,0 +1,39 @@
+functionality:
+  namespace: "datasets/processors"
+  info:
+    type: dataset_processor
+    type_info:
+      label: KNN
+      summary: |
+        Computes the k-nearest-neighbours for each cell.
+      description: |
+        The resulting AnnData will contain both the knn distances and the knn connectivities in 'obsp'.
+  arguments:
+    - name: "--input"
+      __merge__: file_pca.yaml
+      required: true
+      direction: input
+    - name: "--input_layer"
+      type: string
+      default: "normalized"
+      description: Which layer to use as input.
+    - name: "--output"
+      direction: output
+      __merge__: file_knn.yaml
+      required: true
+    - name: "--key_added"
+      type: string
+      default: "knn"
+      description: |
+        The neighbors data is added to `.uns[key_added]`, 
+        distances are stored in `.obsp[key_added+'_distances']` and 
+        connectivities in `.obsp[key_added+'_connectivities']`.
+    - name: "--num_neighbors"
+      type: integer
+      default: 15
+      description: "The size of local neighborhood (in terms of number of neighboring data points) used for manifold approximation."
+  test_resources:
+    - path: /resources_test/common/pancreas
+      dest: resources_test/common/pancreas
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
diff --git a/src/datasets/api/comp_processor_pca.yaml b/src/datasets/api/comp_processor_pca.yaml
new file mode 100644
index 0000000000..a7ca82bc07
--- /dev/null
+++ b/src/datasets/api/comp_processor_pca.yaml
@@ -0,0 +1,49 @@
+functionality:
+  namespace: "datasets/processors"
+  info:
+    type: dataset_processor
+    type_info:
+      label: PCA
+      summary: |
+        Computes a PCA embedding of the normalized data.
+      description:
+        The resulting AnnData will contain an embedding in obsm, as well as optional loadings in 'varm'.
+  arguments:
+    - name: "--input"
+      __merge__: file_hvg.yaml
+      required: true
+      direction: input
+    - name: "--input_layer"
+      type: string
+      default: "normalized"
+      description: Which layer to use as input.
+    - name: "--input_var_features"
+      type: string
+      description: Column name in .var matrix that will be used to select which genes to run the PCA on.
+      default: hvg
+    - name: "--output"
+      direction: output
+      __merge__: file_pca.yaml
+      required: true
+    - name: "--obsm_embedding"
+      type: string
+      default: "X_pca"
+      description: "In which .obsm slot to store the resulting embedding."
+    - name: "--varm_loadings"
+      type: string
+      default: "pca_loadings"
+      description: "In which .varm slot to store the resulting loadings matrix."
+    - name: "--uns_variance"
+      type: string
+      default: "pca_variance"
+      description: "In which .uns slot to store the resulting variance objects."
+    - name: "--num_components"
+      type: integer
+      example: 25
+      description: Number of principal components to compute. Defaults to 50, or 1 - minimum dimension size of selected representation.
+  test_resources:
+    - path: /resources_test/common/pancreas
+      dest: resources_test/common/pancreas
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+
diff --git a/src/datasets/api/comp_processor_subset.yaml b/src/datasets/api/comp_processor_subset.yaml
new file mode 100644
index 0000000000..bad64a6762
--- /dev/null
+++ b/src/datasets/api/comp_processor_subset.yaml
@@ -0,0 +1,31 @@
+functionality:
+  namespace: "datasets/processors"
+  info:
+    type: dataset_processor
+    type_info:
+      label: Subset
+      summary: Sample cells and genes randomly.
+      description: This component subsets the layers, obs and var to create smaller test datasets.
+  arguments:
+    - name: "--input"
+      __merge__: file_common_dataset.yaml
+      required: true
+      direction: input
+    - name: "--input_mod2"
+      __merge__: file_common_dataset.yaml
+      direction: input
+      required: false
+    - name: "--output"
+      __merge__: file_common_dataset.yaml
+      direction: output
+      required: true
+    - name: "--output_mod2"
+      __merge__: file_common_dataset.yaml
+      direction: output
+      required: false
+  test_resources:
+    - path: /resources_test/common/pancreas
+      dest: resources_test/common/pancreas
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+
diff --git a/src/datasets/api/comp_processor_svd.yaml b/src/datasets/api/comp_processor_svd.yaml
new file mode 100644
index 0000000000..91413c2624
--- /dev/null
+++ b/src/datasets/api/comp_processor_svd.yaml
@@ -0,0 +1,45 @@
+functionality:
+  namespace: "datasets/processors"
+  info:
+    type: dataset_processor
+    type_info:
+      label: SVD
+      summary: |
+        Computes a SVD PCA embedding of the normalized data.
+      description:
+        The resulting AnnData will contain an embedding in obsm.
+  arguments:
+    - name: "--input"
+      __merge__: file_normalized.yaml
+      required: true
+      direction: input
+    - name: "--input_mod2"
+      __merge__: file_normalized.yaml
+      required: false
+      direction: input
+    - name: "--input_layer"
+      type: string
+      default: "normalized"
+      description: Which layer to use as input.
+    - name: "--output"
+      direction: output
+      __merge__: file_svd.yaml
+      required: true
+    - name: "--output_mod2"
+      direction: output
+      __merge__: file_svd.yaml
+      required: false
+    - name: "--obsm_embedding"
+      type: string
+      default: "X_svd"
+      description: "In which .obsm slot to store the resulting embedding."
+    - name: "--num_components"
+      type: integer
+      default: 100
+      description: Number of principal components to compute. Defaults to 100, or 1 - minimum dimension size of selected representation.
+  test_resources:
+    - path: /resources_test/common/pancreas
+      dest: resources_test/common/pancreas
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+
diff --git a/src/datasets/api/file_common_dataset.yaml b/src/datasets/api/file_common_dataset.yaml
new file mode 100644
index 0000000000..ed7836bf5c
--- /dev/null
+++ b/src/datasets/api/file_common_dataset.yaml
@@ -0,0 +1,9 @@
+__merge__: file_knn.yaml
+type: file
+example: "resources_test/common/pancreas/dataset.h5ad"
+info:
+  label: "Common dataset"
+  summary: A dataset processed by the common dataset processing pipeline. 
+  description: |
+    This dataset contains both raw counts and normalized data matrices,
+    as well as a PCA embedding, HVG selection and a kNN graph.
diff --git a/src/datasets/api/file_hvg.yaml b/src/datasets/api/file_hvg.yaml
new file mode 100644
index 0000000000..697be29e32
--- /dev/null
+++ b/src/datasets/api/file_hvg.yaml
@@ -0,0 +1,16 @@
+__merge__: file_normalized.yaml
+type: file 
+example: "resources_test/common/pancreas/hvg.h5ad"
+info:
+  label: "Dataset+HVG"
+  summary: "A normalised dataset with a PCA embedding and HVG selection."
+  slots:
+    var:
+      - type: boolean
+        name: hvg
+        description: Whether or not the feature is considered to be a 'highly variable gene'
+        required: true
+      - type: double
+        name: hvg_score
+        description: A score for the feature indicating how highly variable it is.
+        required: true
diff --git a/src/datasets/api/file_knn.yaml b/src/datasets/api/file_knn.yaml
new file mode 100644
index 0000000000..de7d2b8df5
--- /dev/null
+++ b/src/datasets/api/file_knn.yaml
@@ -0,0 +1,21 @@
+__merge__: file_pca.yaml
+type: file
+example: "resources_test/common/pancreas/knn.h5ad"
+info:
+  label: "Dataset+HVG+PCA+kNN"
+  summary: "A normalised data with a PCA embedding, HVG selection and a kNN graph"
+  slots:
+    obsp:
+      - type: double
+        name: knn_distances
+        description: K nearest neighbors distance matrix.
+        required: true
+      - type: double
+        name: knn_connectivities
+        description: K nearest neighbors connectivities matrix.
+        required: true
+    uns:
+      - type: object
+        name: knn
+        description: Supplementary K nearest neighbors data.
+        required: true
diff --git a/src/datasets/api/file_multimodal_dataset.yaml b/src/datasets/api/file_multimodal_dataset.yaml
new file mode 100644
index 0000000000..daac29d77b
--- /dev/null
+++ b/src/datasets/api/file_multimodal_dataset.yaml
@@ -0,0 +1,243 @@
+type: file
+example: "resources_test/common/pancreas/dataset.h5ad"
+info:
+  label: "Common dataset"
+  summary: A dataset processed by the common dataset processing pipeline. 
+  description: |
+    This dataset contains both raw counts and normalized data matrices,
+    as well as a SVD embedding and a HVG selection.
+
+    The format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+
+      - type: double
+        name: normalized
+        description: Normalised expression values
+        required: true
+    obs:
+      - type: string
+        name: dataset_id
+        description: Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.
+        required: false
+
+      - type: string
+        name: assay
+        description: Type of assay used to generate the cell data, indicating the methodology or technique employed.
+        required: false
+
+      - type: string
+        name: assay_ontology_term_id
+        description: Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.
+        required: false
+
+      - type: string
+        name: cell_type
+        description: Classification of the cell type based on its characteristics and function within the tissue or organism.
+        required: false
+
+      - type: string
+        name: cell_type_ontology_term_id
+        description: Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.
+        required: false
+
+      - type: string
+        name: development_stage
+        description: Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.
+        required: false
+
+      - type: string
+        name: development_stage_ontology_term_id
+        description: |
+          Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.
+
+          If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  
+          If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.
+          Otherwise, the Uberon (`UBERON:`) ontology is used.
+        required: false
+
+      - type: string
+        name: disease
+        description: Information on any disease or pathological condition associated with the cell or donor.
+        required: false
+
+      - type: string
+        name: disease_ontology_term_id
+        description: |
+          Ontology term identifier for the disease, enabling standardized disease classification and referencing.
+
+          Must be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).
+        required: false
+
+      - type: string
+        name: donor_id
+        description: Identifier for the donor from whom the cell sample is obtained.
+        required: false
+
+      - type: boolean
+        name: is_primary_data
+        description: Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.
+        required: false
+
+      - type: string
+        name: organism
+        description: Organism from which the cell sample is obtained.
+        required: false
+      
+      - type: string
+        name: organism_ontology_term_id
+        description: |
+          Ontology term identifier for the organism, providing a standardized reference for the organism.
+
+          Must be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.
+        required: false
+
+      - type: string
+        name: self_reported_ethnicity
+        description: Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.
+        required: false
+
+      - type: string
+        name: self_reported_ethnicity_ontology_term_id
+        description: |
+          Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.
+
+          If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.
+        required: false
+
+      - type: string
+        name: sex
+        description: Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.
+        required: false
+
+      - type: string
+        name: sex_ontology_term_id
+        description: Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.
+        required: false
+
+      - type: string
+        name: suspension_type
+        description: Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.
+        required: false
+
+      - type: string
+        name: tissue
+        description: Specific tissue from which the cells were derived, key for context and specificity in cell studies.
+        required: false
+
+      - type: string
+        name: tissue_ontology_term_id
+        description: |
+          Ontology term identifier for the tissue, providing a standardized reference for the tissue type.
+
+          For organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).
+          For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.
+        required: false
+
+      - type: string
+        name: tissue_general
+        description: General category or classification of the tissue, useful for broader grouping and comparison of cell data.
+        required: false
+
+      - type: string
+        name: tissue_general_ontology_term_id
+        description: |
+          Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.
+
+          For organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).
+          For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.
+        required: false
+
+      - type: string
+        name: batch
+        description: A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.
+        required: false
+
+      - type: integer
+        name: soma_joinid
+        description: If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.
+        required: false
+
+      - type: double
+        name: size_factors
+        description: The size factors created by the normalisation method, if any.
+        required: false
+    var:
+      - type: string
+        name: feature_id
+        description: Unique identifier for the feature, usually a ENSEMBL gene id.
+        # TODO: make this required once openproblems_v1 dataloader supports it
+        required: false
+      
+      - type: string
+        name: feature_name
+        description: A human-readable name for the feature, usually a gene symbol.
+        # TODO: make this required once the dataloader supports it
+        required: true
+
+      - type: integer
+        name: soma_joinid
+        description: If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.
+        required: false
+
+      - type: boolean
+        name: hvg
+        description: Whether or not the feature is considered to be a 'highly variable gene'
+        required: true
+
+      - type: double
+        name: hvg_score
+        description: A ranking of the features by hvg.
+        required: true
+
+    obsm:
+      - type: double
+        name: X_svd
+        description: The resulting SVD embedding.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.
+        required: true
+
+      - name: dataset_name
+        type: string
+        description: A human-readable name for the dataset.
+        required: true
+
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+        multiple: true
+
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+        multiple: true
+
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
diff --git a/src/datasets/api/file_normalized.yaml b/src/datasets/api/file_normalized.yaml
new file mode 100644
index 0000000000..ea6f14e9fb
--- /dev/null
+++ b/src/datasets/api/file_normalized.yaml
@@ -0,0 +1,22 @@
+__merge__: file_raw.yaml
+type: file
+example: "resources_test/common/pancreas/normalized.h5ad"
+info:
+  label: "Normalized dataset"
+  summary: "A normalized dataset"
+  slots:
+    layers:
+      - type: double
+        name: normalized
+        description: Normalised expression values
+        required: true
+    obs:
+      - type: double
+        name: size_factors
+        description: The size factors created by the normalisation method, if any.
+        required: false
+    uns:
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
diff --git a/src/datasets/api/file_pca.yaml b/src/datasets/api/file_pca.yaml
new file mode 100644
index 0000000000..daa26618e1
--- /dev/null
+++ b/src/datasets/api/file_pca.yaml
@@ -0,0 +1,22 @@
+__merge__: file_hvg.yaml
+type: file
+example: "resources_test/common/pancreas/pca.h5ad"
+info:
+  label: "Dataset+HVG+PCA"
+  summary: "A normalised dataset with a PCA embedding"
+  slots:
+    obsm:
+      - type: double
+        name: X_pca
+        description: The resulting PCA embedding.
+        required: true
+    varm:
+      - type: double
+        name: pca_loadings
+        description: The PCA loadings matrix.
+        required: true
+    uns:
+      - type: double
+        name: pca_variance
+        description: The PCA variance objects.
+        required: true
diff --git a/src/datasets/api/file_raw.yaml b/src/datasets/api/file_raw.yaml
new file mode 100644
index 0000000000..7ffab3b43e
--- /dev/null
+++ b/src/datasets/api/file_raw.yaml
@@ -0,0 +1,205 @@
+type: file
+example: "resources_test/common/pancreas/raw.h5ad"
+info:
+  label: "Raw dataset"
+  summary: An unprocessed dataset as output by a dataset loader.
+  description: |
+    This dataset contains raw counts and metadata as output by a dataset loader.
+
+    The format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+    obs:
+      - type: string
+        name: dataset_id
+        description: Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.
+        required: false
+
+      - type: string
+        name: assay
+        description: Type of assay used to generate the cell data, indicating the methodology or technique employed.
+        required: false
+
+      - type: string
+        name: assay_ontology_term_id
+        description: Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.
+        required: false
+
+      - type: string
+        name: cell_type
+        description: Classification of the cell type based on its characteristics and function within the tissue or organism.
+        required: false
+
+      - type: string
+        name: cell_type_ontology_term_id
+        description: Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.
+        required: false
+
+      - type: string
+        name: development_stage
+        description: Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.
+        required: false
+
+      - type: string
+        name: development_stage_ontology_term_id
+        description: |
+          Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.
+
+          If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  
+          If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.
+          Otherwise, the Uberon (`UBERON:`) ontology is used.
+        required: false
+
+      - type: string
+        name: disease
+        description: Information on any disease or pathological condition associated with the cell or donor.
+        required: false
+
+      - type: string
+        name: disease_ontology_term_id
+        description: |
+          Ontology term identifier for the disease, enabling standardized disease classification and referencing.
+
+          Must be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).
+        required: false
+
+      - type: string
+        name: donor_id
+        description: Identifier for the donor from whom the cell sample is obtained.
+        required: false
+
+      - type: boolean
+        name: is_primary_data
+        description: Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.
+        required: false
+
+      - type: string
+        name: organism
+        description: Organism from which the cell sample is obtained.
+        required: false
+      
+      - type: string
+        name: organism_ontology_term_id
+        description: |
+          Ontology term identifier for the organism, providing a standardized reference for the organism.
+
+          Must be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.
+        required: false
+
+      - type: string
+        name: self_reported_ethnicity
+        description: Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.
+        required: false
+
+      - type: string
+        name: self_reported_ethnicity_ontology_term_id
+        description: |
+          Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.
+
+          If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.
+        required: false
+
+      - type: string
+        name: sex
+        description: Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.
+        required: false
+
+      - type: string
+        name: sex_ontology_term_id
+        description: Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.
+        required: false
+
+      - type: string
+        name: suspension_type
+        description: Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.
+        required: false
+
+      - type: string
+        name: tissue
+        description: Specific tissue from which the cells were derived, key for context and specificity in cell studies.
+        required: false
+
+      - type: string
+        name: tissue_ontology_term_id
+        description: |
+          Ontology term identifier for the tissue, providing a standardized reference for the tissue type.
+
+          For organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).
+          For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.
+        required: false
+
+      - type: string
+        name: tissue_general
+        description: General category or classification of the tissue, useful for broader grouping and comparison of cell data.
+        required: false
+
+      - type: string
+        name: tissue_general_ontology_term_id
+        description: |
+          Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.
+
+          For organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).
+          For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.
+        required: false
+
+      - type: string
+        name: batch
+        description: A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.
+        required: false
+
+      - type: integer
+        name: soma_joinid
+        description: If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.
+        required: false
+    var:
+      - type: string
+        name: feature_id
+        description: Unique identifier for the feature, usually a ENSEMBL gene id.
+        # TODO: make this required once openproblems_v1 dataloader supports it
+        required: false
+      
+      - type: string
+        name: feature_name
+        description: A human-readable name for the feature, usually a gene symbol.
+        # TODO: make this required once the dataloader supports it
+        required: true
+
+      - type: integer
+        name: soma_joinid
+        description: If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.
+        required: false
+    uns:
+      - type: string
+        name: dataset_id
+        description: A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.
+        required: true
+      - name: dataset_name
+        type: string
+        description: A human-readable name for the dataset.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+        multiple: true
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+        multiple: true
diff --git a/src/datasets/api/file_svd.yaml b/src/datasets/api/file_svd.yaml
new file mode 100644
index 0000000000..2a727369e3
--- /dev/null
+++ b/src/datasets/api/file_svd.yaml
@@ -0,0 +1,12 @@
+__merge__: file_normalized.yaml
+type: file
+example: "resources_test/common/pancreas/svd.h5ad"
+info:
+  label: "Dataset+SVD"
+  summary: "A normalised dataset with a SVD embedding"
+  slots:
+    obsm:
+      - type: double
+        name: X_svd
+        description: The resulting SVD embedding.
+        required: true
\ No newline at end of file
diff --git a/src/datasets/loaders/cellxgene_census/config.vsh.yaml b/src/datasets/loaders/cellxgene_census/config.vsh.yaml
new file mode 100644
index 0000000000..667e1c6a6b
--- /dev/null
+++ b/src/datasets/loaders/cellxgene_census/config.vsh.yaml
@@ -0,0 +1,167 @@
+functionality:
+  name: cellxgene_census
+  namespace: datasets/loaders
+  description: |
+    Query cells from a CellxGene Census or custom TileDBSoma object.
+    Aside from fetching the cells' RNA counts (`.X`), cell metadata
+    (`.obs`) and gene metadata (`.var`), this component also fetches
+    the dataset metadata and joins it into the cell metadata.
+  argument_groups:
+    - name: Input database
+      description: "Open CellxGene Census by version or URI."
+      arguments:
+        - name: "--input_uri"
+          type: string
+          description: "If specified, a URI containing the Census SOMA objects. If specified, will take precedence over the `--census_version` argument."
+          required: false
+          example: "s3://bucket/path"
+        - name: "--census_version"
+          description: "Which release of CellxGene census to use. Possible values are \"latest\", \"stable\", or the date of one of the releases (e.g. \"2023-07-25\"). For more information, check the documentation on [Census data releases](https://chanzuckerberg.github.io/cellxgene-census/cellxgene_census_docsite_data_release_info.html)."
+          type: string
+          example: "stable"
+          required: false
+    - name: Cell query
+      description: Arguments related to the query.
+      arguments:
+        - name: "--species"
+          type: string
+          description: The organism to query, usually one of `Homo sapiens` or `Mus musculus`.
+          required: true
+          example: "homo_sapiens"
+        - name: "--obs_value_filter"
+          type: string
+          description: "Filter for selecting the `obs` metadata (i.e. cells). Value is a filter query written in the SOMA `value_filter` syntax."
+          required: true
+          example: "is_primary_data == True and cell_type_ontology_term_id in ['CL:0000136', 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'"
+    - name: Filter cells by grouping
+      description: 
+      arguments:
+        - name: "--cell_filter_grouping"
+          type: string
+          description: |
+            A subset of 'obs' columns by which to group the cells for filtering.
+            Only groups surpassing or equal to the `--cell_filter_minimum_count`
+            threshold will be retained. Take care not to introduce a selection
+            bias against cells with more fine-grained ontology annotations.
+          required: false
+          example: ["dataset_id", "tissue", "assay", "disease", "cell_type"]
+          multiple: true
+        - name: "--cell_filter_minimum_count"
+          type: integer
+          description: |
+            A minimum number of cells per group to retain. If `--cell_filter_grouping`
+            is defined, this parameter should also be provided and vice versa.
+          required: false
+          example: 100
+    - name: Count filtering
+      description: Arguments related to filtering cells and genes by counts.
+      arguments:
+        - name: "--cell_filter_min_genes"
+          type: integer
+          description: Remove cells with less than this number of genes.
+          required: false
+          default: 50
+        - name: "--cell_filter_min_counts"
+          type: integer
+          description: Remove cells with less than this number of counts.
+          required: false
+          default: 0
+        - name: "--gene_filter_min_cells"
+          type: integer
+          description: Remove genes expressed in less than this number of cells.
+          required: false
+          default: 5
+        - name: "--gene_filter_min_counts"
+          type: integer
+          description: Remove genes with less than this number of counts.
+          required: false
+          default: 0
+    - name: Cell metadata
+      description: Cell metadata arguments
+      arguments:
+        - name: "--obs_batch"
+          type: string
+          description: |
+            Location of where to find the observation batch IDs.  
+            
+            * If not specified, the `.obs["batch"]` field will not be included.
+            * If one or more values are specified, the `.obs["batch"]` field will be 
+              set to the concatenated values of the specified fields, separated by
+              the `obs_batch_separator`.
+          required: false
+          multiple: true
+          multiple_sep: ","
+          example: ["batch"]
+        - name: "--obs_batch_separator"
+          type: string
+          description: Separator to use when concatenating the values of the `--obs_batch` fields.
+          required: false
+          default: "+"
+    - name: Dataset metadata
+      description: Information about the dataset that will be stored in the `.uns` slot.
+      arguments:
+        - name: "--dataset_id"
+          type: string
+          description: Unique identifier of the dataset.
+          required: true
+        - name: "--dataset_name"
+          type: string
+          description: Nicely formatted name.
+          required: true
+        - name: "--dataset_url"
+          type: string
+          description: Link to the original source of the dataset.
+          required: false
+        - name: "--dataset_reference"
+          type: string
+          description: Bibtex reference of the paper in which the dataset was published.
+          required: false
+        - name: "--dataset_summary"
+          type: string
+          description: Short description of the dataset.
+          required: true
+        - name: "--dataset_description"
+          type: string
+          description: Long description of the dataset.
+          required: true
+        - name: "--dataset_organism"
+          type: string
+          description: The organism of the dataset.
+          required: true
+    - name: Outputs
+      description: Output arguments.
+      arguments:
+        - name: "--output"
+          type: file
+          description: Output h5ad file.
+          direction: output
+          required: true
+          example: output.h5ad
+        - name: "--output_compression"
+          type: string
+          choices: ["gzip", "lzf"]
+          required: false
+          example: "gzip"
+  resources:
+    - type: python_script
+      path: script.py
+    - path: /src/common/helper_functions/setup_logger.py
+  test_resources:
+    - type: python_script
+      path: test.py
+platforms:
+  - type: docker
+    #image: openproblems/base_python:1.0.0
+    image: python:3.11
+    setup:
+      - type: python
+        packages:
+          - cellxgene-census
+          - scanpy
+    test_setup:
+      - type: python
+        packages:
+          - viashpy
+  - type: nextflow
+    directives:
+      label: [highmem, midcpu]
\ No newline at end of file
diff --git a/src/datasets/loaders/cellxgene_census/script.py b/src/datasets/loaders/cellxgene_census/script.py
new file mode 100644
index 0000000000..49c44b6b32
--- /dev/null
+++ b/src/datasets/loaders/cellxgene_census/script.py
@@ -0,0 +1,190 @@
+import sys
+import cellxgene_census
+import scanpy as sc
+import tiledbsoma as soma
+
+## VIASH START
+par = {
+    "input_uri": None,
+    "census_version": "stable",
+    "species": "mus_musculus",
+    "obs_value_filter": "dataset_id == '49e4ffcc-5444-406d-bdee-577127404ba8'",
+    "cell_filter_grouping": None,
+    "cell_filter_minimum_count": None,
+    "obs_batch": [ "donor_id" ],
+    "obs_batch_separator": "+",
+    "dataset_name": "pretty name",
+    "dataset_url": "url",
+    "dataset_reference": "ref",
+    "dataset_summary": "summ",
+    "dataset_description": "desc",
+    "dataset_organism": "mus_musculus",
+    "output": "output.h5ad",
+    "output_compression": "gzip",
+}
+meta = {"resources_dir": "src/common/helper_functions"}
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+
+from setup_logger import setup_logger
+logger = setup_logger()
+
+def connect_census(uri, census_version):
+    """
+    Connect to CellxGene Census or user-provided TileDBSoma object
+    """
+    ver = census_version or "stable"
+    logger.info("Connecting to CellxGene Census at %s", f"'{uri}'" if uri else f"version '{ver}'")
+    return cellxgene_census.open_soma(uri=uri, census_version=ver)
+
+def get_anndata(census_connection, par):
+    logger.info("Getting gene expression data based on `%s` query.", par["obs_value_filter"])
+    # workaround for https://github.com/chanzuckerberg/cellxgene-census/issues/891
+    return cellxgene_census.get_anndata(
+        census=census_connection,
+        obs_value_filter=par["obs_value_filter"],
+        organism=par["species"]
+    )
+
+    # exp = census_connection["census_data"][par["species"]]
+    # query = exp.axis_query(
+    #     "RNA",
+    #     obs_query=soma.AxisQuery(value_filter=par["obs_value_filter"]),
+    #     var_query=soma.AxisQuery(),
+    # )
+
+    # n_obs = query.n_obs
+    # n_vars = query.n_vars
+    # logger.info(f"Query yields {n_obs} cells and {n_vars} genes.")
+
+    # logger.info("Fetching obs.")
+    # obs = query.obs().concat().to_pandas()
+
+    # logger.info("Fetching var.")
+    # var = query.var().concat().to_pandas()
+
+    # logger.info("Fetching X.")
+    # X = query.X("raw")
+    # Xcoo = X.coos().concat()
+    # Xcoos = Xcoo.to_scipy().tocsr()
+    # Xcoos_subset = Xcoos[obs["soma_joinid"]]
+
+    # logger.info("Creating AnnData object.")
+    # return sc.AnnData(
+    #     layers={"counts": Xcoos_subset},
+    #     obs=obs,
+    #     var=var
+    # )
+
+def filter_min_cells_per_group(adata, par):
+    n_cells_before, _ = adata.shape
+    cell_count = adata.obs \
+        .groupby(par["cell_filter_grouping"])["soma_joinid"] \
+        .transform("count") \
+        
+    adata = adata[cell_count >= par["cell_filter_minimum_count"]]
+    n_cells_after, _ = adata.shape
+    logger.info(
+        "Removed %s cells based on %s cell_filter_minimum_count of %s cell_filter_grouping."
+        % ((n_cells_before - n_cells_after), par["cell_filter_minimum_count"], par["cell_filter_grouping"])
+    )
+    return adata
+
+def filter_by_counts(adata, par):
+    logger.info("Remove cells with few counts and genes with few counts.")
+    n_cells_before, n_genes_before = adata.shape
+    # remove cells with few counts and genes with few counts
+    scanpy_proc = {
+        par["cell_filter_min_counts"]: (sc.pp.filter_cells, "min_counts"),
+        par["cell_filter_min_genes"]: (sc.pp.filter_cells, "min_genes"),
+        par["gene_filter_min_counts"]: (sc.pp.filter_genes, "min_counts"),
+        par["gene_filter_min_cells"]: (sc.pp.filter_genes, "min_cells"),
+    }
+    for threshold, (func, arg) in scanpy_proc.items():
+        if threshold:
+            func(adata, **{arg: threshold})
+    n_cells_after, n_genes_after = adata.shape
+    logger.info("Removed %s cells and %s genes.", (n_cells_before - n_cells_after), (n_genes_before - n_genes_after))
+
+def move_x_to_layers(adata):
+    logger.info("Move .X to .layers['counts']")
+    adata.layers["counts"] = adata.X
+    adata.X = None
+
+def add_batch_to_obs(adata, par):
+    logger.info("Add batch to the AnnData object.")
+    if par["obs_batch"]:
+        # fetch batch columns from obs
+        cols = [adata.obs[key] for key in par["obs_batch"]]
+        
+        # join cols
+        obs_batch = [par["obs_batch_separator"].join(row) for row in zip(*cols)]
+
+        # store in adata
+        adata.obs["batch"] = obs_batch
+
+def add_metadata_to_uns(adata, par):
+    logger.info("Add metadata to the AnnData object.")
+    for key in ["dataset_id", "dataset_name", "dataset_url", "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism"]:
+        adata.uns[key] = par[key]
+
+def print_unique(adata, column):
+    formatted = "', '".join(adata.obs[column].unique())
+    logger.info(f"Unique {column}: ['{formatted}']")
+
+def print_summary(adata):
+    logger.info(f"Resulting dataset: {adata}")
+
+    logger.info("Summary of dataset:")
+    obs_fields = ["assay", "assay_ontology_term_id", "cell_type", "cell_type_ontology_term_id", "dataset_id", "development_stage", "development_stage_ontology_term_id", "disease", "disease_ontology_term_id", "tissue", "tissue_ontology_term_id", "tissue_general", "tissue_general_ontology_term_id"]
+    for field in obs_fields:
+        print_unique(adata, field)
+def write_anndata(adata, par):
+    logger.info("Writing AnnData object to '%s'", par["output"])
+
+    adata.write_h5ad(par["output"], compression=par["output_compression"])
+
+def main(par, meta):
+    # check arguments
+    if (par["cell_filter_grouping"] is None) != (par["cell_filter_minimum_count"] is None):
+        raise NotImplementedError(
+            "You need to specify either both or none of the following parameters: cell_filter_grouping, cell_filter_minimum_count"
+        )
+    
+    with connect_census(uri=par["input_uri"], census_version=par["census_version"]) as conn:
+        adata = get_anndata(conn, par)
+    
+    print(f"AnnData: {adata}", flush=True)
+
+    if par["cell_filter_grouping"] is not None:
+        adata = filter_min_cells_per_group(adata, par)
+
+    # remove cells with few counts and genes with few counts
+    filter_by_counts(adata, par)
+
+    # logger.log(f"Filtered AnnData: {adata}")
+    print(f"Filtered AnnData: {adata}", flush=True)
+
+    # use feature_id as var_names
+    adata.var_names = adata.var["feature_id"]
+
+    # not needed as long as we have our own implementation of `get_anndata`
+    # move .X to .layers["counts"]
+    move_x_to_layers(adata)
+
+    # add batch to obs
+    add_batch_to_obs(adata, par)
+
+    # add metadata to uns
+    add_metadata_to_uns(adata, par)
+
+    # print summary
+    print_summary(adata)
+
+    # write output to file
+    write_anndata(adata, par)
+
+
+if __name__ == "__main__":
+    main(par, meta)
diff --git a/src/datasets/loaders/cellxgene_census/test.py b/src/datasets/loaders/cellxgene_census/test.py
new file mode 100644
index 0000000000..dba41bcc47
--- /dev/null
+++ b/src/datasets/loaders/cellxgene_census/test.py
@@ -0,0 +1,61 @@
+import sys
+import os
+import pytest
+import anndata as ad
+import numpy as np
+
+## VIASH START
+meta = {
+    'resources_dir': './resources_test/',
+    'executable': './target/docker/query/cellxgene_census',
+    'config': '/home/di/code/openpipeline/src/query/cellxgene_census/config.vsh.yaml'
+}
+## VIASH END
+
+def test_cellxgene_extract_metadata_expression(run_component, tmp_path):
+    output_file = tmp_path / "output.h5ad"
+
+    run_component([
+        "--species", "homo_sapiens",
+        "--obs_value_filter", "is_primary_data == True and cell_type_ontology_term_id in ['CL:0000136', 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'",
+        "--output", output_file,
+        "--obs_batch", "sex,sex",
+        "--dataset_id", "test_dataset_id",
+        "--dataset_name", "test_dataset_name",
+        "--dataset_url", "https://test_dataset_url.com",
+        "--dataset_reference", "test_dataset_reference",
+        "--dataset_summary", "test_dataset_summary",
+        "--dataset_description", "test_dataset_description",
+        "--dataset_organism", "test_homo_sapiens",
+    ])
+
+    # check whether file exists
+    assert os.path.exists(output_file), "Output file does not exist"
+
+    adata = ad.read_h5ad(output_file)
+
+    # check obs
+    assert not adata.obs.empty, ".obs should not be empty"
+    assert "is_primary_data" in adata.obs.columns
+    assert np.all(adata.obs["is_primary_data"] == True)
+    assert "cell_type_ontology_term_id" in adata.obs.columns
+    assert "disease" in adata.obs.columns
+    assert adata.n_obs > 10
+    assert np.all([x in ["male+male", "female+female"] for x in adata.obs["batch"]])
+
+    # check var
+    assert "soma_joinid" in adata.var.columns
+    assert "feature_id" in adata.var.columns
+
+    # check uns
+    assert adata.uns["dataset_id"] == "test_dataset_id", "Incorrect .uns['dataset_id']"
+    assert adata.uns["dataset_name"] == "test_dataset_name", "Incorrect .uns['dataset_name']"
+    assert adata.uns["dataset_url"] == "https://test_dataset_url.com", "Incorrect .uns['dataset_url']"
+    assert adata.uns["dataset_reference"] == "test_dataset_reference", "Incorrect .uns['dataset_reference']"
+    assert adata.uns["dataset_summary"] == "test_dataset_summary", "Incorrect .uns['dataset_summary']"
+    assert adata.uns["dataset_description"] == "test_dataset_description", "Incorrect .uns['dataset_description']"
+    assert adata.uns["dataset_organism"] == "test_homo_sapiens", "Incorrect .uns['dataset_organism']"
+
+
+if __name__ == '__main__':
+    sys.exit(pytest.main([__file__]))
diff --git a/src/datasets/loaders/cellxgene_census_from_source_h5ad/config.vsh.yaml b/src/datasets/loaders/cellxgene_census_from_source_h5ad/config.vsh.yaml
new file mode 100644
index 0000000000..7ee4166d9d
--- /dev/null
+++ b/src/datasets/loaders/cellxgene_census_from_source_h5ad/config.vsh.yaml
@@ -0,0 +1,130 @@
+functionality:
+  name: cellxgene_census_from_source_h5ad
+  namespace: datasets/loaders
+  description: |
+    Query cells from a CellxGene Census or custom TileDBSoma object.
+    Aside from fetching the cells' RNA counts (`.X`), cell metadata
+    (`.obs`) and gene metadata (`.var`), this component also fetches
+    the dataset metadata and joins it into the cell metadata.
+  argument_groups:
+    - name: Input
+      description: Input arguments
+      arguments:
+        - name: "--input_id"
+          type: string
+          description: |
+            The dataset ID of the CellxGene Census dataset to query.
+          required: true
+          example: "a93eab58-3d82-4b61-8a2f-d7666dcdb7c4"
+    - name: Count filtering
+      description: Arguments related to filtering cells and genes by counts.
+      arguments:
+        - name: "--cell_filter_min_genes"
+          type: integer
+          description: Remove cells with less than this number of genes.
+          required: false
+          default: 50
+        - name: "--cell_filter_min_counts"
+          type: integer
+          description: Remove cells with less than this number of counts.
+          required: false
+          default: 0
+        - name: "--gene_filter_min_cells"
+          type: integer
+          description: Remove genes expressed in less than this number of cells.
+          required: false
+          default: 5
+        - name: "--gene_filter_min_counts"
+          type: integer
+          description: Remove genes with less than this number of counts.
+          required: false
+          default: 0
+    - name: Cell metadata
+      description: Cell metadata arguments
+      arguments:
+        - name: "--obs_batch"
+          type: string
+          description: |
+            Location of where to find the observation batch IDs.  
+            
+            * If not specified, the `.obs["batch"]` field will not be included.
+            * If one or more values are specified, the `.obs["batch"]` field will be 
+              set to the concatenated values of the specified fields, separated by
+              the `obs_batch_separator`.
+          required: false
+          multiple: true
+          multiple_sep: ","
+          example: ["batch"]
+        - name: "--obs_batch_separator"
+          type: string
+          description: Separator to use when concatenating the values of the `--obs_batch` fields.
+          required: false
+          default: "+"
+    - name: Dataset metadata
+      description: Information about the dataset that will be stored in the `.uns` slot.
+      arguments:
+        - name: "--dataset_id"
+          type: string
+          description: Unique identifier of the dataset.
+          required: true
+        - name: "--dataset_name"
+          type: string
+          description: Nicely formatted name.
+          required: true
+        - name: "--dataset_url"
+          type: string
+          description: Link to the original source of the dataset.
+          required: false
+        - name: "--dataset_reference"
+          type: string
+          description: Bibtex reference of the paper in which the dataset was published.
+          required: false
+        - name: "--dataset_summary"
+          type: string
+          description: Short description of the dataset.
+          required: true
+        - name: "--dataset_description"
+          type: string
+          description: Long description of the dataset.
+          required: true
+        - name: "--dataset_organism"
+          type: string
+          description: The organism of the dataset.
+          required: true
+    - name: Outputs
+      description: Output arguments.
+      arguments:
+        - name: "--output"
+          type: file
+          description: Output h5ad file.
+          direction: output
+          required: true
+          example: output.h5ad
+        - name: "--output_compression"
+          type: string
+          choices: ["gzip", "lzf"]
+          required: false
+          example: "gzip"
+  resources:
+    - type: python_script
+      path: script.py
+    - path: /src/common/helper_functions/setup_logger.py
+  test_resources:
+    - type: python_script
+      path: test.py
+platforms:
+  - type: docker
+    #image: openproblems/base_python:1.0.0
+    image: python:3.11
+    setup:
+      - type: python
+        packages:
+          - cellxgene-census
+          - scanpy
+    test_setup:
+      - type: python
+        packages:
+          - viashpy
+  - type: nextflow
+    directives:
+      label: [highmem, midcpu]
\ No newline at end of file
diff --git a/src/datasets/loaders/cellxgene_census_from_source_h5ad/script.py b/src/datasets/loaders/cellxgene_census_from_source_h5ad/script.py
new file mode 100644
index 0000000000..900232e6a4
--- /dev/null
+++ b/src/datasets/loaders/cellxgene_census_from_source_h5ad/script.py
@@ -0,0 +1,131 @@
+import sys
+import cellxgene_census
+import scanpy as sc
+import tempfile
+
+## VIASH START
+par = {
+    "input_id": "0895c838-e550-48a3-a777-dbcd35d30272",
+    "obs_batch": [ "donor_id" ],
+    "obs_batch_separator": "+",
+    "dataset_name": "pretty name",
+    "dataset_url": "url",
+    "dataset_reference": "ref",
+    "dataset_summary": "summ",
+    "dataset_description": "desc",
+    "dataset_organism": "mus_musculus",
+    "output": "output.h5ad",
+    "output_compression": "gzip",
+}
+meta = {"resources_dir": "src/common/helper_functions"}
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+
+from setup_logger import setup_logger
+logger = setup_logger()
+
+def get_anndata(par):
+    with tempfile.TemporaryDirectory() as tmp:
+        path = tmp + "/source.h5ad"
+        logger.info("Downloading source h5ad for dataset '%s' to '%s'.", par["input_id"], path)
+        cellxgene_census.download_source_h5ad(par["input_id"], path)
+        return sc.read_h5ad(path)
+
+def filter_by_counts(adata, par):
+    logger.info("Remove cells with few counts and genes with few counts.")
+    t0 = adata.shape
+    # remove cells with few counts and genes with few counts
+    if par["cell_filter_min_counts"]:
+        sc.pp.filter_cells(adata, min_counts=par["cell_filter_min_counts"])
+    if par["cell_filter_min_genes"]:
+        sc.pp.filter_cells(adata, min_genes=par["cell_filter_min_genes"])
+    if par["gene_filter_min_counts"]:
+        sc.pp.filter_genes(adata, min_counts=par["gene_filter_min_counts"])
+    if par["gene_filter_min_cells"]:
+        sc.pp.filter_genes(adata, min_cells=par["gene_filter_min_cells"])
+    t1 = adata.shape
+    logger.info("Removed %s cells and %s genes.", (t0[0] - t1[0]), (t0[1] - t1[1]))
+
+def move_x_to_layers(adata):
+    logger.info("Move .X to .layers['counts']")
+    adata.layers["counts"] = adata.X
+    adata.X = None
+
+def add_batch_to_obs(adata, par):
+    logger.info("Add batch to the AnnData object.")
+    if par["obs_batch"]:
+        # fetch batch columns from obs
+        cols = [adata.obs[key] for key in par["obs_batch"]]
+        
+        # join cols
+        obs_batch = [par["obs_batch_separator"].join(row) for row in zip(*cols)]
+
+        # store in adata
+        adata.obs["batch"] = obs_batch
+
+def add_metadata_to_uns(adata, par):
+    logger.info("Add metadata to the AnnData object.")
+    for key in ["dataset_id", "dataset_name", "dataset_url", "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism"]:
+        adata.uns[key] = par[key]
+
+def print_unique(adata, column):
+    if column not in adata.obs.columns:
+        logger.info(f"Column {column} not found in obs")
+        return
+    formatted = "', '".join(adata.obs[column].unique())
+    logger.info(f"Unique {column}: ['{formatted}']")
+
+def print_summary(adata):
+    logger.info(f"Resulting dataset: {adata}")
+
+    logger.info("Summary of dataset:")
+    print_unique(adata, "assay")
+    print_unique(adata, "assay_ontology_term_id")
+    print_unique(adata, "cell_type")
+    print_unique(adata, "cell_type_ontology_term_id")
+    print_unique(adata, "dataset_id")
+    print_unique(adata, "development_stage")
+    print_unique(adata, "development_stage_ontology_term_id")
+    print_unique(adata, "disease")
+    print_unique(adata, "disease_ontology_term_id")
+    print_unique(adata, "tissue")
+    print_unique(adata, "tissue_ontology_term_id")
+    print_unique(adata, "tissue_general")
+    print_unique(adata, "tissue_general_ontology_term_id")
+
+def write_anndata(adata, par):
+    logger.info("Writing AnnData object to '%s'", par["output"])
+
+    adata.write_h5ad(par["output"], compression=par["output_compression"])
+
+def main(par, meta):
+    adata = get_anndata(par)
+
+    logger.info("AnnData: %s", str(adata))
+
+    # remove cells with few counts and genes with few counts
+    filter_by_counts(adata, par)
+
+    # this is not needed in source h5ads
+    # # use feature_id as var_names
+    # adata.var_names = adata.var["feature_id"]
+
+    # move .X to .layers["counts"]
+    move_x_to_layers(adata)
+
+    # add batch to obs
+    add_batch_to_obs(adata, par)
+
+    # add metadata to uns
+    add_metadata_to_uns(adata, par)
+
+    # print summary
+    print_summary(adata)
+
+    # write output to file
+    write_anndata(adata, par)
+
+
+if __name__ == "__main__":
+    main(par, meta)
diff --git a/src/datasets/loaders/cellxgene_census_from_source_h5ad/test.py b/src/datasets/loaders/cellxgene_census_from_source_h5ad/test.py
new file mode 100644
index 0000000000..098e8017a9
--- /dev/null
+++ b/src/datasets/loaders/cellxgene_census_from_source_h5ad/test.py
@@ -0,0 +1,58 @@
+import sys
+import os
+import pytest
+import anndata as ad
+import numpy as np
+
+## VIASH START
+meta = {
+    'resources_dir': './resources_test/',
+    'executable': './target/docker/datasets/loaders/cellxgene_census_from_source_h5ad/cellxgene_census_from_source_h5ad',
+    'config': 'src/query/cellxgene_census/config.vsh.yaml'
+}
+## VIASH END
+
+def test_cellxgene_extract_metadata_expression(run_component, tmp_path):
+    output_file = tmp_path / "output.h5ad"
+
+    run_component([
+        "--input_id", "0895c838-e550-48a3-a777-dbcd35d30272",
+        "--output", output_file,
+        "--obs_batch", "donor_id",
+        "--dataset_id", "test_dataset_id",
+        "--dataset_name", "test_dataset_name",
+        "--dataset_url", "https://test_dataset_url.com",
+        "--dataset_reference", "test_dataset_reference",
+        "--dataset_summary", "test_dataset_summary",
+        "--dataset_description", "test_dataset_description",
+        "--dataset_organism", "test_homo_sapiens",
+    ])
+
+    # check whether file exists
+    assert os.path.exists(output_file), "Output file does not exist"
+
+    adata = ad.read_h5ad(output_file)
+
+    # check obs
+    assert not adata.obs.empty, ".obs should not be empty"
+    assert "is_primary_data" in adata.obs.columns
+    assert "cell_type_ontology_term_id" in adata.obs.columns
+    assert "disease" in adata.obs.columns
+    assert adata.n_obs > 10
+    assert np.all([x in ["C41", "C58", "C70", "C72"] for x in adata.obs["batch"]])
+
+    # check var
+    assert "feature_name" in adata.var.columns
+
+    # check uns
+    assert adata.uns["dataset_id"] == "test_dataset_id", "Incorrect .uns['dataset_id']"
+    assert adata.uns["dataset_name"] == "test_dataset_name", "Incorrect .uns['dataset_name']"
+    assert adata.uns["dataset_url"] == "https://test_dataset_url.com", "Incorrect .uns['dataset_url']"
+    assert adata.uns["dataset_reference"] == "test_dataset_reference", "Incorrect .uns['dataset_reference']"
+    assert adata.uns["dataset_summary"] == "test_dataset_summary", "Incorrect .uns['dataset_summary']"
+    assert adata.uns["dataset_description"] == "test_dataset_description", "Incorrect .uns['dataset_description']"
+    assert adata.uns["dataset_organism"] == "test_homo_sapiens", "Incorrect .uns['dataset_organism']"
+
+
+if __name__ == '__main__':
+    sys.exit(pytest.main([__file__]))
diff --git a/src/datasets/loaders/openproblems_neurips2021_bmmc/config.vsh.yaml b/src/datasets/loaders/openproblems_neurips2021_bmmc/config.vsh.yaml
new file mode 100644
index 0000000000..96dad30e76
--- /dev/null
+++ b/src/datasets/loaders/openproblems_neurips2021_bmmc/config.vsh.yaml
@@ -0,0 +1,74 @@
+functionality:
+  name: "openproblems_neurips2021_bmmc"
+  namespace: "datasets/loaders"
+  description: "Fetch a dataset from the OpenProblems NeurIPS2021 competition"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input"
+          type: file
+          description: Processed h5ad file published at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122.
+          required: true
+          example: GSE194122_openproblems_neurips2021_cite_BMMC_processed.h5ad
+        - name: "--mod1"
+          type: string
+          description: Name of the first modality.
+          required: true
+          example: GEX
+        - name: "--mod2"
+          type: string
+          description: Name of the second modality.
+          required: true
+          example: ADT
+    - name: Metadata
+      arguments:
+        - name: "--dataset_id"
+          type: string
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "--dataset_name"
+          type: string
+          description: Nicely formatted name.
+          required: true
+        - name: "--dataset_url"
+          type: string
+          description: Link to the original source of the dataset.
+          required: false
+        - name: "--dataset_reference"
+          type: string
+          description: Bibtex reference of the paper in which the dataset was published.
+          required: false
+        - name: "--dataset_summary"
+          type: string
+          description: Short description of the dataset.
+          required: true
+        - name: "--dataset_description"
+          type: string
+          description: Long description of the dataset.
+          required: true
+        - name: "--dataset_organism"
+          type: string
+          description: The organism of the dataset.
+          required: false
+    - name: Outputs
+      arguments:
+        - name: "--output_mod1"
+          __merge__: ../../api/file_raw.yaml
+          direction: "output"
+        - name: "--output_mod2"
+          __merge__: ../../api/file_raw.yaml
+          direction: "output"
+  resources:
+    - type: python_script
+      path: script.py
+  test_resources:
+    - type: python_script
+      path: test.py
+    # - type: file
+    #   path: /resources_test/common/openproblems_neurips2021/neurips2021_bmmc_cite.h5ad
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [highmem, midcpu, midtime]
\ No newline at end of file
diff --git a/src/datasets/loaders/openproblems_neurips2021_bmmc/script.py b/src/datasets/loaders/openproblems_neurips2021_bmmc/script.py
new file mode 100644
index 0000000000..de62f039f6
--- /dev/null
+++ b/src/datasets/loaders/openproblems_neurips2021_bmmc/script.py
@@ -0,0 +1,126 @@
+import anndata as ad
+import pandas as pd
+import numpy as np
+from scipy import sparse
+
+## VIASH START
+par = {
+  "input": "GSE194122_openproblems_neurips2021_cite_BMMC_processed.h5ad",
+  "mod1": "GEX",
+  "mod2": "ATAC",
+  "dataset_id": "openproblems/neurips2021_bmmc",
+  "dataset_name": "BMMC (CITE-seq)",
+  "dataset_url": "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122",
+  "dataset_reference": "Neurips",
+  "dataset_summary": "value",
+  "dataset_description": "value",
+  "dataset_organism": "homo_sapiens",
+  "output_mod1": "output/mod1.h5ad",
+  "output_mod2": "output/mod2.h5ad"
+}
+meta = {
+  "functionality_name": "openproblems_neurips2021_bmmc",
+  "resources_dir": "/tmp/viash_inject_openproblems_neurips2021_bmmc14365472827677740971", 
+}
+## VIASH END
+
+def remove_mod_col(df, mod):
+  df.drop(list(df.filter(like=mod)), axis=1, inplace=True)
+
+def remove_mod_prefix(df, mod):
+  suffix = f"{mod}_"
+  df.columns = df.columns.str.removeprefix(suffix)
+
+def convert_matrix(adata):
+  for key in adata:
+      if isinstance(adata[key], sparse.csr_matrix):
+        adata[key] = sparse.csc_matrix(adata[key])
+      
+
+print("load dataset file", flush=True)
+adata = ad.read_h5ad(par["input"])
+
+# Convert to sparse csc_matrix
+convert_matrix(adata.layers)
+convert_matrix(adata.obsm)
+
+# Add is_train to obs if it is missing
+if "is_train" not in adata.obs.columns:
+  batch_info = adata.obs["batch"]
+  batch_categories = batch_info.dtype.categories
+  # From https://github.com/openproblems-bio/neurips2021_multimodal_viash/blob/75281c039ab98b459edbf52058a18597e710ed4d/src/common/datasets/process_inhouse_datasets/script.R#L14-L17
+  train = ["s1d1", "s1d2", "s2d1", "s2d4", "s3d1", "s3d6", "s3d7"]
+  adata.obs["is_train"] = [ "train" if x in train else "test" for x in batch_info ]
+
+# Construct Modality datasets
+print("Construct Mod datasets", flush=True)
+mask_mod1 = adata.var['feature_types'] == par["mod1"]
+mask_mod2 = adata.var['feature_types'] == par["mod2"]
+
+adata_mod1 = adata[:, mask_mod1]
+adata_mod2 = adata[:, mask_mod2]
+
+# Remove other modality data from obs and var
+mod1_var = pd.DataFrame(adata_mod1.var)
+remove_mod_col(mod1_var, par["mod2"])
+remove_mod_prefix(mod1_var, par["mod1"])
+mod1_var.index.name = "feature_name"
+mod1_var.reset_index("feature_name", inplace=True)
+mod1_var["feature_id"] = np.where(mod1_var.gene_id.isna(), mod1_var.feature_name, mod1_var.gene_id.astype(str))
+mod1_var.drop("gene_id", axis=1, inplace=True)
+mod1_var.set_index("feature_id", drop=False, inplace=True)
+
+mod1_obs = pd.DataFrame(adata_mod1.obs)
+remove_mod_col(mod1_obs, par["mod2"])
+remove_mod_prefix(mod1_obs, par["mod1"])
+
+adata_mod1.var = mod1_var
+adata_mod1.obs = mod1_obs
+
+adata_mod1.uns = { key.replace(f"{par['mod1']}_", ""): value for key, value in adata.uns.items() if not key.startswith(par['mod2'])}
+del adata_mod1.obsm
+del adata_mod1.X
+
+mod2_var = pd.DataFrame(adata_mod2.var)
+remove_mod_col(mod2_var, par["mod1"])
+remove_mod_prefix(mod2_var, par["mod2"])
+mod2_var.index.name = "feature_name"
+mod2_var.reset_index("feature_name", inplace=True)
+mod2_var["feature_id"] = np.where(mod2_var.gene_id.isna(), mod2_var.feature_name, mod2_var.gene_id.astype(str))
+mod2_var.drop("gene_id", axis=1, inplace=True)
+mod2_var.set_index("feature_id", drop=False, inplace=True)
+
+mod2_obs = pd.DataFrame(adata_mod2.obs)
+remove_mod_col(mod2_obs, par["mod1"])
+remove_mod_prefix(mod2_obs, par["mod2"])
+
+adata_mod2.var = mod2_var
+adata_mod2.obs = mod2_obs
+
+adata_mod2.uns = { key.replace(f"{par['mod2']}_", ""): value for key, value in adata.uns.items() if not key.startswith(par['mod1'])}
+if par["mod2"] == "ATAC":
+  adata_mod2.obsm = { key.replace(f"{par['mod2']}_", ""): value for key, value in adata_mod2.obsm.items() if key.startswith(par['mod2'])}
+else:
+  del adata_mod2.obsm
+
+
+del adata_mod2.X
+
+print("Add metadata to uns", flush=True)
+metadata_fields = [
+    "dataset_id", "dataset_name", "dataset_url", "dataset_reference",
+    "dataset_summary", "dataset_description", "dataset_organism"
+]
+for key in metadata_fields:
+    if key in par:
+        print(f"  Setting .uns['{key}']", flush=True)
+        adata_mod1.uns[key] = par[key]
+        adata_mod2.uns[key] = par[key]
+
+print("Writing adata to file", flush=True)
+adata_mod1.write_h5ad(par["output_mod1"], compression="gzip")
+adata_mod2.write_h5ad(par["output_mod2"], compression="gzip")
+
+
+
+
diff --git a/src/datasets/loaders/openproblems_neurips2021_bmmc/test.py b/src/datasets/loaders/openproblems_neurips2021_bmmc/test.py
new file mode 100644
index 0000000000..b194a52fe4
--- /dev/null
+++ b/src/datasets/loaders/openproblems_neurips2021_bmmc/test.py
@@ -0,0 +1,93 @@
+from os import path
+import subprocess
+import anndata as ad
+
+input = "neurips2021_bmmc_cite.h5ad"
+mod1 = "GEX"
+mod2 = "ADT"
+
+output_mod1_file = "output_mod1.h5ad"
+output_mod2_file = "output_mod2.h5ad"
+
+input_url = "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE194nnn/GSE194122/suppl/GSE194122%5Fopenproblems%5Fneurips2021%5Fcite%5FBMMC%5Fprocessed%2Eh5ad%2Egz"
+
+# download input
+print(">> Downloading input", flush=True)
+out = subprocess.run(
+  [
+    "wget",
+    "-O", input + ".gz",
+    input_url,
+  ],
+  stderr=subprocess.STDOUT
+)
+# unzip input
+print(">> Unzipping input", flush=True)
+out = subprocess.run(
+  [
+    "gunzip",
+    input + ".gz",
+  ],
+  stderr=subprocess.STDOUT
+)
+
+print(">> Running script", flush=True)
+out = subprocess.run(
+  [
+    meta["executable"],
+    "--input", input,
+    "--mod1", mod1,
+    "--mod2", mod2,
+    "--output_mod1", output_mod1_file,
+    "--output_mod2", output_mod2_file,
+    "--dataset_id", "openproblems/neurips2021_bmmc",
+    "--dataset_name", "BMMC (Multiome)",
+    "--dataset_url", "http://foo.org",
+    "--dataset_reference", "foo2000bar",
+    "--dataset_summary", "A short summary.",
+    "--dataset_description", "A couple of paragraphs worth of text.",
+    "--dataset_organism", "homo_sapiens",
+  ],
+  stderr=subprocess.STDOUT
+)
+
+if out.stdout:
+    print(out.stdout, flush=True)
+
+if out.returncode:
+    print(f"script: '{out.args}' exited with an error.", flush=True)
+    exit(out.returncode)
+
+print(">> Checking whether files exist", flush=True)
+assert path.exists(output_mod1_file), "Output mod1 file does not exist"
+assert path.exists(output_mod2_file), "Output mod2 file does not exist"
+
+print(">> Read output anndata", flush=True)
+output_mod1 = ad.read_h5ad(output_mod1_file)
+output_mod2 = ad.read_h5ad(output_mod2_file)
+
+print(f"output_mod1: {output_mod1}", flush=True)
+print(f"output_mod2: {output_mod2}", flush=True)
+
+print(">> Check that output mod1 fits expected API", flush=True)
+assert output_mod1.X is None, ".X is not None/empty in mod 1 output"
+assert "counts" in output_mod1.layers, "'counts' not found in mod 1 output layers" 
+assert "cell_type" in output_mod1.obs.columns, "cell_type column not found in mod 1 output obs"
+assert "batch" in output_mod1.obs.columns, "batch column not found in mod 1 output obs"
+assert output_mod1.uns["dataset_name"] == "BMMC (Multiome)", "Expected: Pancreas as value for dataset_name in mod 1 output uns"
+assert output_mod1.uns["dataset_url"] == "http://foo.org", "Expected: http://foo.org as value for dataset_url in mod 1 output uns"
+assert output_mod1.uns["dataset_reference"] == "foo2000bar", "Expected: foo2000bar as value for dataset_reference in mod 1 output uns"
+assert output_mod1.uns["dataset_summary"] == "A short summary.", "Expected: A short summary. as value for dataset_summary in mod 1 output uns"
+assert output_mod1.uns["dataset_description"] == "A couple of paragraphs worth of text.", "Expected: A couple of paragraphs worth of text. as value for dataset_description in mod 1 output uns"
+
+
+print(">> Check that output mod2 fits expected API", flush=True)
+assert output_mod2.X is None, ".X is not None/empty in mod 2 output"
+assert "counts" in output_mod2.layers, "'counts' not found in mod 2 output layers"
+assert "cell_type" in output_mod2.obs.columns, "cell_type column not found in mod 2 output obs"
+assert "batch" in output_mod2.obs.columns, "batch column not found in mod 2 output obs"
+assert output_mod2.uns["dataset_name"] == "BMMC (Multiome)", "Expected: Pancreas as value for dataset_name in mod 2 output uns"
+assert output_mod2.uns["dataset_url"] == "http://foo.org", "Expected: http://foo.org as value for dataset_url in mod 2 output uns"
+assert output_mod2.uns["dataset_reference"] == "foo2000bar", "Expected: foo2000bar as value for dataset_reference in mod 2 output uns"
+assert output_mod2.uns["dataset_summary"] == "A short summary.", "Expected: A short summary. as value for dataset_summary in mod 2 output uns"
+assert output_mod2.uns["dataset_description"] == "A couple of paragraphs worth of text.", "Expected: A couple of paragraphs worth of text. as value for dataset_description in mod 2 output uns"
\ No newline at end of file
diff --git a/src/datasets/loaders/openproblems_neurips2022_pbmc/config.vsh.yaml b/src/datasets/loaders/openproblems_neurips2022_pbmc/config.vsh.yaml
new file mode 100644
index 0000000000..b2141482f1
--- /dev/null
+++ b/src/datasets/loaders/openproblems_neurips2022_pbmc/config.vsh.yaml
@@ -0,0 +1,80 @@
+functionality:
+  name: "openproblems_neurips2022_pbmc"
+  namespace: "datasets/loaders"
+  description: "Fetch a dataset from the OpenProblems NeurIPS2022 competition"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input_mod1"
+          type: file
+          description: "Processed RNA h5ad file"
+          required: true
+          example: cite_rna_merged.h5ad
+        - name: "--input_mod2"
+          type: file
+          description: "Processed ADT or ATAC h5ad file"
+          required: true
+          example: cite_prot_merged.h5ad
+        - name: "--mod1"
+          type: string
+          description: Name of the first modality.
+          required: true
+          example: GEX
+        - name: "--mod2"
+          type: string
+          description: Name of the second modality.
+          required: true
+          example: ADT
+    - name: Metadata
+      arguments:
+        - name: "--dataset_id"
+          type: string
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "--dataset_name"
+          type: string
+          description: Nicely formatted name.
+          required: true
+        - name: "--dataset_url"
+          type: string
+          description: Link to the original source of the dataset.
+          required: false
+        - name: "--dataset_reference"
+          type: string
+          description: Bibtex reference of the paper in which the dataset was published.
+          required: false
+        - name: "--dataset_summary"
+          type: string
+          description: Short description of the dataset.
+          required: true
+        - name: "--dataset_description"
+          type: string
+          description: Long description of the dataset.
+          required: true
+        - name: "--dataset_organism"
+          type: string
+          description: The organism of the dataset.
+          required: false
+    - name: Outputs
+      arguments:
+        - name: "--output_mod1"
+          __merge__: ../../api/file_raw.yaml
+          direction: "output"
+        - name: "--output_mod2"
+          __merge__: ../../api/file_raw.yaml
+          direction: "output"
+  resources:
+    - type: python_script
+      path: script.py
+  # skip unit test until data is public
+  # test_resources:
+  #   - type: python_script
+  #     path: test.py
+    # - type: file
+    #   path: /resources_test/common/openproblems_neurips2021/neurips2021_bmmc_cite.h5ad
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [ highmem, midcpu, midtime]
\ No newline at end of file
diff --git a/src/datasets/loaders/openproblems_neurips2022_pbmc/script.py b/src/datasets/loaders/openproblems_neurips2022_pbmc/script.py
new file mode 100644
index 0000000000..d0dd855b55
--- /dev/null
+++ b/src/datasets/loaders/openproblems_neurips2022_pbmc/script.py
@@ -0,0 +1,94 @@
+import anndata as ad
+from scipy import sparse
+
+## VIASH START
+par = {
+  "input_mod1": "cite_rna_merged.h5ad",
+  "input_mod2": "cite_prot_merged.h5ad",
+  "mod1": "GEX",
+  "mod2": "ADT",
+  "dataset_id": "openproblems/neurips2022_pbmc",
+  "dataset_name": "Kaggle22 PBMC (CITE-seq)",
+  "dataset_url": "https://www.kaggle.com/competitions/open-problems-multimodal/data",
+  "dataset_reference": "Neurips22",
+  "dataset_summary": "Neurips22 competition dataset",
+  "dataset_description": "The dataset for this competition comprises single-cell multiomics data collected from mobilized peripheral CD34+ hematopoietic stem and progenitor cells (HSPCs) isolated from four healthy human donors.",
+  "dataset_organism": "homo_sapiens",
+  "output_mod1": "output/mod1.h5ad",
+  "output_mod2": "output/mod2.h5ad"
+}
+meta = {
+  "functionality_name": "openproblems_neurips2022_pbmc",
+}
+## VIASH END
+
+
+def convert_matrix(adata):
+  for key in adata:
+      if isinstance(adata[key], sparse.csr_matrix):
+        adata[key] = sparse.csc_matrix(adata[key])
+      
+
+print("load dataset modality 1 file", flush=True)
+adata_mod1 = ad.read_h5ad(par["input_mod1"])
+
+print("load dataset modality 2 file", flush=True)
+adata_mod2 = ad.read_h5ad(par["input_mod2"])
+
+# Convert to sparse csc_matrix
+convert_matrix(adata_mod1.layers)
+convert_matrix(adata_mod1.obsm)
+convert_matrix(adata_mod2.layers)
+convert_matrix(adata_mod2.obsm)
+
+
+# Add is_train to obs (modality 1)
+if "is_train" not in adata_mod1.obs.columns:
+    split_info = adata_mod1.obs["kaggle_dataset"]
+    train_sets = ["train", "test_public"]
+    adata_mod1.obs["is_train"] = [ "train" if x in train_sets else "test" for x in split_info ]
+
+# Add is_train to obs if it is missing (modality 2)
+if "is_train" not in adata_mod2.obs.columns:
+    split_info = adata_mod2.obs["kaggle_dataset"]
+    train_sets = ["train", "test_public"]
+    adata_mod2.obs["is_train"] = [ "train" if x in train_sets else "test" for x in split_info ]
+
+
+# split up index in modality 1 into feature ID and feature name
+adata_mod1.var['feature_id'] = [str(s).split('_')[0] for s in adata_mod1.var.index.tolist()]
+# TODO: index does not always contain an underscore.
+if "_" in adata_mod1.var.index[0]:
+  adata_mod1.var['feature_name'] = [str(s).split('_')[1] for s in adata_mod1.var.index.tolist()]
+adata_mod1.var.set_index('feature_id',drop=False, inplace=True)
+
+# set feature_name (proteins have only partial ensemble IDs))
+adata_mod2.var['feature_id'] = adata_mod2.var.index.tolist() # feature id needs to be filled in
+adata_mod2.var['feature_name'] = adata_mod2.var.index.tolist()
+adata_mod2.var.set_index('feature_name',drop=False, inplace=True)
+
+
+# remove adata.X
+del adata_mod1.X
+del adata_mod2.X
+
+
+print("Add metadata to uns", flush=True)
+metadata_fields = [
+    "dataset_id", "dataset_name", "dataset_url", "dataset_reference",
+    "dataset_summary", "dataset_description", "dataset_organism"
+]
+for key in metadata_fields:
+    if key in par:
+        print(f"  Setting .uns['{key}']", flush=True)
+        adata_mod1.uns[key] = par[key]
+        adata_mod2.uns[key] = par[key]
+
+
+print("Writing adata to file", flush=True)
+adata_mod1.write_h5ad(par["output_mod1"], compression="gzip")
+adata_mod2.write_h5ad(par["output_mod2"], compression="gzip")
+
+
+
+
diff --git a/src/datasets/loaders/openproblems_neurips2022_pbmc/test.py b/src/datasets/loaders/openproblems_neurips2022_pbmc/test.py
new file mode 100644
index 0000000000..3bb5c677eb
--- /dev/null
+++ b/src/datasets/loaders/openproblems_neurips2022_pbmc/test.py
@@ -0,0 +1,100 @@
+from os import path
+import subprocess
+import anndata as ad
+
+# TODO: update once data is public
+
+input_mod1 = "cite_rna_merged.h5ad" #change data set path after loading manually?
+input_mod2 = "cite_prot_merged.h5ad" #change data set path after loading manually?
+mod1 = "GEX"
+mod2 = "ADT"
+
+output_mod1_file = "output_mod1.h5ad"
+output_mod2_file = "output_mod2.h5ad"
+
+input_url_mod1 = "s3://openproblems-nextflow/datasets_private/neurips2022/cite_rna_merged.h5ad"
+input_url_mod2 = "s3://openproblems-nextflow/datasets_private/neurips2022/cite_prot_merged.h5ad"
+
+# download input
+# print(">> Downloading input modality 1", flush=True)
+# out = subprocess.run(
+#   [
+#     "aws s3 cp",
+#     "-O", input_mod1,
+#     input_url_mod1,
+#   ],
+#   stderr=subprocess.STDOUT
+# )
+
+# print(">> Downloading input modality 2", flush=True)
+# out = subprocess.run(
+#   [
+#     "aws s3 cp",
+#     "-O", input_mod2,
+#     input_url_mod2,
+#   ],
+#   stderr=subprocess.STDOUT
+# )
+
+
+print(">> Running script", flush=True)
+out = subprocess.run(
+  [
+    meta["executable"],
+    "--input_mod1", input_mod1,
+    "--input_mod2", input_mod2,
+    "--mod1", mod1,
+    "--mod2", mod2,
+    "--output_mod1", output_mod1_file,
+    "--output_mod2", output_mod2_file,
+    "--dataset_id", "openproblems/neurips2021_bmmc",
+    "--dataset_name", "Kaggle22 PBMC (CITE-seq)",
+    "--dataset_url", "https://www.kaggle.com/competitions/open-problems-multimodal/data",
+    "--dataset_reference", "Neurips22",
+    "--dataset_summary", "Neurips22 competition dataset",
+    "--dataset_description", "The dataset for this competition comprises single-cell multiomics data collected from mobilized peripheral CD34+ hematopoietic stem and progenitor cells (HSPCs) isolated from four healthy human donors.",
+    "--dataset_organism", "homo_sapiens",
+  ],
+  stderr=subprocess.STDOUT
+)
+
+if out.stdout:
+    print(out.stdout, flush=True)
+
+if out.returncode:
+    print(f"script: '{out.args}' exited with an error.", flush=True)
+    exit(out.returncode)
+
+print(">> Checking whether files exist", flush=True)
+assert path.exists(output_mod1_file), "Output mod1 file does not exist"
+assert path.exists(output_mod2_file), "Output mod2 file does not exist"
+
+print(">> Read output anndata", flush=True)
+output_mod1 = ad.read_h5ad(output_mod1_file)
+output_mod2 = ad.read_h5ad(output_mod2_file)
+
+print(f"output_mod1: {output_mod1}", flush=True)
+print(f"output_mod2: {output_mod2}", flush=True)
+
+print(">> Check that output mod1 fits expected API", flush=True)
+assert output_mod1.X is None, ".X is not None/empty in mod 1 output"
+assert "counts" in output_mod1.layers, "'counts' not found in mod 1 output layers" 
+assert "cell_type" in output_mod1.obs.columns, "cell_type column not found in mod 1 output obs"
+assert "batch" in output_mod1.obs.columns, "batch column not found in mod 1 output obs"
+assert output_mod1.uns["dataset_name"] == "Kaggle22 PBMC (CITE-seq)", "Expected: Kaggle22 PBMC (CITE-seq) as value for dataset_name in mod 1 output uns"
+assert output_mod1.uns["dataset_url"] == "https://www.kaggle.com/competitions/open-problems-multimodal/data", "Expected: https://www.kaggle.com/competitions/open-problems-multimodal/data as value for dataset_url in mod 1 output uns"
+assert output_mod1.uns["dataset_reference"] == "Neurips22", "Expected: Neurips22 as value for dataset_reference in mod 1 output uns"
+assert output_mod1.uns["dataset_summary"] == "Neurips22 competition dataset", "Expected: Neurips22 competition dataset as value for dataset_summary in mod 1 output uns"
+assert output_mod1.uns["dataset_description"] == "The dataset for this competition comprises single-cell multiomics data collected from mobilized peripheral CD34+ hematopoietic stem and progenitor cells (HSPCs) isolated from four healthy human donors.", "Expected: The dataset for this competition comprises single-cell multiomics data collected from mobilized peripheral CD34+ hematopoietic stem and progenitor cells (HSPCs) isolated from four healthy human donors. as value for dataset_description in mod 1 output uns"
+
+
+print(">> Check that output mod2 fits expected API", flush=True)
+assert output_mod2.X is None, ".X is not None/empty in mod 2 output"
+assert "counts" in output_mod2.layers, "'counts' not found in mod 2 output layers"
+assert "cell_type" in output_mod2.obs.columns, "cell_type column not found in mod 2 output obs"
+assert "batch" in output_mod2.obs.columns, "batch column not found in mod 2 output obs"
+assert output_mod2.uns["dataset_name"] == "Kaggle22 PBMC (CITE-seq)", "Expected: Kaggle22 PBMC (CITE-seq) as value for dataset_name in mod 2 output uns"
+assert output_mod2.uns["dataset_url"] == "https://www.kaggle.com/competitions/open-problems-multimodal/data", "Expected: https://www.kaggle.com/competitions/open-problems-multimodal/data as value for dataset_url in mod 2 output uns"
+assert output_mod2.uns["dataset_reference"] == "Neurips22", "Expected: Neurips22 as value for dataset_reference in mod 2 output uns"
+assert output_mod2.uns["dataset_summary"] == "Neurips22 competition dataset", "Expected: Neurips22 competition dataset as value for dataset_summary in mod 2 output uns"
+assert output_mod2.uns["dataset_description"] == "The dataset for this competition comprises single-cell multiomics data collected from mobilized peripheral CD34+ hematopoietic stem and progenitor cells (HSPCs) isolated from four healthy human donors.", "Expected: The dataset for this competition comprises single-cell multiomics data collected from mobilized peripheral CD34+ hematopoietic stem and progenitor cells (HSPCs) isolated from four healthy human donors. as value for dataset_description in mod 2 output uns"
\ No newline at end of file
diff --git a/src/datasets/loaders/openproblems_v1/config.vsh.yaml b/src/datasets/loaders/openproblems_v1/config.vsh.yaml
new file mode 100644
index 0000000000..d3a3ad846f
--- /dev/null
+++ b/src/datasets/loaders/openproblems_v1/config.vsh.yaml
@@ -0,0 +1,86 @@
+__merge__: ../../api/comp_dataset_loader.yaml
+functionality:
+  name: "openproblems_v1"
+  description: "Fetch a dataset from OpenProblems v1"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input_id"
+          type: "string"
+          description: "The ID of the dataset in OpenProblems v1"
+          required: true
+        - name: "--obs_cell_type"
+          type: "string"
+          description: "Location of where to find the observation cell types."
+        - name: "--obs_batch"
+          type: "string"
+          description: "Location of where to find the observation batch IDs."
+        - name: "--obs_tissue"
+          type: "string"
+          description: "Location of where to find the observation tissue information."
+        - name: "--layer_counts"
+          type: "string"
+          description: "In which layer to find the counts matrix. Leave undefined to use `.X`."
+          example: counts
+        - name: "--sparse"
+          type: boolean
+          default: true
+          description: Convert layers to a sparse CSR format.
+        - name: "--var_feature_id"
+          type: "string"
+          description: "Location of where to find the feature IDs. Can be set to index if the feature IDs are the index."
+          example: gene_ids
+        - name: "--var_feature_name"
+          type: "string"
+          description: "Location of where to find the feature names. Can be set to index if the feature names are the index."
+          default: index
+    - name: Metadata
+      arguments:
+        - name: "--dataset_id"
+          type: string
+          description: Unique identifier of the dataset.
+          required: true
+        - name: "--dataset_name"
+          type: string
+          description: Nicely formatted name.
+          required: true
+        - name: "--dataset_url"
+          type: string
+          description: Link to the original source of the dataset.
+          required: false
+        - name: "--dataset_reference"
+          type: string
+          description: Bibtex reference of the paper in which the dataset was published.
+          required: false
+        - name: "--dataset_summary"
+          type: string
+          description: Short description of the dataset.
+          required: true
+        - name: "--dataset_description"
+          type: string
+          description: Long description of the dataset.
+          required: true
+        - name: "--dataset_organism"
+          type: string
+          description: The organism of the dataset.
+          required: false
+  resources:
+    - type: python_script
+      path: script.py
+  test_resources:
+    - type: python_script
+      path: test.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: apt
+        packages: git
+      - type: docker
+        run: |
+          git clone -b 'v0.8.0' --depth 1 https://github.com/openproblems-bio/openproblems.git /opt/openproblems && \
+            pip install --no-cache-dir -r /opt/openproblems/docker/openproblems/requirements.txt && \
+            pip install --no-cache-dir --editable /opt/openproblems
+  - type: nextflow
+    directives:
+      label: [highmem, midcpu , midtime]
diff --git a/src/datasets/loaders/openproblems_v1/script.py b/src/datasets/loaders/openproblems_v1/script.py
new file mode 100644
index 0000000000..2cdae43a74
--- /dev/null
+++ b/src/datasets/loaders/openproblems_v1/script.py
@@ -0,0 +1,128 @@
+from typing import Any, Callable, Dict, Tuple
+import openproblems as op
+import scanpy as sc
+import scipy
+
+## VIASH START
+par = {
+    "input_id": "pancreas",
+    "dataset_id": "pancreas",
+    "obs_cell_type": "cell_type",
+    "obs_batch": "tech",
+    "obs_tissue": "tissue",
+    "layer_counts": "counts",
+    "output": "test_data.h5ad",
+}
+meta = {
+    "resources_dir": "src/datasets/loaders/openproblems_v1/"
+}
+## VIASH END
+
+# make dataset lookup table
+# If need be, this could be stored in a separate yaml file
+dataset_funs: Dict[str, Tuple[Callable, Dict[str, Any]]] = {
+    "allen_brain_atlas": (op.data.allen_brain_atlas.load_mouse_brain_atlas, {}),
+    "cengen": (op.data.cengen.load_cengen, {}),
+    "immune_cells": (op.data.immune_cells.load_immune, {}),
+    "mouse_blood_olsson_labelled": (op.data.mouse_blood_olsson_labelled.load_olsson_2016_mouse_blood, {}),
+    "mouse_hspc_nestorowa2016": (op.data.mouse_hspc_nestorowa2016.load_mouse_hspc_nestorowa2016, {}),
+    "pancreas": (op.data.pancreas.load_pancreas, {}),
+    # "tabula_muris_senis": op.data.tabula_muris_senis.load_tabula_muris_senis,
+    "tabula_muris_senis_droplet_lung": (
+        op.data.tabula_muris_senis.load_tabula_muris_senis,
+        {"organ_list": ["lung"], "method_list": ["droplet"]}
+    ),
+    "tenx_1k_pbmc": (op.data.tenx.load_tenx_1k_pbmc, {}),
+    "tenx_5k_pbmc": (op.data.tenx.load_tenx_5k_pbmc, {}),
+    "tnbc_wu2021": (op.data.tnbc_wu2021.load_tnbc_data, {}),
+    "zebrafish": (op.data.zebrafish.load_zebrafish, {})
+}
+
+# fetch dataset
+dataset_fun, kwargs = dataset_funs[par["input_id"]]
+
+print("Fetch dataset", flush=True)
+adata = dataset_fun(**kwargs)
+
+# override values one by one because adata.uns and
+# metadata are two different classes.
+for key, value in dataset_fun.metadata.items():
+    print(f"Setting .uns['{key}']", flush=True)
+    adata.uns[key] = value
+
+print("Setting .obs['cell_type']", flush=True)
+if par["obs_cell_type"]:
+    if par["obs_cell_type"] in adata.obs:
+        adata.obs["cell_type"] = adata.obs[par["obs_cell_type"]]
+    else:
+        print(f"Warning: key '{par['obs_cell_type']}' could not be found in adata.obs.", flush=True)
+
+print("Setting .obs['batch']", flush=True)
+if par["obs_batch"]:
+    if par["obs_batch"] in adata.obs:
+        adata.obs["batch"] = adata.obs[par["obs_batch"]]
+    else:
+        print(f"Warning: key '{par['obs_batch']}' could not be found in adata.obs.", flush=True)
+
+print("Setting .obs['tissue']", flush=True)
+if par["obs_tissue"]:
+    if par["obs_tissue"] in adata.obs:
+        adata.obs["tissue"] = adata.obs[par["obs_tissue"]]
+    else:
+        print(f"Warning: key '{par['obs_tissue']}' could not be found in adata.obs.", flush=True)
+
+if par["layer_counts"] and par["layer_counts"] in adata.layers:
+    print(f"Temporarily moving .layers['{par['layer_counts']}'] to .X", flush=True)
+    adata.X = adata.layers[par["layer_counts"]]
+    del adata.layers[par["layer_counts"]]
+
+if par["sparse"] and not scipy.sparse.issparse(adata.X):
+    print("Make counts sparse", flush=True)
+    adata.X = scipy.sparse.csr_matrix(adata.X)
+
+print("Removing empty genes", flush=True)
+sc.pp.filter_genes(adata, min_cells=1)
+
+print("Removing empty cells", flush=True)
+sc.pp.filter_cells(adata, min_counts=2)
+
+print("Moving .X to .layers['counts']", flush=True)
+adata.layers["counts"] = adata.X
+del adata.X
+
+print("Add metadata to uns", flush=True)
+metadata_fields = [
+    "dataset_id", "dataset_name", "dataset_url", "dataset_reference",
+    "dataset_summary", "dataset_description", "dataset_organism"
+]
+uns_metadata = {
+    id: par[id]
+    for id in metadata_fields
+    if id in par
+}
+adata.uns.update(uns_metadata)
+
+print("Setting .var['feature_name']", flush=True)
+
+if par["var_feature_name"] == "index":
+    adata.var["feature_name"] = adata.var.index
+else:
+    if par["var_feature_name"] in adata.var:
+        adata.var["feature_name"] = adata.var[par["feature_name"]]
+        del adata.var[par["feature_name"]]
+    else:
+        print(f"Warning: key '{par['var_feature_name']}' could not be found in adata.var.", flush=True)
+
+print("Setting .var['feature_id']", flush=True)
+
+if par["var_feature_id"] == "index":
+    adata.var["feature_id"] = adata.var.index
+else:
+    if par["var_feature_id"] in adata.var:
+        adata.var["feature_id"] = adata.var[par["feature_id"]]
+        del adata.var[par["feature_id"]]
+    else:
+        print(f"Warning: key '{par['var_feature_id']}' could not be found in adata.var.", flush=True)
+
+print("Writing adata to file", flush=True)
+adata.write_h5ad(par["output"], compression="gzip")
diff --git a/src/datasets/loaders/openproblems_v1/test.py b/src/datasets/loaders/openproblems_v1/test.py
new file mode 100644
index 0000000000..f1b0389837
--- /dev/null
+++ b/src/datasets/loaders/openproblems_v1/test.py
@@ -0,0 +1,55 @@
+from os import path
+import subprocess
+import anndata as ad
+
+input_id = "pancreas"
+dataset_id = "openproblems_v1/" + input_id
+output = "dataset.h5ad"
+obs_cell_type = "celltype"
+obs_batch = "tech"
+
+print(">> Running script", flush=True)
+out = subprocess.run(
+    [
+        meta["executable"],
+        "--input_id", input_id,
+        "--dataset_id", dataset_id,
+        "--obs_cell_type", obs_cell_type,
+        "--obs_batch", obs_batch,
+        "--layer_counts", "counts",
+        "--output", output,
+        "--dataset_name", "Pancreas",
+        "--dataset_url", "http://foo.org",
+        "--dataset_reference", "foo2000bar",
+        "--dataset_summary", "A short summary.",
+        "--dataset_description", "A couple of paragraphs worth of text.",
+        "--dataset_organism", "homo_sapiens",
+    ],
+    stderr=subprocess.STDOUT
+)
+
+if out.stdout:
+    print(out.stdout)
+
+if out.returncode:
+    print(f"script: '{out.args}' exited with an error.")
+    exit(out.returncode)
+
+print(">> Checking whether file exists", flush=True)
+assert path.exists(output), "Output does not exist"
+
+print(">> Read output anndata", flush=True)
+adata = ad.read_h5ad(output)
+
+print(adata)
+
+print(">> Check that output fits expected API", flush=True)
+assert adata.X is None, "adata.X should be None/empty"
+assert "counts" in adata.layers, "Counts layer not found in output layers"
+assert adata.uns["dataset_id"] == dataset_id, f"Expected {dataset_id} as value"
+if obs_cell_type:
+    assert "cell_type" in adata.obs.columns, "'cell_type' column not found in obs of anndata output"
+if obs_batch:
+    assert "batch" in adata.obs.columns, "'batch' column not found in obs of anndata output"
+
+print(">> All tests passed successfully", flush=True)
diff --git a/src/datasets/loaders/openproblems_v1_multimodal/config.vsh.yaml b/src/datasets/loaders/openproblems_v1_multimodal/config.vsh.yaml
new file mode 100644
index 0000000000..6247ae3bf9
--- /dev/null
+++ b/src/datasets/loaders/openproblems_v1_multimodal/config.vsh.yaml
@@ -0,0 +1,94 @@
+functionality:
+  name: "openproblems_v1_multimodal"
+  namespace: "datasets/loaders"
+  description: "Fetch a dataset from OpenProblems v1"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input_id"
+          type: "string"
+          description: "The ID of the dataset in OpenProblems v1"
+          required: true
+        - name: "--obs_cell_type"
+          type: "string"
+          description: "Location of where to find the observation cell types."
+        - name: "--obs_batch"
+          type: "string"
+          description: "Location of where to find the observation batch IDs."
+        - name: "--obs_tissue"
+          type: "string"
+          description: "Location of where to find the observation tissue information."
+        - name: "--layer_counts"
+          type: "string"
+          description: "In which layer to find the counts matrix. Leave undefined to use `.X`."
+          example: counts
+        - name: "--sparse"
+          type: boolean
+          default: true
+          description: Convert layers to a sparse CSR format.
+        - name: "--var_feature_id"
+          type: "string"
+          description: "Location of where to find the feature IDs. Can be set to index if the feature IDs are the index."
+          example: gene_ids
+        - name: "--var_feature_name"
+          type: "string"
+          description: "Location of where to find the feature names. Can be set to index if the feature names are the index."
+          default: index
+    - name: Metadata
+      arguments:
+        - name: "--dataset_id"
+          type: string
+          description: Unique identifier of the dataset.
+          required: true
+        - name: "--dataset_name"
+          type: string
+          description: Nicely formatted name.
+          required: true
+        - name: "--dataset_url"
+          type: string
+          description: Link to the original source of the dataset.
+          required: false
+        - name: "--dataset_reference"
+          type: string
+          description: Bibtex reference of the paper in which the dataset was published.
+          required: false
+        - name: "--dataset_summary"
+          type: string
+          description: Short description of the dataset.
+          required: true
+        - name: "--dataset_description"
+          type: string
+          description: Long description of the dataset.
+          required: true
+        - name: "--dataset_organism"
+          type: string
+          description: The organism of the dataset.
+          required: false
+    - name: Outputs
+      arguments:
+        - name: "--output_mod1"
+          __merge__: ../../api/file_raw.yaml
+          direction: "output"
+        - name: "--output_mod2"
+          __merge__: ../../api/file_raw.yaml
+          direction: "output"
+  resources:
+    - type: python_script
+      path: script.py
+  test_resources:
+    - type: python_script
+      path: test.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: apt
+        packages: git
+      - type: docker
+        run: |
+          git clone -b 'v0.8.0' --depth 1 https://github.com/openproblems-bio/openproblems.git /opt/openproblems && \
+            pip install --no-cache-dir -r /opt/openproblems/docker/openproblems/requirements.txt && \
+            pip install --no-cache-dir --editable /opt/openproblems
+  - type: nextflow
+    directives:
+      label: [highmem, midcpu , midtime]
diff --git a/src/datasets/loaders/openproblems_v1_multimodal/script.py b/src/datasets/loaders/openproblems_v1_multimodal/script.py
new file mode 100644
index 0000000000..f70e92d048
--- /dev/null
+++ b/src/datasets/loaders/openproblems_v1_multimodal/script.py
@@ -0,0 +1,169 @@
+from typing import Any, Callable, Dict, Tuple
+import openproblems as op
+import scanpy as sc
+import scipy
+import pandas as pd
+
+## VIASH START
+par = {
+    "dataset_id": "scicar_mouse_kidney",
+    "obs_tissue": "source",
+    "obs_cell_type": "cell_type",
+    "layer_counts": "counts",
+    "output": "test_data.h5ad",
+    "dataset_name": "name",
+    "dataset_url": "https://some.url",
+    "dataset_reference": "reference",
+    "dataset_summary": "summary",
+    "dataset_description": "description",
+    "dataset_organism": "[homo_sapiens, mus_musculus]",
+}
+meta = {
+    "resources_dir": "src/datasets/loaders/openproblems_v1/"
+}
+## VIASH END
+
+
+# make dataset lookup table
+# If need be, this could be stored in a separate yaml file
+dataset_funs: Dict[str, Tuple[Callable, Dict[str, Any]]] = {
+    "citeseq_cbmc": (op.data.multimodal.citeseq.load_citeseq_cbmc, {}),
+    "scicar_cell_lines": (op.data.multimodal.scicar.load_scicar_cell_lines, {}),
+    "scicar_mouse_kidney": (op.data.multimodal.scicar.load_scicar_mouse_kidney, {}),
+}
+
+# fetch dataset
+dataset_fun, kwargs = dataset_funs[par["input_id"]]
+
+print("Fetch dataset", flush=True)
+adata = dataset_fun(**kwargs)
+
+print(f"source adata: {adata}", flush=True)
+
+# construct modality2 dataset
+mod2_var_data = {
+    key.replace("mode2_var_", ""): adata.uns[key]
+    for key in adata.uns.keys()
+    if key.startswith("mode2_var_")
+}
+mod2_var = pd.DataFrame(
+    mod2_var_data,
+    index=adata.uns["mode2_var"]
+)
+mod2_obs = adata.obs.loc[adata.uns["mode2_obs"]]
+mod2 = sc.AnnData(
+    obs=mod2_obs,
+    var=mod2_var,
+    layers={ "counts": adata.obsm["mode2"] }
+)
+
+# construct modality1 dataset
+mod1 = adata.copy()
+mod1.uns = { key: value for key, value in mod1.uns.items() if not key.startswith("mode2_")}
+mod1.obsm = { key: value for key, value in mod1.obsm.items() if not key.startswith("mode2_")}
+mod1.obsp = { key: value for key, value in mod1.obsp.items() if not key.startswith("mode2_")}
+mod1.varm = { key: value for key, value in mod1.varm.items() if not key.startswith("mode2_")}
+mod1.varp = { key: value for key, value in mod1.varp.items() if not key.startswith("mode2_")}
+
+# override values one by one because adata.uns and
+# metadata are two different classes.
+for key, value in dataset_fun.metadata.items():
+    print(f"Setting .uns['{key}']", flush=True)
+    mod1.uns[key] = value
+    mod2.uns[key] = value
+
+print("Setting .obs['cell_type']", flush=True)
+if par["obs_cell_type"]:
+    if par["obs_cell_type"] in mod1.obs:
+        mod1.obs["cell_type"] = mod1.obs[par["obs_cell_type"]]
+        mod2.obs["cell_type"] = mod2.obs[par["obs_cell_type"]]
+    else:
+        print(f"Warning: key '{par['obs_cell_type']}' could not be found in adata.obs.", flush=True)
+
+print("Setting .obs['batch']", flush=True)
+if par["obs_batch"]:
+    if par["obs_batch"] in mod1.obs:
+        mod1.obs["batch"] = mod1.obs[par["obs_batch"]]
+        mod2.obs["batch"] = mod2.obs[par["obs_batch"]]
+    else:
+        print(f"Warning: key '{par['obs_batch']}' could not be found in adata.obs.", flush=True)
+
+print("Setting .obs['tissue']", flush=True)
+if par["obs_tissue"]:
+    if par["obs_tissue"] in mod1.obs:
+        mod1.obs["tissue"] = mod1.obs[par["obs_tissue"]]
+        mod2.obs["tissue"] = mod2.obs[par["obs_tissue"]]
+    else:
+        print(f"Warning: key '{par['obs_tissue']}' could not be found in adata.obs.", flush=True)
+
+if par["layer_counts"] and par["layer_counts"] in mod1.layers:
+    print(f"Temporarily moving mod1.layers['{par['layer_counts']}']", flush=True)
+    mod1_X = mod1.layers[par["layer_counts"]]
+    del mod1.layers[par["layer_counts"]]
+else:
+    print("Temporarily moving mod1.X", flush=True)
+    mod1_X = mod1.X
+    del mod1.X
+
+if par["sparse"] and not scipy.sparse.issparse(mod1_X):
+    print("Make mod1 counts sparse", flush=True)
+    mod1_X = scipy.sparse.csr_matrix(mod1_X)
+
+if par["sparse"] and not scipy.sparse.issparse(mod2.layers["counts"]):
+    print("Make mod2 counts sparse", flush=True)
+    mod2.layers["counts"] = scipy.sparse.csr_matrix(mod2.layers["counts"])
+
+print("Moving .X to .layers['counts']", flush=True)
+mod1.layers["counts"] = mod1_X
+
+# just in case
+del mod1.X
+del mod2.X
+
+print("Setting .var['feature_name']", flush=True)
+if par["var_feature_name"] == "index":
+    mod1.var["feature_name"] = mod1.var.index
+    mod2.var["feature_name"] = mod2.var.index
+else: 
+    if par["var_feature_name"] in mod1.var:
+        mod1.var["feature_name"] = mod1.var[par["feature_name"]]
+        del mod1.var[par["feature_name"]]
+    else:
+        print(f"Warning: key '{par['var_feature_name']}' could not be found in adata_mod1.var.", flush=True)
+    if par["var_feature_name"] in mod2.var:
+        mod2.var["feature_name"] = mod2.var[par["feature_name"]]
+        del mod2.var[par["feature_name"]]
+    else:
+        print(f"Warning: key '{par['var_feature_name']}' could not be found in adata_mod2.var.", flush=True)
+
+print("Setting .var['feature_id']", flush=True)
+if par["var_feature_id"] == "index":
+    mod1.var["feature_id"] = mod1.var.index
+    mod2.var["feature_id"] = mod2.var.index
+else:
+    if par["var_feature_id"] in mod1.var:
+        mod1.var["feature_id"] = mod1.var[par["feature_id"]]
+        del mod1.var[par["feature_id"]]
+    else:
+        print(f"Warning: key '{par['var_feature_id']}' could not be found in adata_mod1.var.", flush=True)
+    if par["var_feature_id"] in mod2.var:
+        mod2.var["feature_id"] = mod2.var[par["feature_id"]]
+        del mod2.var[par["feature_id"]]
+    else:
+        print(f"Warning: key '{par['var_feature_id']}' could not be found in adata_mod2.var.", flush=True)
+
+
+print("Add metadata to uns", flush=True)
+metadata_fields = [
+    "dataset_id", "dataset_name", "dataset_url", "dataset_reference",
+    "dataset_summary", "dataset_description", "dataset_organism"
+]
+for key in metadata_fields:
+    if key in par:
+        print(f"  Setting .uns['{key}']", flush=True)
+        mod1.uns[key] = par[key]
+        mod2.uns[key] = par[key]
+
+print("Writing adata to file", flush=True)
+mod1.write_h5ad(par["output_mod1"], compression="gzip")
+mod2.write_h5ad(par["output_mod2"], compression="gzip")
diff --git a/src/datasets/loaders/openproblems_v1_multimodal/test.py b/src/datasets/loaders/openproblems_v1_multimodal/test.py
new file mode 100644
index 0000000000..d6ead5c88d
--- /dev/null
+++ b/src/datasets/loaders/openproblems_v1_multimodal/test.py
@@ -0,0 +1,85 @@
+from os import path
+import subprocess
+import anndata as ad
+
+input_id = "scicar_mouse_kidney"
+dataset_id = "openproblems_v1_multimodal/" + input_id
+obs_cell_type = "cell_name"
+obs_batch = "replicate"
+obs_tissue = None
+
+output_mod1_file = "output_mod1.h5ad"
+output_mod2_file = "output_mod2.h5ad"
+
+print(">> Running script", flush=True)
+out = subprocess.run(
+    [
+        meta["executable"],
+        "--input_id", input_id,
+        "--dataset_id", dataset_id,
+        "--obs_cell_type", obs_cell_type,
+        "--obs_batch", obs_batch,
+        "--layer_counts", "counts",
+        "--output_mod1", output_mod1_file,
+        "--output_mod2", output_mod2_file,
+        "--dataset_name", "Pancreas",
+        "--dataset_url", "http://foo.org",
+        "--dataset_reference", "foo2000bar",
+        "--dataset_summary", "A short summary.",
+        "--dataset_description", "A couple of paragraphs worth of text.",
+        "--dataset_organism", "homo_sapiens",
+    ],
+    stderr=subprocess.STDOUT
+)
+
+if out.stdout:
+    print(out.stdout, flush=True)
+
+if out.returncode:
+    print(f"script: '{out.args}' exited with an error.", flush=True)
+    exit(out.returncode)
+
+print(">> Checking whether files exist", flush=True)
+assert path.exists(output_mod1_file), "Output mod1 file does not exist"
+assert path.exists(output_mod2_file), "Output mod2 file does not exist"
+
+print(">> Read output anndata", flush=True)
+output_mod1 = ad.read_h5ad(output_mod1_file)
+output_mod2 = ad.read_h5ad(output_mod2_file)
+
+print(f"output_mod1: {output_mod1}", flush=True)
+print(f"output_mod2: {output_mod2}", flush=True)
+
+print(">> Check that output mod1 fits expected API", flush=True)
+assert output_mod1.X is None, ".X is not None/empty in mod 1 output"
+assert "counts" in output_mod1.layers, "'counts' not found in mod 1 output layers" 
+if obs_cell_type:
+    assert "cell_type" in output_mod1.obs.columns, "cell_type column not found in mod 1 output obs"
+if obs_batch:
+    assert "batch" in output_mod1.obs.columns, "batch column not found in mod 1 output obs"
+if obs_tissue:
+    assert "tissue" in output_mod1.obs.columns, "tissue column not found in mod 1 output obs"
+assert output_mod1.uns["dataset_id"] == dataset_id, f"Expected: {dataset_id} as value for dataset_id in mod 1 output uns"
+assert output_mod1.uns["dataset_name"] == "Pancreas", "Expected: Pancreas as value for dataset_name in mod 1 output uns"
+assert output_mod1.uns["dataset_url"] == "http://foo.org", "Expected: http://foo.org as value for dataset_url in mod 1 output uns"
+assert output_mod1.uns["dataset_reference"] == "foo2000bar", "Expected: foo2000bar as value for dataset_reference in mod 1 output uns"
+assert output_mod1.uns["dataset_summary"] == "A short summary.", "Expected: A short summary. as value for dataset_summary in mod 1 output uns"
+assert output_mod1.uns["dataset_description"] == "A couple of paragraphs worth of text.", "Expected: A couple of paragraphs worth of text. as value for dataset_description in mod 1 output uns"
+
+print(">> Check that output mod2 fits expected API", flush=True)
+assert output_mod2.X is None, ".X is not None/empty in mod 2 output"
+assert "counts" in output_mod2.layers, "'counts' not found in mod 2 output layers"
+if obs_cell_type:
+    assert "cell_type" in output_mod2.obs.columns, "cell_type column not found in mod 2 output obs"
+if obs_batch:
+    assert "batch" in output_mod2.obs.columns, "batch column not found in mod 2 output obs"
+if obs_tissue:
+    assert "tissue" in output_mod2.obs.columns, "tissue column not found in mod 2 output obs"
+assert output_mod2.uns["dataset_id"] == dataset_id, f"Expected: {dataset_id} as value for dataset_id in mod 2 output uns"
+assert output_mod2.uns["dataset_name"] == "Pancreas", "Expected: Pancreas as value for dataset_name in mod 2 output uns"
+assert output_mod2.uns["dataset_url"] == "http://foo.org", "Expected: http://foo.org as value for dataset_url in mod 2 output uns"
+assert output_mod2.uns["dataset_reference"] == "foo2000bar", "Expected: foo2000bar as value for dataset_reference in mod 2 output uns"
+assert output_mod2.uns["dataset_summary"] == "A short summary.", "Expected: A short summary. as value for dataset_summary in mod 2 output uns"
+assert output_mod2.uns["dataset_description"] == "A couple of paragraphs worth of text.", "Expected: A couple of paragraphs worth of text. as value for dataset_description in mod 2 output uns"
+
+print(">> All tests passed successfully", flush=True)
diff --git a/src/datasets/loaders/tenx_visium/config.vsh.yaml b/src/datasets/loaders/tenx_visium/config.vsh.yaml
new file mode 100644
index 0000000000..ba28b32b89
--- /dev/null
+++ b/src/datasets/loaders/tenx_visium/config.vsh.yaml
@@ -0,0 +1,96 @@
+functionality:
+  name: tenx_visium
+  namespace: datasets/loaders
+  description: |
+    Download a SpaceRanger h5 gene expression file and spatial imaging data from the 10x genomics website (or someplace else).
+  
+  argument_groups: 
+    - name: Inputs
+      arguments:
+        - name: "--input_expression"
+          type: string
+          description: URL to the feature / barcode matrix HDF5 of the 10x dataset.
+          required: true
+        - name: "--input_spatial"
+          type: string
+          description: URL to the Spatial imaging data of the 10x dataset.
+          required: true
+    - name: Outputs
+      arguments:
+        - name: "--dataset"
+          type: file
+          direction: output
+          description: Output h5ad file
+          required: true
+          example: dataset.h5ad
+    - name: Metadata
+      arguments:
+        - name: "--dataset_id"
+          type: string
+          description: Unique identifier of the dataset.
+          required: true
+        - name: "--dataset_name"
+          type: string
+          description: Nicely formatted name.
+          required: true
+        - name: "--dataset_url"
+          type: string
+          description: Link to the original source of the dataset.
+          required: false
+        - name: "--dataset_reference"
+          type: string
+          description: Bibtex reference of the paper in which the dataset was published.
+          required: false
+        - name: "--dataset_summary"
+          type: string
+          description: Short description of the dataset.
+          required: true
+        - name: "--dataset_description"
+          type: string
+          description: Long description of the dataset.
+          required: true
+        - name: "--dataset_organism"
+          type: string
+          description: The organism of the dataset.
+          required: false
+    - name: Gene or spot filtering
+      description: Arguments related to filtering cells and genes by counts.
+      arguments:
+        - name: "--spot_filter_min_genes"
+          type: integer
+          description: Remove spots with less than this number of genes.
+          required: false
+          example: 200
+        - name: "--spot_filter_min_counts"
+          type: integer
+          description: Remove spots with less than this number of counts.
+          required: false
+        - name: "--gene_filter_min_spots"
+          type: integer
+          description: Remove genes expressed in less than this number of cells.
+          required: false
+          example: 50
+        - name: "--gene_filter_min_counts"
+          type: integer
+          description: Remove genes with less than this number of counts.
+          required: false
+        - name: "--remove_mitochondrial"
+          type: boolean
+          description: Remove mitochondrial genes?
+          required: false
+  
+  resources:
+    - type: python_script
+      path: script.py
+  test_resources:
+    - type: python_script
+      path: test.py
+
+platforms:
+  - type: docker
+    image: ghcr.io/openproblems-bio/base_python:1.0.4
+    setup:
+      - type: python
+        packages:
+          - squidpy
+  - type: nextflow
diff --git a/src/datasets/loaders/tenx_visium/script.py b/src/datasets/loaders/tenx_visium/script.py
new file mode 100644
index 0000000000..7de04e6b5e
--- /dev/null
+++ b/src/datasets/loaders/tenx_visium/script.py
@@ -0,0 +1,82 @@
+import subprocess
+import squidpy as sq
+import tempfile
+import scanpy as sc
+
+## VIASH START
+par = {
+  "input_expression": "https://cf.10xgenomics.com/samples/spatial-exp/2.0.0/CytAssist_FFPE_Mouse_Brain_Rep1/CytAssist_FFPE_Mouse_Brain_Rep1_filtered_feature_bc_matrix.h5",
+  "input_spatial": "https://cf.10xgenomics.com/samples/spatial-exp/2.0.0/CytAssist_FFPE_Mouse_Brain_Rep1/CytAssist_FFPE_Mouse_Brain_Rep1_spatial.tar.gz",
+  "dataset_id": "tenx_visium/mouse_brain_coronal_section1_visium",
+  "dataset_name": "Mouse Brain Coronal Section 1 (FFPE)",
+  "dataset_url": "https://www.10xgenomics.com/datasets/mouse-brain-coronal-section-1-ffpe-2-standard",
+  "dataset_summary": "Gene expression library of Mouse Brain (CytAssist FFPE) using the Mouse Whole Transcriptome Probe Set",
+  "dataset_organism": "Mus musculus",
+  "dataset": "dataset.h5ad",
+  "spot_filter_min_genes": 200,
+  "gene_filter_min_spots": 50,
+  "remove_mitochondrial": True
+}
+meta = {
+  "functionality_name": "tenx_visium"
+}
+## VIASH END
+
+print(f"Downloading data", flush=True)
+with tempfile.TemporaryDirectory() as tempdir:
+  input_exp = "feature_bc_matrix.h5"
+  input_sp = "image_data.tar.gz"
+  epx_data = subprocess.run(["wget", "-O", f"{tempdir}/{input_exp}", par['input_expression']], stderr=subprocess.STDOUT)
+  sp_data = subprocess.run(["wget", "-O", f"{tempdir}/{input_sp}", par['input_spatial']], stderr=subprocess.STDOUT)
+  extract_spatial = subprocess.run(["tar", "-xzf", f"{tempdir}/{input_sp}", "-C", tempdir], stderr=subprocess.STDOUT)
+
+  # Read visium data and create anndata object
+  adata = sq.read.visium(path=tempdir, counts_file=input_exp)
+
+# Make variable names unique
+adata.var_names_make_unique()
+
+sc.pp.calculate_qc_metrics(adata, inplace=True)
+
+print("Filtering spots or genes")
+t0 = adata.shape
+# remove cells with few counts
+if par["spot_filter_min_counts"]:
+  sc.pp.filter_cells(adata, min_counts=par["spot_filter_min_counts"], inplace=True)
+# remove cells with few genes 
+if par["spot_filter_min_genes"]:
+  sc.pp.filter_cells(adata, min_genes=par["spot_filter_min_genes"], inplace=True)
+# remove genes that have few counts
+if par["gene_filter_min_counts"]:
+  sc.pp.filter_genes(adata, min_counts=par["gene_filter_min_counts"], inplace=True)
+# remove genes that are found in few cells
+if par["gene_filter_min_spots"]:
+  sc.pp.filter_genes(adata, min_cells=par["gene_filter_min_spots"], inplace=True)
+t1 = adata.shape
+print(f"Removed {t0[0] - t1[0]} cells and {(t0[1] - t1[1])} genes.")
+
+if par["remove_mitochondrial"]:
+  print("Removing mitochondrial genes")
+  non_mito_genes_list = [name for name in adata.var_names if not (name.startswith('MT-') or name.startswith('mt-'))]
+  adata = adata[:, non_mito_genes_list]
+  
+# Rename .var columns
+adata.var['feature_name'] = adata.var_names
+adata.var.set_index(adata.var['gene_ids'], inplace=True)
+adata.var.rename(columns={"gene_ids": "feature_id"}, inplace=True)
+
+# Move counts to .layers
+print("Add metadata to uns", flush=True)
+adata.layers["counts"] = adata.X
+adata.X = None
+
+# Add metadata
+print("Add metadata to uns", flush=True)
+metadata_fields = ["dataset_id", "dataset_name", "dataset_url", "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism"]
+for key in metadata_fields:
+  if key in par:
+    print(f"Setting .uns['{key}']", flush=True)
+    adata.uns[key] = par[key]
+
+print("Writing adata to file", flush=True)
+adata.write_h5ad(par["dataset"], compression="gzip")
\ No newline at end of file
diff --git a/src/datasets/loaders/tenx_visium/test.py b/src/datasets/loaders/tenx_visium/test.py
new file mode 100644
index 0000000000..a559ae1d3d
--- /dev/null
+++ b/src/datasets/loaders/tenx_visium/test.py
@@ -0,0 +1,57 @@
+import os
+import subprocess
+import anndata as ad
+
+input_expression ="https://cf.10xgenomics.com/samples/spatial-exp/2.0.0/CytAssist_FFPE_Mouse_Brain_Rep1/CytAssist_FFPE_Mouse_Brain_Rep1_filtered_feature_bc_matrix.h5"
+input_spatial = "https://cf.10xgenomics.com/samples/spatial-exp/2.0.0/CytAssist_FFPE_Mouse_Brain_Rep1/CytAssist_FFPE_Mouse_Brain_Rep1_spatial.tar.gz"
+dataset_id = "10x_visium/mouse_brain_coronal_section1"
+dataset_name = "Mouse Brain Coronal Section 1 (FFPE)"
+dataset_url = "https://www.10xgenomics.com/datasets/mouse-brain-coronal-section-1-ffpe-2-standard"
+dataset_summary = "Gene expression library of Mouse Brain (CytAssist FFPE) using the Mouse Whole Transcriptome Probe Set"
+dataset_description = "CytAssist_FFPE_Mouse_Brain_Rep1 - Gene expression library of Mouse Brain (CytAssist FFPE) using the Mouse Whole Transcriptome Probe Set"
+dataset_organism = "Mus musculus"
+dataset = "dataset.h5ad"
+
+print(">> Running script", flush=True)
+out = subprocess.run(
+    [
+        meta['executable'],
+        "--input_expression",  input_expression, 
+        "--input_spatial", input_spatial, 
+        "--dataset_id", dataset_id, 
+        "--dataset_name", dataset_name, 
+        "--dataset_url", dataset_url, 
+        "--dataset_summary", dataset_summary, 
+        "--dataset_description", dataset_description, 
+        "--dataset_organism", dataset_organism, 
+        "--dataset", dataset
+    ],
+    stderr=subprocess.STDOUT
+)
+
+if out.stdout:
+    print(out.stdout, flush=True)
+
+if out.returncode:
+    print(f"script: '{out.args}' exited with an error.", flush=True)
+    exit(out.returncode)
+
+print(">> Checking whether output file exists", flush=True)
+assert os.path.exists(dataset), "Output does not exist"
+
+print(">> Read output anndata", flush=True)
+adata = ad.read_h5ad(dataset)
+
+print(adata)
+
+print(">> Check that output fits expected API", flush=True)
+assert adata.X is None, "adata.X should be None/empty"
+assert "counts" in adata.layers, "Counts layer not found in .layers"
+assert adata.uns["dataset_id"] == dataset_id, f"Expected {dataset_id} as value"
+assert adata.uns["dataset_name"] == dataset_name, f"Expected {dataset_name} as value"
+assert adata.uns["dataset_url"] == dataset_url, f"Expected {dataset_url} as value"
+assert adata.uns["dataset_summary"] == dataset_summary, f"Expected {dataset_summary} as value"
+assert adata.uns["dataset_organism"] == dataset_organism, f"Expected {dataset_organism} as value"
+assert 'spatial' in adata.obsm, "Spatial spot coordinates not found in .obsm"
+
+print(">> All tests passed successfully", flush=True)
diff --git a/src/datasets/loaders/zenodo_spatial/config.vsh.yaml b/src/datasets/loaders/zenodo_spatial/config.vsh.yaml
new file mode 100644
index 0000000000..776b177481
--- /dev/null
+++ b/src/datasets/loaders/zenodo_spatial/config.vsh.yaml
@@ -0,0 +1,87 @@
+functionality:
+  name: zenodo_spatial
+  namespace: datasets/loaders
+  description: |
+    Download an Anndata file containing DBiT seq, MERFISH, seqFISH, Slide-seq v2, STARmap, and Stereo-seq data from Zenodo.
+  argument_groups: 
+    - name: Inputs
+      arguments:
+        - name: "--input_data"
+          type: string
+          description: URL to the Anndata file.
+          required: true
+    - name: Outputs
+      arguments:
+        - name: "--dataset"
+          type: file
+          direction: output
+          description: Output h5ad file
+          required: true
+          example: dataset.h5ad
+    - name: Metadata
+      arguments:
+        - name: "--dataset_id"
+          type: string
+          description: Unique identifier of the dataset.
+          required: true
+        - name: "--dataset_name"
+          type: string
+          description: Nicely formatted name.
+          required: true
+        - name: "--dataset_url"
+          type: string
+          description: Link to the original source of the dataset.
+          required: false
+        - name: "--dataset_reference"
+          type: string
+          description: Bibtex reference of the paper in which the dataset was published.
+          required: false
+        - name: "--dataset_summary"
+          type: string
+          description: Short description of the dataset.
+          required: true
+        - name: "--dataset_description"
+          type: string
+          description: Long description of the dataset.
+          required: true
+        - name: "--dataset_organism"
+          type: string
+          description: The organism of the dataset.
+          required: false
+    - name: Gene or spot filtering
+      description: Arguments related to filtering cells and genes by counts.
+      arguments:
+        - name: "--spot_filter_min_genes"
+          type: integer
+          description: Remove spots with less than this number of genes.
+          required: false
+          example: 200
+        - name: "--spot_filter_min_counts"
+          type: integer
+          description: Remove spots with less than this number of counts.
+          required: false
+        - name: "--gene_filter_min_spots"
+          type: integer
+          description: Remove genes expressed in less than this number of cells.
+          required: false
+          example: 50
+        - name: "--gene_filter_min_counts"
+          type: integer
+          description: Remove genes with less than this number of counts.
+          required: false
+        - name: "--remove_mitochondrial"
+          type: boolean
+          description: Remove mitochondrial genes?
+          required: false
+  
+  resources:
+    - type: python_script
+      path: script.py
+  test_resources:
+    - type: python_script
+      path: test.py
+
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
diff --git a/src/datasets/loaders/zenodo_spatial/script.py b/src/datasets/loaders/zenodo_spatial/script.py
new file mode 100644
index 0000000000..83aeb86056
--- /dev/null
+++ b/src/datasets/loaders/zenodo_spatial/script.py
@@ -0,0 +1,85 @@
+import subprocess
+import tempfile
+import scanpy as sc
+
+# VIASH START
+par = {
+    "input_data": "ps://zenodo.org/records/12785822/files/Slide-seqV2_stickels2020highly_stickels2021highly_SlideSeqV2_Mouse_Olfactory_bulb_Puck_200127_15_data_whole.h5ad?download=1",
+    "dataset_id": "zenodo_spatial/mouse_olfactory_bulb_puck_slideseqv2",
+    "dataset_name": "Mouse Olfactory Bulk Puck",
+    "dataset_url": "https://singlecell.broadinstitute.org/single_cell/study/SCP815/sensitive-spatial-genome-wide-expression-profiling-at-cellular-resolution#study-summary",
+    "dataset_summary": "Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2",
+    "dataset_organism": "Mus musculus",
+    "dataset": "dataset.h5ad",
+    "spot_filter_min_genes": 10,
+    "gene_filter_min_spots": 500,
+    "remove_mitochondrial": True
+}
+meta = {
+    "functionality_name": "zenodo_spatial"
+}
+# VIASH END
+
+print(f"Downloading data", flush=True)
+with tempfile.TemporaryDirectory() as tempdir:
+    input_data = "input_data.h5ad"
+    epx_data = subprocess.run(["wget", "-O", f"{tempdir}/{input_data}", par['input_data']], stderr=subprocess.STDOUT)
+    adata = sc.read_h5ad(filename=f"{tempdir}/{input_data}")
+
+# Make variable names unique
+adata.var_names_make_unique()
+
+sc.pp.calculate_qc_metrics(adata, inplace=True, percent_top=None)
+
+print("Filtering spots or genes")
+t0 = adata.shape
+# remove cells with few counts
+if par["spot_filter_min_counts"]:
+    sc.pp.filter_cells(
+        adata, min_counts=par["spot_filter_min_counts"], inplace=True)
+
+# remove cells with few genes
+if par["spot_filter_min_genes"]:
+    sc.pp.filter_cells(
+        adata, min_genes=par["spot_filter_min_genes"], inplace=True)
+
+# remove genes that have few counts
+if par["gene_filter_min_counts"]:
+    sc.pp.filter_genes(
+        adata, min_counts=par["gene_filter_min_counts"], inplace=True)
+
+# remove genes that are found in few cells
+if par["gene_filter_min_spots"]:
+    sc.pp.filter_genes(
+        adata, min_cells=par["gene_filter_min_spots"], inplace=True)
+
+t1 = adata.shape
+print(f"Removed {t0[0] - t1[0]} cells and {(t0[1] - t1[1])} genes.")
+
+if par["remove_mitochondrial"]:
+    print("Removing mitochondrial genes")
+    non_mito_genes_list = [name for name in adata.var_names if not (
+        name.startswith('MT-') or name.startswith('mt-'))]
+    adata = adata[:, non_mito_genes_list]
+
+# Rename .var columns
+adata.var['feature_name'] = adata.var_names
+if('gene_ids' in adata.var):
+    adata.var.set_index(adata.var['gene_ids'], inplace=True)
+    adata.var.rename(columns={"gene_ids": "feature_id"}, inplace=True)
+
+# Move counts to .layers
+print("Add metadata to uns", flush=True)
+adata.layers["counts"] = adata.X
+adata.X = None
+
+# Add metadata
+print("Add metadata to uns", flush=True)
+metadata_fields = ["dataset_id", "dataset_name", "dataset_url", "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism"]
+for key in metadata_fields:
+    if key in par:
+        print(f"Setting .uns['{key}']", flush=True)
+        adata.uns[key] = par[key]
+
+print("Writing adata to file", flush=True)
+adata.write_h5ad(par["dataset"], compression="gzip")
diff --git a/src/datasets/loaders/zenodo_spatial/test.py b/src/datasets/loaders/zenodo_spatial/test.py
new file mode 100644
index 0000000000..07dcd953a8
--- /dev/null
+++ b/src/datasets/loaders/zenodo_spatial/test.py
@@ -0,0 +1,55 @@
+import os
+import subprocess
+import anndata as ad
+
+input_data ="https://zenodo.org/records/12784832/files/Slide-seqV2_stickels2020highly_stickels2021highly_SlideSeqV2_Mouse_Olfactory_bulb_Puck_200127_15_data_whole.h5ad?download=1"
+dataset_id = "zenodo_spatial/mouse_olfactory_bulb_puck"
+dataset_name = "mouse_olfactory_bulb_puck"
+dataset_url = "https://singlecell.broadinstitute.org/single_cell/study/SCP815/sensitive-spatial-genome-wide-expression-profiling-at-cellular-resolution#study-summary"
+dataset_summary = "Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2"
+dataset_description = "Gene expression library of mouse olfactory bulk puck profiled using Slide-seq V2"
+dataset_organism = "Mus musculus"
+dataset = "dataset.h5ad"
+
+print(">> Running script", flush=True)
+out = subprocess.run(
+    [
+        meta['executable'],
+        "--input_data",  input_data, 
+        "--dataset_id", dataset_id, 
+        "--dataset_name", dataset_name, 
+        "--dataset_url", dataset_url, 
+        "--dataset_summary", dataset_summary, 
+        "--dataset_description", dataset_description, 
+        "--dataset_organism", dataset_organism, 
+        "--dataset", dataset
+    ],
+    stderr=subprocess.STDOUT
+)
+
+if out.stdout:
+    print(out.stdout, flush=True)
+
+if out.returncode:
+    print(f"script: '{out.args}' exited with an error.", flush=True)
+    exit(out.returncode)
+
+print(">> Checking whether output file exists", flush=True)
+assert os.path.exists(dataset), "Output does not exist"
+
+print(">> Read output anndata", flush=True)
+adata = ad.read_h5ad(dataset)
+
+print(adata)
+
+print(">> Check that output fits expected API", flush=True)
+assert adata.X is None, "adata.X should be None/empty"
+assert "counts" in adata.layers, "Counts layer not found in .layers"
+assert adata.uns["dataset_id"] == dataset_id, f"Expected {dataset_id} as value"
+assert adata.uns["dataset_name"] == dataset_name, f"Expected {dataset_name} as value"
+assert adata.uns["dataset_url"] == dataset_url, f"Expected {dataset_url} as value"
+assert adata.uns["dataset_summary"] == dataset_summary, f"Expected {dataset_summary} as value"
+assert adata.uns["dataset_organism"] == dataset_organism, f"Expected {dataset_organism} as value"
+assert 'spatial' in adata.obsm, "Spatial spot coordinates not found in .obsm"
+
+print(">> All tests passed successfully", flush=True)
diff --git a/src/datasets/loaders/zenodo_spatial_slidetags/config.vsh.yaml b/src/datasets/loaders/zenodo_spatial_slidetags/config.vsh.yaml
new file mode 100644
index 0000000000..905be3514c
--- /dev/null
+++ b/src/datasets/loaders/zenodo_spatial_slidetags/config.vsh.yaml
@@ -0,0 +1,88 @@
+functionality:
+  name: zenodo_spatial_slidetags
+  namespace: datasets/loaders
+  description: |
+    Download a compressed file containing gene expression matrix and spatial locations from zenodo.
+  
+  argument_groups: 
+    - name: Inputs
+      arguments:
+        - name: "--input_data"
+          type: string
+          description: URL to the file.
+          required: true
+    - name: Outputs
+      arguments:
+        - name: "--dataset"
+          type: file
+          direction: output
+          description: Output h5ad file
+          required: true
+          example: dataset.h5ad
+    - name: Metadata
+      arguments:
+        - name: "--dataset_id"
+          type: string
+          description: Unique identifier of the dataset.
+          required: true
+        - name: "--dataset_name"
+          type: string
+          description: Nicely formatted name.
+          required: true
+        - name: "--dataset_url"
+          type: string
+          description: Link to the original source of the dataset.
+          required: false
+        - name: "--dataset_reference"
+          type: string
+          description: Bibtex reference of the paper in which the dataset was published.
+          required: false
+        - name: "--dataset_summary"
+          type: string
+          description: Short description of the dataset.
+          required: true
+        - name: "--dataset_description"
+          type: string
+          description: Long description of the dataset.
+          required: true
+        - name: "--dataset_organism"
+          type: string
+          description: The organism of the dataset.
+          required: false
+    - name: Gene or spot filtering
+      description: Arguments related to filtering cells and genes by counts.
+      arguments:
+        - name: "--spot_filter_min_genes"
+          type: integer
+          description: Remove spots with less than this number of genes.
+          required: false
+          example: 200
+        - name: "--spot_filter_min_counts"
+          type: integer
+          description: Remove spots with less than this number of counts.
+          required: false
+        - name: "--gene_filter_min_spots"
+          type: integer
+          description: Remove genes expressed in less than this number of cells.
+          required: false
+          example: 50
+        - name: "--gene_filter_min_counts"
+          type: integer
+          description: Remove genes with less than this number of counts.
+          required: false
+        - name: "--remove_mitochondrial"
+          type: boolean
+          description: Remove mitochondrial genes?
+          required: false
+  
+  resources:
+    - type: python_script
+      path: script.py
+  test_resources:
+    - type: python_script
+      path: test.py
+
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
diff --git a/src/datasets/loaders/zenodo_spatial_slidetags/script.py b/src/datasets/loaders/zenodo_spatial_slidetags/script.py
new file mode 100644
index 0000000000..5a8cf212fa
--- /dev/null
+++ b/src/datasets/loaders/zenodo_spatial_slidetags/script.py
@@ -0,0 +1,103 @@
+import subprocess
+import pandas as pd
+import tempfile
+import scanpy as sc
+
+# VIASH START
+par = {
+    "input_data": "https://zenodo.org/records/12785822/files/slidetag_human_cortex.tar.gz?download=1",
+    "dataset_id": "zenodo_spatial_slidetags/human_cortex_slidetags",
+    "dataset_name": "slidetag_human_cortex",
+    "dataset_url": "https://www.nature.com/articles/s41586-023-06837-4",
+    "dataset_summary": "Slide-tags enables single-nucleus barcoding for multimodal spatial genomics",
+    "dataset_organism": "Homo sapiens",
+    "dataset": "dataset.h5ad",
+    "spot_filter_min_genes": 200,
+    "gene_filter_min_spots": 50,
+    "remove_mitochondrial": True
+}
+meta = {
+    "functionality_name": "zenodo_spatial_slidetags"
+}
+# VIASH END
+
+print(f"Downloading data", flush=True)
+with tempfile.TemporaryDirectory() as tempdir:
+    input_data = "input_data.tar.gz"
+    dataset_name = par['dataset_name']
+    epx_data = subprocess.run(
+        ["wget", "-O", f"{tempdir}/{input_data}", par['input_data']], stderr=subprocess.STDOUT)
+    extract_spatial = subprocess.run(
+        ["tar", "-xzf", f"{tempdir}/{input_data}", "-C", tempdir, "--strip-components=1"], stderr=subprocess.STDOUT)
+
+    # Read gene expression and create anndata object
+    adata = sc.read_10x_mtx(path=tempdir)
+
+    # Read spatial locations
+    df = pd.read_csv(f"{tempdir}/spatial.csv", skiprows=1)
+    df = df.set_index('TYPE')
+    df.columns = ['spatial1', 'spatial2', 'cell_type']
+
+    # add spatial locations to anndata object
+    sel_cells = list(set(df.index) & set(adata.obs_names))
+
+    df = df.loc[sel_cells, ]
+    adata = adata[sel_cells, ]
+
+    adata.obs = df
+    adata.obsm['spatial'] = df[['spatial2', 'spatial1']].values
+
+# Make variable names unique
+adata.var_names_make_unique()
+
+sc.pp.calculate_qc_metrics(adata, inplace=True)
+
+print("Filtering spots or genes")
+t0 = adata.shape
+# remove cells with few counts
+if par["spot_filter_min_counts"]:
+    sc.pp.filter_cells(
+        adata, min_counts=par["spot_filter_min_counts"], inplace=True)
+# remove cells with few genes
+if par["spot_filter_min_genes"]:
+    sc.pp.filter_cells(
+        adata, min_genes=par["spot_filter_min_genes"], inplace=True)
+# remove genes that have few counts
+if par["gene_filter_min_counts"]:
+    sc.pp.filter_genes(
+        adata, min_counts=par["gene_filter_min_counts"], inplace=True)
+# remove genes that are found in few cells
+if par["gene_filter_min_spots"]:
+    sc.pp.filter_genes(
+        adata, min_cells=par["gene_filter_min_spots"], inplace=True)
+t1 = adata.shape
+print(f"Removed {t0[0] - t1[0]} cells and {(t0[1] - t1[1])} genes.")
+
+if par["remove_mitochondrial"]:
+    print("Removing mitochondrial genes")
+    non_mito_genes_list = [name for name in adata.var_names if not (
+        name.startswith('MT-') or name.startswith('mt-'))]
+    adata = adata[:, non_mito_genes_list]
+
+
+# Rename .var columns
+adata.var['feature_name'] = adata.var_names
+adata.var.set_index(adata.var['gene_ids'], inplace=True)
+adata.var.rename(columns={"gene_ids": "feature_id"}, inplace=True)
+
+# Move counts to .layers
+print("Add metadata to uns", flush=True)
+adata.layers["counts"] = adata.X
+adata.X = None
+
+# Add metadata
+print("Add metadata to uns", flush=True)
+metadata_fields = ["dataset_id", "dataset_name", "dataset_url",
+                   "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism"]
+for key in metadata_fields:
+    if key in par:
+        print(f"Setting .uns['{key}']", flush=True)
+        adata.uns[key] = par[key]
+
+print("Writing adata to file", flush=True)
+adata.write_h5ad(par["dataset"], compression="gzip")
diff --git a/src/datasets/loaders/zenodo_spatial_slidetags/test.py b/src/datasets/loaders/zenodo_spatial_slidetags/test.py
new file mode 100644
index 0000000000..9f859ebea6
--- /dev/null
+++ b/src/datasets/loaders/zenodo_spatial_slidetags/test.py
@@ -0,0 +1,55 @@
+import os
+import subprocess
+import anndata as ad
+
+input_data ="https://zenodo.org/records/12785822/files/slidetag_human_cortex.tar.gz?download=1"
+dataset_id = "zenodo_spatial_slidetags/human_cortex"
+dataset_name = "slidetag_human_cortex"
+dataset_url = "https://www.nature.com/articles/s41586-023-06837-4"
+dataset_summary = "Slide-tags enables single-nucleus barcoding for multimodal spatial genomics"
+dataset_description = "A 100 mm2 region of the human prefrontal cortex from a neurotypical donor aged 78 years was profiled by Slide-tags"
+dataset_organism = "Homo sapiens"
+dataset = "dataset.h5ad"
+
+print(">> Running script", flush=True)
+out = subprocess.run(
+    [
+        meta['executable'],
+        "--input_data",  input_data, 
+        "--dataset_id", dataset_id, 
+        "--dataset_name", dataset_name, 
+        "--dataset_url", dataset_url, 
+        "--dataset_summary", dataset_summary, 
+        "--dataset_description", dataset_description, 
+        "--dataset_organism", dataset_organism, 
+        "--dataset", dataset
+    ],
+    stderr=subprocess.STDOUT
+)
+
+if out.stdout:
+    print(out.stdout, flush=True)
+
+if out.returncode:
+    print(f"script: '{out.args}' exited with an error.", flush=True)
+    exit(out.returncode)
+
+print(">> Checking whether output file exists", flush=True)
+assert os.path.exists(dataset), "Output does not exist"
+
+print(">> Read output anndata", flush=True)
+adata = ad.read_h5ad(dataset)
+
+print(adata)
+
+print(">> Check that output fits expected API", flush=True)
+assert adata.X is None, "adata.X should be None/empty"
+assert "counts" in adata.layers, "Counts layer not found in .layers"
+assert adata.uns["dataset_id"] == dataset_id, f"Expected {dataset_id} as value"
+assert adata.uns["dataset_name"] == dataset_name, f"Expected {dataset_name} as value"
+assert adata.uns["dataset_url"] == dataset_url, f"Expected {dataset_url} as value"
+assert adata.uns["dataset_summary"] == dataset_summary, f"Expected {dataset_summary} as value"
+assert adata.uns["dataset_organism"] == dataset_organism, f"Expected {dataset_organism} as value"
+assert 'spatial' in adata.obsm, "Spatial spot coordinates not found in .obsm"
+
+print(">> All tests passed successfully", flush=True)
diff --git a/src/datasets/normalization/atac_tfidf/config.vsh.yaml b/src/datasets/normalization/atac_tfidf/config.vsh.yaml
new file mode 100644
index 0000000000..5a8f56306a
--- /dev/null
+++ b/src/datasets/normalization/atac_tfidf/config.vsh.yaml
@@ -0,0 +1,22 @@
+__merge__: ../../api/comp_normalization.yaml
+functionality:
+  name: "atac_tfidf"
+  description: |
+    Transform peak counts with TF-IDF (Term Frequency - Inverse Document Frequency).
+
+    TF: peak counts are normalised by total number of counts per cell DF: total number of counts for each peak IDF: number of cells divided by DF
+
+    By default, log(TF) * log(IDF) is returned.
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages:
+          - muon
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, midcpu]
diff --git a/src/datasets/normalization/atac_tfidf/script.py b/src/datasets/normalization/atac_tfidf/script.py
new file mode 100644
index 0000000000..ecb772bd64
--- /dev/null
+++ b/src/datasets/normalization/atac_tfidf/script.py
@@ -0,0 +1,26 @@
+import anndata as ad
+from muon import atac as ac
+
+## VIASH START
+par = {
+    'input': "resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod2.h5ad",
+    'output': "output_norm.h5ad"
+}
+meta = {
+    'functionality_name': "tfidf"
+}
+## VIASH END
+
+print("Load data", flush=True)
+adata = ad.read_h5ad(par['input'])
+
+print("Normalize data", flush=True)
+input_adata = ad.AnnData(X=adata.layers["counts"])
+normalized_counts = ac.pp.tfidf(input_adata, inplace=False)
+
+print("Store output in adata", flush=True)
+adata.layers[par["layer_output"]] = normalized_counts
+adata.uns["normalization_id"] = par["normalization_id"] or meta['functionality_name']
+
+print("Write data", flush=True)
+adata.write_h5ad(par['output'], compression="gzip")
diff --git a/src/datasets/normalization/l1_sqrt/config.vsh.yaml b/src/datasets/normalization/l1_sqrt/config.vsh.yaml
new file mode 100644
index 0000000000..212eadc968
--- /dev/null
+++ b/src/datasets/normalization/l1_sqrt/config.vsh.yaml
@@ -0,0 +1,27 @@
+__merge__: ../../api/comp_normalization.yaml
+functionality:
+  name: "l1_sqrt"
+  description: |
+    Scaled L1 sqrt normalization.
+
+    This normalization method causes all cells to have the same sum of values.
+
+    Steps:
+
+    * Compute the square root of the counts.
+    * Apply L1 normalization (rescaled such that the sum of the values of each cell sum to 1).
+    * Multiply by the median UMI count per cell, causing all cells to have the sum of values.
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages:
+          - scprep
+          - numpy<2
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, midcpu]
diff --git a/src/datasets/normalization/l1_sqrt/script.py b/src/datasets/normalization/l1_sqrt/script.py
new file mode 100644
index 0000000000..76c69cf897
--- /dev/null
+++ b/src/datasets/normalization/l1_sqrt/script.py
@@ -0,0 +1,29 @@
+import anndata as ad
+import scprep
+import numpy as np
+
+## VIASH START
+par = {
+    'input': "output_train.h5ad",
+    'output': "output_norm.h5ad"
+}
+meta = {
+    'functionality_name': "l1_sqrt"
+}
+## VIASH END
+
+print("Load data", flush=True)
+adata = ad.read_h5ad(par['input'])
+
+print("Normalize data", flush=True)
+# libsize and sqrt L1 norm
+sqrt_data = scprep.utils.matrix_transform(adata.layers['counts'], np.sqrt)
+l1_sqrt, libsize = scprep.normalize.library_size_normalize(sqrt_data, rescale=1, return_library_size=True)
+l1_sqrt = l1_sqrt.tocsr()
+
+print("Store output in adata", flush=True)
+adata.layers[par["layer_output"]] = l1_sqrt
+adata.uns["normalization_id"] = par["normalization_id"] or meta['functionality_name']
+
+print("Write data", flush=True)
+adata.write_h5ad(par['output'], compression="gzip")
diff --git a/src/datasets/normalization/log_cp/config.vsh.yaml b/src/datasets/normalization/log_cp/config.vsh.yaml
new file mode 100644
index 0000000000..89b2a283f9
--- /dev/null
+++ b/src/datasets/normalization/log_cp/config.vsh.yaml
@@ -0,0 +1,18 @@
+__merge__: ../../api/comp_normalization.yaml
+functionality:
+  name: "log_cp"
+  description: "Normalize data using Log CP"
+  resources:
+    - type: python_script
+      path: script.py
+  arguments:
+    - name: "--n_cp"
+      type: integer
+      default: 1e4
+      description: "Number of counts per cell. When set to -1, will use None."
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, midcpu]
diff --git a/src/datasets/normalization/log_cp/script.py b/src/datasets/normalization/log_cp/script.py
new file mode 100644
index 0000000000..39ddf61636
--- /dev/null
+++ b/src/datasets/normalization/log_cp/script.py
@@ -0,0 +1,42 @@
+import scanpy as sc
+
+## VIASH START
+par = {
+    'input': "resources_test/common/pancreas/dataset.h5ad",
+    'output': "output.h5ad",
+    'layer_output': "log_cp10k",
+    'obs_size_factors': "log_cp10k_size_factors",
+    'n_cp': 1e6,
+}
+meta = {
+    "functionality_name": "normalize_log_cp10k"
+}
+## VIASH END
+
+print(">> Load data", flush=True)
+adata = sc.read_h5ad(par['input'])
+
+print(">> Normalize data", flush=True)
+if par["n_cp"] == -1:
+    norm = sc.pp.normalize_total(
+        adata, 
+        target_sum=None, 
+        layer="counts", 
+        inplace=False
+    )
+else:
+    norm = sc.pp.normalize_total(
+        adata, 
+        target_sum=par["n_cp"], 
+        layer="counts", 
+        inplace=False
+    )
+lognorm = sc.pp.log1p(norm["X"])
+
+print(">> Store output in adata", flush=True)
+adata.layers[par["layer_output"]] = lognorm
+adata.obs[par["obs_size_factors"]] = norm["norm_factor"]
+adata.uns["normalization_id"] = par["normalization_id"] or meta['functionality_name']
+
+print(">> Write data", flush=True)
+adata.write_h5ad(par['output'], compression="gzip")
diff --git a/src/datasets/normalization/log_scran_pooling/config.vsh.yaml b/src/datasets/normalization/log_scran_pooling/config.vsh.yaml
new file mode 100644
index 0000000000..4cbf81ff5a
--- /dev/null
+++ b/src/datasets/normalization/log_scran_pooling/config.vsh.yaml
@@ -0,0 +1,18 @@
+__merge__: ../../api/comp_normalization.yaml
+functionality:
+  name: "log_scran_pooling"
+  description: "Normalize data using scran pooling"
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [ Matrix, rlang, scran, BiocParallel ]
+      - type: python
+        pip: scanpy
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, midcpu]
diff --git a/src/datasets/normalization/log_scran_pooling/script.R b/src/datasets/normalization/log_scran_pooling/script.R
new file mode 100644
index 0000000000..be51e21f38
--- /dev/null
+++ b/src/datasets/normalization/log_scran_pooling/script.R
@@ -0,0 +1,38 @@
+cat(">> Loading dependencies\n")
+library(anndata, warn.conflicts = FALSE)
+requireNamespace("scran", quietly = TRUE)
+requireNamespace("BiocParallel", quietly = TRUE)
+library(Matrix, warn.conflicts = FALSE)
+
+## VIASH START
+par <- list(
+  input = "resources_test/label_projection/pancreas/datas.h5ad",
+  output = "output.scran.h5ad",
+  layer_output = "log_scran_pooling",
+  obs_size_factors = "size_factors_log_scran_pooling"
+)
+## VIASH END
+
+cat(">> Load data\n")
+adata <- anndata::read_h5ad(par$input)
+counts <- as(t(adata$layers[["counts"]]), "CsparseMatrix")
+
+cat(">> Normalizing data\n")
+size_factors <- scran::calculateSumFactors(
+  counts,
+  min.mean = 0.1,
+  BPPARAM = BiocParallel::MulticoreParam()
+)
+lognorm <- log1p(sweep(adata$layers[["counts"]], 1, size_factors, "*"))
+
+cat(">> Storing in anndata\n")
+adata$obs[[par$obs_size_factors]] <- size_factors
+adata$layers[[par$layer_output]] <- lognorm
+norm_id <- par[["normalization_id"]]
+if (is.null(norm_id)) {
+  norm_id <- meta[["functionality_name"]]
+}
+adata$uns[["normalization_id"]] <- norm_id
+
+cat(">> Writing to file\n")
+zzz <- adata$write_h5ad(par$output, compression = "gzip")
diff --git a/src/datasets/normalization/prot_clr/config.vsh.yaml b/src/datasets/normalization/prot_clr/config.vsh.yaml
new file mode 100644
index 0000000000..8f6bbe269f
--- /dev/null
+++ b/src/datasets/normalization/prot_clr/config.vsh.yaml
@@ -0,0 +1,26 @@
+__merge__: ../../api/comp_normalization.yaml
+functionality:
+  name: "prot_clr"
+  description: |
+    Perform center log ratio (CLR) normalization on input CITE-seq data (Stoeckius et al. 2017).
+
+    The CLR transformation is defined as:
+
+    $$
+    x_{\text{clr}} = \log\left(\frac{x}{g(x)}\right)
+    $$
+
+    where $\(g(x)\)$ is the geometric mean of the row $\(x\)$.
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages:
+          - muon
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, midcpu]
diff --git a/src/datasets/normalization/prot_clr/script.py b/src/datasets/normalization/prot_clr/script.py
new file mode 100644
index 0000000000..3f0a2fb3fd
--- /dev/null
+++ b/src/datasets/normalization/prot_clr/script.py
@@ -0,0 +1,28 @@
+import anndata as ad
+from muon import prot as pt
+
+## VIASH START
+par = {
+    'input': "resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod2.h5ad",
+    'output': "output_norm.h5ad"
+}
+meta = {
+    'functionality_name': "clr"
+}
+## VIASH END
+
+print("Load data", flush=True)
+adata = ad.read_h5ad(par['input'])
+
+print("Normalize data", flush=True)
+input_adata = ad.AnnData(X=adata.layers["counts"])
+normalized_counts = pt.pp.clr(input_adata, inplace=False)
+if not normalized_counts:
+    raise RuntimeError("CLR failed to return the requested output layer")
+
+print("Store output in adata", flush=True)
+adata.layers[par["layer_output"]] = normalized_counts.X
+adata.uns["normalization_id"] = par["normalization_id"] or meta['functionality_name']
+
+print("Write data", flush=True)
+adata.write_h5ad(par['output'], compression="gzip")
diff --git a/src/datasets/normalization/sqrt_cp/config.vsh.yaml b/src/datasets/normalization/sqrt_cp/config.vsh.yaml
new file mode 100644
index 0000000000..4d95636f4c
--- /dev/null
+++ b/src/datasets/normalization/sqrt_cp/config.vsh.yaml
@@ -0,0 +1,18 @@
+__merge__: ../../api/comp_normalization.yaml
+functionality:
+  name: "sqrt_cp"
+  description: "Normalize data using Log Sqrt"
+  resources:
+    - type: python_script
+      path: script.py
+  arguments:
+    - name: "--n_cp"
+      type: integer
+      default: 1e4
+      description: "Number of counts per cell"
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, midcpu]
diff --git a/src/datasets/normalization/sqrt_cp/script.py b/src/datasets/normalization/sqrt_cp/script.py
new file mode 100644
index 0000000000..84afdaa19d
--- /dev/null
+++ b/src/datasets/normalization/sqrt_cp/script.py
@@ -0,0 +1,35 @@
+import scanpy as sc
+import numpy as np
+
+## VIASH START
+par = {
+    'input': "resources_test/common/pancreas/dataset.h5ad",
+    'output': "output.h5ad",
+    'layer_output': "sqrt_cpm",
+    'obs_size_factors': "size_factors_sqrt_cpm",
+    'n_cp': 1e6,
+}
+meta = {
+    "functionality_name": "normalize_sqrt_cpm"
+}
+## VIASH END
+
+print(">> Load data", flush=True)
+adata = sc.read_h5ad(par['input'])
+
+print(">> Normalize data", flush=True)
+norm = sc.pp.normalize_total(
+    adata, 
+    target_sum=par['n_cp'], 
+    layer="counts", 
+    inplace=False
+)
+lognorm = np.sqrt(norm['X'])
+
+print(">> Store output in adata", flush=True)
+adata.layers[par["layer_output"]] = lognorm
+adata.obs[par["obs_size_factors"]] = norm["norm_factor"]
+adata.uns["normalization_id"] = par["normalization_id"] or meta['functionality_name']
+
+print(">> Write data", flush=True)
+adata.write_h5ad(par['output'], compression="gzip")
diff --git a/src/datasets/processors/hvg/config.vsh.yaml b/src/datasets/processors/hvg/config.vsh.yaml
new file mode 100644
index 0000000000..aed18c6d38
--- /dev/null
+++ b/src/datasets/processors/hvg/config.vsh.yaml
@@ -0,0 +1,13 @@
+__merge__: ../../api/comp_processor_hvg.yaml
+functionality:
+  name: "hvg"
+  description: "Compute HVG"
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [midtime, highmem, midcpu]
diff --git a/src/datasets/processors/hvg/script.py b/src/datasets/processors/hvg/script.py
new file mode 100644
index 0000000000..60af4317bb
--- /dev/null
+++ b/src/datasets/processors/hvg/script.py
@@ -0,0 +1,36 @@
+
+import scanpy as sc
+
+### VIASH START
+par = {
+  'input': 'work/ca/0751ff85df6f9478cb7bda5a705cad/zebrafish.sqrt_cpm.pca.output.h5ad',
+  'input_layer': 'normalized',
+  'output': 'dataset.h5ad',
+  'var_hvg': 'hvg',
+  'var_hvg_score': 'hvg_score',
+  'num_features': 100
+}
+### VIASH END
+
+print(">> Load data", flush=True)
+adata = sc.read_h5ad(par['input'])
+
+print(">> Look for layer", flush=True)
+layer = adata.X if not par['input_layer'] else adata.layers[par['input_layer']]
+
+print(">> Run HVG", flush=True)
+out = sc.pp.highly_variable_genes(
+  adata,
+  layer=par["input_layer"],
+  n_top_genes=par["num_features"],
+  flavor='cell_ranger',
+  inplace=False
+)
+
+print(">> Storing output", flush=True)
+adata.var[par["var_hvg"]] = out['highly_variable'].values
+adata.var[par["var_hvg_score"]] = out['dispersions_norm'].values
+
+print(">> Writing data", flush=True)
+adata.write_h5ad(par['output'])
+
diff --git a/src/datasets/processors/knn/config.vsh.yaml b/src/datasets/processors/knn/config.vsh.yaml
new file mode 100644
index 0000000000..9908fe9086
--- /dev/null
+++ b/src/datasets/processors/knn/config.vsh.yaml
@@ -0,0 +1,13 @@
+__merge__: ../../api/comp_processor_knn.yaml
+functionality:
+  name: "knn"
+  description: "Compute KNN"
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [midtime, highmem, midcpu]
diff --git a/src/datasets/processors/knn/script.py b/src/datasets/processors/knn/script.py
new file mode 100644
index 0000000000..ae364f6ba3
--- /dev/null
+++ b/src/datasets/processors/knn/script.py
@@ -0,0 +1,27 @@
+
+import scanpy as sc
+
+### VIASH START
+par = {
+  'input': 'work/ca/0751ff85df6f9478cb7bda5a705cad/zebrafish.sqrt_cpm.pca.output.h5ad',
+  'input_layer': 'normalized',
+  'output': 'dataset.h5ad',
+  'key_added': 'knn',
+  'n_neighbors': 15
+}
+### VIASH END
+
+print(">> Load data", flush=True)
+adata = sc.read(par['input'])
+
+print(">> Run kNN", flush=True)
+sc.pp.neighbors(
+    adata,
+    use_rep='X_pca',
+    key_added=par['key_added'],
+    n_neighbors=par['num_neighbors']
+)
+
+print(">> Writing data", flush=True)
+adata.write_h5ad(par['output'])
+
diff --git a/src/datasets/processors/pca/config.vsh.yaml b/src/datasets/processors/pca/config.vsh.yaml
new file mode 100644
index 0000000000..7f0213b922
--- /dev/null
+++ b/src/datasets/processors/pca/config.vsh.yaml
@@ -0,0 +1,17 @@
+__merge__: ../../api/comp_processor_pca.yaml
+functionality:
+  name: "pca"
+  description: "Compute PCA"
+  resources:
+    - type: python_script
+      path: script.py
+  # test_resources:
+  #   - type: python_script
+  #     path: test_script.py
+  #   - path: "../../../resources_test/common/pancreas"
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [midtime, highmem, midcpu]
diff --git a/src/datasets/processors/pca/script.py b/src/datasets/processors/pca/script.py
new file mode 100644
index 0000000000..d56d376259
--- /dev/null
+++ b/src/datasets/processors/pca/script.py
@@ -0,0 +1,39 @@
+
+import scanpy as sc
+
+### VIASH START
+par = {
+  'input': 'resources_test/common/pancreas/dataset.h5ad',
+  'input_layer': 'log_cp10k',
+  'output': 'dataset.h5ad',
+  'obsm_embedding': 'X_pca',
+  'varm_loadings': 'pca_loadings',
+  'uns_variance': 'pca_variance',
+  'num_components': 25
+}
+### VIASH END
+
+print(">> Load data", flush=True)
+adata = sc.read(par['input'])
+
+print(">> Look for layer", flush=True)
+layer = adata.X if not par['input_layer'] else adata.layers[par['input_layer']]
+
+print(">> Run PCA", flush=True)
+X_pca, loadings, variance, variance_ratio = sc.tl.pca(
+    layer, 
+    n_comps=par["num_components"], 
+    return_info=True
+)
+
+print(">> Storing output", flush=True)
+adata.obsm[par["obsm_embedding"]] = X_pca
+adata.varm[par["varm_loadings"]] = loadings.T
+adata.uns[par["uns_variance"]] = {
+    "variance": variance, 
+    "variance_ratio": variance_ratio
+}
+
+print(">> Writing data", flush=True)
+adata.write_h5ad(par['output'])
+
diff --git a/src/datasets/processors/subsample/config.vsh.yaml b/src/datasets/processors/subsample/config.vsh.yaml
new file mode 100644
index 0000000000..4e52e93db5
--- /dev/null
+++ b/src/datasets/processors/subsample/config.vsh.yaml
@@ -0,0 +1,51 @@
+__merge__: ../../api/comp_processor_subset.yaml
+functionality:
+  name: "subsample"
+  description: "Subsample an h5ad file"
+  arguments:
+    - name: "--n_obs"
+      type: integer
+      description: Maximum number of observations to be kept. It might end up being less because empty cells / genes are removed.
+      default: 500
+    - name: "--n_vars"
+      type: integer
+      description: Maximum number of variables to be kept. It might end up being less because empty cells / genes are removed.
+      default: 500
+    - name: "--keep_features"
+      type: string
+      multiple: true
+      description: A list of genes to keep.
+    - name: "--keep_cell_type_categories"
+      type: "string"
+      multiple: true
+      description: "Cell type indexes to be selected"
+      required: false
+    - name: "--keep_batch_categories"
+      type: "string"
+      multiple: true
+      description: "Categories indexes to be selected"
+      required: false
+    - name: "--even"
+      type: "boolean_true"
+      description: Subsample evenly from different batches
+    - name: "--seed"
+      type: "integer"
+      description: "A seed for the subsampling."
+      example: 123
+  resources:
+    - type: python_script
+      path: script.py
+  test_resources:
+    - type: python_script
+      path: test_script.py
+    - path: /resources_test/common/pancreas
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    test_setup:
+      - type: python
+        packages:
+          - viashpy
+  - type: nextflow
+    directives:
+      label: [midtime, highmem, midcpu]
diff --git a/src/datasets/processors/subsample/script.py b/src/datasets/processors/subsample/script.py
new file mode 100644
index 0000000000..c2347349c0
--- /dev/null
+++ b/src/datasets/processors/subsample/script.py
@@ -0,0 +1,145 @@
+import scanpy as sc
+import random
+import numpy as np
+
+### VIASH START
+par = {
+    "input": "resources_test/common/scicar_cell_lines/temp_mod1_full.h5ad",
+    "input_mod2": "resources_test/common/scicar_cell_lines/temp_mod2_full.h5ad",
+    "n_obs": 600,
+    "n_vars": 1500,
+    "keep_cell_type_categories": None,
+    "keep_batch_categories": None,
+    "keep_features": None,
+    "keep_cell_type_categories": None,
+    "keep_batch_categories": None,
+    "even": False,
+    "output": "subsample_mod1.h5ad",
+    "output_mod2": "subsample_mod2.h5ad",
+    "seed": 123
+}
+### VIASH END
+
+if par["seed"]:
+    print(f">> Setting seed to {par['seed']}", flush=True)
+    random.seed(par["seed"])
+
+print(">> Load data", flush=True)
+adata_input = sc.read_h5ad(par["input"])
+
+if par["input_mod2"] is not None:
+    adata_mod2 = sc.read_h5ad(par["input_mod2"])
+
+# copy counts to .X because otherwise filter_genes and filter_cells won't work
+adata_input.X = adata_input.layers["counts"]
+if par["input_mod2"] is not None:
+    adata_mod2.X = adata_mod2.layers["counts"]
+
+print(">> Determining output shape", flush=True)
+min_obs_list = [par["n_obs"], adata_input.shape[0]]
+if par["input_mod2"] is not None:
+    min_obs_list.append(adata_mod2.shape[0])
+n_obs = min(min_obs_list)
+
+min_vars_list = [par["n_vars"], adata_input.shape[1]]
+if par["input_mod2"] is not None:
+    min_vars_list.append(adata_mod2.shape[1])
+n_vars = min(min_vars_list)
+
+print(">> Subsampling the observations", flush=True)
+obs_filt = np.ones(dtype=np.bool_, shape=adata_input.n_obs)
+
+# subset by cell_type
+if par.get("keep_cell_type_categories"):
+    print(f">> Selecting cell_type_categories {par['keep_cell_type_categories']}")
+    obs_filt = obs_filt & adata_input.obs["cell_type"].isin(par["keep_cell_type_categories"])
+
+# subset by batch
+if par.get("keep_batch_categories"):
+    print(f">> Selecting cell_type_categories {par['keep_batch_categories']}")
+    obs_filt = obs_filt & adata_input.obs["batch"].isin(par["keep_batch_categories"])
+
+# subsample evenly across batches or not
+if par.get("even"):
+    obs_evenly = "batch"
+    choice_ix = np.where(obs_filt)[0]
+    choice_batch = adata_input[choice_ix].obs[obs_evenly]
+    names, counts = np.unique(choice_batch, return_counts=True)
+    probs = dict(zip(names, 1 / counts / len(names)))
+    
+    choice_probs = [ probs[batch] for batch in choice_batch ]
+    obs_index = np.random.choice(choice_ix, size=n_obs, replace=False, p=choice_probs)
+else:
+    obs_index = np.random.choice(np.where(obs_filt)[0], n_obs, replace=False)
+
+# subsample obs
+adata_output = adata_input[obs_index].copy()
+if par["input_mod2"] is not None:
+    adata_output_mod2 = adata_mod2[obs_index].copy()
+
+# filter cells and genes
+if par["input_mod2"] is not None:
+    n_cells =  adata_output.X.sum(axis=1).A.flatten()
+    n_cells_mod2 =  adata_output_mod2.X.sum(axis=1).A.flatten()
+    keep_cells = np.minimum(n_cells, n_cells_mod2) > 1
+    adata_output = adata_output[keep_cells, :].copy()
+    adata_output_mod2 = adata_output_mod2[keep_cells, :].copy()
+
+    sc.pp.filter_genes(adata_output, min_cells=1)
+    sc.pp.filter_genes(adata_output_mod2, min_cells=1)
+    
+else:
+    # todo: this should not remove features in keep_features!
+    print(">> Remove empty observations and features", flush=True)
+    sc.pp.filter_genes(adata_output, min_cells=1)
+    sc.pp.filter_cells(adata_output, min_counts=2)
+
+print(">> Subsampling the features", flush=True)
+if par.get("keep_features"):
+    initial_filt = adata_output.var_names.isin(par["keep_features"])
+    initial_idx, *_ = initial_filt.nonzero()
+    remaining_idx, *_ = (~initial_filt).nonzero()
+    rest_idx = remaining_idx[np.random.choice(len(remaining_idx), n_vars - len(initial_idx), replace=False)]
+    var_ix = np.concatenate([initial_idx, rest_idx])
+else:
+    var_ix = np.random.choice(adata_output.shape[1], n_vars, replace=False)
+    if par["input_mod2"] is not None:
+        var_ix_mod2 = np.random.choice(adata_output_mod2.shape[1], n_vars, replace=False)
+
+#  subsample vars
+adata_output = adata_output[:, var_ix].copy()
+if par["input_mod2"] is not None:
+    adata_output_mod2 = adata_output_mod2[:, var_ix_mod2].copy()
+
+# filter cells and genes
+if par["input_mod2"] is not None:
+    n_cells =  adata_output.X.sum(axis=1).A.flatten()
+    n_cells_mod2 =  adata_output_mod2.X.sum(axis=1).A.flatten()
+    keep_cells = np.minimum(n_cells, n_cells_mod2) > 1
+    adata_output = adata_output[keep_cells, :].copy()
+    adata_output_mod2 = adata_output_mod2[keep_cells, :].copy()
+
+    sc.pp.filter_genes(adata_output, min_cells=1)
+    sc.pp.filter_genes(adata_output_mod2, min_cells=1)
+    
+
+else:
+    # todo: this should not remove features in keep_features!
+    print(">> Remove empty observations and features", flush=True)
+    sc.pp.filter_genes(adata_output, min_cells=1)
+    sc.pp.filter_cells(adata_output, min_counts=2)
+
+print(">> Update dataset_id", flush=True)
+adata_output.uns["dataset_id"] = adata_output.uns["dataset_id"] + "_subsample"
+if par["input_mod2"] is not None:
+    adata_output_mod2.uns["dataset_id"] = adata_output_mod2.uns["dataset_id"] + "_subsample"
+
+# remove previously copied .X
+del adata_output.X
+if par["input_mod2"] is not None:
+    del adata_output_mod2.X
+
+print(">> Writing data", flush=True)
+adata_output.write_h5ad(par["output"])
+if par["output_mod2"] is not None:
+    adata_output_mod2.write_h5ad(par["output_mod2"])
diff --git a/src/datasets/processors/subsample/test_script.py b/src/datasets/processors/subsample/test_script.py
new file mode 100644
index 0000000000..80dde5d383
--- /dev/null
+++ b/src/datasets/processors/subsample/test_script.py
@@ -0,0 +1,64 @@
+import sys
+import os
+import pytest
+import anndata as ad
+
+## VIASH START
+meta = {
+    "resources_dir": "resources_test/common"
+}
+## VIASH END
+
+input_path = f"{meta['resources_dir']}/pancreas/dataset.h5ad"
+input = ad.read_h5ad(input_path)
+
+def test_even_sampling(run_component):
+    output_path = "output.h5ad"
+    run_component([
+        "--input", input_path,
+        "--output", output_path,
+        "--even",
+        "--seed", "123",
+        "--n_obs", "100",
+        "--n_vars", "120"
+    ])
+
+    # Checking whether file exists
+    assert os.path.exists(output_path), "Output file not found"
+
+    # Check that test output fits expected API
+    output = ad.read_h5ad(output_path)
+
+    assert output.n_obs <= 100, "n_obs should be <= 100"
+    assert output.n_vars <= 120, "n_vars should be <= 100"
+
+
+def test_keep_functionality(run_component):
+    output_path = "output.h5ad"
+
+    # keep_features = list(input.var_names[:10])
+    # use genes with high enough expression
+    keep_features = ["ANP32E", "CBX5", "HMGB2"]
+
+    run_component([
+        "--input", input_path,
+        "--keep_cell_type_categories", "acinar:beta",
+        "--keep_batch_categories", "celseq:inDrop4:smarter",
+        "--keep_features", ":".join(keep_features),
+        "--output", output_path,
+        "--seed", "123"
+    ])
+
+    # Checking whether file exists
+    assert os.path.exists(output_path), "Output file not found"
+
+    # Check that test output fits expected API
+    output = ad.read_h5ad(output_path)
+
+    assert output.n_obs <= 500, "n_obs should be <= 500"
+    assert output.n_vars <= 500, "n_vars should be <= 500"
+    for feat in keep_features:
+        assert feat in output.var_names, f"{feat} should be in output.var_names"
+
+if __name__ == '__main__':
+    sys.exit(pytest.main([__file__, "--capture=no"], plugins=["viashpy"]))
diff --git a/src/datasets/processors/svd/config.vsh.yaml b/src/datasets/processors/svd/config.vsh.yaml
new file mode 100644
index 0000000000..bbad17f58c
--- /dev/null
+++ b/src/datasets/processors/svd/config.vsh.yaml
@@ -0,0 +1,16 @@
+__merge__: ../../api/comp_processor_svd.yaml
+functionality:
+  name: "svd"
+  description: "Compute SVD pca reduction"
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi: [scikit-learn]
+  - type: nextflow
+    directives:
+      label: [midtime, highmem, midcpu]
diff --git a/src/datasets/processors/svd/script.py b/src/datasets/processors/svd/script.py
new file mode 100644
index 0000000000..8c94be407a
--- /dev/null
+++ b/src/datasets/processors/svd/script.py
@@ -0,0 +1,45 @@
+import anndata as ad
+import sklearn.decomposition
+
+
+## VIASH START
+par = {
+    "input": "resources_test/common/scicar_cell_lines/normalized_mod1.h5ad",
+    "input_mod2": "resources_test/common/scicar_cell_lines/normalized_mod2.h5ad",
+    "output": "output.h5ad",
+    "input_layer": "normalized",
+    "obsm_embedding": "X_svd",
+    "num_components": 100,
+}
+## VIASH END
+
+print(">> Load data", flush=True)
+adata = ad.read(par["input"])
+if par["input_mod2"] is not None:
+    adata2 = ad.read(par["input_mod2"])
+
+print(">> check parameters", flush=True)
+min_list = [par["num_components"], min(adata.layers[par["input_layer"]].shape) - 1]
+
+if par["input_mod2"] is not None:
+    min_list.append(min(adata2.layers[par["input_layer"]].shape) - 1)
+
+n_svd = min(min_list)
+
+
+print(">> Run SVD", flush=True)
+svd1 = sklearn.decomposition.TruncatedSVD(n_svd).fit_transform(adata.layers[par["input_layer"]])
+if par["input_mod2"] is not None:
+    svd2 = sklearn.decomposition.TruncatedSVD(n_svd).fit_transform(adata2.layers[par["input_layer"]])
+
+print(">> Storing output", flush=True)
+adata.obsm[par["obsm_embedding"]] = svd1
+if par["input_mod2"] is not None:
+    adata2.obsm[par["obsm_embedding"]] = svd2
+
+
+print(">> Writing data", flush=True)
+adata.write_h5ad(par["output"])
+if par["input_mod2"] is not None:
+    adata2.write_h5ad(par["output_mod2"])
+
diff --git a/src/datasets/resource_scripts/cellxgene_census.sh b/src/datasets/resource_scripts/cellxgene_census.sh
new file mode 100755
index 0000000000..5d6181f91e
--- /dev/null
+++ b/src/datasets/resource_scripts/cellxgene_census.sh
@@ -0,0 +1,153 @@
+#!/bin/bash
+
+# template for adding new datasets
+#   - id: cellxgene_census/
+#     species:
+#     census_version: "2023-07-25"
+#     obs_value_filter: "dataset_id == ''"
+#     obs_batch:
+#     dataset_name:
+#     dataset_summary:
+#     dataset_description:
+#     dataset_url:
+#     dataset_reference:
+#     dataset_organism:
+
+# not sure which dataset ids to use
+#   - id: cellxgene_census/human_brain_atlas
+#     species: homo_sapiens
+#     census_version: "2023-07-25"
+#     obs_value_filter: "dataset_id == ''" # <--- ?
+#     obs_batch: donor_id
+#     dataset_name:  Human Brain Atlas
+#     dataset_summary: Single-Cell DNA Methylation and 3D Genome Human Brain Atlas
+#     dataset_description: Delineating the gene regulatory programs underlying complex cell types is fundamental for understanding brain functions in health and disease. Here, we comprehensively examine human brain cell epigenomes by probing DNA methylation and chromatin conformation at single-cell resolution in over 500,000 cells from 46 brain regions. We identified 188 cell types and characterized their molecular signatures. Integrative analyses revealed concordant changes in DNA methylation, chromatin accessibility, chromatin organization, and gene expression across cell types, cortical areas, and basal ganglia structures. With these resources, we developed scMCodes that reliably predict brain cell types using their methylation status at select genomic sites. This multimodal epigenomic brain cell atlas provides new insights into the complexity of cell type-specific gene regulation in the adult human brain.
+#     dataset_url: https://cellxgene.cziscience.com/collections/fdebfda9-bb9a-4b4b-97e5-651097ea07b0
+#     dataset_reference: tian2023singlecell
+#     dataset_organism: homo_sapiens
+
+cat > "/tmp/params.yaml" << 'HERE'
+param_list:
+  - id: cellxgene_census/mouse_pancreas_atlas
+    species: mus_musculus
+    census_version: "2023-07-25"
+    obs_value_filter: "dataset_id == '49e4ffcc-5444-406d-bdee-577127404ba8'"
+    obs_batch: donor_id
+    dataset_name: Mouse Pancreatic Islet Atlas
+    dataset_summary: Mouse pancreatic islet scRNA-seq atlas across sexes, ages, and stress conditions including diabetes
+    dataset_description: To better understand pancreatic β-cell heterogeneity we generated a mouse pancreatic islet atlas capturing a wide range of biological conditions. The atlas contains scRNA-seq datasets of over 300,000 mouse pancreatic islet cells, of which more than 100,000 are β-cells, from nine datasets with 56 samples, including two previously unpublished datasets. The samples vary in sex, age (ranging from embryonic to aged), chemical stress, and disease status (including T1D NOD model development and two T2D models, mSTZ and db/db) together with different diabetes treatments. Additional information about data fields is available in anndata uns field 'field_descriptions' and on https://github.com/theislab/mm_pancreas_atlas_rep/blob/main/resources/cellxgene.md.
+    dataset_url: https://cellxgene.cziscience.com/collections/296237e2-393d-4e31-b590-b03f74ac5070
+    dataset_reference: hrovatin2023delineating
+    dataset_organism: mus_musculus
+  - id: cellxgene_census/hcla
+    species: homo_sapiens
+    census_version: "2023-07-25"
+    obs_value_filter: "dataset_id == '066943a2-fdac-4b29-b348-40cede398e4e'"
+    obs_batch: donor_id
+    dataset_name: Human Lung Cell Atlas
+    dataset_summary: An integrated cell atlas of the human lung in health and disease (core)
+    dataset_description: The integrated Human Lung Cell Atlas (HLCA) represents the first large-scale, integrated single-cell reference atlas of the human lung. It consists of over 2 million cells from the respiratory tract of 486 individuals, and includes 49 different datasets. It is split into the HLCA core, and the extended or full HLCA. The HLCA core includes data of healthy lung tissue from 107 individuals, and includes manual cell type annotations based on consensus across 6 independent experts, as well as demographic, biological and technical metadata.
+    dataset_url: https://cellxgene.cziscience.com/collections/6f6d381a-7701-4781-935c-db10d30de293
+    dataset_reference: sikkema2023integrated
+    dataset_organism: homo_sapiens
+  - id: cellxgene_census/tabula_sapiens
+    species: homo_sapiens
+    census_version: "2023-07-25"
+    obs_value_filter: "dataset_id == '53d208b0-2cfd-4366-9866-c3c6114081bc'"
+    obs_batch: [donor_id, assay]
+    dataset_name: Tabula Sapiens
+    dataset_summary: A multiple-organ, single-cell transcriptomic atlas of humans
+    dataset_description: Tabula Sapiens is a benchmark, first-draft human cell atlas of nearly 500,000 cells from 24 organs of 15 normal human subjects. This work is the product of the Tabula Sapiens Consortium. Taking the organs from the same individual controls for genetic background, age, environment, and epigenetic effects and allows detailed analysis and comparison of cell types that are shared between tissues. Our work creates a detailed portrait of cell types as well as their distribution and variation in gene expression across tissues and within the endothelial, epithelial, stromal and immune compartments.
+    dataset_url: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5
+    dataset_reference: consortium2022tabula
+    dataset_organism: homo_sapiens
+  - id: cellxgene_census/immune_cell_atlas
+    species: homo_sapiens
+    census_version: "2023-07-25"
+    obs_value_filter: "dataset_id == '1b9d8702-5af8-4142-85ed-020eb06ec4f6'"
+    obs_batch: donor_id
+    dataset_name: Immune Cell Atlas
+    dataset_summary: Cross-tissue immune cell analysis reveals tissue-specific features in humans
+    dataset_description: Despite their crucial role in health and disease, our knowledge of immune cells within human tissues remains limited. We surveyed the immune compartment of 16 tissues from 12 adult donors by single-cell RNA sequencing and VDJ sequencing generating a dataset of ~360,000 cells. To systematically resolve immune cell heterogeneity across tissues, we developed CellTypist, a machine learning tool for rapid and precise cell type annotation. Using this approach, combined with detailed curation, we determined the tissue distribution of finely phenotyped immune cell types, revealing hitherto unappreciated tissue-specific features and clonal architecture of T and B cells. Our multitissue approach lays the foundation for identifying highly resolved immune cell types by leveraging a common reference dataset, tissue-integrated expression analysis, and antigen receptor sequencing.
+    dataset_url: https://cellxgene.cziscience.com/collections/62ef75e4-cbea-454e-a0ce-998ec40223d3
+    dataset_reference: dominguez2022crosstissue
+    dataset_organism: homo_sapiens
+  - id: cellxgene_census/gtex_v9
+    species: homo_sapiens
+    census_version: "2023-07-25"
+    obs_value_filter: "dataset_id == '4ed927e9-c099-49af-b8ce-a2652d069333'"
+    obs_batch: donor_id
+    dataset_name: GTEX v9
+    dataset_summary: Single-nucleus cross-tissue molecular reference maps to decipher disease gene function
+    dataset_description: Understanding the function of genes and their regulation in tissue homeostasis and disease requires knowing the cellular context in which genes are expressed in tissues across the body. Single cell genomics allows the generation of detailed cellular atlases in human tissues, but most efforts are focused on single tissue types. Here, we establish a framework for profiling multiple tissues across the human body at single-cell resolution using single nucleus RNA-Seq (snRNA-seq), and apply it to 8 diverse, archived, frozen tissue types (three donors per tissue). We apply four snRNA-seq methods to each of 25 samples from 16 donors, generating a cross-tissue atlas of 209,126 nuclei profiles, and benchmark them vs. scRNA-seq of comparable fresh tissues. We use a conditional variational autoencoder (cVAE) to integrate an atlas across tissues, donors, and laboratory methods. We highlight shared and tissue-specific features of tissue-resident immune cells, identifying tissue-restricted and non-restricted resident myeloid populations. These include a cross-tissue conserved dichotomy between LYVE1- and HLA class II-expressing macrophages, and the broad presence of LAM-like macrophages across healthy tissues that is also observed in disease. For rare, monogenic muscle diseases, we identify cell types that likely underlie the neuromuscular, metabolic, and immune components of these diseases, and biological processes involved in their pathology. For common complex diseases and traits analyzed by GWAS, we identify the cell types and gene modules that potentially underlie disease mechanisms. The experimental and analytical frameworks we describe will enable the generation of large-scale studies of how cellular and molecular processes vary across individuals and populations.
+    dataset_url: https://cellxgene.cziscience.com/collections/a3ffde6c-7ad2-498a-903c-d58e732f7470
+    dataset_reference: eraslan2022singlenucleus
+    dataset_organism: homo_sapiens
+  - id: cellxgene_census/human_retina_cell_atlas
+    species: homo_sapiens
+    census_version: "2023-07-25"
+    obs_value_filter: "dataset_id == 'd6505c89-c43d-4c28-8c4f-7351a5fd5528'"
+    obs_batch: donor_id
+    dataset_name: Human Retina Cell Atlas
+    dataset_summary: Single cell atlas of the human retina
+    dataset_description: As the light sensing part of the visual system, the human retina is composed of five classes of neuron, including photoreceptors, horizontal cells, amacrine, bipolar, and retinal ganglion cells. Each class of neuron can be further classified into subgroups with the abundance varying three orders of magnitude. Therefore, to capture all cell types in the retina and generate a complete single cell reference atlas, it is essential to scale up from currently published single cell profiling studies to improve the sensitivity. In addition, to gain a better understanding of gene regulation at single cell level, it is important to include sufficient scATAC-seq data in the reference. To fill the gap, we performed snRNA-seq and snATAC-seq for the retina from healthy donors. To further increase the size of the dataset, we then collected and incorporated publicly available datasets. All data underwent a unified preprocessing pipeline and data integration. Multiple integration methods were benchmarked by scIB, and scVI was chosen. To harness the power of multiomics, snATAC-seq datasets were also preprocessed, and scGlue was used to generate co-embeddings between snRNA-seq and snATAC-seq cells. To facilitate the public use of references, we employ CELLxGENE and UCSC Cell Browser for visualization. By combining previously published and newly generated datasets, a single cell atlas of the human retina that is composed of 2.5 million single cells from 48 donors has been generated. As a result, over 90 distinct cell types are identified based on the transcriptomics profile with the rarest cell type accounting for about 0.01% of the cell population. In addition, open chromatin profiling has been generated for over 400K nuclei via single nuclei ATAC-seq, allowing systematic characterization of cis-regulatory elements for individual cell type. Integrative analysis reveals intriguing differences in the transcriptome, chromatin landscape, and gene regulatory network among cell class, subgroup, and type. In addition, changes in cell proportion, gene expression and chromatin openness have been observed between different gender and over age. Accessible through interactive browsers, this study represents the most comprehensive reference cell atlas of the human retina to date. As part of the human cell atlas project, this resource lays the foundation for further research in understanding retina biology and diseases.
+    dataset_url: https://cellxgene.cziscience.com/collections/4c6eaf5c-6d57-4c76-b1e9-60df8c655f1e
+    dataset_reference: li2023integrated
+    dataset_organism: homo_sapiens
+  - id: cellxgene_census/dkd
+    species: homo_sapiens
+    census_version: "2023-07-25"
+    obs_value_filter: "dataset_id in ['ad0bf220-dd49-4b71-bb5c-576fee675d2b', 'e067e5ca-e53e-485f-aa8e-efd5435229c8']"
+    obs_batch: donor_id
+    dataset_name: Diabetic Kidney Disease
+    dataset_summary: Multimodal single cell sequencing implicates chromatin accessibility and genetic background in diabetic kidney disease progression
+    dataset_description: Multimodal single cell sequencing is a powerful tool for interrogating cell-specific changes in transcription and chromatin accessibility. We performed single nucleus RNA (snRNA-seq) and assay for transposase accessible chromatin sequencing (snATAC-seq) on human kidney cortex from donors with and without diabetic kidney disease (DKD) to identify altered signaling pathways and transcription factors associated with DKD. Both snRNA-seq and snATAC-seq had an increased proportion of VCAM1+ injured proximal tubule cells (PT_VCAM1) in DKD samples. PT_VCAM1 has a pro-inflammatory expression signature and transcription factor motif enrichment implicated NFkB signaling. We used stratified linkage disequilibrium score regression to partition heritability of kidney-function-related traits using publicly-available GWAS summary statistics. Cell-specific PT_VCAM1 peaks were enriched for heritability of chronic kidney disease (CKD), suggesting that genetic background may regulate chromatin accessibility and DKD progression. snATAC-seq found cell-specific differentially accessible regions (DAR) throughout the nephron that change accessibility in DKD and these regions were enriched for glucocorticoid receptor (GR) motifs. Changes in chromatin accessibility were associated with decreased expression of insulin receptor, increased gluconeogenesis, and decreased expression of the GR cytosolic chaperone, FKBP5, in the diabetic proximal tubule. Cleavage under targets and release using nuclease (CUT&RUN) profiling of GR binding in bulk kidney cortex and an in vitro model of the proximal tubule (RPTEC) showed that DAR co-localize with GR binding sites. CRISPRi silencing of GR response elements (GRE) in the FKBP5 gene body reduced FKBP5 expression in RPTEC, suggesting that reduced FKBP5 chromatin accessibility in DKD may alter cellular response to GR. We developed an open-source tool for single cell allele specific analysis (SALSA) to model the effect of genetic background on gene expression. Heterozygous germline single nucleotide variants (SNV) in proximal tubule ATAC peaks were associated with allele-specific chromatin accessibility and differential expression of target genes within cis-coaccessibility networks. Partitioned heritability of proximal tubule ATAC peaks with a predicted allele-specific effect was enriched for eGFR, suggesting that genetic background may modify DKD progression in a cell-specific manner.
+    dataset_url: https://cellxgene.cziscience.com/collections/b3e2c6e3-9b05-4da9-8f42-da38a664b45b
+    dataset_reference: wilson2022multimodal
+    dataset_organism: homo_sapiens
+  - id: cellxgene_census/hypomap
+    species: mus_musculus
+    census_version: "2023-07-25"
+    obs_value_filter: "dataset_id == 'dbb4e1ed-d820-4e83-981f-88ef7eb55a35'"
+    obs_batch: donor_id
+    dataset_name: HypoMap
+    dataset_summary: A unified single cell gene expression atlas of the murine hypothalamus
+    dataset_description: The hypothalamus plays a key role in coordinating fundamental body functions. Despite recent progress in single-cell technologies, a unified catalogue and molecular characterization of the heterogeneous cell types and, specifically, neuronal subtypes in this brain region are still lacking. Here we present an integrated reference atlas “HypoMap” of the murine hypothalamus consisting of 384,925 cells, with the ability to incorporate new additional experiments. We validate HypoMap by comparing data collected from SmartSeq2 and bulk RNA sequencing of selected neuronal cell types with different degrees of cellular heterogeneity.
+    dataset_url: https://cellxgene.cziscience.com/collections/d86517f0-fa7e-4266-b82e-a521350d6d36
+    dataset_reference: steuernagel2022hypomap
+    dataset_organism: mus_musculus
+
+normalization_methods: [log_cp10k, sqrt_cp10k, l1_sqrt]
+output_dataset: '$id/dataset.h5ad'
+output_meta: '$id/dataset_metadata.yaml'
+output_state: '$id/state.yaml'
+output_raw: force_null
+output_normalized: force_null
+output_pca: force_null
+output_hvg: force_null
+output_knn: force_null
+publish_dir: s3://openproblems-data/resources/datasets
+HERE
+
+cat > /tmp/nextflow.config << HERE
+process {
+  executor = 'awsbatch'
+  withLabel: highmem {
+    memory = '350GB'
+  }
+  withName: '.*publishStatesProc' {
+    memory = '16GB'
+    disk = '100GB'
+  }
+}
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/datasets/workflows/process_cellxgene_census/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file "/tmp/params.yaml" \
+  --config /tmp/nextflow.config \
+  --labels cellxgene_census,dataset_loader
diff --git a/src/datasets/resource_scripts/dataset_info.sh b/src/datasets/resource_scripts/dataset_info.sh
new file mode 100755
index 0000000000..04c032916f
--- /dev/null
+++ b/src/datasets/resource_scripts/dataset_info.sh
@@ -0,0 +1,54 @@
+#!/bin/bash
+
+DATASETS_DIR="s3://openproblems-data/resources/datasets"
+
+cat > "/tmp/params.yaml" << HERE
+param_list:
+  - id: openproblems_v1
+    input_states: "$DATASETS_DIR/openproblems_v1/**/log_cp10k/state.yaml"
+    rename_keys: 'input:output_dataset'
+  - id: openproblems_v1_multimodal
+    input_states: "$DATASETS_DIR/openproblems_v1_multimodal/**/log_cp10k/state.yaml"
+    rename_keys: 'input:output_mod1'
+  - id: cellxgene_census
+    input_states: "$DATASETS_DIR/cellxgene_census/**/log_cp10k/state.yaml"
+    rename_keys: 'input:output_dataset'
+settings: '{"output": "dataset_info.yaml"}'
+output_state: state.yaml
+publish_dir: "$DATASETS_DIR"
+HERE
+
+cat > /tmp/nextflow.config << HERE
+process {
+  executor = 'awsbatch'
+  withLabel: highmem {
+    memory = '350GB'
+  }
+  withName: '.*publishStatesProc' {
+    memory = '16GB'
+    disk = '100GB'
+  }
+}
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --entry-name auto \
+  --pull-latest \
+  --main-script target/nextflow/datasets/workflows/extract_dataset_info/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file "/tmp/params.yaml" \
+  --config /tmp/nextflow.config
+
+
+# # run locally after the above has finished
+# nextflow run . \
+#   -main-script target/nextflow/common/process_task_results/get_dataset_info/main.nf \
+#   -profile docker \
+#   -resume \
+#   --input "$DATASETS_DIR/dataset_info.yaml" \
+#   --task_id "common" \
+#   --output "dataset_info.json" \
+#   --output_state state.yaml \
+#   --publish_dir "../website/documentation/reference/datasets/data/"
\ No newline at end of file
diff --git a/src/datasets/resource_scripts/openproblems_neurips2021_multimodal.sh b/src/datasets/resource_scripts/openproblems_neurips2021_multimodal.sh
new file mode 100755
index 0000000000..a306ba2ef8
--- /dev/null
+++ b/src/datasets/resource_scripts/openproblems_neurips2021_multimodal.sh
@@ -0,0 +1,46 @@
+#!/bin/bash
+
+params_file="/tmp/datasets_openproblems_neurips2021_params.yaml"
+
+cat > "$params_file" << 'HERE'
+param_list:
+  - id: openproblems_neurips2021/bmmc_cite
+    # input: "/tmp/neurips2021_bmmc_cite.h5ad"
+    input: "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE194nnn/GSE194122/suppl/GSE194122%5Fopenproblems%5Fneurips2021%5Fcite%5FBMMC%5Fprocessed%2Eh5ad%2Egz"
+    mod1: GEX
+    mod2: ADT
+    dataset_name: NeurIPS2021 CITE-Seq
+    dataset_organism: homo_sapiens
+    dataset_summary: Single-cell CITE-Seq (GEX+ADT) data collected from bone marrow mononuclear cells of 12 healthy human donors.
+    dataset_description: "Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X 3 prime Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site."
+
+  - id: openproblems_neurips2021/bmmc_multiome
+    # input: "/tmp/neurips2021_bmmc_multiome.h5ad"
+    input: "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE194nnn/GSE194122/suppl/GSE194122%5Fopenproblems%5Fneurips2021%5Fmultiome%5FBMMC%5Fprocessed%2Eh5ad%2Egz"
+    mod1: GEX
+    mod2: ATAC
+    dataset_name: NeurIPS2021 Multiome
+    dataset_organism: homo_sapiens
+    dataset_summary: Single-cell Multiome (GEX+ATAC) data collected from bone marrow mononuclear cells of 12 healthy human donors.
+    dataset_description: "Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X Multiome Gene Expression and Chromatin Accessibility kit. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site."
+
+dataset_url: "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122"
+dataset_reference: luecken2021neurips
+normalization_methods: [log_cp10k, sqrt_cp10k, l1_sqrt]
+output_mod1: '$id/dataset_mod1.h5ad'
+output_mod2: '$id/dataset_mod2.h5ad'
+output_meta_mod1: '$id/dataset_metadata_mod1.yaml'
+output_meta_mod2: '$id/dataset_metadata_mod2.yaml'
+output_state: '$id/state.yaml'
+publish_dir: s3://openproblems-data/resources/datasets
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/datasets/workflows/process_openproblems_neurips2021_bmmc/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file "$params_file" \
+  --config src/wf_utils/labels_tw.config \
+  --labels neurips2021,dataset_loader \
diff --git a/src/datasets/resource_scripts/openproblems_neurips2021_multimodal_test.sh b/src/datasets/resource_scripts/openproblems_neurips2021_multimodal_test.sh
new file mode 100755
index 0000000000..be8444371b
--- /dev/null
+++ b/src/datasets/resource_scripts/openproblems_neurips2021_multimodal_test.sh
@@ -0,0 +1,43 @@
+#!/bin/bash
+
+params_file="/tmp/datasets_openproblems_neurips2021_params.yaml"
+
+cat > "$params_file" << 'HERE'
+param_list:
+  - id: openproblems_neurips2021/bmmc_cite
+    # input: "/tmp/neurips2021_bmmc_cite.h5ad"
+    input: "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE194nnn/GSE194122/suppl/GSE194122%5Fopenproblems%5Fneurips2021%5Fcite%5FBMMC%5Fprocessed%2Eh5ad%2Egz"
+    mod1: GEX
+    mod2: ADT
+    dataset_name: OpenProblems NeurIPS2021 CITE-Seq
+    dataset_organism: homo_sapiens
+    dataset_summary: Single-cell CITE-Seq (GEX+ADT) data collected from bone marrow mononuclear cells of 12 healthy human donors.
+    dataset_description: "Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X 3 prime Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site."
+
+  - id: openproblems_neurips2021/bmmc_multiome
+    # input: "/tmp/neurips2021_bmmc_multiome.h5ad"
+    input: "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE194nnn/GSE194122/suppl/GSE194122%5Fopenproblems%5Fneurips2021%5Fmultiome%5FBMMC%5Fprocessed%2Eh5ad%2Egz"
+    mod1: GEX
+    mod2: ATAC
+    dataset_name: OpenProblems NeurIPS2021 Multiome
+    dataset_organism: homo_sapiens
+    dataset_summary: Single-cell Multiome (GEX+ATAC) data collected from bone marrow mononuclear cells of 12 healthy human donors.
+    dataset_description: "Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X Multiome Gene Expression and Chromatin Accessibility kit. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site."
+
+dataset_url: "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122"
+dataset_reference: luecken2021neurips
+normalization_methods: [log_cp10k, sqrt_cp10k, l1_sqrt]
+output_mod1: '$id/dataset_mod1.h5ad'
+output_mod2: '$id/dataset_mod2.h5ad'
+output_meta_mod1: '$id/dataset_metadata_mod1.yaml'
+output_meta_mod2: '$id/dataset_metadata_mod2.yaml'
+output_state: '$id/state.yaml'
+publish_dir: resources/datasets/openproblems_neurips2021
+HERE
+
+export NXF_VER=23.10.1
+nextflow run . \
+  -main-script target/nextflow/datasets/workflows/process_openproblems_neurips2021_bmmc/main.nf \
+  -profile docker \
+  -resume \
+  -params-file "$params_file"
diff --git a/src/datasets/resource_scripts/openproblems_neurips2022_pbmc.sh b/src/datasets/resource_scripts/openproblems_neurips2022_pbmc.sh
new file mode 100755
index 0000000000..e3e6783a8e
--- /dev/null
+++ b/src/datasets/resource_scripts/openproblems_neurips2022_pbmc.sh
@@ -0,0 +1,57 @@
+#!/bin/bash
+
+set -e
+
+params_file="/tmp/datasets_openproblems_neurips2022_params.yaml"
+
+cat > "$params_file" << 'HERE'
+param_list:
+  - id: openproblems_neurips2022/pbmc_cite
+    input_mod1: s3://openproblems-nextflow/datasets_private/neurips2022/cite_rna_merged.h5ad
+    input_mod2: s3://openproblems-nextflow/datasets_private/neurips2022/cite_prot_merged.h5ad
+    mod1: GEX
+    mod2: ADT
+    dataset_name: OpenProblems NeurIPS2022 CITE-Seq
+    dataset_organism: homo_sapiens
+    dataset_summary: Single-cell CITE-Seq (GEX+ADT) data collected from bone marrow mononuclear cells of 12 healthy human donors.
+    dataset_description: "Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X 3 prime Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2022. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site."
+
+  - id: openproblems_neurips2022/pbmc_multiome
+    input_mod1: s3://openproblems-nextflow/datasets_private/neurips2022/multiome_rna_merged.h5ad
+    input_mod2: s3://openproblems-nextflow/datasets_private/neurips2022/multiome_atac_merged.h5ad
+    mod1: GEX
+    mod2: ATAC
+    dataset_name: OpenProblems NeurIPS2022 Multiome
+    dataset_organism: homo_sapiens
+    dataset_summary: Single-cell Multiome (GEX+ATAC) data collected from bone marrow mononuclear cells of 12 healthy human donors.
+    dataset_description: "Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X Multiome Gene Expression and Chromatin Accessibility kit. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2022. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site."
+
+dataset_url: "https://www.kaggle.com/competitions/open-problems-multimodal/data"
+dataset_reference: lance2024predicting
+normalization_methods: [log_cp10k, sqrt_cp10k, l1_sqrt]
+output_mod1: '$id/dataset_mod1.h5ad'
+output_mod2: '$id/dataset_mod2.h5ad'
+output_meta_mod1: '$id/dataset_metadata_mod1.yaml'
+output_meta_mod2: '$id/dataset_metadata_mod2.yaml'
+output_state: '$id/state.yaml'
+publish_dir: s3://openproblems-data/resources/datasets
+HERE
+
+cat > /tmp/nextflow.config << HERE
+process {
+  withName:'.*publishStatesProc' {
+    memory = '16GB'
+    disk = '100GB'
+  }
+}
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/datasets/workflows/process_openproblems_neurips2022_pbmc/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 1pK56PjjzeraOOC2LDZvN2 \
+  --params-file "$params_file" \
+  --config /tmp/nextflow.config \
+  --labels openproblems_neurips2022_pbmc,dataset_loader \
diff --git a/src/datasets/resource_scripts/openproblems_v1.sh b/src/datasets/resource_scripts/openproblems_v1.sh
new file mode 100755
index 0000000000..8d40e57c46
--- /dev/null
+++ b/src/datasets/resource_scripts/openproblems_v1.sh
@@ -0,0 +1,182 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+params_file="/tmp/datasets_openproblems_v1_params.yaml"
+
+cat > "$params_file" << 'HERE'
+param_list:
+  - id: openproblems_v1/allen_brain_atlas
+    obs_cell_type: label
+    layer_counts: counts
+    input_id: allen_brain_atlas
+    dataset_name: Mouse Brain Atlas
+    dataset_url: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE71585
+    dataset_reference: tasic2016adult
+    dataset_summary: Adult mouse primary visual cortex
+    dataset_description: A murine brain atlas with adjacent cell types as assumed benchmark truth, inferred from deconvolution proportion correlations using matching 10x Visium slides (see Dimitrov et al., 2022).
+    dataset_organism: mus_musculus
+    var_feature_name: index
+
+  - id: openproblems_v1/cengen
+    obs_cell_type: cell_type
+    obs_batch: experiment_code
+    obs_tissue: tissue
+    layer_counts: counts
+    input_id: cengen
+    dataset_name: CeNGEN
+    dataset_url: https://www.cengen.org
+    dataset_reference: hammarlund2018cengen
+    dataset_summary: Complete Gene Expression Map of an Entire Nervous System
+    dataset_description: 100k FACS-isolated C. elegans neurons from 17 experiments sequenced on 10x Genomics.
+    dataset_organism: caenorhabditis_elegans
+    var_feature_name: index
+
+  - id: openproblems_v1/immune_cells
+    obs_cell_type: final_annotation
+    obs_batch: batch
+    obs_tissue: tissue
+    layer_counts: counts
+    input_id: immune_cells
+    dataset_name: Human immune
+    dataset_url: https://theislab.github.io/scib-reproducibility/dataset_immune_cell_hum.html
+    dataset_reference: luecken2022benchmarking
+    dataset_summary: Human immune cells dataset from the scIB benchmarks
+    dataset_description: Human immune cells from peripheral blood and bone marrow taken from 5 datasets comprising 10 batches across technologies (10X, Smart-seq2).
+    dataset_organism: homo_sapiens
+    var_feature_name: index
+
+  - id: openproblems_v1/mouse_blood_olsson_labelled
+    obs_cell_type: celltype
+    layer_counts: counts
+    input_id: mouse_blood_olsson_labelled
+    dataset_name: Mouse myeloid
+    dataset_url: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE70245
+    dataset_reference: olsson2016single
+    dataset_summary: Myeloid lineage differentiation from mouse blood
+    dataset_description: 660 FACS-isolated myeloid cells from 9 experiments sequenced using C1 Fluidigm and SMARTseq in 2016 by Olsson et al.
+    dataset_organism: mus_musculus
+    var_feature_name: index
+
+  - id: openproblems_v1/mouse_hspc_nestorowa2016
+    obs_cell_type: cell_type_label
+    layer_counts: counts
+    input_id: mouse_hspc_nestorowa2016
+    dataset_name: Mouse HSPC
+    dataset_url: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE81682
+    dataset_reference: nestorowa2016single
+    dataset_summary: Haematopoeitic stem and progenitor cells from mouse bone marrow
+    dataset_description: 1656 hematopoietic stem and progenitor cells from mouse bone marrow. Sequenced by Smart-seq2. 
+    dataset_organism: mus_musculus
+    var_feature_name: name
+    var_feature_id: converted_alias
+    
+
+  - id: openproblems_v1/pancreas
+    obs_cell_type: celltype
+    obs_batch: tech
+    layer_counts: counts
+    input_id: pancreas
+    dataset_name: Human pancreas
+    dataset_url: https://theislab.github.io/scib-reproducibility/dataset_pancreas.html
+    dataset_reference: luecken2022benchmarking
+    dataset_summary: Human pancreas cells dataset from the scIB benchmarks
+    dataset_description: Human pancreatic islet scRNA-seq data from 6 datasets across technologies (CEL-seq, CEL-seq2, Smart-seq2, inDrop, Fluidigm C1, and SMARTER-seq). 
+    dataset_organism: homo_sapiens
+    var_feature_name: index
+
+  # disabled as this is not working in openproblemsv1
+  # - id: openproblems_v1/tabula_muris_senis_droplet_lung
+  #   obs_cell_type: cell_type
+  #   obs_batch: donor_id
+  #   layer_counts: counts
+  #   input_id: tabula_muris_senis_droplet_lung
+  #   dataset_name: Tabula Muris Senis Lung
+  #   dataset_url: https://tabula-muris-senis.ds.czbiohub.org
+  #   dataset_reference: tabula2020single
+  #   dataset_summary: Aging mouse lung cells from Tabula Muris Senis
+  #   dataset_description: All lung cells from 10x profiles in Tabula Muris Senis, a 500k cell-atlas from 18 organs and tissues across the mouse lifespan.
+  #   dataset_organism: mus_musculus
+
+  - id: openproblems_v1/tenx_1k_pbmc
+    layer_counts: counts
+    input_id: tenx_1k_pbmc
+    dataset_name: 1k PBMCs
+    dataset_url: https://www.10xgenomics.com/resources/datasets/1-k-pbm-cs-from-a-healthy-donor-v-3-chemistry-3-standard-3-0-0
+    dataset_reference: 10x2018pbmc
+    dataset_summary: 1k peripheral blood mononuclear cells from a healthy donor
+    dataset_description: 1k Peripheral Blood Mononuclear Cells (PBMCs) from a healthy donor. Sequenced on 10X v3 chemistry in November 2018 by 10X Genomics.
+    dataset_organism: homo_sapiens
+    var_feature_name: index
+
+  - id: openproblems_v1/tenx_5k_pbmc
+    layer_counts: counts
+    input_id: tenx_5k_pbmc
+    dataset_name: 5k PBMCs
+    dataset_url: https://www.10xgenomics.com/resources/datasets/5-k-peripheral-blood-mononuclear-cells-pbm-cs-from-a-healthy-donor-with-cell-surface-proteins-v-3-chemistry-3-1-standard-3-1-0
+    dataset_reference: 10x2019pbmc
+    dataset_summary: 5k peripheral blood mononuclear cells from a healthy donor
+    dataset_description: 5k Peripheral Blood Mononuclear Cells (PBMCs) from a healthy donor. Sequenced on 10X v3 chemistry in July 2019 by 10X Genomics.
+    dataset_organism: homo_sapiens
+    var_feature_name: index
+    var_feature_id: gene_ids
+
+
+  - id: openproblems_v1/tnbc_wu2021
+    obs_cell_type: celltype_minor
+    layer_counts: counts
+    input_id: tnbc_wu2021
+    dataset_name: Triple-Negative Breast Cancer
+    dataset_url: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE118389
+    dataset_reference: wu2021single
+    dataset_summary: 1535 cells from six fresh triple-negative breast cancer tumors.
+    dataset_description: 1535 cells from six TNBC donors by (Wu et al., 2021). This dataset includes cytokine activities, inferred using a multivariate linear model with cytokine-focused signatures, as assumed true cell-cell communication (Dimitrov et al., 2022).
+    dataset_organism: homo_sapiens
+    var_feature_name: index
+    
+  - id: openproblems_v1/zebrafish
+    obs_cell_type: cell_type
+    obs_batch: lab
+    layer_counts: counts
+    input_id: zebrafish
+    dataset_name: Zebrafish embryonic cells
+    dataset_url: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE112294
+    dataset_reference: wagner2018single
+    dataset_summary: Single-cell mRNA sequencing of zebrafish embryonic cells.
+    dataset_description: 90k cells from zebrafish embryos throughout the first day of development, with and without a knockout of chordin, an important developmental gene. 
+    dataset_organism: danio_rerio
+    var_feature_name: index
+    var_feature_id: index
+
+
+normalization_methods: [log_cp10k, sqrt_cp10k, l1_sqrt]
+output_dataset: '$id/dataset.h5ad'
+output_meta: '$id/dataset_metadata.yaml'
+output_state: '$id/state.yaml'
+output_raw: force_null
+output_normalized: force_null
+output_pca: force_null
+output_hvg: force_null
+output_knn: force_null
+publish_dir: s3://openproblems-data/resources/datasets
+HERE
+
+cat > /tmp/nextflow.config << HERE
+process {
+  executor = 'awsbatch'
+}
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/datasets/workflows/process_openproblems_v1/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file "$params_file" \
+  --config /tmp/nextflow.config \
+  --labels openproblems_v1,dataset_loader
\ No newline at end of file
diff --git a/src/datasets/resource_scripts/openproblems_v1_multimodal.sh b/src/datasets/resource_scripts/openproblems_v1_multimodal.sh
new file mode 100755
index 0000000000..2d516a8ccb
--- /dev/null
+++ b/src/datasets/resource_scripts/openproblems_v1_multimodal.sh
@@ -0,0 +1,85 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+params_file="/tmp/datasets_openproblems_v1_multimodal_params.yaml"
+
+cat > "$params_file" << 'HERE'
+param_list:
+  - id: openproblems_v1_multimodal/citeseq_cbmc
+    input_id: citeseq_cbmc
+    dataset_name: "CITE-Seq CBMC"
+    dataset_summary: "CITE-seq profiles of 8k Cord Blood Mononuclear Cells"
+    dataset_description: "8k cord blood mononuclear cells profiled by CITEseq using a panel of 13 antibodies."
+    dataset_reference: stoeckius2017simultaneous
+    dataset_url: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE100866
+    dataset_organism: homo_sapiens
+    layer_counts: counts
+    var_feature_name: index
+    mod1: GEX
+    mod2: ADT
+
+  - id: openproblems_v1_multimodal/scicar_cell_lines
+    input_id: scicar_cell_lines
+    dataset_name: "sci-CAR Cell Lines"
+    dataset_summary: "sci-CAR profiles of 5k cell line cells (HEK293T, NIH/3T3, A549) across three treatment conditions (DEX 0h, 1h and 3h)"
+    dataset_description: "Single cell RNA-seq and ATAC-seq co-profiling for HEK293T cells, NIH/3T3 cells, A549 cells across three treatment conditions (DEX 0 hour, 1 hour and 3 hour treatment)."
+    dataset_reference: cao2018joint
+    dataset_url: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE117089
+    dataset_organism: "[homo_sapiens, mus_musculus]"
+    obs_cell_type: cell_name
+    layer_counts: counts
+    var_feature_id: index
+    var_feature_name: gene_short_name
+    mod1: GEX
+    mod2: ATAC
+
+  - id: openproblems_v1_multimodal/scicar_mouse_kidney
+    input_id: scicar_mouse_kidney
+    dataset_name: "sci-CAR Mouse Kidney"
+    dataset_summary: "sci-CAR profiles of 11k mouse kidney cells"
+    dataset_description: "Single cell RNA-seq and ATAC-seq co-profiling of 11k mouse kidney cells."
+    dataset_reference: cao2018joint
+    dataset_url: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE117089
+    dataset_organism: mus_musculus
+    obs_cell_type: cell_name
+    obs_batch: replicate
+    layer_counts: counts
+    var_feature_id: index
+    var_feature_name: gene_short_name
+    mod1: GEX
+    mod2: ATAC
+
+normalization_methods: [log_cp10k, sqrt_cp10k, l1_sqrt]
+output_mod1: '$id/dataset_mod1.h5ad'
+output_mod2: '$id/dataset_mod2.h5ad'
+output_meta_mod1: '$id/dataset_metadata_mod1.yaml'
+output_meta_mod2: '$id/dataset_metadata_mod2.yaml'
+output_state: '$id/state.yaml'
+publish_dir: s3://openproblems-data/resources/datasets
+HERE
+
+
+cat > /tmp/nextflow.config << HERE
+process {
+  withName:'.*publishStatesProc' {
+    memory = '16GB'
+    disk = '100GB'
+  }
+  errorStrategy = "ignore"
+}
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/datasets/workflows/process_openproblems_v1_multimodal/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file "$params_file" \
+  --labels openproblems_v1_multimodal,dataset_loader \
+  --config /tmp/nextflow.config
\ No newline at end of file
diff --git a/src/datasets/resource_scripts/openproblems_v1_multimodal_test.sh b/src/datasets/resource_scripts/openproblems_v1_multimodal_test.sh
new file mode 100755
index 0000000000..268a17cf7d
--- /dev/null
+++ b/src/datasets/resource_scripts/openproblems_v1_multimodal_test.sh
@@ -0,0 +1,45 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+export TOWER_WORKSPACE_ID=53907369739130
+
+OUTPUT_DIR="resources/datasets"
+
+if [ ! -d "$OUTPUT_DIR" ]; then
+  mkdir -p "$OUTPUT_DIR"
+fi
+
+params_file="/tmp/datasets_openproblems_v1_multimodal_params.yaml"
+
+cat > "$params_file" << 'HERE'
+param_list:
+  - id: openproblems_v1_multimodal/citeseq_cbmc
+    dataset_name: "CITE-Seq CBMC"
+    dataset_summary: "CITE-seq profiles of 8k Cord Blood Mononuclear Cells"
+    dataset_description: "8k cord blood mononuclear cells profiled by CITEseq using a panel of 13 antibodies."
+    dataset_reference: stoeckius2017simultaneous
+    dataset_url: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE100866
+    dataset_organism: homo_sapiens
+    layer_counts: counts
+
+normalization_methods: [log_cp10k, sqrt_cp10k, l1_sqrt]
+output_mod1: '$id/dataset_mod1.h5ad'
+output_mod2: '$id/dataset_mod2.h5ad'
+output_meta_mod1: '$id/dataset_metadata_mod1.yaml'
+output_meta_mod2: '$id/dataset_metadata_mod2.yaml'
+output_state: '$id/state.yaml'
+HERE
+
+export NXF_VER=22.04.5
+nextflow \
+  run . \
+  -main-script target/nextflow/datasets/workflows/process_openproblems_v1_multimodal/main.nf \
+  -profile docker \
+  -resume \
+  -params-file "$params_file" \
+  --publish_dir "$OUTPUT_DIR"
diff --git a/src/datasets/resource_scripts/openproblems_v1_test.sh b/src/datasets/resource_scripts/openproblems_v1_test.sh
new file mode 100755
index 0000000000..a79545f052
--- /dev/null
+++ b/src/datasets/resource_scripts/openproblems_v1_test.sh
@@ -0,0 +1,51 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+export TOWER_WORKSPACE_ID=53907369739130
+
+OUTPUT_DIR="resources/datasets"
+
+if [ ! -d "$OUTPUT_DIR" ]; then
+  mkdir -p "$OUTPUT_DIR"
+fi
+
+params_file="/tmp/datasets_openproblems_v1_params.yaml"
+
+cat > "$params_file" << 'HERE'
+param_list:
+  - id: openproblems_v1/pancreas
+    obs_cell_type: celltype
+    obs_batch: tech
+    layer_counts: counts
+    dataset_name: Human pancreas
+    dataset_url: https://theislab.github.io/scib-reproducibility/dataset_pancreas.html
+    dataset_reference: luecken2022benchmarking
+    dataset_summary: Human pancreas cells dataset from the scIB benchmarks
+    dataset_description: Human pancreatic islet scRNA-seq data from 6 datasets across technologies (CEL-seq, CEL-seq2, Smart-seq2, inDrop, Fluidigm C1, and SMARTER-seq). 
+    dataset_organism: homo_sapiens
+
+normalization_methods: [log_cp10k, sqrt_cp10k, l1_sqrt]
+output_dataset: '$id/dataset.h5ad'
+output_meta: '$id/dataset_metadata.yaml'
+output_state: '$id/state.yaml'
+output_raw: force_null
+output_normalized: force_null
+output_pca: force_null
+output_hvg: force_null
+output_knn: force_null
+HERE
+
+export NXF_VER=23.04.2
+nextflow run . \
+  -main-script target/nextflow/datasets/workflows/process_openproblems_v1/main.nf \
+  -profile docker \
+  -resume \
+  -params-file "$params_file" \
+  --publish_dir "$OUTPUT_DIR"
+  
+  # -with-tower
diff --git a/src/datasets/resource_scripts/tenx_visium.sh b/src/datasets/resource_scripts/tenx_visium.sh
new file mode 100755
index 0000000000..d5b54e7ef5
--- /dev/null
+++ b/src/datasets/resource_scripts/tenx_visium.sh
@@ -0,0 +1,316 @@
+#!/bin/bash
+
+# cat > "/tmp/params.yaml" << 'HERE'
+# param_list:
+#   - id: tenx_visium/mouse_brain_coronal_section1_visium
+#     input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/2.0.0/CytAssist_FFPE_Mouse_Brain_Rep1/CytAssist_FFPE_Mouse_Brain_Rep1_filtered_feature_bc_matrix.h5"
+#     input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/2.0.0/CytAssist_FFPE_Mouse_Brain_Rep1/CytAssist_FFPE_Mouse_Brain_Rep1_spatial.tar.gz"
+#     dataset_name: 10X Visium - Mouse Brain Coronal
+#     dataset_url: "https://www.10xgenomics.com/datasets/mouse-brain-coronal-section-1-ffpe-2-standard"
+#     dataset_summary: Gene expression library of Mouse Brain (CytAssist FFPE) using the Mouse Whole Transcriptome Probe Set
+#     dataset_description: "FFPE Mouse Brain tissue blocks sectioned as described in Visium CytAssist Spatial Gene Expression for FFPE - Tissue Preparation Guide Demonstrated Protocol. The H&E stained glass slide with tissue section was processed via Visium CytAssist instrument to transfer analytes to a Visium CytAssist Spatial Gene Expression slide. The probe extension and library construction steps follow the standard Visium for FFPE workflow outside of the instrument. The H&E image was acquired using Olympus VS200 Slide Scanning Microscope. Sequencing depth was 53,497 reads per spot. Sequencing configuration: 28bp read 1 (16bp Visium spatial barcode, 12bp UMI), 90bp read 2 (transcript), 10bp i7 sample barcode and 10bp i5 sample barcode. Key metrics include: 2,310 spots detected under tissue; 6,736 median genes per spot; 24,862 median UMI counts per spot."
+#     dataset_reference: 10x2022brain
+#     dataset_organism: Mus musculus
+#     spot_filter_min_genes: 200
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: true
+
+#   - id: tenx_visium/human_colorectal_cancer_visium
+#     input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/2.0.1/CytAssist_11mm_FFPE_Human_Colorectal_Cancer/CytAssist_11mm_FFPE_Human_Colorectal_Cancer_filtered_feature_bc_matrix.h5"
+#     input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/2.0.1/CytAssist_11mm_FFPE_Human_Colorectal_Cancer/CytAssist_11mm_FFPE_Human_Colorectal_Cancer_spatial.tar.gz"
+#     dataset_name: 10X Visium - Human Colorectal Cancer
+#     dataset_url: "https://www.10xgenomics.com/datasets/human-colorectal-cancer-11-mm-capture-area-ffpe-2-standard"
+#     dataset_summary: Gene expression library of Human Colorectal Cancer (CytAssist FFPE) using the Human Whole Transcriptome Probe Set
+#     dataset_description: "The tissue was sectioned as described in the Visium CytAssist Spatial Gene Expression for FFPE Tissue Preparation Guide (CG000518). Tissue section of 5 µm was placed on a standard glass slide, then stained following the Deparaffinization, H&E Staining, Imaging & Decrosslinking Demonstrated Protocol (CG000520). The glass slide with tissue section was processed via Visium CytAssist instrument to transfer analytes to a Visium CytAssist Spatial Gene Expression Slide v2, with 11 mm capture areas following the Visium CytAssist Spatial Gene Expression Reagent Kits User Guide (CG000495)."
+#     dataset_reference: 10x2023colorectal
+#     dataset_organism: Homo sapiens
+#     spot_filter_min_genes: 200
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: true
+
+#   - id: tenx_visium/human_heart_visium
+#     input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/1.0.0/V1_Human_Heart/V1_Human_Heart_filtered_feature_bc_matrix.h5"
+#     input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/1.0.0/V1_Human_Heart/V1_Human_Heart_spatial.tar.gz"
+#     dataset_name: 10X Visium - Human Heart
+#     dataset_url: "https://www.10xgenomics.com/datasets/human-heart-1-standard-1-0-0"
+#     dataset_summary: V1_Human_Heart
+#     dataset_description: "10x Genomics obtained fresh frozen human heart tissue from BioIVT Asterand. The tissue was embedded and cryosectioned as described in Visium Spatial Protocols - Tissue Preparation Guide Demonstrated Protocol (CG000240). Tissue sections of 10 µm thickness were placed on Visium Gene Expression Slides."
+#     dataset_reference: 10x2019heart
+#     dataset_organism: Homo sapiens
+#     spot_filter_min_genes: 200
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: true
+
+#   - id: tenx_visium/mouse_embryo_visium
+#     input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/2.1.0/CytAssist_11mm_FFPE_Mouse_Embryo/CytAssist_11mm_FFPE_Mouse_Embryo_filtered_feature_bc_matrix.h5"
+#     input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/2.1.0/CytAssist_11mm_FFPE_Mouse_Embryo/CytAssist_11mm_FFPE_Mouse_Embryo_spatial.tar.gz"
+#     dataset_name: 10X Visium - Mouse Embryo
+#     dataset_url: "https://www.10xgenomics.com/datasets/visium-cytassist-mouse-embryo-11-mm-capture-area-ffpe-2-standard"
+#     dataset_summary: Gene expression library of Mouse Embryo (CytAssist FFPE) using the Mouse Whole Transcriptome Probe Set
+#     dataset_description: "The tissue was sectioned as described in Visium CytAssist Spatial Gene Expression for FFPE Tissue Preparation Guide Demonstrated Protocol CG000518. Tissue sections of 5 µm was placed on a standard glass slide, and H&E-stained following deparaffinization. Sections were coverslipped with 85% glycerol, imaged, decoverslipped, followed by dehydration & decrosslinking (Demonstrated Protocol CG000520). The glass slide with the tissue section was processed with the Visium CytAssist instrument to transfer analytes to a Visium CytAssist Spatial Gene Expression slide (11 mm Capture Area). The probe extension and library construction steps follow the standard Visium for FFPE workflow outside of the instrument."
+#     dataset_reference: 10x2023embryo
+#     dataset_organism: Mus musculus
+#     spot_filter_min_genes: 200
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: false
+
+#   - id: tenx_visium/mouse_olfactory_bulb_visium
+#     input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_Mouse_Olfactory_Bulb/Visium_Mouse_Olfactory_Bulb_filtered_feature_bc_matrix.h5"
+#     input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_Mouse_Olfactory_Bulb/Visium_Mouse_Olfactory_Bulb_spatial.tar.gz"
+#     dataset_name: 10X Visium - Mouse Olfactory Bulb
+#     dataset_url: "https://www.10xgenomics.com/datasets/adult-mouse-olfactory-bulb-1-standard-1"
+#     dataset_summary: 10X Genomics obtained fresh frozen mouse olfactory bulb tissue from BioIVT.
+#     dataset_description: "The tissue was embedded and cryosectioned as described in Visium Spatial Protocols Tissue Preparation Guide (Demonstrated Protocol CG000240). Tissue sections of 10µm were placed on Visium Gene Expression slides, then fixed and stained following Methanol Fixation, H&E Staining & Imaging for Visium Spatial Protocols (CG000160)."
+#     dataset_reference: 10x2022olfactory
+#     dataset_organism: Mus musculus
+#     spot_filter_min_genes: 200
+#     gene_filter_min_spots: 30
+#     remove_mitochondrial: false
+
+#   - id: tenx_visium/human_breast_cancer_1_visium
+#     input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/1.2.0/Parent_Visium_Human_BreastCancer/Parent_Visium_Human_BreastCancer_filtered_feature_bc_matrix.h5"
+#     input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/1.2.0/Parent_Visium_Human_BreastCancer/Parent_Visium_Human_BreastCancer_spatial.tar.gz"
+#     dataset_name: 10X Visium - Human Breast Cancer 1
+#     dataset_url: "https://www.10xgenomics.com/datasets/human-breast-cancer-whole-transcriptome-analysis-1-standard-1-2-0"
+#     dataset_summary: Whole transcriptome analysis, Adult Human Breast Cancer (Visium)
+#     dataset_description: "10X Genomics obtained fresh frozen human Invasive Lobular Carcinoma breast tissue from BioIVT Asterand. The tissue was embedded and cryosectioned as described in Visium Spatial Protocols Tissue Preparation Guide Demonstrated Protocol (CG000240). Tissue sections of 10µm were placed on Visium Gene Expression slides and fixed and stained following Methanol Fixation, H&E Staining & Imaging for Visium Spatial Protocols (CG000160)."
+#     dataset_reference: 10x2020breast
+#     dataset_organism: Homo sapiens
+#     spot_filter_min_genes: 100
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: true
+
+#   - id: tenx_visium/human_lymph_node_visium
+#     input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/1.0.0/V1_Human_Lymph_Node/V1_Human_Lymph_Node_filtered_feature_bc_matrix.h5"
+#     input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/1.0.0/V1_Human_Lymph_Node/V1_Human_Lymph_Node_spatial.tar.gz"
+#     dataset_name: 10X Visium - Human Lymph Node
+#     dataset_url: "https://www.10xgenomics.com/datasets/human-lymph-node-1-standard-1-0-0"
+#     dataset_summary: Whole transcriptome analysis, Human Lymph Node
+#     dataset_description: "10x Genomics obtained fresh frozen human lymph node from BioIVT Asterand. The tissue was embedded and cryosectioned as described in Visium Spatial Protocols - Tissue Preparation Guide Demonstrated Protocol (CG000240). Tissue sections of 10 µm thickness were placed on Visium Gene Expression Slides."
+#     dataset_reference: 10x2019lymph
+#     dataset_organism: Homo sapiens
+#     spot_filter_min_genes: 100
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: true
+
+#   - id: tenx_visium/human_normal_prostate_visium
+#     input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Normal_Prostate/Visium_FFPE_Human_Normal_Prostate_filtered_feature_bc_matrix.h5"
+#     input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Normal_Prostate/Visium_FFPE_Human_Normal_Prostate_spatial.tar.gz"
+#     dataset_name: 10X Visium - Human Normal Prostate
+#     dataset_url: "https://www.10xgenomics.com/datasets/normal-human-prostate-ffpe-1-standard-1-3-0"
+#     dataset_summary: Gene expression library of Human Normal Prostate (Visium FFPE) using the Human Whole Transcriptome Probe Set
+#     dataset_description: "10x Genomics obtained FFPE human prostate tissue from Indivumed Human Tissue Specimens. The tissue was sectioned as described in Visium Spatial Gene Expression for FFPE – Tissue Preparation Guide Demonstrated Protocol (CG000408). Tissue sections of 5 µm were placed on Visium Gene Expression slides, then stained following Deparaffinization, H&E Staining, Imaging & Decrosslinking Demonstrated Protocol (CG000409)."
+#     dataset_reference: 10x2021prostate
+#     dataset_organism: Homo sapiens
+#     spot_filter_min_genes: 100
+#     gene_filter_min_spots: 30
+#     remove_mitochondrial: true
+
+#   - id: tenx_visium/human_prostate_cancer_visium
+#     input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Prostate_IF/Visium_FFPE_Human_Prostate_IF_filtered_feature_bc_matrix.h5"
+#     input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Prostate_IF/Visium_FFPE_Human_Prostate_IF_spatial.tar.gz"
+#     dataset_name: 10X Visium - Human Prostate Cancer
+#     dataset_url: "https://www.10xgenomics.com/datasets/human-prostate-cancer-adjacent-normal-section-with-if-staining-ffpe-1-standard"
+#     dataset_summary: Gene expression library of Human Prostate Cancer (Visium FFPE) with an IF image using the Human Whole Transcriptome Probe Set
+#     dataset_description: "10x Genomics obtained FFPE human prostate tissue from Indivumed Human Tissue Specimens. Original diagnosis with adenocarcinoma. The tissue was sectioned as described in Visium Spatial Gene Expression for FFPE Tissue Preparation Guide Demonstrated Protocol (CG000408). Tissue sections of 10 µm were placed on Visium Gene Expression slides, then stained following Deparaffinization, Decrosslinking, Immunofluorescence Staining & Imaging Demonstrated Protocol (CG000410)."
+#     dataset_reference: 10x2022prostate
+#     dataset_organism: Homo sapiens
+#     spot_filter_min_genes: 100
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: true
+
+# normalization_methods: [log_cp10k]
+# output_dataset: '$id/dataset.h5ad'
+# output_meta: '$id/dataset_metadata.yaml'
+# output_state: '$id/state.yaml'
+# output_raw: force_null
+# output_normalized: force_null
+# publish_dir: resources/datasets
+# HERE
+
+cat > "/tmp/params.yaml" << 'HERE'
+param_list:
+  - id: tenx_visium/human_cerebellum_visium
+    input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/1.2.0/Parent_Visium_Human_Cerebellum/Parent_Visium_Human_Cerebellum_filtered_feature_bc_matrix.h5"
+    input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/1.2.0/Parent_Visium_Human_Cerebellum/Parent_Visium_Human_Cerebellum_spatial.tar.gz"
+    dataset_name: 10X Visium - Adult Human Cerebellum
+    dataset_url: "https://www.10xgenomics.com/datasets/human-cerebellum-whole-transcriptome-analysis-1-standard-1-2-0"
+    dataset_summary: Human Cerebellum Whole Transcriptome Analysis
+    dataset_description: "10X Genomics obtained fresh frozen human cerebellum tissue from BioIVT Asterand. The tissue was embedded and cryosectioned as described in Visium Spatial Protocols Tissue Preparation Guide (Demonstrated Protocol CG000240). Tissue sections of 10µm were placed on Visium Gene Expression slides and fixed and stained following Methanol Fixation, H&E Staining & Imaging for Visium Spatial Protocols (CG000160)."
+    dataset_reference: 10x2020cerebellum
+    dataset_organism: Homo sapiens
+    spot_filter_min_genes: 100
+    gene_filter_min_spots: 50
+    remove_mitochondrial: true
+
+  - id: tenx_visium/mouse_kidney_v1_visium
+    input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Mouse_Kidney/V1_Mouse_Kidney_filtered_feature_bc_matrix.h5"
+    input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Mouse_Kidney/V1_Mouse_Kidney_spatial.tar.gz"
+    dataset_name: 10X Visium - Mouse Kidney 1
+    dataset_url: "https://www.10xgenomics.com/datasets/mouse-kidney-section-coronal-1-standard-1-1-0"
+    dataset_summary: Mouse Kidney Whole Transcriptome Analysis
+    dataset_description: "10x Genomics obtained fresh frozen mouse kidney tissue from BioIVT Asterand. The tissue was embedded and cryosectioned as described in Visium Spatial Protocols - Tissue Preparation Guide Demonstrated Protocol (CG000240). Tissue sections of 10 µm thickness from a slice of the coronal plane were placed on Visium Gene Expression slides, then stained following the Methanol Fixation, H&E Staining & Imaging Demonstrated Protocol (CG000160)."
+    dataset_reference: 10x2020kidney
+    dataset_organism: Mus musculus
+    spot_filter_min_genes: 100
+    gene_filter_min_spots: 30
+    remove_mitochondrial: false
+
+  - id: tenx_visium/human_lung_cancer_visium
+    input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/2.0.1/CytAssist_11mm_FFPE_Human_Lung_Cancer/CytAssist_11mm_FFPE_Human_Lung_Cancer_filtered_feature_bc_matrix.h5"
+    input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/2.0.1/CytAssist_11mm_FFPE_Human_Lung_Cancer/CytAssist_11mm_FFPE_Human_Lung_Cancer_spatial.tar.gz"
+    dataset_name: 10X Visium - Human Lung Cancer
+    dataset_url: "https://www.10xgenomics.com/datasets/human-lung-cancer-11-mm-capture-area-ffpe-2-standard"
+    dataset_summary: Gene expression library of Human Lung Cancer (CytAssist FFPE) using the Human Whole Transcriptome Probe Set
+    dataset_description: "10x Genomics obtained FFPE human lung cancer tissue from Avaden Biosciences. The tissue was sectioned as described in the Visium CytAssist Spatial Gene Expression for FFPE Tissue Preparation Guide (CG000518). Tissue section of 5 µm was placed on a standard glass slide, then stained following the Deparaffinization, H&E Staining, Imaging & Decrosslinking Demonstrated Protocol (CG000520). The glass slide with tissue section was processed via Visium CytAssist instrument to transfer analytes to a Visium CytAssist Spatial Gene Expression Slide v2, with 11 mm capture areas following the Visium CytAssist Spatial Gene Expression Reagent Kits User Guide (CG000495)."
+    dataset_reference: 10x2023lung
+    dataset_organism: Homo sapiens
+    spot_filter_min_genes: 100
+    gene_filter_min_spots: 50
+    remove_mitochondrial: true
+
+  - id: tenx_visium/human_brain_cancer_visium
+    input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/2.0.1/CytAssist_11mm_FFPE_Human_Glioblastoma/CytAssist_11mm_FFPE_Human_Glioblastoma_filtered_feature_bc_matrix.h5"
+    input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/2.0.1/CytAssist_11mm_FFPE_Human_Glioblastoma/CytAssist_11mm_FFPE_Human_Glioblastoma_spatial.tar.gz"
+    dataset_name: 10X Visium - Human Brain Cancer
+    dataset_url: "https://www.10xgenomics.com/datasets/human-brain-cancer-11-mm-capture-area-ffpe-2-standard"
+    dataset_summary: Gene expression library of Human Glioblastoma (CytAssist FFPE) using the Human Whole Transcriptome Probe Set
+    dataset_description: "10x Genomics obtained FFPE human brain cancer tissue from Avaden Biosciences. The tissue was sectioned as described in the Visium CytAssist Spatial Gene Expression for FFPE - Tissue Preparation Guide (CG000518). Tissue section of 5 µm was placed on a standard glass slide, then stained following the Deparaffinization, H&E Staining, Imaging & Decrosslinking Demonstrated Protocol (CG000520). The glass slide with tissue section was processed via Visium CytAssist instrument to transfer analytes to a Visium CytAssist Spatial Gene Expression Slide v2, with 11 mm capture areas following the Visium CytAssist Spatial Gene Expression Reagent Kits User Guide (CG000495)."
+    dataset_reference: 10x2023brain
+    dataset_organism: Homo sapiens
+    spot_filter_min_genes: 100
+    gene_filter_min_spots: 100
+    remove_mitochondrial: true
+
+  - id: tenx_visium/human_kidney_visium
+    input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/2.0.1/CytAssist_11mm_FFPE_Human_Kidney/CytAssist_11mm_FFPE_Human_Kidney_filtered_feature_bc_matrix.h5"
+    input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/2.0.1/CytAssist_11mm_FFPE_Human_Kidney/CytAssist_11mm_FFPE_Human_Kidney_spatial.tar.gz"
+    dataset_name: 10X Visium - Human Kidney
+    dataset_url: "https://www.10xgenomics.com/datasets/human-kidney-11-mm-capture-area-ffpe-2-standard"
+    dataset_summary: Gene expression library of Human Kidney (CytAssist FFPE) using the Human Whole Transcriptome Probe Set
+    dataset_description: "10x Genomics obtained FFPE human kidney tissue from Avaden Biosciences. The tissue was sectioned as described in the Visium CytAssist Spatial Gene Expression for FFPE – Tissue Preparation Guide (CG000518). Tissue section of 5 µm was placed on a standard glass slide, then stained following the Deparaffinization, H&E Staining, Imaging & Decrosslinking Demonstrated Protocol (CG000520). The glass slide with tissue section was processed via Visium CytAssist instrument to transfer analytes to a Visium CytAssist Spatial Gene Expression Slide v2, with 11 mm capture areas following the Visium CytAssist Spatial Gene Expression Reagent Kits User Guide (CG000495)."
+    dataset_reference: 10x2023kidney
+    dataset_organism: Homo sapiens
+    spot_filter_min_genes: 100
+    gene_filter_min_spots: 50
+    remove_mitochondrial: true
+
+  - id: tenx_visium/human_intestinal_cancer_visium
+    input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Intestinal_Cancer/Visium_FFPE_Human_Intestinal_Cancer_filtered_feature_bc_matrix.h5"
+    input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Intestinal_Cancer/Visium_FFPE_Human_Intestinal_Cancer_spatial.tar.gz"
+    dataset_name: 10X Visium - Human Intestine Cancer
+    dataset_url: "https://www.10xgenomics.com/datasets/human-intestine-cancer-1-standard"
+    dataset_summary: Gene expression library of Human Intestinal Cancer (Visium FFPE) using the Human Whole Transcriptome Probe Set
+    dataset_description: "5 µm section from Human Intestinal Cancer. FFPE tissue purchased from BioIVT Asterand Human Tissue Specimens. Libraries were prepared following the Visium Spatial Gene Expression Reagent Kits for FFPE User Guide (CG000407 Rev A)."
+    dataset_reference: 10x2022intestine
+    dataset_organism: Homo sapiens
+    spot_filter_min_genes: 100
+    gene_filter_min_spots: 30
+    remove_mitochondrial: true
+
+  - id: tenx_visium/human_skin_melanoma_visium
+    input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/2.0.0/CytAssist_FFPE_Human_Skin_Melanoma/CytAssist_FFPE_Human_Skin_Melanoma_filtered_feature_bc_matrix.h5"
+    input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/2.0.0/CytAssist_FFPE_Human_Skin_Melanoma/CytAssist_FFPE_Human_Skin_Melanoma_spatial.tar.gz"
+    dataset_name: 10X Visium - Human Skin Melanoma
+    dataset_url: "https://www.10xgenomics.com/datasets/human-melanoma-if-stained-ffpe-2-standard"
+    dataset_summary: Gene expression library of Human Skin Melanoma (CytAssist FFPE) using the Human Whole Transcriptome Probe Set
+    dataset_description: "10x Genomics obtained FFPE Human Melanoma tissue blocks from Avaden Biosciences. The tissue was sectioned as described in Visium CytAssist Spatial Gene Expression for FFPE Tissue Preparation Guide Demonstrated Protocol (CG000518). Tissue sections of 5 µm was placed on a standard glass slide, deparaffinized followed by immunofluorescence (IF) staining. Sections were coverslipped with 85% glycerol, imaged, decoverslipped, followed by dehydration & decrosslinking Demonstrated Protocol (CG000519). The glass slide with tissue section was processed via Visium CytAssist instrument to transfer analytes to a Visium CytAssist Spatial Gene Expression slide. The probe extension and library construction steps follow the standard Visium for FFPE workflow outside of the instrument."
+    dataset_reference: 10x2022melanoma
+    dataset_organism: Homo sapiens
+    spot_filter_min_genes: 100
+    gene_filter_min_spots: 50
+    remove_mitochondrial: true
+
+  - id: tenx_visium/human_cervical_cancer_visium
+    input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Cervical_Cancer/Visium_FFPE_Human_Cervical_Cancer_filtered_feature_bc_matrix.h5"
+    input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Cervical_Cancer/Visium_FFPE_Human_Cervical_Cancer_spatial.tar.gz"
+    dataset_name: 10X Visium - Human Cervical Cancer
+    dataset_url: "https://www.10xgenomics.com/datasets/human-cervical-cancer-1-standard"
+    dataset_summary: Gene expression library of Human Cervical Cancer (Visium FFPE) using the Human Whole Transcriptome Probe Set
+    dataset_description: "5 µm section from squamous cell carcinoma of human cervical cancer. FFPE tissue purchased from Discovery Life Sciences."
+    dataset_reference: 10x2022cervical
+    dataset_organism: Homo sapiens
+    spot_filter_min_genes: 100
+    gene_filter_min_spots: 50
+    remove_mitochondrial: true
+
+  - id: tenx_visium/human_breast_cancer_2_visium
+    input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Breast_Cancer/Visium_FFPE_Human_Breast_Cancer_filtered_feature_bc_matrix.h5"
+    input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/1.3.0/Visium_FFPE_Human_Breast_Cancer/Visium_FFPE_Human_Breast_Cancer_spatial.tar.gz"
+    dataset_name: 10X Visium - Human Breast Cancer 2
+    dataset_url: "https://www.10xgenomics.com/datasets/human-breast-cancer-ductal-carcinoma-in-situ-invasive-carcinoma-ffpe-1-standard-1-3-0"
+    dataset_summary: Gene expression library of Human Breast Cancer (Visium FFPE) using the Human Whole Transcriptome Probe Set
+    dataset_description: "10x Genomics obtained FFPE human breast tissue from BioIVT Asterand Human Tissue Specimens. The tissue was annotated with Ductal Carcinoma In Situ, Invasive Carcinoma. The tissue was sectioned as described in Visium Spatial Gene Expression for FFPE – Tissue Preparation Guide Demonstrated Protocol (CG000408). Tissue sections of 5 µm were placed on Visium Gene Expression slides, then stained following Deparaffinization, H&E Staining, Imaging & Decrosslinking Demonstrated Protocol (CG000409)."
+    dataset_reference: 10x2021breast
+    dataset_organism: Homo sapiens
+    spot_filter_min_genes: 100
+    gene_filter_min_spots: 50
+    remove_mitochondrial: true
+
+normalization_methods: [log_cp10k]
+output_dataset: '$id/dataset.h5ad'
+output_meta: '$id/dataset_metadata.yaml'
+output_state: '$id/state.yaml'
+output_raw: force_null
+output_normalized: force_null
+publish_dir: resources/datasets
+HERE
+
+# cat > "/tmp/params.yaml" << 'HERE'
+# param_list:
+#   - id: tenx_visium/human_colon_cancer_xenium
+#     input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/2.1.0/CytAssist_FFPE_Human_Colon_Post_Xenium_Rep1/CytAssist_FFPE_Human_Colon_Post_Xenium_Rep1_filtered_feature_bc_matrix.h5"
+#     input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/2.1.0/CytAssist_FFPE_Human_Colon_Post_Xenium_Rep1/CytAssist_FFPE_Human_Colon_Post_Xenium_Rep1_spatial.tar.gz"
+#     dataset_name: 10X Xenium - Human Colon
+#     dataset_url: "https://www.10xgenomics.com/datasets/visium-cytassist-gene-expression-libraries-of-post-xenium-human-colon-cancer-ffpe-using-the-human-whole-transcriptome-probe-set-2-standard"
+#     dataset_summary: Gene expression library of Post Xenium Human Colon Cancer (CytAssist FFPE) using the Human Whole Transcriptome Probe Set - Replicate 1
+#     dataset_description: "This dataset is provided as part of the Technical Note: Post-Xenium In Situ Applications: Immunofluorescence, H&E, and Visium CytAssist Spatial Gene Expression (CG000709). Post-Xenium samples were compared to controls (samples not processed through the Xenium workflow) using 5 µm (FFPE) serial sections."
+#     dataset_reference: 10x2023colon
+#     dataset_organism: Homo sapiens
+#     spot_filter_min_genes: 100
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: true
+
+#   - id: tenx_visium/mouse_brain_xenium
+#     input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/2.1.0/CytAssist_FreshFrozen_Mouse_Brain_Post_Xenium_Rep1/CytAssist_FreshFrozen_Mouse_Brain_Post_Xenium_Rep1_filtered_feature_bc_matrix.h5"
+#     input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/2.1.0/CytAssist_FreshFrozen_Mouse_Brain_Post_Xenium_Rep1/CytAssist_FreshFrozen_Mouse_Brain_Post_Xenium_Rep1_spatial.tar.gz"
+#     dataset_name: 10X Xenium - Mouse Brain 
+#     dataset_url: "https://www.10xgenomics.com/datasets/visium-cytassist-gene-expression-libraries-of-post-xenium-mouse-brain-ff-using-the-mouse-whole-transcriptome-probe-set-2-standard"
+#     dataset_summary: Gene expression library of Post Xenium Mouse Brain (CytAssist Fresh Frozen) using the Mouse Whole Transcriptome Probe Set - Replicate 1
+#     dataset_description: "This dataset is provided as part of the Technical Note: Post-Xenium In Situ Applications: Immunofluorescence, H&E, and Visium CytAssist Spatial Gene Expression (CG000709). Post-Xenium samples were compared to controls (samples not processed through the Xenium workflow) using 10 µm fresh-frozen (FF) serial sections."
+#     dataset_reference: 10x2023mousebrain
+#     dataset_organism: Mus musculus
+#     spot_filter_min_genes: 100
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: false
+
+# normalization_methods: [log_cp10k]
+# output_dataset: '$id/dataset.h5ad'
+# output_meta: '$id/dataset_metadata.yaml'
+# output_state: '$id/state.yaml'
+# output_raw: force_null
+# output_normalized: force_null
+# publish_dir: resources/datasets
+# HERE
+
+cat > /tmp/nextflow.config << HERE
+process {
+  executor = 'awsbatch'
+  withLabel: highmem {
+    memory = '350GB'
+  }
+  withName: '.*publishStatesProc' {
+    memory = '16GB'
+    disk = '100GB'
+  }
+}
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision integration_build \
+  --pull-latest \
+  --main-script target/nextflow/datasets/workflows/process_tenx_visium/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file "/tmp/params.yaml" \
+  --config /tmp/nextflow.config 
diff --git a/src/datasets/resource_scripts/zenodo_spatial.sh.sh b/src/datasets/resource_scripts/zenodo_spatial.sh.sh
new file mode 100755
index 0000000000..7842b4368f
--- /dev/null
+++ b/src/datasets/resource_scripts/zenodo_spatial.sh.sh
@@ -0,0 +1,414 @@
+#!/bin/bash
+
+cat > "/tmp/params.yaml" << 'HERE'
+param_list:
+  - id: zenodo_spatial/human_heart_myocardial_infarction_1_visium
+    input_data: "https://zenodo.org/records/13328275/files/10X0018.h5ad?download=1"
+    dataset_name: 10X Visium - Human Heart MI 1
+    dataset_url: "https://www.nature.com/articles/s41586-022-05060-x"
+    dataset_summary: Gene expression library of human heart using 10x Visium.
+    dataset_description: "Frozen heart samples were embedded in OCT (Tissue-Tek) and cryosectioned (Thermo Cryostar). The 10-µm section was placed on the pre-chilled Optimization slides (Visium, 10X Genomics, PN-1000193) and the optimal lysis time was determined. The tissues were treated as recommended by 10X Genomics and the optimization procedure showed an optimal permeabilization time of 12 or 18 min of digestion and release of RNA from the tissue slide. Spatial gene expression slides (Visium, 10X Genomics, PN-1000187) were used for spatial transcriptomics following the Visium User Guides"
+    dataset_reference: kuppe2022spatial
+    dataset_organism: Homo sapiens
+    spot_filter_min_genes: 200
+    gene_filter_min_spots: 50
+    remove_mitochondrial: true
+
+  - id: zenodo_spatial/human_heart_myocardial_infarction_2_visium
+    input_data: "https://zenodo.org/records/13328275/files/10X009.h5ad?download=1"
+    dataset_name: 10X Visium - Human Heart MI 2
+    dataset_url: "https://www.nature.com/articles/s41586-022-05060-x"
+    dataset_summary: Gene expression library of human heart using 10x Visium.
+    dataset_description: "Frozen heart samples were embedded in OCT (Tissue-Tek) and cryosectioned (Thermo Cryostar). The 10-µm section was placed on the pre-chilled Optimization slides (Visium, 10X Genomics, PN-1000193) and the optimal lysis time was determined. The tissues were treated as recommended by 10X Genomics and the optimization procedure showed an optimal permeabilization time of 12 or 18 min of digestion and release of RNA from the tissue slide. Spatial gene expression slides (Visium, 10X Genomics, PN-1000187) were used for spatial transcriptomics following the Visium User Guides"
+    dataset_reference: kuppe2022spatial
+    dataset_organism: Homo sapiens
+    spot_filter_min_genes: 200
+    gene_filter_min_spots: 50
+    remove_mitochondrial: true
+
+normalization_methods: [log_cp10k]
+output_dataset: '$id/dataset.h5ad'
+output_meta: '$id/dataset_metadata.yaml'
+output_state: '$id/state.yaml'
+output_raw: force_null
+output_normalized: force_null
+publish_dir: resources/datasets
+remove_mitochondrial: true
+HERE
+
+# catt > "/tmp/params.yaml" << 'HERE'
+# param_list:
+#   - id: zenodo_spatial/mouse_e10_brain_dbitseq
+#     input_data: "https://zenodo.org/records/12785822/files/DBiT-seq_liu2020high_E10_brain_gene_25um_data.h5ad?download=1"
+#     dataset_name: DBiT-seq - Mouse Brain (E10)
+#     dataset_url: "https://www.cell.com/cell/fulltext/S0092-8674(20)31390-8"
+#     dataset_summary: High-Spatial-Resolution Multi-Omics Sequencing via Deterministic Barcoding in Tissue.
+#     dataset_description: "Gene expression library of an E10 whole mouse embryo tissue (brain in early-stage organogenesis) profiled using DBiT-seq."
+#     dataset_organism: Mus musculus
+#     dataset_reference: liu2020high
+#     spot_filter_min_genes: 10
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: true
+
+#   - id: zenodo_spatial/mouse_e10_eye_dbitseq
+#     input_data: "https://zenodo.org/records/12785822/files/DBiT-seq_liu2020high_E10_eye_and_nearby_data.h5ad?download=1"
+#     dataset_name: DBiT-seq - Mouse Eye (E10)
+#     dataset_url: "https://www.cell.com/cell/fulltext/S0092-8674(20)31390-8"
+#     dataset_summary: High-Spatial-Resolution Multi-Omics Sequencing via Deterministic Barcoding in Tissue.
+#     dataset_description: "Gene expression library of an E10 whole mouse embryo tissue (eye in early-stage organogenesis) profiled using DBiT-seq."
+#     dataset_organism: Mus musculus
+#     dataset_reference: liu2020high
+#     spot_filter_min_genes: 10
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: true
+
+#   - id: zenodo_spatial/mouse_e10_whole_body_dbitseq
+#     input_data: "https://zenodo.org/records/12785822/files/DBiT-seq_liu2020high_E10_whole_gene_best_data.h5ad?download=1"
+#     dataset_name: DBiT-seq - Mouse Whole Body (E10)
+#     dataset_url: "https://www.cell.com/cell/fulltext/S0092-8674(20)31390-8"
+#     dataset_summary: High-Spatial-Resolution Multi-Omics Sequencing via Deterministic Barcoding in Tissue.
+#     dataset_description: "Gene expression library of an E10 whole mouse embryo tissue profiled using DBiT-seq."
+#     dataset_organism: Mus musculus
+#     dataset_reference: liu2020high
+#     spot_filter_min_genes: 10
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: true
+
+#   - id: zenodo_spatial/mouse_e11_lower_body_dbitseq
+#     input_data: "https://zenodo.org/records/12785822/files/DBiT-seq_liu2020high_E11_lower_body_data.h5ad?download=1"
+#     dataset_name: DBiT-seq - Mouse Lower Body (E11)
+#     dataset_url: "https://www.cell.com/cell/fulltext/S0092-8674(20)31390-8"
+#     dataset_summary: High-Spatial-Resolution Multi-Omics Sequencing via Deterministic Barcoding in Tissue.
+#     dataset_description: "Gene expression library of an E11 whole mouse embryo tissue (lower body in early-stage organogenesis) profiled using DBiT-seq."
+#     dataset_organism: Mus musculus
+#     dataset_reference: liu2020high
+#     spot_filter_min_genes: 10
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: true
+
+#   - id: zenodo_spatial/mouse_e11_1_dbitseq
+#     input_data: "https://zenodo.org/records/12785822/files/DBiT-seq_liu2020high_GSM4364244_E11-FL-1L_gene_data.h5ad?download=1"
+#     dataset_name: DBiT-seq - Mouse Whole Body 1 (E11)
+#     dataset_url: "https://www.cell.com/cell/fulltext/S0092-8674(20)31390-8"
+#     dataset_summary: High-Spatial-Resolution Multi-Omics Sequencing via Deterministic Barcoding in Tissue.
+#     dataset_description: "Gene expression library of an E11 whole mouse embryo tissue profiled using DBiT-seq."
+#     dataset_organism: Mus musculus
+#     dataset_reference: liu2020high
+#     spot_filter_min_genes: 10
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: true
+
+#   - id: zenodo_spatial/mouse_e11_2_dbitseq
+#     input_data: "https://zenodo.org/records/12785822/files/DBiT-seq_liu2020high_GSM4364245_E11-FL-2L_gene_data.h5ad?download=1"
+#     dataset_name: DBiT-seq - Mouse Whole Body 2 (E11)
+#     dataset_url: "https://www.cell.com/cell/fulltext/S0092-8674(20)31390-8"
+#     dataset_summary: High-Spatial-Resolution Multi-Omics Sequencing via Deterministic Barcoding in Tissue.
+#     dataset_description: "Gene expression library of an E11 whole mouse embryo tissue profiled using DBiT-seq."
+#     dataset_organism: Mus musculus
+#     dataset_reference: liu2020high
+#     spot_filter_min_genes: 10
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: true
+
+# normalization_methods: [log_cp10k]
+# output_dataset: '$id/dataset.h5ad'
+# output_meta: '$id/dataset_metadata.yaml'
+# output_state: '$id/state.yaml'
+# output_raw: force_null
+# output_normalized: force_null
+# publish_dir: resources/datasets
+# HERE
+
+# cat > "/tmp/params.yaml" << 'HERE'
+# param_list:
+#   - id: zenodo_spatial/human_cortex_1_merfish
+#     input_data: "https://zenodo.org/records/12785822/files/MERFISH_Fang2022Conservation_H18.06.006.MTG.250.expand.rep1_data.h5ad?download=1"
+#     dataset_name: MERFISH - Human Cortex 1
+#     dataset_url: "https://www.science.org/doi/10.1126/science.abm1741"
+#     dataset_summary: Spatially resolved profiling of human cerebral cortex using multiplexed error-robust fluorescence in situ hybridization (MERFISH).
+#     dataset_description: "Spatially resolved profiling of human cerebral cortex (middle temopral gyrus) replicate 1 using multiplexed error-robust fluorescence in situ hybridization (MERFISH) (250 gene panel)."
+#     dataset_organism: Homo sapiens
+#     dataset_reference: fang2022conservation
+#     spot_filter_min_genes: 10
+#     gene_filter_min_spots: 100
+#     remove_mitochondrial: false
+
+#   - id: zenodo_spatial/human_cortex_2_merfish
+#     input_data: "https://zenodo.org/records/12785822/files/MERFISH_Fang2022Conservation_H18.06.006.MTG.4000.expand.rep1_data.h5ad?download=1"
+#     dataset_name: MERFISH - Human Cortex 2
+#     dataset_url: "https://www.science.org/doi/10.1126/science.abm1741"
+#     dataset_summary: Spatially resolved profiling of human cerebral cortex using multiplexed error-robust fluorescence in situ hybridization (MERFISH).
+#     dataset_description: "Spatially resolved profiling of human cerebral cortex (middle temopral gyrus) replicate 1 using multiplexed error-robust fluorescence in situ hybridization (MERFISH) (4000 gene panel)."
+#     dataset_organism: Homo sapiens
+#     dataset_reference: fang2022conservation
+#     spot_filter_min_genes: 10
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: false
+
+#   - id: zenodo_spatial/human_cortex_3_merfish
+#     input_data: "https://zenodo.org/records/12785822/files/MERFISH_Fang2022Conservation_H18.06.006.MTG.4000.expand.rep2_data.h5ad?download=1"
+#     dataset_name: MERFISH - Human Cortex 3
+#     dataset_url: "https://www.science.org/doi/10.1126/science.abm1741"
+#     dataset_summary: Spatially resolved profiling of human cerebral cortex using multiplexed error-robust fluorescence in situ hybridization (MERFISH).
+#     dataset_description: "Spatially resolved profiling of human cerebral cortex (middle temopral gyrus) replicate 2 using multiplexed error-robust fluorescence in situ hybridization (MERFISH) (4000 gene panel)."
+#     dataset_organism: Homo sapiens
+#     dataset_reference: fang2022conservation
+#     spot_filter_min_genes: 10
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: false
+
+#   - id: zenodo_spatial/human_cortex_4_merfish
+#     input_data: "https://zenodo.org/records/12785822/files/MERFISH_Fang2022Conservation_H18.06.006.MTG.4000.expand.rep3_data.h5ad?download=1"
+#     dataset_name: MERFISH - Human Cortex 4
+#     dataset_url: "https://www.science.org/doi/10.1126/science.abm1741"
+#     dataset_summary: Spatially resolved profiling of human cerebral cortex using multiplexed error-robust fluorescence in situ hybridization (MERFISH).
+#     dataset_description: "Spatially resolved profiling of human cerebral cortex (middle temopral gyrus) replicate 3 using multiplexed error-robust fluorescence in situ hybridization (MERFISH) (4000 gene panel)."
+#     dataset_organism: Homo sapiens
+#     dataset_reference: fang2022conservation
+#     spot_filter_min_genes: 10
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: false
+
+#   - id: zenodo_spatial/mouse_cortex_merfish
+#     input_data: "https://zenodo.org/records/12785822/files/MERFISH_Fang2022Conservation_mouse1.AUD_TEA_VIS.242.unexpand_data.h5ad?download=1"
+#     dataset_name: MERFISH - Mouse Cortex
+#     dataset_url: "https://www.science.org/doi/10.1126/science.abm1741"
+#     dataset_summary: Spatially resolved profiling of mouse cerebral cortex using multiplexed error-robust fluorescence in situ hybridization (MERFISH).
+#     dataset_description: "Spatially resolved profiling of mouse cerebral cortex (visual cortex (VIS), auditory cortex (AUD) and temporal association area (TEa) unexpanded sections) using multiplexed error-robust fluorescence in situ hybridization (MERFISH)."
+#     dataset_organism: Mus musculus
+#     dataset_reference: fang2022conservation
+#     spot_filter_min_genes: 10
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: true
+
+# normalization_methods: [log_cp10k]
+# output_dataset: '$id/dataset.h5ad'
+# output_meta: '$id/dataset_metadata.yaml'
+# output_state: '$id/state.yaml'
+# output_raw: force_null
+# output_normalized: force_null
+# publish_dir: resources/datasets
+# HERE
+
+# cat > "/tmp/params.yaml" << 'HERE'
+# param_list:
+#   - id: zenodo_spatial/mouse_organogenesis_seqfish
+#     input_data: "https://zenodo.org/records/12785822/files/seqfish.h5ad?download=1"
+#     dataset_name: Seqfish - Mouse Organogenesis
+#     dataset_url: "https://www.nature.com/articles/s41587-021-01006-2"
+#     dataset_summary: Single-cell spatial expression of mouse organogenesis.
+#     dataset_description: "Sagittal sections from mouse embryo at the 8-12 ss was profiled by seqFISH."
+#     dataset_organism: Mus musculus
+#     dataset_reference: lohoff2021integration
+#     spot_filter_min_genes: 10
+#     gene_filter_min_spots: 10
+#     remove_mitochondrial: true
+
+# normalization_methods: [log_cp10k]
+# output_dataset: '$id/dataset.h5ad'
+# output_meta: '$id/dataset_metadata.yaml'
+# output_state: '$id/state.yaml'
+# output_raw: force_null
+# output_normalized: force_null
+# publish_dir: resources/datasets
+# remove_mitochondrial: true
+# HERE
+
+# cat > "/tmp/params.yaml" << 'HERE'
+# param_list:
+#   - id: zenodo_spatial/mouse_olfactory_bulb_puck_slideseqv2
+#     input_data: "https://zenodo.org/records/12785822/files/Slide-seqV2_stickels2020highly_stickels2021highly_SlideSeqV2_Mouse_Olfactory_bulb_Puck_200127_15_data_whole.h5ad?download=1"
+#     dataset_name: Slide-seqV2 - Mouse Olfactory Bulb Puck
+#     dataset_url: "https://singlecell.broadinstitute.org/single_cell/study/SCP815/sensitive-spatial-genome-wide-expression-profiling-at-cellular-resolution#study-summary"
+#     dataset_summary: Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2.
+#     dataset_description: "Gene expression library of mouse olfactory bulk puck profiled using Slide-seq V2."
+#     dataset_reference: stickels2020highly
+#     dataset_organism: Mus musculus
+#     spot_filter_min_genes: 10
+#     gene_filter_min_spots: 500
+#     remove_mitochondrial: true
+
+#   - id: zenodo_spatial/mouse_cortex_slideseqv2
+#     input_data: "https://zenodo.org/records/12785822/files/Slide-seqV2_stickels2020highly_palla2021squidpy_Slide-seqV2_Mouse_Cortex_data_whole.h5ad?download=1"
+#     dataset_name: Slide-seqV2 - Mouse Cortex
+#     dataset_url: "https://singlecell.broadinstitute.org/single_cell/study/SCP815/sensitive-spatial-genome-wide-expression-profiling-at-cellular-resolution#study-summary"
+#     dataset_summary: Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2.
+#     dataset_description: "Gene expression library of Mouse cortex profiled using Slide-seq V2."
+#     dataset_reference: stickels2020highly
+#     dataset_organism: Mus musculus
+#     spot_filter_min_genes: 10
+#     gene_filter_min_spots: 500
+#     remove_mitochondrial: true
+
+#   - id: zenodo_spatial/mouse_cerebellum_slideseqv2
+#     input_data: "https://zenodo.org/records/12785822/files/Slide-seqV2_stickels2020highly_stickels2021highly_Slide-seqV2_Mouse_Cerebellum_SCP948_data_whole.h5ad?download=1"
+#     dataset_name: Slide-seqV2 - Mouse Cerebellum
+#     dataset_url: "https://singlecell.broadinstitute.org/single_cell/study/SCP815/sensitive-spatial-genome-wide-expression-profiling-at-cellular-resolution#study-summary"
+#     dataset_summary: Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2.
+#     dataset_description: "Gene expression library of mouse cerebellum profiled using Slide-seq V2."
+#     dataset_reference: stickels2020highly
+#     dataset_organism: Mus musculus
+#     spot_filter_min_genes: 100
+#     gene_filter_min_spots: 500
+#     remove_mitochondrial: true
+
+#   - id: zenodo_spatial/mouse_hippocampus_puck_slideseqv2
+#     input_data: "https://zenodo.org/records/12785822/files/Slide-seqV2_stickels2020highly_stickels2021highly_Slide-seqV2_Mouse_Hippocampus_Puck_200115_08_data_whole.h5ad?download=1"
+#     dataset_name: Slide-seqV2 - Mouse Hippocampus Puck
+#     dataset_url: "https://singlecell.broadinstitute.org/single_cell/study/SCP815/sensitive-spatial-genome-wide-expression-profiling-at-cellular-resolution#study-summary"
+#     dataset_summary: Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2.
+#     dataset_description: "Gene expression library of mouse hippocampus puck profiled using Slide-seq V2."
+#     dataset_reference: stickels2020highly
+#     dataset_organism: Mus musculus
+#     spot_filter_min_genes: 200
+#     gene_filter_min_spots: 500
+#     remove_mitochondrial: true
+
+#   - id: zenodo_spatial/mouse_somatosensory_cortex_puck_slideseqv2
+#     input_data: "https://zenodo.org/records/12785822/files/Slide-seqV2_stickels2020highly_stickels2021highly_Slide-seqV2_Mouse_SomatosensoryCortex_Puck_200306_03_data_whole.h5ad?download=1"
+#     dataset_name: Slide-seqV2 - Mouse Somatosensory Cortex Puck
+#     dataset_url: "https://singlecell.broadinstitute.org/single_cell/study/SCP815/sensitive-spatial-genome-wide-expression-profiling-at-cellular-resolution#study-summary"
+#     dataset_summary: Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2.
+#     dataset_description: "Gene expression library of mouse somatosensory cortex puck profiled using Slide-seq V2."
+#     dataset_reference: stickels2020highly
+#     dataset_organism: Mus musculus
+#     spot_filter_min_genes: 200
+#     gene_filter_min_spots: 500
+#     remove_mitochondrial: true
+
+# normalization_methods: [log_cp10k]
+# output_dataset: '$id/dataset.h5ad'
+# output_meta: '$id/dataset_metadata.yaml'
+# output_state: '$id/state.yaml'
+# output_raw: force_null
+# output_normalized: force_null
+# publish_dir: resources/datasets
+# HERE
+
+# cat > "/tmp/params.yaml" << 'HERE'
+# param_list:
+#   - id: zenodo_spatial/mouse_brain_2d_zstep10_0_starmap
+#     input_data: "https://zenodo.org/records/12785822/files/STARmap_Wang2018three_data_2D_zstep10_0_data.h5ad?download=1"
+#     dataset_name: STARmap - Mouse Brain 1
+#     dataset_url: "https://www.science.org/doi/10.1126/science.aat5691"
+#     dataset_summary: Three-dimensional intact-tissue sequencing of single-cell transcriptional states.
+#     dataset_description: "3D architecture of cell types in visual cortex volumes."
+#     dataset_organism: Mus musculus
+#     dataset_reference: wang2018three
+#     spot_filter_min_genes: 1
+#     gene_filter_min_spots: 1
+#     remove_mitochondrial: true
+
+#   - id: zenodo_spatial/mouse_brain_2d_zstep15_0_starmap
+#     input_data: "https://zenodo.org/records/12785822/files/STARmap_Wang2018three_data_2D_zstep15_0_data.h5ad?download=1"
+#     dataset_name: STARmap - Mouse Brain 2
+#     dataset_url: "https://www.science.org/doi/10.1126/science.aat5691"
+#     dataset_summary: Three-dimensional intact-tissue sequencing of single-cell transcriptional states.
+#     dataset_description: "3D architecture of cell types in visual cortex volumes."
+#     dataset_organism: Mus musculus
+#     dataset_reference: wang2018three
+#     spot_filter_min_genes: 1
+#     gene_filter_min_spots: 1
+#     remove_mitochondrial: true
+
+# normalization_methods: [log_cp10k]
+# output_dataset: '$id/dataset.h5ad'
+# output_meta: '$id/dataset_metadata.yaml'
+# output_state: '$id/state.yaml'
+# output_raw: force_null
+# output_normalized: force_null
+# publish_dir: resources/datasets
+# HERE
+
+# cat > "/tmp/params.yaml" << 'HERE'
+# param_list:
+#   - id: zenodo_spatial/drosophila_embryo_e5_6_stereoseq
+#     input_data: "https://zenodo.org/records/12785822/files/Stereo-seq_wang2022high_E14-16h_a_count_normal_stereoseq_data_whole_time_point_5.6.h5ad?download=1"
+#     dataset_name: Stereo-seq - Drosophila embryo E5_6
+#     dataset_url: "https://www.sciencedirect.com/science/article/pii/S1534580722002465"
+#     dataset_summary: Stereo-seq faithfully captures Drosophila spatial transcriptomes with high resolution.
+#     dataset_description: "Drosophila has long been a successful model organism in multiple biomedical fields. Spatial gene expression patterns are critical for the understanding of complex pathways and interactions, whereas temporal gene expression changes are vital for studying highly dynamic physiological activities. Systematic studies in Drosophila are still impeded by the lack of spatiotemporal transcriptomic information. Here, utilizing spatial enhanced resolution omics-sequencing (Stereo-seq), we dissected the spatiotemporal transcriptomic changes of developing Drosophila with high resolution and sensitivity. (Data from an embryo collected 14-16 h after egg laying)"
+#     dataset_organism: Drosophila
+#     dataset_reference: wang2022high
+#     spot_filter_min_genes: 10
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: true
+
+#   - id: zenodo_spatial/drosophila_embryo_e6_3_stereoseq
+#     input_data: "https://zenodo.org/records/12785822/files/Stereo-seq_wang2022high_E14-16h_a_count_normal_stereoseq_data_whole_time_point_6.3.h5ad?download=1"
+#     dataset_name: Stereo-seq - Drosophila embryo E6_3
+#     dataset_url: "https://www.sciencedirect.com/science/article/pii/S1534580722002465"
+#     dataset_summary: Stereo-seq faithfully captures Drosophila spatial transcriptomes with high resolution.
+#     dataset_description: "Drosophila has long been a successful model organism in multiple biomedical fields. Spatial gene expression patterns are critical for the understanding of complex pathways and interactions, whereas temporal gene expression changes are vital for studying highly dynamic physiological activities. Systematic studies in Drosophila are still impeded by the lack of spatiotemporal transcriptomic information. Here, utilizing spatial enhanced resolution omics-sequencing (Stereo-seq), we dissected the spatiotemporal transcriptomic changes of developing Drosophila with high resolution and sensitivity. (Data from an embryo collected 14-16 h after egg laying)"
+#     dataset_organism: Drosophila
+#     dataset_reference: wang2022high
+#     spot_filter_min_genes: 10
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: true
+
+#   - id: zenodo_spatial/drosophila_embryo_e7_stereoseq
+#     input_data: "https://zenodo.org/records/12785822/files/Stereo-seq_wang2022high_E14-16h_a_count_normal_stereoseq_data_whole_time_point_7.h5ad?download=1"
+#     dataset_name: Stereo-seq - Drosophila embryo E7
+#     dataset_url: "https://www.sciencedirect.com/science/article/pii/S1534580722002465"
+#     dataset_summary: Stereo-seq faithfully captures Drosophila spatial transcriptomes with high resolution.
+#     dataset_description: "Drosophila has long been a successful model organism in multiple biomedical fields. Spatial gene expression patterns are critical for the understanding of complex pathways and interactions, whereas temporal gene expression changes are vital for studying highly dynamic physiological activities. Systematic studies in Drosophila are still impeded by the lack of spatiotemporal transcriptomic information. Here, utilizing spatial enhanced resolution omics-sequencing (Stereo-seq), we dissected the spatiotemporal transcriptomic changes of developing Drosophila with high resolution and sensitivity. (Data from an embryo collected 14-16 h after egg laying)"
+#     dataset_organism: Drosophila
+#     dataset_reference: wang2022high
+#     spot_filter_min_genes: 10
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: true
+
+#   - id: zenodo_spatial/drosophila_embryo_e9_1_stereoseq
+#     input_data: "https://zenodo.org/records/12785822/files/Stereo-seq_wang2022high_E14-16h_a_count_normal_stereoseq_data_whole_time_point_9.1.h5ad?download=1"
+#     dataset_name: Stereo-seq - Drosophila embryo E9_1
+#     dataset_url: "https://www.sciencedirect.com/science/article/pii/S1534580722002465"
+#     dataset_summary: Stereo-seq faithfully captures Drosophila spatial transcriptomes with high resolution.
+#     dataset_description: "Drosophila has long been a successful model organism in multiple biomedical fields. Spatial gene expression patterns are critical for the understanding of complex pathways and interactions, whereas temporal gene expression changes are vital for studying highly dynamic physiological activities. Systematic studies in Drosophila are still impeded by the lack of spatiotemporal transcriptomic information. Here, utilizing spatial enhanced resolution omics-sequencing (Stereo-seq), we dissected the spatiotemporal transcriptomic changes of developing Drosophila with high resolution and sensitivity. (Data from an embryo collected 14-16 h after egg laying)"
+#     dataset_organism: Drosophila
+#     dataset_reference: wang2022high
+#     spot_filter_min_genes: 10
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: true
+
+#   - id: zenodo_spatial/drosophila_embryo_e10_stereoseq
+#     input_data: "https://zenodo.org/records/12785822/files/Stereo-seq_wang2022high_E14-16h_a_count_normal_stereoseq_data_whole_time_point_10.5.h5ad?download=1"
+#     dataset_name: Stereo-seq - Drosophila embryo E10
+#     dataset_url: "https://www.sciencedirect.com/science/article/pii/S1534580722002465"
+#     dataset_summary: Stereo-seq faithfully captures Drosophila spatial transcriptomes with high resolution.
+#     dataset_description: "Drosophila has long been a successful model organism in multiple biomedical fields. Spatial gene expression patterns are critical for the understanding of complex pathways and interactions, whereas temporal gene expression changes are vital for studying highly dynamic physiological activities. Systematic studies in Drosophila are still impeded by the lack of spatiotemporal transcriptomic information. Here, utilizing spatial enhanced resolution omics-sequencing (Stereo-seq), we dissected the spatiotemporal transcriptomic changes of developing Drosophila with high resolution and sensitivity. (Data from an embryo collected 14-16 h after egg laying)"
+#     dataset_organism: Drosophila
+#     dataset_reference: wang2022high
+#     spot_filter_min_genes: 10
+#     gene_filter_min_spots: 50
+#     remove_mitochondrial: true
+
+# normalization_methods: [log_cp10k]
+# output_dataset: '$id/dataset.h5ad'
+# output_meta: '$id/dataset_metadata.yaml'
+# output_state: '$id/state.yaml'
+# output_raw: force_null
+# output_normalized: force_null
+# publish_dir: resources/datasets
+# HERE
+
+cat > /tmp/nextflow.config << HERE
+process {
+  executor = 'awsbatch'
+  withLabel: highmem {
+    memory = '350GB'
+  }
+  withName: '.*publishStatesProc' {
+    memory = '16GB'
+    disk = '100GB'
+  }
+}
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/datasets/workflows/process_zenodo_spatial/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file "/tmp/params.yaml" \
+  --config /tmp/nextflow.config 
diff --git a/src/datasets/resource_scripts/zenodo_spatial_slidetags.sh b/src/datasets/resource_scripts/zenodo_spatial_slidetags.sh
new file mode 100755
index 0000000000..5ab4962240
--- /dev/null
+++ b/src/datasets/resource_scripts/zenodo_spatial_slidetags.sh
@@ -0,0 +1,82 @@
+#!/bin/bash
+
+cat > "/tmp/params.yaml" << 'HERE'
+param_list:
+  - id: zenodo_spatial_slidetags/human_cortex_slidetags
+    input_data: "https://zenodo.org/records/12785822/files/slidetag_human_cortex.tar.gz?download=1"
+    dataset_name: Slide-tags - Human Cortex
+    dataset_url: "https://www.nature.com/articles/s41586-023-06837-4"
+    dataset_summary: Slide-tags enables single-nucleus barcoding for multimodal spatial genomics.
+    dataset_description: "A 100 mm2 region of the human prefrontal cortex from a neurotypical donor aged 78 years was profiled by Slide-tags."
+    dataset_organism: Homo sapiens
+    dataset_reference: russell2023slide
+    spot_filter_min_genes: 200
+    gene_filter_min_spots: 50
+    remove_mitochondrial: true
+
+  - id: zenodo_spatial_slidetags/human_skin_melanoma_slidetags
+    input_data: "https://zenodo.org/records/12785822/files/slidetag_human_skin_melanoma.tar.gz?download=1"
+    dataset_name: Slide-tags - Human Skin Melanoma
+    dataset_url: "https://www.nature.com/articles/s41586-023-06837-4"
+    dataset_summary: Slide-tags enables single-nucleus barcoding for multimodal spatial genomics.
+    dataset_description: "A metastatic melanoma sample was profiled by Slide-tags."
+    dataset_organism: Homo sapiens
+    dataset_reference: russell2023slide
+    spot_filter_min_genes: 200
+    gene_filter_min_spots: 50
+    remove_mitochondrial: true
+
+  - id: zenodo_spatial_slidetags/human_tonsil_slidetags
+    input_data: "https://zenodo.org/records/12785822/files/slidetag_human_tonsil.tar.gz?download=1"
+    dataset_name: Slide-tags - Human Tonsil
+    dataset_url: "https://www.nature.com/articles/s41586-023-06837-4"
+    dataset_summary: Slide-tags enables single-nucleus barcoding for multimodal spatial genomics.
+    dataset_description: "A human tonsil was profiled by Slide-tags."
+    dataset_organism: Homo sapiens
+    dataset_reference: russell2023slide
+    spot_filter_min_genes: 200
+    gene_filter_min_spots: 50
+    remove_mitochondrial: true
+
+  - id: zenodo_spatial_slidetags/mouse_embryo_slidetags
+    input_data: "https://zenodo.org/records/12785822/files/slidetag_mouse_embryo.tar.gz?download=1"
+    dataset_name: Slide-tags - Mouse Embryo
+    dataset_url: "https://www.nature.com/articles/s41586-023-06837-4"
+    dataset_summary: Slide-tags enables single-nucleus barcoding for multimodal spatial genomics.
+    dataset_description: "Mouse embryo tonsil was profiled by Slide-tags."
+    dataset_organism: Mus musculus
+    dataset_reference: russell2023slide
+    spot_filter_min_genes: 200
+    gene_filter_min_spots: 50
+    remove_mitochondrial: false
+
+normalization_methods: [log_cp10k]
+output_dataset: '$id/dataset.h5ad'
+output_meta: '$id/dataset_metadata.yaml'
+output_state: '$id/state.yaml'
+output_raw: force_null
+output_normalized: force_null
+publish_dir: resources/datasets
+HERE
+
+cat > /tmp/nextflow.config << HERE
+process {
+  executor = 'awsbatch'
+  withLabel: highmem {
+    memory = '350GB'
+  }
+  withName: '.*publishStatesProc' {
+    memory = '16GB'
+    disk = '100GB'
+  }
+}
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/datasets/workflows/process_zenodo_spatial_slidetags/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file "/tmp/params.yaml" \
+  --config /tmp/nextflow.config 
diff --git a/src/datasets/resource_test_scripts/cxg_mouse_pancreas_atlas.sh b/src/datasets/resource_test_scripts/cxg_mouse_pancreas_atlas.sh
new file mode 100755
index 0000000000..3b5d35ee5c
--- /dev/null
+++ b/src/datasets/resource_test_scripts/cxg_mouse_pancreas_atlas.sh
@@ -0,0 +1,50 @@
+#!/bin/bash
+
+DATASET_DIR=resources_test/common
+
+
+mkdir -p $DATASET_DIR
+
+wget https://raw.githubusercontent.com/theislab/scib/c993ffd9ccc84ae0b1681928722ed21985fb91d1/scib/resources/g2m_genes_tirosh.txt -O $DATASET_DIR/temp_g2m_genes_tirosh_mm.txt
+wget https://raw.githubusercontent.com/theislab/scib/c993ffd9ccc84ae0b1681928722ed21985fb91d1/scib/resources/s_genes_tirosh.txt -O $DATASET_DIR/temp_s_genes_tirosh_mm.txt
+KEEP_FEATURES=`cat $DATASET_DIR/temp_g2m_genes_tirosh_mm.txt $DATASET_DIR/temp_s_genes_tirosh_mm.txt | paste -sd ":" -`
+
+cat > "/tmp/params.yaml" << HERE
+param_list:
+  - id: cxg_mouse_pancreas_atlas
+    species: mus_musculus
+    census_version: "2023-07-25"
+    obs_value_filter: "dataset_id == '49e4ffcc-5444-406d-bdee-577127404ba8' and donor_id in ['mouse_pancreatic_islet_atlas_Hrovatin__Fltp_2y__MUC13974', 'mouse_pancreatic_islet_atlas_Hrovatin__Fltp_2y__MUC13975', 'mouse_pancreatic_islet_atlas_Hrovatin__Fltp_2y__MUC13976']"
+    obs_batch: donor_id
+    dataset_name: Mouse Pancreatic Islet Atlas
+    dataset_summary: Mouse pancreatic islet scRNA-seq atlas across sexes, ages, and stress conditions including diabetes
+    dataset_description: To better understand pancreatic β-cell heterogeneity we generated a mouse pancreatic islet atlas capturing a wide range of biological conditions. The atlas contains scRNA-seq datasets of over 300,000 mouse pancreatic islet cells, of which more than 100,000 are β-cells, from nine datasets with 56 samples, including two previously unpublished datasets. The samples vary in sex, age (ranging from embryonic to aged), chemical stress, and disease status (including T1D NOD model development and two T2D models, mSTZ and db/db) together with different diabetes treatments. Additional information about data fields is available in anndata uns field 'field_descriptions' and on https://github.com/theislab/mm_pancreas_atlas_rep/blob/main/resources/cellxgene.md.
+    dataset_url: https://cellxgene.cziscience.com/collections/296237e2-393d-4e31-b590-b03f74ac5070
+    dataset_reference: hrovatin2023delineating
+    dataset_organism: mus_musculus
+
+normalization_methods: [log_cp10k]
+n_obs: 600
+n_vars: 1500
+output_dataset: '\$id/dataset.h5ad'
+output_meta: '\$id/dataset_metadata.yaml'
+output_state: '\$id/state.yaml'
+output_raw: force_null
+output_normalized: force_null
+output_pca: force_null
+output_hvg: force_null
+output_knn: force_null
+publish_dir: $DATASET_DIR
+do_subsample: true
+keep_features: '$KEEP_FEATURES'
+HERE
+
+nextflow run . \
+  -main-script target/nextflow/datasets/workflows/process_cellxgene_census/main.nf \
+  -c src/wf_utils/labels_ci.config \
+  -profile docker \
+  -params-file "/tmp/params.yaml"
+
+rm -r $DATASET_DIR/temp_*
+
+# src/tasks/batch_integration/resources_test_scripts/process.sh
\ No newline at end of file
diff --git a/src/datasets/resource_test_scripts/mouse_brain_coronal_section1.sh b/src/datasets/resource_test_scripts/mouse_brain_coronal_section1.sh
new file mode 100755
index 0000000000..e4b889e063
--- /dev/null
+++ b/src/datasets/resource_test_scripts/mouse_brain_coronal_section1.sh
@@ -0,0 +1,37 @@
+#!/bin/bash
+
+set -e
+
+cat > /tmp/params.yaml << 'HERE'
+param_list:
+  - id: mouse_brain_coronal_section1
+    input_expression: "https://cf.10xgenomics.com/samples/spatial-exp/2.0.0/CytAssist_FFPE_Mouse_Brain_Rep1/CytAssist_FFPE_Mouse_Brain_Rep1_filtered_feature_bc_matrix.h5"
+    input_spatial: "https://cf.10xgenomics.com/samples/spatial-exp/2.0.0/CytAssist_FFPE_Mouse_Brain_Rep1/CytAssist_FFPE_Mouse_Brain_Rep1_spatial.tar.gz"
+    dataset_name: Mouse Brain Coronal Section 1 (FFPE)
+    dataset_url: "https://www.10xgenomics.com/datasets/mouse-brain-coronal-section-1-ffpe-2-standard"
+    dataset_summary: Gene expression library of Mouse Brain (CytAssist FFPE) using the Mouse Whole Transcriptome Probe Set
+    dataset_description: "FFPE Mouse Brain tissue blocks sectioned as described in Visium CytAssist Spatial Gene Expression for FFPE - Tissue Preparation Guide Demonstrated Protocol. The H&E stained glass slide with tissue section was processed via Visium CytAssist instrument to transfer analytes to a Visium CytAssist Spatial Gene Expression slide. The probe extension and library construction steps follow the standard Visium for FFPE workflow outside of the instrument. The H&E image was acquired using Olympus VS200 Slide Scanning Microscope. Sequencing depth was 53,497 reads per spot. Sequencing configuration: 28bp read 1 (16bp Visium spatial barcode, 12bp UMI), 90bp read 2 (transcript), 10bp i7 sample barcode and 10bp i5 sample barcode. Key metrics include: 2,310 spots detected under tissue; 6,736 median genes per spot; 24,862 median UMI counts per spot."
+    dataset_reference: 10x2022brain
+    dataset_organism: Mus musculus
+
+normalization_methods: [log_cp10k]
+n_obs: 600
+n_vars: 500
+output_dataset: '$id/dataset.h5ad'
+output_meta: '$id/dataset_metadata.yaml'
+output_state: '$id/state.yaml'
+output_raw: force_null
+output_normalized: force_null
+publish_dir: resources_test/common
+do_subsample: true
+spot_filter_min_genes: 200
+gene_filter_min_spots: 50
+remove_mitochondrial: true
+HERE
+
+nextflow run . \
+  -main-script target/nextflow/datasets/workflows/process_tenx_visium/main.nf \
+  -c src/wf_utils/labels_ci.config \
+  -profile docker \
+  -params-file "/tmp/params.yaml"
+
diff --git a/src/datasets/resource_test_scripts/neurips2021_bmmc.sh b/src/datasets/resource_test_scripts/neurips2021_bmmc.sh
new file mode 100755
index 0000000000..98644d9dbf
--- /dev/null
+++ b/src/datasets/resource_test_scripts/neurips2021_bmmc.sh
@@ -0,0 +1,71 @@
+#!/bin/bash
+
+set -e
+
+params_file="/tmp/datasets_openproblems_neurips2021_params.yaml"
+
+cat > "$params_file" << 'HERE'
+param_list:
+  - id: openproblems_neurips2021/bmmc_cite
+    # input: "/tmp/neurips2021_bmmc_cite.h5ad"
+    input: "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE194nnn/GSE194122/suppl/GSE194122%5Fopenproblems%5Fneurips2021%5Fcite%5FBMMC%5Fprocessed%2Eh5ad%2Egz"
+    mod1: GEX
+    mod2: ADT
+    dataset_name: OpenProblems NeurIPS2021 CITE-Seq
+    dataset_organism: homo_sapiens
+    dataset_summary: Single-cell CITE-Seq (GEX+ADT) data collected from bone marrow mononuclear cells of 12 healthy human donors.
+    dataset_description: "Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X 3 prime Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site."
+
+  - id: openproblems_neurips2021/bmmc_multiome
+    # input: "/tmp/neurips2021_bmmc_multiome.h5ad"
+    input: "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE194nnn/GSE194122/suppl/GSE194122%5Fopenproblems%5Fneurips2021%5Fmultiome%5FBMMC%5Fprocessed%2Eh5ad%2Egz"
+    mod1: GEX
+    mod2: ATAC
+    dataset_name: OpenProblems NeurIPS2021 Multiome
+    dataset_organism: homo_sapiens
+    dataset_summary: Single-cell Multiome (GEX+ATAC) data collected from bone marrow mononuclear cells of 12 healthy human donors.
+    dataset_description: "Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X Multiome Gene Expression and Chromatin Accessibility kit. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site."
+
+dataset_url: "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122"
+dataset_reference: luecken2021neurips
+normalization_methods: [log_cp10k]
+do_subsample: true
+even: true
+n_obs: 600
+n_vars: 1500
+output_mod1: '$id/dataset_mod1.h5ad'
+output_mod2: '$id/dataset_mod2.h5ad'
+output_meta_mod1: '$id/dataset_metadata_mod1.yaml'
+output_meta_mod2: '$id/dataset_metadata_mod2.yaml'
+output_state: '$id/state.yaml'
+# publish_dir: s3://openproblems-data/resources_test/common
+HERE
+
+# cat > /tmp/nextflow.config << HERE
+# process {
+#   withName:'.*publishStatesProc' {
+#     memory = '16GB'
+#     disk = '100GB'
+#   }
+# }
+# HERE
+
+nextflow run . \
+  -main-script target/nextflow/datasets/workflows/process_openproblems_neurips2021_bmmc/main.nf \
+  -profile docker \
+  -resume \
+  --publish_dir resources_test/common \
+  -params-file "$params_file" \
+  -c src/wf_utils/labels.config
+
+# tw launch https://github.com/openproblems-bio/openproblems.git \
+#   --revision main_build \
+#   --main-script target/nextflow/datasets/workflows/process_openproblems_neurips2021_bmmc/main.nf \
+#   --workspace 53907369739130 \
+#   --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+#   --params-file "$params_file" \
+#   --config /tmp/nextflow.config \
+#   --labels predict_modality
+
+# run task process dataset components
+src/tasks/predict_modality/resources_test_scripts/neurips2021_bmmc.sh
\ No newline at end of file
diff --git a/src/datasets/resource_test_scripts/neurips2022_pbmc.sh b/src/datasets/resource_test_scripts/neurips2022_pbmc.sh
new file mode 100755
index 0000000000..b62e6f40e1
--- /dev/null
+++ b/src/datasets/resource_test_scripts/neurips2022_pbmc.sh
@@ -0,0 +1,76 @@
+#!/bin/bash
+
+set -e
+
+params_file="/tmp/datasets_openproblems_neurips2022_params.yaml"
+
+cat > "$params_file" << 'HERE'
+param_list:
+  - id: openproblems_neurips2022/pbmc_cite
+    input_mod1: s3://openproblems-nextflow/datasets_private/neurips2022/cite_rna_merged.h5ad
+    input_mod2: s3://openproblems-nextflow/datasets_private/neurips2022/cite_prot_merged.h5ad
+    mod1: GEX
+    mod2: ADT
+    dataset_name: OpenProblems NeurIPS2022 CITE-Seq
+    dataset_organism: homo_sapiens
+    dataset_summary: Single-cell CITE-Seq (GEX+ADT) data collected from bone marrow mononuclear cells of 12 healthy human donors.
+    dataset_description: "Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X 3 prime Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2022. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site."
+
+  - id: openproblems_neurips2022/pbmc_multiome
+    input_mod1: s3://openproblems-nextflow/datasets_private/neurips2022/multiome_rna_merged.h5ad
+    input_mod2: s3://openproblems-nextflow/datasets_private/neurips2022/multiome_atac_merged.h5ad
+    mod1: GEX
+    mod2: ATAC
+    dataset_name: OpenProblems NeurIPS2022 Multiome
+    dataset_organism: homo_sapiens
+    dataset_summary: Single-cell Multiome (GEX+ATAC) data collected from bone marrow mononuclear cells of 12 healthy human donors.
+    dataset_description: "Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X Multiome Gene Expression and Chromatin Accessibility kit. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2022. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site."
+
+dataset_url: "https://www.kaggle.com/competitions/open-problems-multimodal/data"
+dataset_reference: lance2024predicting
+normalization_methods: [log_cp10k]
+do_subsample: true
+even: true
+n_obs: 600
+n_vars: 1500
+output_mod1: '$id/dataset_mod1.h5ad'
+output_mod2: '$id/dataset_mod2.h5ad'
+output_meta_mod1: '$id/dataset_metadata_mod1.yaml'
+output_meta_mod2: '$id/dataset_metadata_mod2.yaml'
+output_state: '$id/state.yaml'
+publish_dir: s3://openproblems-data/resources_test/common
+HERE
+
+# nextflow run . \
+#   -main-script target/nextflow/datasets/workflows/process_openproblems_neurips2022_pbmc/main.nf \
+#   -profile docker \
+#   -resume \
+#   --publish_dir resources_test/common \
+#   -params-file "$params_file" \
+#   -c src/wf_utils/labels.config
+
+
+cat > /tmp/nextflow.config << HERE
+process {
+  withName:'.*publishStatesProc' {
+    memory = '16GB'
+    disk = '100GB'
+  }
+}
+HERE
+
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/datasets/workflows/process_openproblems_neurips2022_pbmc/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 1pK56PjjzeraOOC2LDZvN2 \
+  --params-file "$params_file" \
+  --config /tmp/nextflow.config \
+  --labels openproblems_neurips2022_pbmc,dataset_loader \
+
+
+
+# run task process dataset components
+# src/tasks/predict_modality/resources_test_scripts/neurips2022_pbmc.sh
\ No newline at end of file
diff --git a/src/datasets/resource_test_scripts/pancreas.sh b/src/datasets/resource_test_scripts/pancreas.sh
new file mode 100755
index 0000000000..fb26f7ef30
--- /dev/null
+++ b/src/datasets/resource_test_scripts/pancreas.sh
@@ -0,0 +1,61 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+DATASET_DIR=resources_test/common
+
+set -e
+
+mkdir -p $DATASET_DIR
+
+wget https://raw.githubusercontent.com/theislab/scib/c993ffd9ccc84ae0b1681928722ed21985fb91d1/scib/resources/g2m_genes_tirosh_hm.txt -O $DATASET_DIR/temp_g2m_genes_tirosh_hm.txt
+wget https://raw.githubusercontent.com/theislab/scib/c993ffd9ccc84ae0b1681928722ed21985fb91d1/scib/resources/s_genes_tirosh_hm.txt -O $DATASET_DIR/temp_s_genes_tirosh_hm.txt
+KEEP_FEATURES=`cat $DATASET_DIR/temp_g2m_genes_tirosh_hm.txt $DATASET_DIR/temp_s_genes_tirosh_hm.txt | paste -sd ":" -`
+
+# download dataset
+nextflow run . \
+  -main-script target/nextflow/datasets/workflows/process_openproblems_v1/main.nf \
+  -profile docker \
+  -c src/wf_utils/labels_ci.config \
+  -resume \
+  --id pancreas \
+  --input_id pancreas \
+  --obs_cell_type "celltype" \
+  --obs_batch "tech" \
+  --var_feature_name "index" \
+  --layer_counts "counts" \
+  --dataset_name "Human pancreas" \
+  --dataset_url "https://theislab.github.io/scib-reproducibility/dataset_pancreas.html" \
+  --dataset_reference "luecken2022benchmarking" \
+  --dataset_summary "Human pancreas cells dataset from the scIB benchmarks" \
+  --dataset_description "Human pancreatic islet scRNA-seq data from 6 datasets across technologies (CEL-seq, CEL-seq2, Smart-seq2, inDrop, Fluidigm C1, and SMARTER-seq)." \
+  --dataset_organism "homo_sapiens" \
+  --keep_cell_type_categories "acinar:beta" \
+  --keep_batch_categories "celseq:inDrop4:smarter" \
+  --keep_features "$KEEP_FEATURES" \
+  --seed 123 \
+  --normalization_methods log_cp10k \
+  --do_subsample true \
+  --n_obs 600 \
+  --n_vars 1500 \
+  --output_raw '$id/raw.h5ad' \
+  --output_normalized '$id/normalized.h5ad' \
+  --output_hvg '$id/hvg.h5ad' \
+  --output_pca '$id/pca.h5ad' \
+  --output_knn '$id/knn.h5ad' \
+  --output_dataset '$id/dataset.h5ad' \
+  --output_meta '$id/dataset_meta.yaml' \
+  --output_state '$id/state.yaml' \
+  --publish_dir "$DATASET_DIR"
+
+rm -r $DATASET_DIR/temp_*
+
+# run task process dataset components
+src/tasks/batch_integration/resources_test_scripts/process.sh
+src/tasks/denoising/resources_test_scripts/pancreas.sh
+src/tasks/dimensionality_reduction/resources_test_scripts/pancreas.sh
+src/tasks/label_projection/resources_test_scripts/pancreas.sh
\ No newline at end of file
diff --git a/src/datasets/resource_test_scripts/scicar_cell_lines.sh b/src/datasets/resource_test_scripts/scicar_cell_lines.sh
new file mode 100755
index 0000000000..f765744136
--- /dev/null
+++ b/src/datasets/resource_test_scripts/scicar_cell_lines.sh
@@ -0,0 +1,48 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+DATASET_DIR=resources_test/common
+
+set -e
+
+mkdir -p $DATASET_DIR
+
+# download dataset
+nextflow run . \
+  -main-script target/nextflow/datasets/workflows/process_openproblems_v1_multimodal/main.nf \
+  -profile docker \
+  -resume \
+  --id scicar_cell_lines \
+  --input_id scicar_cell_lines \
+  --obs_tissue "source" \
+  --layer_counts "counts" \
+  --obs_cell_type "cell_name" \
+  --var_feature_id "index" \
+  --var_feature_name "gene_short_name" \
+  --dataset_name "sci-CAR cell lines" \
+  --dataset_url "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE117089" \
+  --dataset_reference "cao2018joint" \
+  --dataset_summary "sciCAR is a combinatorial indexing-based assay that jointly measures cellular transcriptomes and the accessibility of cellular chromatin in the same cells" \
+  --dataset_description "sciCAR is a combinatorial indexing-based assay that jointly measures cellular transcriptomes and the accessibility of cellular chromatin in the same cells. Here, we use two sciCAR datasets that were obtained from the same study. The first dataset contains 4,825 cells from three cell lines (HEK293T cells, NIH/3T3 cells, and A549 cells) at multiple timepoints (0, 1 hour, 3 hours) after dexamethasone treatment. The second dataset contains 11,233 cells from wild-type adult mouse kidney." \
+  --dataset_organism "[homo_sapiens, mus_musculus]" \
+  --mod1 GEX \
+  --mod2 ATAC \
+  --do_subsample true \
+  --n_obs 600 \
+  --n_vars 1500 \
+  --seed 123 \
+  --normalization_methods log_cp10k \
+  --output_mod1 '$id/dataset_mod1.h5ad' \
+  --output_mod2 '$id/dataset_mod2.h5ad' \
+  --output_meta_mod1 '$id/dataset_metadata_mod1.yaml' \
+  --output_meta_mod2 '$id/dataset_metadata_mod2.yaml' \
+  --output_state '$id/state.yaml' \
+  --publish_dir "$DATASET_DIR"
+
+# run task process dataset components
+src/tasks/match_modalities/resources_test_scripts/scicar_cell_lines.sh
\ No newline at end of file
diff --git a/src/datasets/resource_test_scripts/slideseq_test.sh b/src/datasets/resource_test_scripts/slideseq_test.sh
new file mode 100755
index 0000000000..a9050be40a
--- /dev/null
+++ b/src/datasets/resource_test_scripts/slideseq_test.sh
@@ -0,0 +1,36 @@
+#!/bin/bash
+
+set -e
+
+cat > /tmp/params.yaml << 'HERE'
+param_list:
+  - id: mouse_cerebellum
+    input_data: "https://zenodo.org/records/12785822/files/Slide-seqV2_stickels2020highly_stickels2021highly_SlideSeqV2_Mouse_Olfactory_bulb_Puck_200127_15_data_whole.h5ad?download=1"
+    dataset_name: Mouse cerebellum
+    dataset_url: "..."
+    dataset_summary: ...
+    dataset_description: "..."
+    dataset_reference: ref
+    dataset_organism: Mus musculus
+
+normalization_methods: [log_cp10k]
+n_obs: 600
+n_vars: 500
+output_dataset: '$id/dataset.h5ad'
+output_meta: '$id/dataset_metadata.yaml'
+output_state: '$id/state.yaml'
+output_raw: force_null
+output_normalized: force_null
+publish_dir: resources_test/common
+do_subsample: true
+spot_filter_min_genes: 200
+gene_filter_min_spots: 50
+remove_mitochondrial: true
+HERE
+
+nextflow run . \
+  -main-script target/nextflow/datasets/workflows/process_spatial_from_zenodo/main.nf \
+  -c src/wf_utils/labels_ci.config \
+  -profile docker \
+  -params-file "/tmp/params.yaml"
+
diff --git a/src/datasets/workflows/extract_dataset_info/config.vsh.yaml b/src/datasets/workflows/extract_dataset_info/config.vsh.yaml
new file mode 100644
index 0000000000..58433db567
--- /dev/null
+++ b/src/datasets/workflows/extract_dataset_info/config.vsh.yaml
@@ -0,0 +1,34 @@
+functionality:
+  name: "extract_dataset_info"
+  namespace: "datasets/workflows"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input"
+          __merge__: /src/datasets/api/file_raw.yaml
+          required: true
+          direction: input
+    - name: Filter arguments
+      arguments:
+        - name: "--filter_normalization_id"
+          type: string
+          required: false
+          direction: input
+          description: If defined, only the normalization with this ID will be included in the output.
+          multiple: true
+          example: [ log_cp10k ]
+    - name: Outputs
+      arguments:
+        - name: "--output"
+          type: file
+          required: true
+          direction: output
+          example: dataset_uns.yaml
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+  dependencies: 
+    - name: common/extract_metadata
+platforms:
+  - type: nextflow
diff --git a/src/datasets/workflows/extract_dataset_info/main.nf b/src/datasets/workflows/extract_dataset_info/main.nf
new file mode 100644
index 0000000000..887812760e
--- /dev/null
+++ b/src/datasets/workflows/extract_dataset_info/main.nf
@@ -0,0 +1,58 @@
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    // extract the dataset metadata
+    | extract_metadata.run(
+      fromState: [input: "input"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    // only keep one of the normalization methods
+    | filter{ id, state ->
+      if (state.filter_normalization_id) {
+        state.filter_normalization_id.contains(state.dataset_uns.normalization_id)
+      } else {
+        true
+      }
+    }
+
+    | joinStates { ids, states ->
+      // remove normalization id
+      // TODO: make this optional through a parameter?
+      def dataset_uns = states.collect{state ->
+        def uns = state.dataset_uns.clone()
+        uns.remove("normalization_id")
+        uns
+      }
+
+      // store data as yaml
+      def dataset_uns_yaml_blob = toYamlBlob(dataset_uns)
+      def dataset_uns_file = tempFile("dataset_uns.yaml")
+      dataset_uns_file.write(dataset_uns_yaml_blob)
+
+      def new_state = [
+        output: dataset_uns_file, 
+        _meta: [join_id: ids[0]]
+      ]
+      ["output", new_state]
+    }
+
+
+  emit:
+  output_ch
+}
diff --git a/src/datasets/workflows/extract_dataset_info/run_test.sh b/src/datasets/workflows/extract_dataset_info/run_test.sh
new file mode 100755
index 0000000000..9723de008a
--- /dev/null
+++ b/src/datasets/workflows/extract_dataset_info/run_test.sh
@@ -0,0 +1,32 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+# export TOWER_WORKSPACE_ID=53907369739130
+
+OUTPUT_DIR="output/temp"
+
+if [ ! -d "$OUTPUT_DIR" ]; then
+  mkdir -p "$OUTPUT_DIR"
+fi
+
+DATASETS_DIR="resources_test/common"
+
+export NXF_VER=22.04.5
+nextflow run . \
+  -main-script target/nextflow/datasets/workflows/extract_dataset_info/main.nf \
+  -profile docker \
+  -resume \
+  -c src/wf_utils/labels_ci.config \
+  -entry auto \
+  --input_states "$DATASETS_DIR/**/state.yaml" \
+  --rename_keys 'input:output_dataset' \
+  --settings '{"output": "dataset_info.yaml"}' \
+  --publish_dir "$OUTPUT_DIR" \
+  --output_state "state.yaml"
\ No newline at end of file
diff --git a/src/datasets/workflows/extract_dataset_meta/config.vsh.yaml b/src/datasets/workflows/extract_dataset_meta/config.vsh.yaml
new file mode 100644
index 0000000000..26041b1039
--- /dev/null
+++ b/src/datasets/workflows/extract_dataset_meta/config.vsh.yaml
@@ -0,0 +1,25 @@
+functionality:
+  name: "extract_dataset_meta"
+  namespace: "datasets/workflows"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input"
+          __merge__: /src/datasets/api/file_raw.yaml
+          required: true
+          direction: input
+    - name: Outputs
+      arguments:
+        - name: "--output"
+          type: file
+          required: true
+          direction: output
+          example: meta.yaml
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+  dependencies: 
+    - name: common/extract_metadata
+platforms:
+  - type: nextflow
diff --git a/src/datasets/workflows/extract_dataset_meta/main.nf b/src/datasets/workflows/extract_dataset_meta/main.nf
new file mode 100644
index 0000000000..cbac67b571
--- /dev/null
+++ b/src/datasets/workflows/extract_dataset_meta/main.nf
@@ -0,0 +1,20 @@
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    // extract the dataset metadata
+    | extract_metadata.run(
+      fromState: [input: "input"],
+      toState: [output: "output"]
+    )
+
+    | setState([
+      "output",
+    ])
+
+  emit:
+  output_ch
+}
diff --git a/src/datasets/workflows/extract_dataset_meta/run_test.sh b/src/datasets/workflows/extract_dataset_meta/run_test.sh
new file mode 100755
index 0000000000..4792938fee
--- /dev/null
+++ b/src/datasets/workflows/extract_dataset_meta/run_test.sh
@@ -0,0 +1,29 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+# export TOWER_WORKSPACE_ID=53907369739130
+
+OUTPUT_DIR="output/temp"
+
+if [ ! -d "$OUTPUT_DIR" ]; then
+  mkdir -p "$OUTPUT_DIR"
+fi
+
+DATASETS_DIR="resources_test/common/pancreas/dataset.h5ad"
+
+export NXF_VER=22.04.5
+nextflow run . \
+  -main-script target/nextflow/datasets/workflows/extract_dataset_meta/main.nf \
+  -profile docker \
+  -resume \
+  -c src/wf_utils/labels_ci.config \
+  --input $DATASETS_DIR \
+  --output meta.yaml \
+  --publish_dir "$OUTPUT_DIR"
\ No newline at end of file
diff --git a/src/datasets/workflows/process_cellxgene_census/config.vsh.yaml b/src/datasets/workflows/process_cellxgene_census/config.vsh.yaml
new file mode 100644
index 0000000000..3e1fd5263b
--- /dev/null
+++ b/src/datasets/workflows/process_cellxgene_census/config.vsh.yaml
@@ -0,0 +1,201 @@
+functionality:
+  name: process_cellxgene_census
+  namespace: datasets/workflows
+  description: |
+    Fetch and process datasets originating from the CELLxGENE census.
+  argument_groups:
+    - name: Input database
+      description: "Open CellxGene Census by version or URI."
+      arguments:
+        - name: "--input_uri"
+          type: string
+          description: "If specified, a URI containing the Census SOMA objects. If specified, will take precedence over the `--census_version` argument."
+          required: false
+          example: "s3://bucket/path"
+        - name: "--census_version"
+          description: "Which release of CellxGene census to use. Possible values are \"latest\", \"stable\", or the date of one of the releases (e.g. \"2023-07-25\"). For more information, check the documentation on [Census data releases](https://chanzuckerberg.github.io/cellxgene-census/cellxgene_census_docsite_data_release_info.html)."
+          type: string
+          example: "stable"
+          required: false
+    - name: Cell query
+      description: Arguments related to the query.
+      arguments:
+        - name: "--species"
+          type: string
+          description: The organism to query, usually one of `Homo sapiens` or `Mus musculus`.
+          required: false
+          default: "homo_sapiens"
+          multiple: false
+        - name: "--obs_value_filter"
+          type: string
+          description: "Filter for selecting the `obs` metadata (i.e. cells). Value is a filter query written in the SOMA `value_filter` syntax."
+          required: false
+          example: "is_primary_data == True and cell_type_ontology_term_id in ['CL:0000136', 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'"
+    - name: Cell filter
+      description: Filter the cells based on a minimum cell count per specified group
+      arguments:
+        - name: "--cell_filter_grouping"
+          type: string
+          description: |
+            A subset of 'obs' columns by which to group the cells for filtering.
+            Only groups surpassing or equal to the `--cell_filter_minimum_count`
+            threshold will be retained. Take care not to introduce a selection
+            bias against cells with more fine-grained ontology annotations.
+          required: false
+          example: ["dataset_id", "tissue", "assay", "disease", "cell_type"]
+          multiple: true
+        - name: "--cell_filter_minimum_count"
+          type: double
+          description: |
+            A minimum number of cells per group to retain. If `--cell_filter_grouping`
+            is defined, this parameter should also be provided and vice versa.
+          required: false
+          example: 100
+    - name: Cell metadata
+      description: Cell metadata arguments
+      arguments:
+        - name: "--obs_batch"
+          type: string
+          description: |
+            Location of where to find the observation batch IDs.  
+            
+            * If not specified, the `.obs["batch"]` field will not be included.
+            * If one or more values are specified, the `.obs["batch"]` field will be 
+              set to the concatenated values of the specified fields, separated by
+              the `obs_batch_separator`.
+          required: false
+          multiple: true
+          multiple_sep: ","
+          example: ["batch"]
+        - name: "--obs_batch_separator"
+          type: string
+          description: Separator to use when concatenating the values of the `--obs_batch` fields.
+          required: false
+          default: "+"
+    - name: Dataset metadata
+      description: Information about the dataset that will be stored in the `.uns` slot.
+      arguments:
+        - name: "--id"
+          type: string
+          description: Nicely formatted name.
+          required: true
+        - name: "--dataset_name"
+          type: string
+          description: Nicely formatted name.
+          required: true
+        - name: "--dataset_url"
+          type: string
+          description: Link to the original source of the dataset.
+          required: false
+        - name: "--dataset_reference"
+          type: string
+          description: Bibtex reference of the paper in which the dataset was published.
+          required: false
+        - name: "--dataset_summary"
+          type: string
+          description: Short description of the dataset.
+          required: true
+        - name: "--dataset_description"
+          type: string
+          description: Long description of the dataset.
+          required: true
+        - name: "--dataset_organism"
+          type: string
+          description: The organism of the dataset.
+          required: true
+    - name: Sampling options
+      arguments:
+        - name: "--do_subsample"
+          type: boolean
+          default: false
+          description: "Whether or not to subsample the dataset"
+        - name: "--n_obs"
+          type: integer
+          description: Maximum number of observations to be kept. It might end up being less because empty cells / genes are removed.
+          default: 500
+        - name: "--n_vars"
+          type: integer
+          description: Maximum number of variables to be kept. It might end up being less because empty cells / genes are removed.
+          default: 500
+        - name: "--keep_features"
+          type: string
+          multiple: true
+          description: A list of genes to keep.
+        - name: "--keep_cell_type_categories"
+          type: "string"
+          multiple: true
+          description: "Categories indexes to be selected"
+          required: false
+        - name: "--keep_batch_categories"
+          type: "string"
+          multiple: true
+          description: "Categories indexes to be selected"
+          required: false
+        - name: "--even"
+          type: "boolean_true"
+          description: Subsample evenly from different batches
+        - name: "--seed"
+          type: "integer"
+          description: "A seed for the subsampling."
+          example: 123
+    - name: Normalization
+      arguments:
+        - name: "--normalization_methods"
+          type: string
+          multiple: true
+          choices: ["log_cp10k", "log_cpm", "sqrt_cp10k", "sqrt_cpm", "l1_sqrt", "log_scran_pooling"]
+          default: ["log_cp10k", "log_cpm", "sqrt_cp10k", "sqrt_cpm", "l1_sqrt"]
+          description: "Which normalization methods to run."
+    - name: Outputs
+      arguments:
+        - name: "--output_dataset"
+          __merge__: /src/datasets/api/file_common_dataset.yaml
+          direction: "output"
+          required: true
+        - name: "--output_meta"
+          direction: "output"
+          type: file
+          description: "Dataset metadata"
+          default: "dataset_metadata.yaml"
+        - name: "--output_raw"
+          __merge__: /src/datasets/api/file_raw.yaml
+          direction: "output"
+          required: false
+        - name: "--output_normalized"
+          __merge__: /src/datasets/api/file_normalized.yaml
+          direction: "output"
+          required: false
+        - name: "--output_pca"
+          __merge__: /src/datasets/api/file_pca.yaml
+          direction: "output"
+          required: false
+        - name: "--output_hvg"
+          __merge__: /src/datasets/api/file_hvg.yaml
+          direction: "output"
+          required: false
+        - name: "--output_knn"
+          __merge__: /src/datasets/api/file_knn.yaml
+          direction: "output"
+          required: false
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - path: /src/wf_utils/helper.nf
+  dependencies: 
+    - name: datasets/loaders/cellxgene_census
+    - name: datasets/normalization/log_cp
+    - name: datasets/normalization/log_scran_pooling
+    - name: datasets/normalization/sqrt_cp
+    - name: datasets/normalization/l1_sqrt
+    - name: datasets/processors/subsample
+    - name: datasets/processors/pca
+    - name: datasets/processors/hvg
+    - name: datasets/processors/knn
+    - name: common/extract_metadata
+  # test_resources:
+  #   - type: nextflow_script
+  #     path: main.nf
+  #     entrypoint: test_wf
+platforms:
+  - type: nextflow
diff --git a/src/datasets/workflows/process_cellxgene_census/main.nf b/src/datasets/workflows/process_cellxgene_census/main.nf
new file mode 100644
index 0000000000..bd1fc813a9
--- /dev/null
+++ b/src/datasets/workflows/process_cellxgene_census/main.nf
@@ -0,0 +1,160 @@
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // create different normalization methods by overriding the defaults
+  normalization_methods = [
+    log_cp.run(
+      key: "log_cp10k",
+      args: [normalization_id: "log_cp10k", n_cp: 10000],
+    ),
+    log_cp.run(
+      key: "log_cpm",
+      args: [normalization_id: "log_cpm", n_cp: 1000000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cp10k",
+      args: [normalization_id: "sqrt_cp10k", n_cp: 10000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cpm",
+      args: [normalization_id: "sqrt_cpm", n_cp: 1000000],
+    ),
+    l1_sqrt.run(
+      key: "l1_sqrt",
+      args: [normalization_id: "l1_sqrt"],
+    ),
+    log_scran_pooling.run(
+      key: "log_scran_pooling",
+      args: [normalization_id: "log_scran_pooling"],
+    )
+  ]
+
+  output_ch = input_ch
+
+    // store original id for later use
+    | map{ id, state ->
+      [id, state + [_meta: [join_id: id]]]
+    }
+
+    // fetch data from legacy openproblems
+    | cellxgene_census.run(
+      fromState: [
+        "input_uri": "input_uri",
+        "census_version": "census_version",
+        "species": "species",
+        "obs_value_filter": "obs_value_filter",
+        "cell_filter_grouping": "cell_filter_grouping",
+        "cell_filter_minimum_count": "cell_filter_minimum_count",
+        "obs_batch": "obs_batch",
+        "obs_batch_separator": "obs_batch_separator",
+        "dataset_id": "id",
+        "dataset_name": "dataset_name",
+        "dataset_url": "dataset_url",
+        "dataset_reference": "dataset_reference",
+        "dataset_summary": "dataset_summary",
+        "dataset_description": "dataset_description",
+        "dataset_organism": "dataset_organism",
+      ],
+      toState: ["output_raw": "output"]
+    )
+    
+    // subsample if so desired
+    | subsample.run(
+      runIf: { id, state -> state.do_subsample },
+      fromState: [
+        "input": "output_raw",
+        "n_obs": "n_obs",
+        "n_vars": "n_vars",
+        "keep_features": "keep_features",
+        "keep_cell_type_categories": "keep_cell_type_categories",
+        "keep_batch_categories": "keep_batch_categories",
+        "even": "even",
+        "seed": "seed"
+      ],
+      args: [output_mod2: null],
+      toState: ["output_raw": "output"]
+    )
+
+    | runEach(
+      components: normalization_methods,
+      id: { id, state, comp ->
+        if (state.normalization_methods.size() > 1) {
+          id + "/" + comp.name
+        } else {
+          id
+        }
+      },
+      filter: { id, state, comp ->
+        comp.name in state.normalization_methods
+      },
+      fromState: ["input": "output_raw"],
+      toState: { id, output, state, comp ->
+        state + [
+          output_normalized: output.output,
+          normalization_id: comp.name
+        ]
+      }
+    )
+
+    | hvg.run(
+      fromState: ["input": "output_normalized"],
+      toState: ["output_hvg": "output"]
+    )
+
+    | pca.run(
+      fromState: ["input": "output_hvg"],
+      toState: ["output_pca": "output" ]
+    )
+
+    | knn.run(
+      fromState: ["input": "output_pca"],
+      toState: ["output_knn": "output"]
+    )
+
+    // add synonym
+    | map{ id, state ->
+      [id, state + [output_dataset: state.output_knn]]
+    }
+
+    | extract_metadata.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_dataset")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_dataset,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta": "output"]
+    )
+
+    // only output the files for which an output file was specified
+    | setState([
+      "output_dataset",
+      "output_meta",
+      "output_raw",
+      "output_normalized",
+      "output_pca",
+      "output_hvg",
+      "output_knn",
+      "_meta"
+    ])
+
+  emit:
+  output_ch
+}
\ No newline at end of file
diff --git a/src/datasets/workflows/process_openproblems_neurips2021_bmmc/config.vsh.yaml b/src/datasets/workflows/process_openproblems_neurips2021_bmmc/config.vsh.yaml
new file mode 100644
index 0000000000..8d3ca51d0b
--- /dev/null
+++ b/src/datasets/workflows/process_openproblems_neurips2021_bmmc/config.vsh.yaml
@@ -0,0 +1,137 @@
+functionality:
+  name: process_openproblems_neurips2021_bmmc
+  namespace: datasets/workflows
+  description: |
+    Fetch and process Neurips 2021 multimodal datasets
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--id"
+          type: "string"
+          description: "The ID of the dataset"
+          required: true
+        - name: "--input"
+          type: "file"
+          description: "Path to the input dataset"
+          required: true
+        - name: "--mod1"
+          type: string
+          description: Name of the first modality.
+          required: true
+          example: GEX
+        - name: "--mod2"
+          type: string
+          description: Name of the second modality.
+          required: true
+          example: ADT
+    - name: Metadata
+      arguments:
+        - name: "--dataset_name"
+          type: string
+          description: Nicely formatted name.
+          required: true
+        - name: "--dataset_url"
+          type: string
+          description: Link to the original source of the dataset.
+          required: false
+        - name: "--dataset_reference"
+          type: string
+          description: Bibtex reference of the paper in which the dataset was published.
+          required: false
+        - name: "--dataset_summary"
+          type: string
+          description: Short description of the dataset.
+          required: true
+        - name: "--dataset_description"
+          type: string
+          description: Long description of the dataset.
+          required: true
+        - name: "--dataset_organism"
+          type: string
+          description: The organism of the dataset.
+          required: false
+    - name: Sampling options
+      arguments:
+        - name: "--do_subsample"
+          type: boolean
+          default: false
+          description: "Whether or not to subsample the dataset"
+        - name: "--n_obs"
+          type: integer
+          description: Maximum number of observations to be kept. It might end up being less because empty cells / genes are removed.
+          default: 500
+        - name: "--n_vars"
+          type: integer
+          description: Maximum number of variables to be kept. It might end up being less because empty cells / genes are removed.
+          default: 500
+        - name: "--keep_features"
+          type: string
+          multiple: true
+          description: A list of genes to keep.
+        - name: "--keep_cell_type_categories"
+          type: "string"
+          multiple: true
+          description: "Categories indexes to be selected"
+          required: false
+        - name: "--keep_batch_categories"
+          type: "string"
+          multiple: true
+          description: "Categories indexes to be selected"
+          required: false
+        - name: "--even"
+          type: "boolean_true"
+          description: Subsample evenly from different batches
+        - name: "--seed"
+          type: "integer"
+          description: "A seed for the subsampling."
+          example: 123
+    - name: Normalization
+      arguments:
+        - name: "--normalization_methods"
+          type: string
+          multiple: true
+          choices: ["log_cp10k", "log_cpm", "sqrt_cp10k", "sqrt_cpm", "l1_sqrt", "log_scran_pooling"]
+          default: ["log_cp10k", "log_cpm", "sqrt_cp10k", "sqrt_cpm", "l1_sqrt"]
+          description: "Which normalization methods to run."
+    - name: Outputs
+      arguments:
+        - name: "--output_mod1"
+          direction: "output"
+          __merge__: /src/datasets/api/file_multimodal_dataset.yaml
+        - name: "--output_mod2"
+          direction: "output"
+          __merge__: /src/datasets/api/file_multimodal_dataset.yaml
+        - name: "--output_meta_mod1"
+          direction: "output"
+          type: file
+          description: "Dataset metadata"
+          example: "dataset_metadata_mod1.yaml"
+        - name: "--output_meta_mod2"
+          direction: "output"
+          type: file
+          description: "Dataset metadata"
+          example: "dataset_metadata_mod2.yaml"
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - path: /src/wf_utils/helper.nf
+  dependencies:
+    - name: datasets/loaders/openproblems_neurips2021_bmmc
+    - name: datasets/normalization/log_cp
+    - name: datasets/normalization/log_scran_pooling
+    - name: datasets/normalization/sqrt_cp
+    - name: datasets/normalization/l1_sqrt
+    - name: datasets/normalization/prot_clr
+    - name: datasets/normalization/atac_tfidf
+    - name: datasets/processors/subsample
+    - name: datasets/processors/svd
+    - name: datasets/processors/hvg
+    - name: common/extract_metadata
+    - name: common/decompress_gzip
+  # test_resources:
+  #   - type: nextflow_script
+  #     path: main.nf
+  #     entrypoint: test_wf
+platforms:
+  - type: nextflow
diff --git a/src/datasets/workflows/process_openproblems_neurips2021_bmmc/main.nf b/src/datasets/workflows/process_openproblems_neurips2021_bmmc/main.nf
new file mode 100644
index 0000000000..5f3b9867c7
--- /dev/null
+++ b/src/datasets/workflows/process_openproblems_neurips2021_bmmc/main.nf
@@ -0,0 +1,196 @@
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // create different normalization methods by overriding the defaults
+  normalization_methods = [
+    log_cp.run(
+      key: "log_cp10k",
+      args: [normalization_id: "log_cp10k", n_cp: 10000]
+    ),
+    log_cp.run(
+      key: "log_cpm",
+      args: [normalization_id: "log_cpm", n_cp: 1000000]
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cp10k",
+      args: [normalization_id: "sqrt_cp10k", n_cp: 10000]
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cpm",
+      args: [normalization_id: "sqrt_cpm", n_cp: 1000000]
+    ),
+    l1_sqrt.run(
+      key: "l1_sqrt",
+      args: [normalization_id: "l1_sqrt"]
+    ),
+    log_scran_pooling.run(
+      key: "log_scran_pooling",
+      args: [normalization_id: "log_scran_pooling"]
+    )
+  ]
+
+  output_ch = input_ch
+
+    // store original id for later use
+    | map{ id, state ->
+      [id, state + [_meta: [join_id: id]]]
+    }
+
+    | decompress_gzip.run(
+      fromState: ["input": "input"],
+      toState: ["input_decompressed": "output"]
+    )
+
+    // process neurips downloaded dataset
+    | openproblems_neurips2021_bmmc.run(
+      fromState: [
+        "input": "input_decompressed",
+        "mod1": "mod1",
+        "mod2": "mod2",
+        "dataset_id": "id",
+        "dataset_name": "dataset_name",
+        "dataset_url": "dataset_url",
+        "dataset_reference": "dataset_reference",
+        "dataset_summary": "dataset_summary",
+        "dataset_description": "dataset_description",
+        "dataset_organism": "dataset_organism"
+      ],
+      toState: [
+        "raw_mod1": "output_mod1",
+        "raw_mod2": "output_mod2"
+      ]
+    )
+
+    // subsample if need be
+    | subsample.run(
+      runIf: { id, state -> state.do_subsample },
+      fromState: [
+        "input": "raw_mod1",
+        "input_mod2": "raw_mod2",
+        "n_obs": "n_obs",
+        "n_vars": "n_vars",
+        "keep_features": "keep_features",
+        "keep_cell_type_categories": "keep_cell_type_categories",
+        "keep_batch_categories": "keep_batch_categories",
+        "even": "even",
+        "seed": "seed"
+      ],
+      toState: [
+        "raw_mod1": "output",
+        "raw_mod2": "output_mod2"
+      ]
+    )
+
+    // run mod1 normalization methods
+    | runEach(
+      components: normalization_methods,
+      id: { id, state, comp ->
+        if (state.normalization_methods.size() > 1) {
+          id + "/" + comp.name
+        } else {
+          id
+        }
+      },
+      filter: { id, state, comp ->
+        comp.name in state.normalization_methods
+      },
+      fromState: ["input": "raw_mod1"],
+      toState: { id, output, state, comp -> 
+        state + [
+          "normalization_id": comp.name,
+          "normalized_mod1": output.output
+        ]
+      }
+    )
+
+    // run normalization methods on second modality
+    // TODO: can we change this to DSB?
+    | prot_clr.run(
+      runIf: { id, state -> state.mod2 == "ADT" },
+      args: [normalization_id: "prot_clr"],
+      fromState: ["input": "raw_mod2"],
+      toState: ["normalized_mod2": "output"]
+    )
+    | atac_tfidf.run(
+      runIf: { id, state -> state.mod2 == "ATAC" },
+      args: [normalization_id: "atac_tfidf"],
+      fromState: ["input": "raw_mod2"],
+      toState: ["normalized_mod2": "output"]
+    )
+
+    | svd.run(
+      fromState: [
+        "input": "normalized_mod1",
+        "input_mod2": "normalized_mod2"
+      ],
+      toState: [
+        "svd_mod1": "output",
+        "svd_mod2": "output_mod2"
+      ]
+    )
+
+    | hvg.run(
+      fromState: [ "input": "svd_mod1" ],
+      toState: [ "hvg_mod1": "output" ]
+    )
+
+    | hvg.run(
+      key: "hvg_mod2",
+      fromState: [ "input": "svd_mod2" ],
+      toState: [ "hvg_mod2": "output" ]
+    )
+
+    // add synonyms
+    | map{ id, state ->
+      [id, state + ["output_mod1": state.hvg_mod1, "output_mod2": state.hvg_mod2]]
+    }
+
+    | extract_metadata.run(
+      key: "extract_metadata_mod1",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_mod1")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_mod1,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta_mod1": "output"]
+    )
+
+    | extract_metadata.run(
+      key: "extract_metadata_mod2",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_mod2")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_mod2,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta_mod2": "output"]
+    )
+
+    // only output the files for which an output file was specified
+    | setState([
+      "output_mod1",
+      "output_mod2",
+      "output_meta_mod1",
+      "output_meta_mod2",
+      "_meta"
+    ])
+
+  emit:
+  output_ch
+}
diff --git a/src/datasets/workflows/process_openproblems_neurips2022_pbmc/config.vsh.yaml b/src/datasets/workflows/process_openproblems_neurips2022_pbmc/config.vsh.yaml
new file mode 100644
index 0000000000..96bcc3ee2c
--- /dev/null
+++ b/src/datasets/workflows/process_openproblems_neurips2022_pbmc/config.vsh.yaml
@@ -0,0 +1,143 @@
+functionality:
+  name: process_openproblems_neurips2022_pbmc
+  namespace: datasets/workflows
+  description: |
+    Fetch and process Neurips 2022 multimodal datasets
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--id"
+          type: "string"
+          description: "The ID of the dataset"
+          required: true
+        - name: "--input_mod1"
+          type: file
+          description: "Processed RNA h5ad file"
+          required: true
+          example: cite_rna_merged.h5ad
+        - name: "--input_mod2"
+          type: file
+          description: "Processed ADT or ATAC h5ad file"
+          required: true
+          example: cite_prot_merged.h5ad
+        - name: "--mod1"
+          type: string
+          description: Name of the first modality.
+          required: true
+          example: GEX
+        - name: "--mod2"
+          type: string
+          description: Name of the second modality.
+          required: true
+          example: ADT
+    - name: Metadata
+      arguments:
+        - name: "--dataset_name"
+          type: string
+          description: Nicely formatted name.
+          required: true
+        - name: "--dataset_url"
+          type: string
+          description: Link to the original source of the dataset.
+          required: false
+        - name: "--dataset_reference"
+          type: string
+          description: Bibtex reference of the paper in which the dataset was published.
+          required: false
+        - name: "--dataset_summary"
+          type: string
+          description: Short description of the dataset.
+          required: true
+        - name: "--dataset_description"
+          type: string
+          description: Long description of the dataset.
+          required: true
+        - name: "--dataset_organism"
+          type: string
+          description: The organism of the dataset.
+          required: false
+    - name: Sampling options
+      arguments:
+        - name: "--do_subsample"
+          type: boolean
+          default: false
+          description: "Whether or not to subsample the dataset"
+        - name: "--n_obs"
+          type: integer
+          description: Maximum number of observations to be kept. It might end up being less because empty cells / genes are removed.
+          default: 500
+        - name: "--n_vars"
+          type: integer
+          description: Maximum number of variables to be kept. It might end up being less because empty cells / genes are removed.
+          default: 500
+        - name: "--keep_features"
+          type: string
+          multiple: true
+          description: A list of genes to keep.
+        - name: "--keep_cell_type_categories"
+          type: "string"
+          multiple: true
+          description: "Categories indexes to be selected"
+          required: false
+        - name: "--keep_batch_categories"
+          type: "string"
+          multiple: true
+          description: "Categories indexes to be selected"
+          required: false
+        - name: "--even"
+          type: "boolean_true"
+          description: Subsample evenly from different batches
+        - name: "--seed"
+          type: "integer"
+          description: "A seed for the subsampling."
+          example: 123
+    - name: Normalization
+      arguments:
+        - name: "--normalization_methods"
+          type: string
+          multiple: true
+          choices: ["log_cp10k", "log_cpm", "sqrt_cp10k", "sqrt_cpm", "l1_sqrt", "log_scran_pooling"]
+          default: ["log_cp10k", "log_cpm", "sqrt_cp10k", "sqrt_cpm", "l1_sqrt"]
+          description: "Which normalization methods to run."
+    - name: Outputs
+      arguments:
+        - name: "--output_mod1"
+          direction: "output"
+          __merge__: /src/datasets/api/file_multimodal_dataset.yaml
+        - name: "--output_mod2"
+          direction: "output"
+          __merge__: /src/datasets/api/file_multimodal_dataset.yaml
+        - name: "--output_meta_mod1"
+          direction: "output"
+          type: file
+          description: "Dataset metadata"
+          example: "dataset_metadata_mod1.yaml"
+        - name: "--output_meta_mod2"
+          direction: "output"
+          type: file
+          description: "Dataset metadata"
+          example: "dataset_metadata_mod2.yaml"
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - path: /src/wf_utils/helper.nf
+  dependencies:
+    - name: datasets/loaders/openproblems_neurips2022_pbmc
+    - name: datasets/normalization/log_cp
+    - name: datasets/normalization/log_scran_pooling
+    - name: datasets/normalization/sqrt_cp
+    - name: datasets/normalization/l1_sqrt
+    - name: datasets/normalization/prot_clr
+    - name: datasets/normalization/atac_tfidf
+    - name: datasets/processors/subsample
+    - name: datasets/processors/svd
+    - name: datasets/processors/hvg
+    - name: common/extract_metadata
+    - name: common/decompress_gzip
+  # test_resources:
+  #   - type: nextflow_script
+  #     path: main.nf
+  #     entrypoint: test_wf
+platforms:
+  - type: nextflow
diff --git a/src/datasets/workflows/process_openproblems_neurips2022_pbmc/main.nf b/src/datasets/workflows/process_openproblems_neurips2022_pbmc/main.nf
new file mode 100644
index 0000000000..834d52bf63
--- /dev/null
+++ b/src/datasets/workflows/process_openproblems_neurips2022_pbmc/main.nf
@@ -0,0 +1,192 @@
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // create different normalization methods by overriding the defaults
+  normalization_methods = [
+    log_cp.run(
+      key: "log_cp10k",
+      args: [normalization_id: "log_cp10k", n_cp: 10000]
+    ),
+    log_cp.run(
+      key: "log_cpm",
+      args: [normalization_id: "log_cpm", n_cp: 1000000]
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cp10k",
+      args: [normalization_id: "sqrt_cp10k", n_cp: 10000]
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cpm",
+      args: [normalization_id: "sqrt_cpm", n_cp: 1000000]
+    ),
+    l1_sqrt.run(
+      key: "l1_sqrt",
+      args: [normalization_id: "l1_sqrt"]
+    ),
+    log_scran_pooling.run(
+      key: "log_scran_pooling",
+      args: [normalization_id: "log_scran_pooling"]
+    )
+  ]
+
+  output_ch = input_ch
+
+    // store original id for later use
+    | map{ id, state ->
+      [id, state + [_meta: [join_id: id]]]
+    }
+
+    // process neurips downloaded dataset
+    | openproblems_neurips2022_pbmc.run(
+      fromState: [
+        "input_mod1": "input_mod1",
+        "input_mod2": "input_mod2",
+        "mod1": "mod1",
+        "mod2": "mod2",
+        "dataset_id": "id",
+        "dataset_name": "dataset_name",
+        "dataset_url": "dataset_url",
+        "dataset_reference": "dataset_reference",
+        "dataset_summary": "dataset_summary",
+        "dataset_description": "dataset_description",
+        "dataset_organism": "dataset_organism"
+      ],
+      toState: [
+        "raw_mod1": "output_mod1",
+        "raw_mod2": "output_mod2"
+      ]
+    )
+
+    // subsample if need be
+    | subsample.run(
+      runIf: { id, state -> state.do_subsample },
+      fromState: [
+        "input": "raw_mod1",
+        "input_mod2": "raw_mod2",
+        "n_obs": "n_obs",
+        "n_vars": "n_vars",
+        "keep_features": "keep_features",
+        "keep_cell_type_categories": "keep_cell_type_categories",
+        "keep_batch_categories": "keep_batch_categories",
+        "even": "even",
+        "seed": "seed"
+      ],
+      toState: [
+        "raw_mod1": "output",
+        "raw_mod2": "output_mod2"
+      ]
+    )
+
+    // run mod1 normalization methods
+    | runEach(
+      components: normalization_methods,
+      id: { id, state, comp ->
+        if (state.normalization_methods.size() > 1) {
+          id + "/" + comp.name
+        } else {
+          id
+        }
+      },
+      filter: { id, state, comp ->
+        comp.name in state.normalization_methods
+      },
+      fromState: ["input": "raw_mod1"],
+      toState: { id, output, state, comp -> 
+        state + [
+          "normalization_id": comp.name,
+          "normalized_mod1": output.output
+        ]
+      }
+    )
+
+    // run normalization methods on second modality
+    // TODO: can we change this to DSB?
+    | prot_clr.run(
+      runIf: { id, state -> state.mod2 == "ADT" },
+      args: [normalization_id: "prot_clr"],
+      fromState: ["input": "raw_mod2"],
+      toState: ["normalized_mod2": "output"]
+    )
+    | atac_tfidf.run(
+      runIf: { id, state -> state.mod2 == "ATAC" },
+      args: [normalization_id: "atac_tfidf"],
+      fromState: ["input": "raw_mod2"],
+      toState: ["normalized_mod2": "output"]
+    )
+
+    | svd.run(
+      fromState: [
+        "input": "normalized_mod1",
+        "input_mod2": "normalized_mod2"
+      ],
+      toState: [
+        "svd_mod1": "output",
+        "svd_mod2": "output_mod2"
+      ]
+    )
+
+    | hvg.run(
+      fromState: [ "input": "svd_mod1" ],
+      toState: [ "hvg_mod1": "output" ]
+    )
+
+    | hvg.run(
+      key: "hvg_mod2",
+      fromState: [ "input": "svd_mod2" ],
+      toState: [ "hvg_mod2": "output" ]
+    )
+
+    // add synonyms
+    | map{ id, state ->
+      [id, state + ["output_mod1": state.hvg_mod1, "output_mod2": state.hvg_mod2]]
+    }
+
+    | extract_metadata.run(
+      key: "extract_metadata_mod1",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_mod1")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_mod1,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta_mod1": "output"]
+    )
+
+    | extract_metadata.run(
+      key: "extract_metadata_mod2",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_mod2")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_mod2,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta_mod2": "output"]
+    )
+
+    // only output the files for which an output file was specified
+    | setState([
+      "output_mod1",
+      "output_mod2",
+      "output_meta_mod1",
+      "output_meta_mod2",
+      "_meta"
+    ])
+
+  emit:
+  output_ch
+}
diff --git a/src/datasets/workflows/process_openproblems_v1/config.vsh.yaml b/src/datasets/workflows/process_openproblems_v1/config.vsh.yaml
new file mode 100644
index 0000000000..fb0cd73a65
--- /dev/null
+++ b/src/datasets/workflows/process_openproblems_v1/config.vsh.yaml
@@ -0,0 +1,163 @@
+functionality:
+  name: process_openproblems_v1
+  namespace: datasets/workflows
+  description: |
+    Fetch and process legacy OpenProblems v1 datasets
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--id"
+          type: string
+          description: Unique identifier of the dataset.
+          required: true
+        - name: "--input_id"
+          type: "string"
+          description: "The ID of the dataset in OpenProblems v1"
+          required: true
+        - name: "--obs_cell_type"
+          type: "string"
+          description: "Location of where to find the observation cell types."
+        - name: "--obs_batch"
+          type: "string"
+          description: "Location of where to find the observation batch IDs."
+        - name: "--obs_tissue"
+          type: "string"
+          description: "Location of where to find the observation tissue information."
+        - name: "--layer_counts"
+          type: "string"
+          description: "In which layer to find the counts matrix. Leave undefined to use `.X`."
+          example: counts
+        - name: "--sparse"
+          type: boolean
+          default: true
+          description: Convert layers to a sparse CSR format.
+        - name: "--var_feature_id"
+          type: "string"
+          description: "Location of where to find the feature IDs. Can be set to index if the feature IDs are the index."
+          example: gene_ids
+        - name: "--var_feature_name"
+          type: "string"
+          description: "Location of where to find the feature names. Can be set to index if the feature names are the index."
+          default: index
+    - name: Metadata
+      arguments:
+        - name: "--dataset_name"
+          type: string
+          description: Nicely formatted name.
+          required: true
+        - name: "--dataset_url"
+          type: string
+          description: Link to the original source of the dataset.
+          required: false
+        - name: "--dataset_reference"
+          type: string
+          description: Bibtex reference of the paper in which the dataset was published.
+          required: false
+        - name: "--dataset_summary"
+          type: string
+          description: Short description of the dataset.
+          required: true
+        - name: "--dataset_description"
+          type: string
+          description: Long description of the dataset.
+          required: true
+        - name: "--dataset_organism"
+          type: string
+          description: The organism of the dataset.
+          required: false
+    - name: Sampling options
+      arguments:
+        - name: "--do_subsample"
+          type: boolean
+          default: false
+          description: "Whether or not to subsample the dataset"
+        - name: "--n_obs"
+          type: integer
+          description: Maximum number of observations to be kept. It might end up being less because empty cells / genes are removed.
+          default: 500
+        - name: "--n_vars"
+          type: integer
+          description: Maximum number of variables to be kept. It might end up being less because empty cells / genes are removed.
+          default: 500
+        - name: "--keep_features"
+          type: string
+          multiple: true
+          description: A list of genes to keep.
+        - name: "--keep_cell_type_categories"
+          type: "string"
+          multiple: true
+          description: "Categories indexes to be selected"
+          required: false
+        - name: "--keep_batch_categories"
+          type: "string"
+          multiple: true
+          description: "Categories indexes to be selected"
+          required: false
+        - name: "--even"
+          type: "boolean_true"
+          description: Subsample evenly from different batches
+        - name: "--seed"
+          type: "integer"
+          description: "A seed for the subsampling."
+          example: 123
+    - name: Normalization
+      arguments:
+        - name: "--normalization_methods"
+          type: string
+          multiple: true
+          choices: ["log_cp10k", "log_cpm", "sqrt_cp10k", "sqrt_cpm", "l1_sqrt", "log_scran_pooling"]
+          default: ["log_cp10k", "log_cpm", "sqrt_cp10k", "sqrt_cpm", "l1_sqrt"]
+          description: "Which normalization methods to run."
+    - name: Outputs
+      arguments:
+        - name: "--output_dataset"
+          __merge__: /src/datasets/api/file_common_dataset.yaml
+          direction: "output"
+          required: true
+        - name: "--output_meta"
+          direction: "output"
+          type: file
+          description: "Dataset metadata"
+          default: "dataset_metadata.yaml"
+        - name: "--output_raw"
+          __merge__: /src/datasets/api/file_raw.yaml
+          direction: "output"
+          required: false
+        - name: "--output_normalized"
+          __merge__: /src/datasets/api/file_normalized.yaml
+          direction: "output"
+          required: false
+        - name: "--output_pca"
+          __merge__: /src/datasets/api/file_pca.yaml
+          direction: "output"
+          required: false
+        - name: "--output_hvg"
+          __merge__: /src/datasets/api/file_hvg.yaml
+          direction: "output"
+          required: false
+        - name: "--output_knn"
+          __merge__: /src/datasets/api/file_knn.yaml
+          direction: "output"
+          required: false
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - path: /src/wf_utils/helper.nf
+  dependencies: 
+    - name: datasets/loaders/openproblems_v1
+    - name: datasets/normalization/log_cp
+    - name: datasets/normalization/log_scran_pooling
+    - name: datasets/normalization/sqrt_cp
+    - name: datasets/normalization/l1_sqrt
+    - name: datasets/processors/subsample
+    - name: datasets/processors/pca
+    - name: datasets/processors/hvg
+    - name: datasets/processors/knn
+    - name: common/extract_metadata
+  # test_resources:
+  #   - type: nextflow_script
+  #     path: main.nf
+  #     entrypoint: test_wf
+platforms:
+  - type: nextflow
diff --git a/src/datasets/workflows/process_openproblems_v1/main.nf b/src/datasets/workflows/process_openproblems_v1/main.nf
new file mode 100644
index 0000000000..ad57d63029
--- /dev/null
+++ b/src/datasets/workflows/process_openproblems_v1/main.nf
@@ -0,0 +1,158 @@
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // create different normalization methods by overriding the defaults
+  normalization_methods = [
+    log_cp.run(
+      key: "log_cp10k",
+      args: [normalization_id: "log_cp10k", n_cp: 10000],
+    ),
+    log_cp.run(
+      key: "log_cpm",
+      args: [normalization_id: "log_cpm", n_cp: 1000000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cp10k",
+      args: [normalization_id: "sqrt_cp10k", n_cp: 10000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cpm",
+      args: [normalization_id: "sqrt_cpm", n_cp: 1000000],
+    ),
+    l1_sqrt.run(
+      key: "l1_sqrt",
+      args: [normalization_id: "l1_sqrt"],
+    ),
+    log_scran_pooling.run(
+      key: "log_scran_pooling",
+      args: [normalization_id: "log_scran_pooling"],
+    )
+  ]
+
+  output_ch = input_ch
+
+    // store original id for later use
+    | map{ id, state ->
+      [id, state + [_meta: [join_id: id]]]
+    }
+
+    // fetch data from legacy openproblems
+    | openproblems_v1.run(
+      fromState: [
+        "input_id": "input_id",
+        "obs_cell_type": "obs_cell_type",
+        "obs_batch": "obs_batch",
+        "obs_tissue": "obs_tissue",
+        "layer_counts": "layer_counts",
+        "sparse": "sparse",
+        "dataset_id": "id",
+        "dataset_name": "dataset_name",
+        "dataset_url": "dataset_url",
+        "dataset_reference": "dataset_reference",
+        "dataset_summary": "dataset_summary",
+        "dataset_description": "dataset_description",
+        "dataset_organism": "dataset_organism",
+      ],
+      toState: ["output_raw": "output"]
+    )
+    
+    // subsample if so desired
+    | subsample.run(
+      runIf: { id, state -> state.do_subsample },
+      fromState: [
+        "input": "output_raw",
+        "n_obs": "n_obs",
+        "n_vars": "n_vars",
+        "keep_features": "keep_features",
+        "keep_cell_type_categories": "keep_cell_type_categories",
+        "keep_batch_categories": "keep_batch_categories",
+        "even": "even",
+        "seed": "seed"
+      ],
+      args: [output_mod2: null],
+      toState: ["output_raw": "output"]
+    )
+
+    | runEach(
+      components: normalization_methods,
+      id: { id, state, comp ->
+        if (state.normalization_methods.size() > 1) {
+          id + "/" + comp.name
+        } else {
+          id
+        }
+      },
+      filter: { id, state, comp ->
+        comp.name in state.normalization_methods
+      },
+      fromState: ["input": "output_raw"],
+      toState: { id, output, state, comp ->
+        state + [
+          output_normalized: output.output,
+          normalization_id: comp.name
+        ]
+      }
+    )
+
+    | hvg.run(
+      fromState: ["input": "output_normalized"],
+      toState: ["output_hvg": "output"]
+    )
+
+    | pca.run(
+      fromState: ["input": "output_hvg"],
+      toState: ["output_pca": "output" ]
+    )
+
+    | knn.run(
+      fromState: ["input": "output_pca"],
+      toState: ["output_knn": "output"]
+    )
+
+    // add synonym
+    | map{ id, state ->
+      [id, state + [output_dataset: state.output_knn]]
+    }
+
+    | extract_metadata.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_dataset")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_dataset,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta": "output"]
+    )
+
+    // only output the files for which an output file was specified
+    | setState([
+      "output_dataset",
+      "output_meta",
+      "output_raw",
+      "output_normalized",
+      "output_pca",
+      "output_hvg",
+      "output_knn",
+      "_meta"
+    ])
+
+  emit:
+  output_ch
+}
\ No newline at end of file
diff --git a/src/datasets/workflows/process_openproblems_v1_multimodal/config.vsh.yaml b/src/datasets/workflows/process_openproblems_v1_multimodal/config.vsh.yaml
new file mode 100644
index 0000000000..58b045cc3b
--- /dev/null
+++ b/src/datasets/workflows/process_openproblems_v1_multimodal/config.vsh.yaml
@@ -0,0 +1,161 @@
+functionality:
+  name: process_openproblems_v1_multimodal
+  namespace: datasets/workflows
+  description: |
+    Fetch and process legacy OpenProblems v1 multimodal datasets
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--id"
+          type: string
+          description: Unique identifier of the dataset.
+          required: true
+        - name: "--input_id"
+          type: "string"
+          description: "The ID of the dataset in OpenProblems v1"
+          required: true
+        - name: "--obs_cell_type"
+          type: "string"
+          description: "Location of where to find the observation cell types."
+        - name: "--obs_batch"
+          type: "string"
+          description: "Location of where to find the observation batch IDs."
+        - name: "--obs_tissue"
+          type: "string"
+          description: "Location of where to find the observation tissue information."
+        - name: "--layer_counts"
+          type: "string"
+          description: "In which layer to find the counts matrix. Leave undefined to use `.X`."
+          example: counts
+        - name: "--sparse"
+          type: boolean
+          default: true
+          description: Convert layers to a sparse CSR format.
+        - name: "--var_feature_id"
+          type: "string"
+          description: "Location of where to find the feature IDs. Can be set to index if the feature IDs are the index."
+          example: gene_ids
+        - name: "--var_feature_name"
+          type: "string"
+          description: "Location of where to find the feature names. Can be set to index if the feature names are the index."
+          default: index
+        - name: "--mod1"
+          type: string
+          description: Name of the first modality.
+          required: true
+          example: GEX
+        - name: "--mod2"
+          type: string
+          description: Name of the second modality.
+          required: true
+          example: ADT
+    - name: Metadata
+      arguments:
+        - name: "--dataset_name"
+          type: string
+          description: Nicely formatted name.
+          required: true
+        - name: "--dataset_url"
+          type: string
+          description: Link to the original source of the dataset.
+          required: false
+        - name: "--dataset_reference"
+          type: string
+          description: Bibtex reference of the paper in which the dataset was published.
+          required: false
+        - name: "--dataset_summary"
+          type: string
+          description: Short description of the dataset.
+          required: true
+        - name: "--dataset_description"
+          type: string
+          description: Long description of the dataset.
+          required: true
+        - name: "--dataset_organism"
+          type: string
+          description: The organism of the dataset.
+          required: false
+    - name: Sampling options
+      arguments:
+        - name: "--do_subsample"
+          type: boolean
+          default: false
+          description: "Whether or not to subsample the dataset"
+        - name: "--n_obs"
+          type: integer
+          description: Maximum number of observations to be kept. It might end up being less because empty cells / genes are removed.
+          default: 500
+        - name: "--n_vars"
+          type: integer
+          description: Maximum number of variables to be kept. It might end up being less because empty cells / genes are removed.
+          default: 500
+        - name: "--keep_features"
+          type: string
+          multiple: true
+          description: A list of genes to keep.
+        - name: "--keep_cell_type_categories"
+          type: "string"
+          multiple: true
+          description: "Categories indexes to be selected"
+          required: false
+        - name: "--keep_batch_categories"
+          type: "string"
+          multiple: true
+          description: "Categories indexes to be selected"
+          required: false
+        - name: "--even"
+          type: "boolean_true"
+          description: Subsample evenly from different batches
+        - name: "--seed"
+          type: "integer"
+          description: "A seed for the subsampling."
+          example: 123
+    - name: Normalization
+      arguments:
+        - name: "--normalization_methods"
+          type: string
+          multiple: true
+          choices: ["log_cp10k", "log_cpm", "sqrt_cp10k", "sqrt_cpm", "l1_sqrt", "log_scran_pooling"]
+          default: ["log_cp10k", "log_cpm", "sqrt_cp10k", "sqrt_cpm", "l1_sqrt"]
+          description: "Which normalization methods to run."
+    - name: Outputs
+      arguments:
+        - name: "--output_mod1"
+          direction: "output"
+          __merge__: /src/datasets/api/file_multimodal_dataset.yaml
+        - name: "--output_mod2"
+          direction: "output"
+          __merge__: /src/datasets/api/file_multimodal_dataset.yaml
+        - name: "--output_meta_mod1"
+          direction: "output"
+          type: file
+          description: "Dataset metadata"
+          example: "dataset_metadata_mod1.yaml"
+        - name: "--output_meta_mod2"
+          direction: "output"
+          type: file
+          description: "Dataset metadata"
+          example: "dataset_metadata_mod2.yaml"
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - path: /src/wf_utils/helper.nf
+  dependencies:
+    - name: datasets/loaders/openproblems_v1_multimodal
+    - name: datasets/normalization/log_cp
+    - name: datasets/normalization/log_scran_pooling
+    - name: datasets/normalization/sqrt_cp
+    - name: datasets/normalization/l1_sqrt
+    - name: datasets/normalization/prot_clr
+    - name: datasets/normalization/atac_tfidf
+    - name: datasets/processors/subsample
+    - name: datasets/processors/svd
+    - name: datasets/processors/hvg
+    - name: common/extract_metadata
+  # test_resources:
+  #   - type: nextflow_script
+  #     path: main.nf
+  #     entrypoint: test_wf
+platforms:
+  - type: nextflow
diff --git a/src/datasets/workflows/process_openproblems_v1_multimodal/main.nf b/src/datasets/workflows/process_openproblems_v1_multimodal/main.nf
new file mode 100644
index 0000000000..96d37d6182
--- /dev/null
+++ b/src/datasets/workflows/process_openproblems_v1_multimodal/main.nf
@@ -0,0 +1,204 @@
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // create different normalization methods by overriding the defaults
+  normalization_methods = [
+    log_cp.run(
+      key: "log_cp10k",
+      args: [normalization_id: "log_cp10k", n_cp: 10000]
+    ),
+    log_cp.run(
+      key: "log_cpm",
+      args: [normalization_id: "log_cpm", n_cp: 1000000]
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cp10k",
+      args: [normalization_id: "sqrt_cp10k", n_cp: 10000]
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cpm",
+      args: [normalization_id: "sqrt_cpm", n_cp: 1000000]
+    ),
+    l1_sqrt.run(
+      key: "l1_sqrt",
+      args: [normalization_id: "l1_sqrt"]
+    ),
+    log_scran_pooling.run(
+      key: "log_scran_pooling",
+      args: [normalization_id: "log_scran_pooling"]
+    )
+  ]
+
+  output_ch = input_ch
+
+    // store original id for later use
+    | map{ id, state ->
+      [id, state + [_meta: [join_id: id]]]
+    }
+
+    // fetch data from legacy openproblems
+    | openproblems_v1_multimodal.run(
+      fromState: [
+        "input_id": "input_id",
+        "obs_cell_type": "obs_cell_type",
+        "obs_batch": "obs_batch",
+        "obs_tissue": "obs_tissue",
+        "layer_counts": "layer_counts",
+        "sparse": "sparse",
+        "dataset_id": "id",
+        "dataset_name": "dataset_name",
+        "dataset_url": "dataset_url",
+        "dataset_reference": "dataset_reference",
+        "dataset_summary": "dataset_summary",
+        "dataset_description": "dataset_description",
+        "dataset_organism": "dataset_organism"
+      ],
+      toState: [
+        "raw_mod1": "output_mod1",
+        "raw_mod2": "output_mod2"
+      ]
+    )
+
+    // subsample if need be
+    | subsample.run(
+      runIf: { id, state -> state.do_subsample },
+      fromState: [
+        "input": "raw_mod1",
+        "input_mod2": "raw_mod2",
+        "n_obs": "n_obs",
+        "n_vars": "n_vars",
+        "keep_features": "keep_features",
+        "keep_cell_type_categories": "keep_cell_type_categories",
+        "keep_batch_categories": "keep_batch_categories",
+        "even": "even",
+        "seed": "seed"
+      ],
+      toState: [
+        "raw_mod1": "output",
+        "raw_mod2": "output_mod2"
+      ]
+    )
+
+    // run normalization methods
+    | runEach(
+      components: normalization_methods,
+      id: { id, state, comp ->
+        if (state.normalization_methods.size() > 1) {
+          id + "/" + comp.name
+        } else {
+          id
+        }
+      },
+      filter: { id, state, comp ->
+        comp.name in state.normalization_methods
+      },
+      fromState: ["input": "raw_mod1"],
+      toState: { id, output, state, comp -> 
+        state + [
+          "normalization_id": comp.name,
+          "normalized_mod1": output.output
+        ]
+      }
+    )
+
+    // run normalization methods on second modality
+    // TODO: can we change this to DSB?
+    | prot_clr.run(
+      runIf: { id, state -> state.mod2 == "ADT" },
+      args: [normalization_id: "prot_clr"],
+      fromState: ["input": "raw_mod2"],
+      toState: ["normalized_mod2": "output"]
+    )
+    | atac_tfidf.run(
+      runIf: { id, state -> state.mod2 == "ATAC" },
+      args: [normalization_id: "atac_tfidf"],
+      fromState: ["input": "raw_mod2"],
+      toState: ["normalized_mod2": "output"]
+    )
+
+    | svd.run(
+      fromState: [
+        "input": "normalized_mod1",
+        "input_mod2": "normalized_mod2"
+      ],
+      toState: [
+        "svd_mod1": "output",
+        "svd_mod2": "output_mod2"
+      ]
+    )
+
+    | hvg.run(
+      fromState: [ "input": "svd_mod1" ],
+      toState: [ "hvg_mod1": "output" ]
+    )
+
+    | hvg.run(
+      key: "hvg_mod2",
+      fromState: [ "input": "svd_mod2" ],
+      toState: [ "hvg_mod2": "output" ]
+    )
+
+    // add synonyms
+    | map{ id, state ->
+      [id, state + [
+        "output_mod1": state.hvg_mod1,
+        "output_mod2": state.hvg_mod2
+      ]]
+    }
+
+    | extract_metadata.run(
+      key: "extract_metadata_mod1",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_mod1")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_mod1,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta_mod1": "output"]
+    )
+
+    | extract_metadata.run(
+      key: "extract_metadata_mod2",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_mod2")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_mod2,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta_mod2": "output"]
+    )
+    
+    // only output the files for which an output file was specified
+    | setState([
+      "output_mod1",
+      "output_mod2",
+      "output_meta_mod1",
+      "output_meta_mod2",
+      "_meta"
+    ])
+
+  emit:
+  output_ch
+}
diff --git a/src/datasets/workflows/process_tenx_visium/config.vsh.yaml b/src/datasets/workflows/process_tenx_visium/config.vsh.yaml
new file mode 100644
index 0000000000..91a2867820
--- /dev/null
+++ b/src/datasets/workflows/process_tenx_visium/config.vsh.yaml
@@ -0,0 +1,142 @@
+functionality:
+  name: process_tenx_visium
+  namespace: datasets/workflows
+  description: | 
+    Download and process datasets originating from 10x Genomics.
+  argument_groups: 
+    - name: Input
+      arguments:
+        - name: "--input_expression"
+          type: string
+          description: URL to the feature / barcode matrix HDF5.
+          required: true
+        - name: "--input_spatial"
+          type: string
+          description: URL to the Spatial imaging data.
+          required: true
+    - name: Outputs
+      arguments:
+        - name: "--output_dataset"
+          type: file
+          direction: output
+          description: Output h5ad file
+          required: true
+          __merge__: /src/datasets/api/file_raw.yaml
+        - name: "--output_meta"
+          direction: "output"
+          type: file
+          description: "Dataset metadata"
+          default: "dataset_metadata.yaml"
+    - name: Metadata
+      arguments:
+        - name: "--id"
+          type: string
+          description: Unique identifier of the dataset.
+          required: true
+        - name: "--dataset_name"
+          type: string
+          description: Nicely formatted name.
+          required: true
+        - name: "--dataset_url"
+          type: string
+          description: Link to the original source of the dataset.
+          required: false
+        - name: "--dataset_reference"
+          type: string
+          description: Bibtex reference of the paper in which the dataset was published.
+          required: false
+        - name: "--dataset_summary"
+          type: string
+          description: Short description of the dataset.
+          required: true
+        - name: "--dataset_description"
+          type: string
+          description: Long description of the dataset.
+          required: true
+        - name: "--dataset_organism"
+          type: string
+          description: The organism of the dataset.
+          required: false
+    - name: Gene or spot filtering
+      description: Arguments related to filtering cells and genes by counts.
+      arguments:
+        - name: "--spot_filter_min_genes"
+          type: integer
+          description: Remove spots with less than this number of genes.
+          required: false
+          example: 200
+        - name: "--spot_filter_min_counts"
+          type: integer
+          description: Remove spots with less than this number of counts.
+          required: false
+        - name: "--gene_filter_min_spots"
+          type: integer
+          description: Remove genes expressed in less than this number of cells.
+          required: false
+          example: 50
+        - name: "--gene_filter_min_counts"
+          type: integer
+          description: Remove genes with less than this number of counts.
+          required: false
+        - name: "--remove_mitochondrial"
+          type: boolean
+          description: Remove mitovhondrial genes?
+          required: false
+    - name: Sampling options
+      arguments:
+        - name: "--do_subsample"
+          type: boolean
+          default: false
+          description: "Whether or not to subsample the dataset"
+        - name: "--n_obs"
+          type: integer
+          description: Maximum number of observations to be kept. It might end up being less because empty cells / genes are removed.
+          default: 500
+        - name: "--n_vars"
+          type: integer
+          description: Maximum number of variables to be kept. It might end up being less because empty cells / genes are removed.
+          default: 500
+        # - name: "--keep_features"
+        #   type: string
+        #   multiple: true
+        #   description: A list of genes to keep.
+        # - name: "--keep_cell_type_categories"
+        #   type: "string"
+        #   multiple: true
+        #   description: "Categories indexes to be selected"
+        #   required: false
+        # - name: "--keep_batch_categories"
+        #   type: "string"
+        #   multiple: true
+        #   description: "Categories indexes to be selected"
+        #   required: false
+        # - name: "--even"
+        #   type: "boolean_true"
+        #   description: Subsample evenly from different batches
+        - name: "--seed"
+          type: "integer"
+          description: "A seed for the subsampling."
+          example: 123
+    - name: Normalization
+      arguments:
+        - name: "--normalization_methods"
+          type: string
+          multiple: true
+          choices: ["log_cp10k", "log_cpm", "sqrt_cp10k", "sqrt_cpm", "l1_sqrt", "log_scran_pooling"]
+          default: ["log_cp10k", "log_cpm", "sqrt_cp10k", "sqrt_cpm", "l1_sqrt"]
+          description: "Which normalization methods to run."
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - path: /src/wf_utils/helper.nf
+  dependencies: 
+    - name: datasets/loaders/tenx_visium
+    - name: datasets/normalization/log_cp
+    - name: datasets/normalization/log_scran_pooling
+    - name: datasets/normalization/sqrt_cp
+    - name: datasets/normalization/l1_sqrt
+    - name: datasets/processors/subsample
+    - name: common/extract_metadata
+platforms:
+  - type: nextflow
\ No newline at end of file
diff --git a/src/datasets/workflows/process_tenx_visium/main.nf b/src/datasets/workflows/process_tenx_visium/main.nf
new file mode 100644
index 0000000000..2ec0eae247
--- /dev/null
+++ b/src/datasets/workflows/process_tenx_visium/main.nf
@@ -0,0 +1,133 @@
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+    // create different normalization methods by overriding the defaults
+  normalization_methods = [
+    log_cp.run(
+      key: "log_cp10k",
+      args: [normalization_id: "log_cp10k", n_cp: 10000],
+    ),
+    log_cp.run(
+      key: "log_cpm",
+      args: [normalization_id: "log_cpm", n_cp: 1000000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cp10k",
+      args: [normalization_id: "sqrt_cp10k", n_cp: 10000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cpm",
+      args: [normalization_id: "sqrt_cpm", n_cp: 1000000],
+    ),
+    l1_sqrt.run(
+      key: "l1_sqrt",
+      args: [normalization_id: "l1_sqrt"],
+    ),
+    log_scran_pooling.run(
+      key: "log_scran_pooling",
+      args: [normalization_id: "log_scran_pooling"],
+    )
+  ]
+
+  output_ch = input_ch
+
+    // store original id for later use
+    | map{ id, state ->
+      [id, state + [_meta: [join_id: id]]]
+    }
+
+    // fetch data from legacy openproblems
+    | tenx_visium.run(
+      fromState: [
+        "input_expression": "input_expression",
+        "input_spatial": "input_spatial",
+        "dataset_id": "id",
+        "dataset_name": "dataset_name",
+        "dataset_url": "dataset_url",
+        "dataset_reference": "dataset_reference",
+        "dataset_summary": "dataset_summary",
+        "dataset_description": "dataset_description",
+        "dataset_organism": "dataset_organism",
+        "spot_filter_min_genes": "spot_filter_min_genes",
+        "gene_filter_min_spots": "gene_filter_min_spots",
+        "remove_mitochondrial": "remove_mitochondrial"
+      ],
+      toState: ["output_raw": "dataset"]
+    )
+    
+    // subsample if so desired
+    | subsample.run(
+      runIf: { id, state -> state.do_subsample },
+      fromState: [
+        "input": "output_raw",
+        "n_obs": "n_obs",
+        "n_vars": "n_vars",
+        "seed": "seed"
+      ],
+      args: [output_mod2: null],
+      toState: ["output_raw": "output"]
+    )
+
+    | runEach(
+      components: normalization_methods,
+      id: { id, state, comp ->
+        if (state.normalization_methods.size() > 1) {
+          id + "/" + comp.name
+        } else {
+          id
+        }
+      },
+      filter: { id, state, comp ->
+        comp.name in state.normalization_methods
+      },
+      fromState: ["input": "output_raw"],
+      toState: { id, output, state, comp ->
+        state + [
+          output_normalized: output.output,
+          normalization_id: comp.name
+        ]
+      }
+    )
+    
+    // add synonym
+    | map{ id, state ->
+      [id, state + [output_dataset: state.output_normalized]]
+    }
+
+    | extract_metadata.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_dataset")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_dataset,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta": "output"]
+    )
+
+    // only output the files for which an output file was specified
+    | setState([
+      "output_dataset",
+      "output_meta",
+      "_meta"
+    ])
+
+  emit:
+  output_ch
+}
\ No newline at end of file
diff --git a/src/datasets/workflows/process_zenodo_spatial/config.vsh.yaml b/src/datasets/workflows/process_zenodo_spatial/config.vsh.yaml
new file mode 100644
index 0000000000..45b938b716
--- /dev/null
+++ b/src/datasets/workflows/process_zenodo_spatial/config.vsh.yaml
@@ -0,0 +1,138 @@
+functionality:
+  name: process_zenodo_spatial
+  namespace: datasets/workflows
+  description: | 
+    Download and process DBiT seq, MERFISH, seqFISH, Slide-seq v2, STARmap, and Stereo-seq data from Zenodo.
+  argument_groups: 
+    - name: Input
+      arguments:
+        - name: "--input_data"
+          type: string
+          description: URL to the Anndata file.
+          required: true
+    - name: Outputs
+      arguments:
+        - name: "--output_dataset"
+          type: file
+          direction: output
+          description: Output h5ad file
+          required: true
+          __merge__: /src/datasets/api/file_raw.yaml
+        - name: "--output_meta"
+          direction: "output"
+          type: file
+          description: "Dataset metadata"
+          default: "dataset_metadata.yaml"
+    - name: Metadata
+      arguments:
+        - name: "--id"
+          type: string
+          description: Unique identifier of the dataset.
+          required: true
+        - name: "--dataset_name"
+          type: string
+          description: Nicely formatted name.
+          required: true
+        - name: "--dataset_url"
+          type: string
+          description: Link to the original source of the dataset.
+          required: false
+        - name: "--dataset_reference"
+          type: string
+          description: Bibtex reference of the paper in which the dataset was published.
+          required: false
+        - name: "--dataset_summary"
+          type: string
+          description: Short description of the dataset.
+          required: true
+        - name: "--dataset_description"
+          type: string
+          description: Long description of the dataset.
+          required: true
+        - name: "--dataset_organism"
+          type: string
+          description: The organism of the dataset.
+          required: false
+    - name: Gene or spot filtering
+      description: Arguments related to filtering cells and genes by counts.
+      arguments:
+        - name: "--spot_filter_min_genes"
+          type: integer
+          description: Remove spots with less than this number of genes.
+          required: false
+          example: 200
+        - name: "--spot_filter_min_counts"
+          type: integer
+          description: Remove spots with less than this number of counts.
+          required: false
+        - name: "--gene_filter_min_spots"
+          type: integer
+          description: Remove genes expressed in less than this number of cells.
+          required: false
+          example: 50
+        - name: "--gene_filter_min_counts"
+          type: integer
+          description: Remove genes with less than this number of counts.
+          required: false
+        - name: "--remove_mitochondrial"
+          type: boolean
+          description: Remove mitovhondrial genes?
+          required: false
+    - name: Sampling options
+      arguments:
+        - name: "--do_subsample"
+          type: boolean
+          default: false
+          description: "Whether or not to subsample the dataset"
+        - name: "--n_obs"
+          type: integer
+          description: Maximum number of observations to be kept. It might end up being less because empty cells / genes are removed.
+          default: 600
+        - name: "--n_vars"
+          type: integer
+          description: Maximum number of variables to be kept. It might end up being less because empty cells / genes are removed.
+          default: 500
+        # - name: "--keep_features"
+        #   type: string
+        #   multiple: true
+        #   description: A list of genes to keep.
+        # - name: "--keep_cell_type_categories"
+        #   type: "string"
+        #   multiple: true
+        #   description: "Categories indexes to be selected"
+        #   required: false
+        # - name: "--keep_batch_categories"
+        #   type: "string"
+        #   multiple: true
+        #   description: "Categories indexes to be selected"
+        #   required: false
+        # - name: "--even"
+        #   type: "boolean_true"
+        #   description: Subsample evenly from different batches
+        - name: "--seed"
+          type: "integer"
+          description: "A seed for the subsampling."
+          example: 123
+    - name: Normalization
+      arguments:
+        - name: "--normalization_methods"
+          type: string
+          multiple: true
+          choices: ["log_cp10k", "log_cpm", "sqrt_cp10k", "sqrt_cpm", "l1_sqrt", "log_scran_pooling"]
+          default: ["log_cp10k", "log_cpm", "sqrt_cp10k", "sqrt_cpm", "l1_sqrt"]
+          description: "Which normalization methods to run."
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - path: /src/wf_utils/helper.nf
+  dependencies: 
+    - name: datasets/loaders/zenodo_spatial
+    - name: datasets/normalization/log_cp
+    - name: datasets/normalization/log_scran_pooling
+    - name: datasets/normalization/sqrt_cp
+    - name: datasets/normalization/l1_sqrt
+    - name: datasets/processors/subsample
+    - name: common/extract_metadata
+platforms:
+  - type: nextflow
\ No newline at end of file
diff --git a/src/datasets/workflows/process_zenodo_spatial/main.nf b/src/datasets/workflows/process_zenodo_spatial/main.nf
new file mode 100644
index 0000000000..a5893c0ab4
--- /dev/null
+++ b/src/datasets/workflows/process_zenodo_spatial/main.nf
@@ -0,0 +1,132 @@
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // create different normalization methods by overriding the defaults
+  normalization_methods = [
+    log_cp.run(
+      key: "log_cp10k",
+      args: [normalization_id: "log_cp10k", n_cp: 10000],
+    ),
+    log_cp.run(
+      key: "log_cpm",
+      args: [normalization_id: "log_cpm", n_cp: 1000000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cp10k",
+      args: [normalization_id: "sqrt_cp10k", n_cp: 10000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cpm",
+      args: [normalization_id: "sqrt_cpm", n_cp: 1000000],
+    ),
+    l1_sqrt.run(
+      key: "l1_sqrt",
+      args: [normalization_id: "l1_sqrt"],
+    ),
+    log_scran_pooling.run(
+      key: "log_scran_pooling",
+      args: [normalization_id: "log_scran_pooling"],
+    )
+  ]
+
+  output_ch = input_ch
+
+    // store original id for later use
+    | map{ id, state ->
+      [id, state + [_meta: [join_id: id]]]
+    }
+
+    // fetch data from legacy openproblems
+    | zenodo_spatial.run(
+      fromState: [
+        "input_data": "input_data",
+        "dataset_id": "id",
+        "dataset_name": "dataset_name",
+        "dataset_url": "dataset_url",
+        "dataset_reference": "dataset_reference",
+        "dataset_summary": "dataset_summary",
+        "dataset_description": "dataset_description",
+        "dataset_organism": "dataset_organism",
+        "spot_filter_min_genes": "spot_filter_min_genes",
+        "gene_filter_min_spots": "gene_filter_min_spots",
+        "remove_mitochondrial": "remove_mitochondrial"
+      ],
+      toState: ["output_raw": "dataset"]
+    )
+    
+    // subsample if so desired
+    | subsample.run(
+      runIf: { id, state -> state.do_subsample },
+      fromState: [
+        "input": "output_raw",
+        "n_obs": "n_obs",
+        "n_vars": "n_vars",
+        "seed": "seed"
+      ],
+      args: [output_mod2: null],
+      toState: ["output_raw": "output"]
+    )
+
+    | runEach(
+      components: normalization_methods,
+      id: { id, state, comp ->
+        if (state.normalization_methods.size() > 1) {
+          id + "/" + comp.name
+        } else {
+          id
+        }
+      },
+      filter: { id, state, comp ->
+        comp.name in state.normalization_methods
+      },
+      fromState: ["input": "output_raw"],
+      toState: { id, output, state, comp ->
+        state + [
+          output_normalized: output.output,
+          normalization_id: comp.name
+        ]
+      }
+    )
+    
+    // add synonym
+    | map{ id, state ->
+      [id, state + [output_dataset: state.output_normalized]]
+    }
+
+    | extract_metadata.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_dataset")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_dataset,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta": "output"]
+    )
+
+    // only output the files for which an output file was specified
+    | setState([
+      "output_dataset",
+      "output_meta",
+      "_meta"
+    ])
+
+  emit:
+  output_ch
+}
\ No newline at end of file
diff --git a/src/datasets/workflows/process_zenodo_spatial_slidetags/config.vsh.yaml b/src/datasets/workflows/process_zenodo_spatial_slidetags/config.vsh.yaml
new file mode 100644
index 0000000000..23934fe161
--- /dev/null
+++ b/src/datasets/workflows/process_zenodo_spatial_slidetags/config.vsh.yaml
@@ -0,0 +1,138 @@
+functionality:
+  name: process_zenodo_spatial_slidetags
+  namespace: datasets/workflows
+  description: | 
+    Download and process slide tags datasets originating from Zenodo.
+  argument_groups: 
+    - name: Input
+      arguments:
+        - name: "--input_data"
+          type: string
+          description: URL to the Anndata file.
+          required: true
+    - name: Outputs
+      arguments:
+        - name: "--output_dataset"
+          type: file
+          direction: output
+          description: Output h5ad file
+          required: true
+          __merge__: /src/datasets/api/file_raw.yaml
+        - name: "--output_meta"
+          direction: "output"
+          type: file
+          description: "Dataset metadata"
+          default: "dataset_metadata.yaml"
+    - name: Metadata
+      arguments:
+        - name: "--id"
+          type: string
+          description: Unique identifier of the dataset.
+          required: true
+        - name: "--dataset_name"
+          type: string
+          description: Nicely formatted name.
+          required: true
+        - name: "--dataset_url"
+          type: string
+          description: Link to the original source of the dataset.
+          required: false
+        - name: "--dataset_reference"
+          type: string
+          description: Bibtex reference of the paper in which the dataset was published.
+          required: false
+        - name: "--dataset_summary"
+          type: string
+          description: Short description of the dataset.
+          required: true
+        - name: "--dataset_description"
+          type: string
+          description: Long description of the dataset.
+          required: true
+        - name: "--dataset_organism"
+          type: string
+          description: The organism of the dataset.
+          required: false
+    - name: Gene or spot filtering
+      description: Arguments related to filtering cells and genes by counts.
+      arguments:
+        - name: "--spot_filter_min_genes"
+          type: integer
+          description: Remove spots with less than this number of genes.
+          required: false
+          example: 200
+        - name: "--spot_filter_min_counts"
+          type: integer
+          description: Remove spots with less than this number of counts.
+          required: false
+        - name: "--gene_filter_min_spots"
+          type: integer
+          description: Remove genes expressed in less than this number of cells.
+          required: false
+          example: 50
+        - name: "--gene_filter_min_counts"
+          type: integer
+          description: Remove genes with less than this number of counts.
+          required: false
+        - name: "--remove_mitochondrial"
+          type: boolean
+          description: Remove mitovhondrial genes?
+          required: false
+    - name: Sampling options
+      arguments:
+        - name: "--do_subsample"
+          type: boolean
+          default: false
+          description: "Whether or not to subsample the dataset"
+        - name: "--n_obs"
+          type: integer
+          description: Maximum number of observations to be kept. It might end up being less because empty cells / genes are removed.
+          default: 600
+        - name: "--n_vars"
+          type: integer
+          description: Maximum number of variables to be kept. It might end up being less because empty cells / genes are removed.
+          default: 500
+        # - name: "--keep_features"
+        #   type: string
+        #   multiple: true
+        #   description: A list of genes to keep.
+        # - name: "--keep_cell_type_categories"
+        #   type: "string"
+        #   multiple: true
+        #   description: "Categories indexes to be selected"
+        #   required: false
+        # - name: "--keep_batch_categories"
+        #   type: "string"
+        #   multiple: true
+        #   description: "Categories indexes to be selected"
+        #   required: false
+        # - name: "--even"
+        #   type: "boolean_true"
+        #   description: Subsample evenly from different batches
+        - name: "--seed"
+          type: "integer"
+          description: "A seed for the subsampling."
+          example: 123
+    - name: Normalization
+      arguments:
+        - name: "--normalization_methods"
+          type: string
+          multiple: true
+          choices: ["log_cp10k", "log_cpm", "sqrt_cp10k", "sqrt_cpm", "l1_sqrt", "log_scran_pooling"]
+          default: ["log_cp10k", "log_cpm", "sqrt_cp10k", "sqrt_cpm", "l1_sqrt"]
+          description: "Which normalization methods to run."
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - path: /src/wf_utils/helper.nf
+  dependencies: 
+    - name: datasets/loaders/zenodo_spatial_slidetags
+    - name: datasets/normalization/log_cp
+    - name: datasets/normalization/log_scran_pooling
+    - name: datasets/normalization/sqrt_cp
+    - name: datasets/normalization/l1_sqrt
+    - name: datasets/processors/subsample
+    - name: common/extract_metadata
+platforms:
+  - type: nextflow
\ No newline at end of file
diff --git a/src/datasets/workflows/process_zenodo_spatial_slidetags/main.nf b/src/datasets/workflows/process_zenodo_spatial_slidetags/main.nf
new file mode 100644
index 0000000000..2bb6b9300a
--- /dev/null
+++ b/src/datasets/workflows/process_zenodo_spatial_slidetags/main.nf
@@ -0,0 +1,132 @@
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // create different normalization methods by overriding the defaults
+  normalization_methods = [
+    log_cp.run(
+      key: "log_cp10k",
+      args: [normalization_id: "log_cp10k", n_cp: 10000],
+    ),
+    log_cp.run(
+      key: "log_cpm",
+      args: [normalization_id: "log_cpm", n_cp: 1000000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cp10k",
+      args: [normalization_id: "sqrt_cp10k", n_cp: 10000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cpm",
+      args: [normalization_id: "sqrt_cpm", n_cp: 1000000],
+    ),
+    l1_sqrt.run(
+      key: "l1_sqrt",
+      args: [normalization_id: "l1_sqrt"],
+    ),
+    log_scran_pooling.run(
+      key: "log_scran_pooling",
+      args: [normalization_id: "log_scran_pooling"],
+    )
+  ]
+
+  output_ch = input_ch
+
+    // store original id for later use
+    | map{ id, state ->
+      [id, state + [_meta: [join_id: id]]]
+    }
+
+    // fetch data from legacy openproblems
+    | zenodo_spatial_slidetags.run(
+      fromState: [
+        "input_data": "input_data",
+        "dataset_id": "id",
+        "dataset_name": "dataset_name",
+        "dataset_url": "dataset_url",
+        "dataset_reference": "dataset_reference",
+        "dataset_summary": "dataset_summary",
+        "dataset_description": "dataset_description",
+        "dataset_organism": "dataset_organism",
+        "spot_filter_min_genes": "spot_filter_min_genes",
+        "gene_filter_min_spots": "gene_filter_min_spots",
+        "remove_mitochondrial": "remove_mitochondrial"
+      ],
+      toState: ["output_raw": "dataset"]
+    )
+    
+    // subsample if so desired
+    | subsample.run(
+      runIf: { id, state -> state.do_subsample },
+      fromState: [
+        "input": "output_raw",
+        "n_obs": "n_obs",
+        "n_vars": "n_vars",
+        "seed": "seed"
+      ],
+      args: [output_mod2: null],
+      toState: ["output_raw": "output"]
+    )
+
+    | runEach(
+      components: normalization_methods,
+      id: { id, state, comp ->
+        if (state.normalization_methods.size() > 1) {
+          id + "/" + comp.name
+        } else {
+          id
+        }
+      },
+      filter: { id, state, comp ->
+        comp.name in state.normalization_methods
+      },
+      fromState: ["input": "output_raw"],
+      toState: { id, output, state, comp ->
+        state + [
+          output_normalized: output.output,
+          normalization_id: comp.name
+        ]
+      }
+    )
+    
+    // add synonym
+    | map{ id, state ->
+      [id, state + [output_dataset: state.output_normalized]]
+    }
+
+    | extract_metadata.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_dataset")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_dataset,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta": "output"]
+    )
+
+    // only output the files for which an output file was specified
+    | setState([
+      "output_dataset",
+      "output_meta",
+      "_meta"
+    ])
+
+  emit:
+  output_ch
+}
\ No newline at end of file
diff --git a/src/tasks/batch_integration/README.md b/src/tasks/batch_integration/README.md
new file mode 100644
index 0000000000..073a654508
--- /dev/null
+++ b/src/tasks/batch_integration/README.md
@@ -0,0 +1,571 @@
+# Batch Integration
+
+
+Remove unwanted batch effects from scRNA data while retaining
+biologically meaningful variation.
+
+Path:
+[`src/tasks/batch_integration`](https://github.com/openproblems-bio/openproblems/tree/main/src/tasks/batch_integration)
+
+## Motivation
+
+As single-cell technologies advance, single-cell datasets are growing
+both in size and complexity. Especially in consortia such as the Human
+Cell Atlas, individual studies combine data from multiple labs, each
+sequencing multiple individuals possibly with different technologies.
+This gives rise to complex batch effects in the data that must be
+computationally removed to perform a joint analysis. These batch
+integration methods must remove the batch effect while not removing
+relevant biological information. Currently, over 200 tools exist that
+aim to remove batch effects scRNA-seq datasets \[@zappia2018exploring\].
+These methods balance the removal of batch effects with the conservation
+of nuanced biological information in different ways. This abundance of
+tools has complicated batch integration method choice, leading to
+several benchmarks on this topic \[@luecken2020benchmarking;
+@tran2020benchmark; @chazarragil2021flexible; @mereu2020benchmarking\].
+Yet, benchmarks use different metrics, method implementations and
+datasets. Here we build a living benchmarking task for batch integration
+methods with the vision of improving the consistency of method
+evaluation.
+
+## Description
+
+In this task we evaluate batch integration methods on their ability to
+remove batch effects in the data while conserving variation attributed
+to biological effects. As input, methods require either normalised or
+unnormalised data with multiple batches and consistent cell type labels.
+The batch integrated output can be a feature matrix, a low dimensional
+embedding and/or a neighbourhood graph. The respective batch-integrated
+representation is then evaluated using sets of metrics that capture how
+well batch effects are removed and whether biological variance is
+conserved. We have based this particular task on the latest, and most
+extensive benchmark of single-cell data integration methods.
+
+## Authors & contributors
+
+| name              | roles              |
+|:------------------|:-------------------|
+| Michaela Mueller  | maintainer, author |
+| Kai Waldrant      | contributor        |
+| Robrecht Cannoodt | contributor        |
+| Daniel Strobl     | author             |
+
+## API
+
+``` mermaid
+flowchart LR
+  file_common_dataset("Common Dataset")
+  comp_process_dataset[/"Data processor"/]
+  file_dataset("Dataset")
+  file_solution("Solution")
+  comp_control_method_embedding[/"Control method (embedding)"/]
+  comp_control_method_graaf[/"Control method (graph)"/]
+  comp_method_embedding[/"Method (embedding)"/]
+  comp_method_feature[/"Method (feature)"/]
+  comp_method_graaf[/"Method (graph)"/]
+  comp_metric_embedding[/"Metric (embedding)"/]
+  comp_metric_feature[/"Metric (feature)"/]
+  comp_metric_graaf[/"Metric (graph)"/]
+  file_integrated_embedding("Integrated embedding")
+  file_integrated_graaf("Integrated Graph")
+  file_integrated_feature("Integrated Feature")
+  file_score("Score")
+  comp_transformer_embedding_to_graaf[/"Embedding to Graph"/]
+  comp_transformer_feature_to_embedding[/"Feature to Embedding"/]
+  file_common_dataset---comp_process_dataset
+  comp_process_dataset-->file_dataset
+  comp_process_dataset-->file_solution
+  file_dataset---comp_control_method_embedding
+  file_dataset---comp_control_method_graaf
+  file_dataset---comp_method_embedding
+  file_dataset---comp_method_feature
+  file_dataset---comp_method_graaf
+  file_solution---comp_metric_embedding
+  file_solution---comp_metric_feature
+  file_solution---comp_metric_graaf
+  comp_control_method_embedding-->file_integrated_embedding
+  comp_control_method_graaf-->file_integrated_graaf
+  comp_method_embedding-->file_integrated_embedding
+  comp_method_feature-->file_integrated_feature
+  comp_method_graaf-->file_integrated_graaf
+  comp_metric_embedding-->file_score
+  comp_metric_feature-->file_score
+  comp_metric_graaf-->file_score
+  file_integrated_embedding---comp_metric_embedding
+  file_integrated_embedding---comp_transformer_embedding_to_graaf
+  file_integrated_graaf---comp_metric_graaf
+  file_integrated_feature---comp_metric_feature
+  file_integrated_feature---comp_transformer_feature_to_embedding
+  comp_transformer_embedding_to_graaf-->file_integrated_graaf
+  comp_transformer_feature_to_embedding-->file_integrated_embedding
+```
+
+## File format: Common Dataset
+
+A subset of the common dataset.
+
+Example file: `resources_test/common/pancreas/dataset.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'cell_type', 'batch'
+     var: 'hvg', 'hvg_score', 'feature_name'
+     obsm: 'X_pca'
+     obsp: 'knn_distances', 'knn_connectivities'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id', 'knn'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                         | Type      | Description                                                                    |
+|:-----------------------------|:----------|:-------------------------------------------------------------------------------|
+| `obs["cell_type"]`           | `string`  | Cell type information.                                                         |
+| `obs["batch"]`               | `string`  | Batch information.                                                             |
+| `var["hvg"]`                 | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’.       |
+| `var["hvg_score"]`           | `double`  | A ranking of the features by hvg.                                              |
+| `var["feature_name"]`        | `string`  | A human-readable name for the feature, usually a gene symbol.                  |
+| `obsm["X_pca"]`              | `double`  | The resulting PCA embedding.                                                   |
+| `obsp["knn_distances"]`      | `double`  | K nearest neighbors distance matrix.                                           |
+| `obsp["knn_connectivities"]` | `double`  | K nearest neighbors connectivities matrix.                                     |
+| `layers["counts"]`           | `integer` | Raw counts.                                                                    |
+| `layers["normalized"]`       | `double`  | Normalized expression values.                                                  |
+| `uns["dataset_id"]`          | `string`  | A unique identifier for the dataset.                                           |
+| `uns["dataset_name"]`        | `string`  | Nicely formatted name.                                                         |
+| `uns["dataset_url"]`         | `string`  | (*Optional*) Link to the original source of the dataset.                       |
+| `uns["dataset_reference"]`   | `string`  | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]`     | `string`  | Short description of the dataset.                                              |
+| `uns["dataset_description"]` | `string`  | Long description of the dataset.                                               |
+| `uns["dataset_organism"]`    | `string`  | (*Optional*) The organism of the sample in the dataset.                        |
+| `uns["normalization_id"]`    | `string`  | Which normalization was used.                                                  |
+| `uns["knn"]`                 | `object`  | Supplementary K nearest neighbors data.                                        |
+
+</div>
+
+## Component type: Data processor
+
+Path:
+[`src/batch_integration`](https://github.com/openproblems-bio/openproblems/tree/main/src/batch_integration)
+
+A label projection dataset processor.
+
+Arguments:
+
+<div class="small">
+
+| Name                | Type      | Description                                                                |
+|:--------------------|:----------|:---------------------------------------------------------------------------|
+| `--input`           | `file`    | A subset of the common dataset.                                            |
+| `--output_dataset`  | `file`    | (*Output*) Unintegrated AnnData HDF5 file.                                 |
+| `--output_solution` | `file`    | (*Output*) Solution dataset.                                               |
+| `--obs_label`       | `string`  | (*Optional*) Which .obs slot to use as label. Default: `cell_type`.        |
+| `--obs_batch`       | `string`  | (*Optional*) Which .obs slot to use as batch covariate. Default: `batch`.  |
+| `--hvgs`            | `integer` | (*Optional*) Number of highly variable genes. Default: `2000`.             |
+| `--subset_hvg`      | `boolean` | (*Optional*) Whether to subset to highly variable genes. Default: `FALSE`. |
+
+</div>
+
+## File format: Dataset
+
+Unintegrated AnnData HDF5 file.
+
+Example file: `resources_test/batch_integration/pancreas/dataset.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'batch', 'label'
+     var: 'hvg', 'hvg_score', 'feature_name'
+     obsm: 'X_pca'
+     obsp: 'knn_distances', 'knn_connectivities'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'normalization_id', 'dataset_organism', 'knn'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                         | Type      | Description                                                              |
+|:-----------------------------|:----------|:-------------------------------------------------------------------------|
+| `obs["batch"]`               | `string`  | Batch information.                                                       |
+| `obs["label"]`               | `string`  | label information.                                                       |
+| `var["hvg"]`                 | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’. |
+| `var["hvg_score"]`           | `double`  | A ranking of the features by hvg.                                        |
+| `var["feature_name"]`        | `string`  | A human-readable name for the feature, usually a gene symbol.            |
+| `obsm["X_pca"]`              | `double`  | The resulting PCA embedding.                                             |
+| `obsp["knn_distances"]`      | `double`  | K nearest neighbors distance matrix.                                     |
+| `obsp["knn_connectivities"]` | `double`  | K nearest neighbors connectivities matrix.                               |
+| `layers["counts"]`           | `integer` | Raw counts.                                                              |
+| `layers["normalized"]`       | `double`  | Normalized expression values.                                            |
+| `uns["dataset_id"]`          | `string`  | A unique identifier for the dataset.                                     |
+| `uns["normalization_id"]`    | `string`  | Which normalization was used.                                            |
+| `uns["dataset_organism"]`    | `string`  | (*Optional*) The organism of the sample in the dataset.                  |
+| `uns["knn"]`                 | `object`  | Supplementary K nearest neighbors data.                                  |
+
+</div>
+
+## File format: Solution
+
+Solution dataset
+
+Example file: `resources_test/batch_integration/pancreas/solution.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'batch', 'label'
+     var: 'hvg', 'hvg_score', 'feature_name'
+     obsm: 'X_pca'
+     obsp: 'knn_distances', 'knn_connectivities'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id', 'knn'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                         | Type      | Description                                                                    |
+|:-----------------------------|:----------|:-------------------------------------------------------------------------------|
+| `obs["batch"]`               | `string`  | Batch information.                                                             |
+| `obs["label"]`               | `string`  | label information.                                                             |
+| `var["hvg"]`                 | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’.       |
+| `var["hvg_score"]`           | `double`  | A ranking of the features by hvg.                                              |
+| `var["feature_name"]`        | `string`  | A human-readable name for the feature, usually a gene symbol.                  |
+| `obsm["X_pca"]`              | `double`  | The resulting PCA embedding.                                                   |
+| `obsp["knn_distances"]`      | `double`  | K nearest neighbors distance matrix.                                           |
+| `obsp["knn_connectivities"]` | `double`  | K nearest neighbors connectivities matrix.                                     |
+| `layers["counts"]`           | `integer` | Raw counts.                                                                    |
+| `layers["normalized"]`       | `double`  | Normalized expression values.                                                  |
+| `uns["dataset_id"]`          | `string`  | A unique identifier for the dataset.                                           |
+| `uns["dataset_name"]`        | `string`  | Nicely formatted name.                                                         |
+| `uns["dataset_url"]`         | `string`  | (*Optional*) Link to the original source of the dataset.                       |
+| `uns["dataset_reference"]`   | `string`  | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]`     | `string`  | Short description of the dataset.                                              |
+| `uns["dataset_description"]` | `string`  | Long description of the dataset.                                               |
+| `uns["dataset_organism"]`    | `string`  | (*Optional*) The organism of the sample in the dataset.                        |
+| `uns["normalization_id"]`    | `string`  | Which normalization was used.                                                  |
+| `uns["knn"]`                 | `object`  | Supplementary K nearest neighbors data.                                        |
+
+</div>
+
+## Component type: Control method (embedding)
+
+Path:
+[`src/batch_integration/control_methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/batch_integration/control_methods)
+
+A batch integration embedding control method.
+
+Arguments:
+
+<div class="small">
+
+| Name       | Type   | Description                                 |
+|:-----------|:-------|:--------------------------------------------|
+| `--input`  | `file` | Unintegrated AnnData HDF5 file.             |
+| `--output` | `file` | (*Output*) An integrated AnnData HDF5 file. |
+
+</div>
+
+## Component type: Control method (graph)
+
+Path:
+[`src/batch_integration/control_methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/batch_integration/control_methods)
+
+A batch integration graph control method.
+
+Arguments:
+
+<div class="small">
+
+| Name       | Type   | Description                              |
+|:-----------|:-------|:-----------------------------------------|
+| `--input`  | `file` | Unintegrated AnnData HDF5 file.          |
+| `--output` | `file` | (*Output*) Integrated AnnData HDF5 file. |
+
+</div>
+
+## Component type: Method (embedding)
+
+Path:
+[`src/batch_integration/methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/batch_integration/methods)
+
+A batch integration embedding method.
+
+Arguments:
+
+<div class="small">
+
+| Name       | Type   | Description                                 |
+|:-----------|:-------|:--------------------------------------------|
+| `--input`  | `file` | Unintegrated AnnData HDF5 file.             |
+| `--output` | `file` | (*Output*) An integrated AnnData HDF5 file. |
+
+</div>
+
+## Component type: Method (feature)
+
+Path:
+[`src/batch_integration/methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/batch_integration/methods)
+
+A batch integration feature method.
+
+Arguments:
+
+<div class="small">
+
+| Name       | Type   | Description                              |
+|:-----------|:-------|:-----------------------------------------|
+| `--input`  | `file` | Unintegrated AnnData HDF5 file.          |
+| `--output` | `file` | (*Output*) Integrated AnnData HDF5 file. |
+
+</div>
+
+## Component type: Method (graph)
+
+Path:
+[`src/batch_integration/methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/batch_integration/methods)
+
+A batch integration graph method.
+
+Arguments:
+
+<div class="small">
+
+| Name       | Type   | Description                              |
+|:-----------|:-------|:-----------------------------------------|
+| `--input`  | `file` | Unintegrated AnnData HDF5 file.          |
+| `--output` | `file` | (*Output*) Integrated AnnData HDF5 file. |
+
+</div>
+
+## Component type: Metric (embedding)
+
+Path:
+[`src/batch_integration/metrics`](https://github.com/openproblems-bio/openproblems/tree/main/src/batch_integration/metrics)
+
+A batch integration embedding metric.
+
+Arguments:
+
+<div class="small">
+
+| Name                 | Type   | Description                      |
+|:---------------------|:-------|:---------------------------------|
+| `--input_integrated` | `file` | An integrated AnnData HDF5 file. |
+| `--input_solution`   | `file` | Solution dataset.                |
+| `--output`           | `file` | (*Output*) Metric score file.    |
+
+</div>
+
+## Component type: Metric (feature)
+
+Path:
+[`src/batch_integration/metrics`](https://github.com/openproblems-bio/openproblems/tree/main/src/batch_integration/metrics)
+
+A batch integration feature metric.
+
+Arguments:
+
+<div class="small">
+
+| Name                 | Type   | Description                   |
+|:---------------------|:-------|:------------------------------|
+| `--input_integrated` | `file` | Integrated AnnData HDF5 file. |
+| `--input_solution`   | `file` | Solution dataset.             |
+| `--output`           | `file` | (*Output*) Metric score file. |
+
+</div>
+
+## Component type: Metric (graph)
+
+Path:
+[`src/batch_integration/metrics`](https://github.com/openproblems-bio/openproblems/tree/main/src/batch_integration/metrics)
+
+A batch integration graph metric.
+
+Arguments:
+
+<div class="small">
+
+| Name                 | Type   | Description                   |
+|:---------------------|:-------|:------------------------------|
+| `--input_integrated` | `file` | Integrated AnnData HDF5 file. |
+| `--input_solution`   | `file` | Solution dataset.             |
+| `--output`           | `file` | (*Output*) Metric score file. |
+
+</div>
+
+## File format: Integrated embedding
+
+An integrated AnnData HDF5 file.
+
+Example file:
+`resources_test/batch_integration/pancreas/integrated_embedding.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obsm: 'X_emb'
+     uns: 'dataset_id', 'normalization_id', 'dataset_organism', 'method_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                      | Type     | Description                                             |
+|:--------------------------|:---------|:--------------------------------------------------------|
+| `obsm["X_emb"]`           | `double` | integration embedding prediction.                       |
+| `uns["dataset_id"]`       | `string` | A unique identifier for the dataset.                    |
+| `uns["normalization_id"]` | `string` | Which normalization was used.                           |
+| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
+| `uns["method_id"]`        | `string` | A unique identifier for the method.                     |
+
+</div>
+
+## File format: Integrated Graph
+
+Integrated AnnData HDF5 file.
+
+Example file:
+`resources_test/batch_integration/pancreas/integrated_graph.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obsp: 'connectivities', 'distances'
+     uns: 'dataset_id', 'normalization_id', 'dataset_organism', 'method_id', 'neighbors'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                      | Type     | Description                                             |
+|:--------------------------|:---------|:--------------------------------------------------------|
+| `obsp["connectivities"]`  | `double` | Neighbors connectivities matrix.                        |
+| `obsp["distances"]`       | `double` | Neighbors connectivities matrix.                        |
+| `uns["dataset_id"]`       | `string` | A unique identifier for the dataset.                    |
+| `uns["normalization_id"]` | `string` | Which normalization was used.                           |
+| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
+| `uns["method_id"]`        | `string` | A unique identifier for the method.                     |
+| `uns["neighbors"]`        | `object` | Supplementary K nearest neighbors data.                 |
+
+</div>
+
+## File format: Integrated Feature
+
+Integrated AnnData HDF5 file.
+
+Example file:
+`resources_test/batch_integration/pancreas/integrated_feature.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     layers: 'corrected_counts'
+     uns: 'dataset_id', 'normalization_id', 'dataset_organism', 'method_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                         | Type     | Description                                             |
+|:-----------------------------|:---------|:--------------------------------------------------------|
+| `layers["corrected_counts"]` | `double` | Corrected counts after integration.                     |
+| `uns["dataset_id"]`          | `string` | A unique identifier for the dataset.                    |
+| `uns["normalization_id"]`    | `string` | Which normalization was used.                           |
+| `uns["dataset_organism"]`    | `string` | (*Optional*) The organism of the sample in the dataset. |
+| `uns["method_id"]`           | `string` | A unique identifier for the method.                     |
+
+</div>
+
+## File format: Score
+
+Metric score file
+
+Example file: `score.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     uns: 'dataset_id', 'normalization_id', 'method_id', 'metric_ids', 'metric_values'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                      | Type     | Description                                                                                  |
+|:--------------------------|:---------|:---------------------------------------------------------------------------------------------|
+| `uns["dataset_id"]`       | `string` | A unique identifier for the dataset.                                                         |
+| `uns["normalization_id"]` | `string` | Which normalization was used.                                                                |
+| `uns["method_id"]`        | `string` | A unique identifier for the method.                                                          |
+| `uns["metric_ids"]`       | `string` | One or more unique metric identifiers.                                                       |
+| `uns["metric_values"]`    | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |
+
+</div>
+
+## Component type: Embedding to Graph
+
+Path:
+[`src/batch_integration/transformers`](https://github.com/openproblems-bio/openproblems/tree/main/src/batch_integration/transformers)
+
+Transform an embedding to a graph output.
+
+Arguments:
+
+<div class="small">
+
+| Name       | Type   | Description                              |
+|:-----------|:-------|:-----------------------------------------|
+| `--input`  | `file` | An integrated AnnData HDF5 file.         |
+| `--output` | `file` | (*Output*) Integrated AnnData HDF5 file. |
+
+</div>
+
+## Component type: Feature to Embedding
+
+Path:
+[`src/batch_integration/transformers`](https://github.com/openproblems-bio/openproblems/tree/main/src/batch_integration/transformers)
+
+Transform a feature output to an embedding.
+
+Arguments:
+
+<div class="small">
+
+| Name       | Type   | Description                                 |
+|:-----------|:-------|:--------------------------------------------|
+| `--input`  | `file` | Integrated AnnData HDF5 file.               |
+| `--output` | `file` | (*Output*) An integrated AnnData HDF5 file. |
+
+</div>
+
diff --git a/src/tasks/batch_integration/api/comp_control_method_embedding.yaml b/src/tasks/batch_integration/api/comp_control_method_embedding.yaml
new file mode 100644
index 0000000000..9c4bc65ce5
--- /dev/null
+++ b/src/tasks/batch_integration/api/comp_control_method_embedding.yaml
@@ -0,0 +1,26 @@
+functionality:
+  namespace: batch_integration/control_methods
+  info:
+    type: control_method
+    subtype: embedding
+    type_info:
+      label: Control method (embedding)
+      summary: A batch integration embedding control method.
+      description: |
+        A batch integration control method which outputs a batch-corrected embedding.
+  arguments:
+    - name: --input
+      __merge__: file_dataset.yaml
+      direction: input
+      required: true
+    - name: --output
+      direction: output
+      __merge__: file_integrated_embedding.yaml
+      required: true
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/batch_integration/pancreas
+      dest: resources_test/batch_integration/pancreas
diff --git a/src/tasks/batch_integration/api/comp_control_method_feature.yaml b/src/tasks/batch_integration/api/comp_control_method_feature.yaml
new file mode 100644
index 0000000000..3d2ac9853d
--- /dev/null
+++ b/src/tasks/batch_integration/api/comp_control_method_feature.yaml
@@ -0,0 +1,26 @@
+functionality:
+  namespace: batch_integration/control_methods
+  info:
+    type: control_method
+    subtype: feature
+    type_info:
+      label: Control method (feature)
+      summary: A batch integration feature control method.
+      description: |
+        A batch integration control method which outputs a batch-corrected feature space.
+  arguments:
+    - name: --input
+      __merge__: file_dataset.yaml
+      direction: input
+      required: true
+    - name: --output
+      direction: output
+      __merge__: file_integrated_feature.yaml
+      required: true
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/batch_integration/pancreas
+      dest: resources_test/batch_integration/pancreas
diff --git a/src/tasks/batch_integration/api/comp_control_method_graph.yaml b/src/tasks/batch_integration/api/comp_control_method_graph.yaml
new file mode 100644
index 0000000000..cba6f48f7a
--- /dev/null
+++ b/src/tasks/batch_integration/api/comp_control_method_graph.yaml
@@ -0,0 +1,26 @@
+functionality:
+  namespace: batch_integration/control_methods
+  info:
+    type: control_method
+    subtype: graph
+    type_info:
+      label: Control method (graph)
+      summary: A batch integration graph control method.
+      description: |
+        A batch integration control method which outputs a batch-corrected cell graphs.
+  arguments:
+    - __merge__: file_dataset.yaml
+      name: --input
+      direction: input
+      required: true
+    - __merge__: file_integrated_graph.yaml
+      name: --output
+      direction: output
+      required: true
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/batch_integration/pancreas
+      dest: resources_test/batch_integration/pancreas
diff --git a/src/tasks/batch_integration/api/comp_method_embedding.yaml b/src/tasks/batch_integration/api/comp_method_embedding.yaml
new file mode 100644
index 0000000000..86e7d7caf3
--- /dev/null
+++ b/src/tasks/batch_integration/api/comp_method_embedding.yaml
@@ -0,0 +1,29 @@
+functionality:
+  namespace: batch_integration/methods
+  info:
+    type: method
+    subtype: embedding
+    type_info:
+      label: Method (embedding)
+      summary: A batch integration embedding method.
+      description: |
+        A batch integration method which outputs a batch-corrected embedding.
+  arguments:
+    - name: --input
+      __merge__: file_dataset.yaml
+      direction: input
+      required: true
+    - name: --output
+      __merge__: file_integrated_embedding.yaml
+      direction: output
+      required: true
+  test_resources:
+    # check method component
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - path: /src/common/library.bib
+    # auto-run component
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/batch_integration/pancreas
+      dest: resources_test/batch_integration/pancreas
diff --git a/src/tasks/batch_integration/api/comp_method_feature.yaml b/src/tasks/batch_integration/api/comp_method_feature.yaml
new file mode 100644
index 0000000000..d609c2dd5b
--- /dev/null
+++ b/src/tasks/batch_integration/api/comp_method_feature.yaml
@@ -0,0 +1,29 @@
+functionality:
+  namespace: batch_integration/methods
+  info:
+    type: method
+    subtype: feature
+    type_info:
+      label: Method (feature)
+      summary: A batch integration feature method.
+      description: |
+        A batch integration method which outputs a batch-corrected feature-space.
+  arguments:
+    - name: --input
+      __merge__: file_dataset.yaml
+      direction: input
+      required: true
+    - name: --output
+      __merge__: file_integrated_feature.yaml
+      direction: output
+      required: true
+  test_resources:
+    # check method component
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - path: /src/common/library.bib
+    # auto-run component
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/batch_integration/pancreas
+      dest: resources_test/batch_integration/pancreas
diff --git a/src/tasks/batch_integration/api/comp_method_graph.yaml b/src/tasks/batch_integration/api/comp_method_graph.yaml
new file mode 100644
index 0000000000..2f37146e24
--- /dev/null
+++ b/src/tasks/batch_integration/api/comp_method_graph.yaml
@@ -0,0 +1,29 @@
+functionality:
+  namespace: batch_integration/methods
+  info:
+    type: method
+    subtype: graph
+    type_info:
+      label: Method (graph)
+      summary: A batch integration graph method.
+      description: |
+        A batch integration method which outputs a batch-corrected cell graphs.
+  arguments:
+    - name: --input
+      __merge__: file_dataset.yaml
+      direction: input
+      required: true
+    - name: --output
+      __merge__: file_integrated_graph.yaml
+      direction: output
+      required: true
+  test_resources:
+    # check method component
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - path: /src/common/library.bib
+    # auto-run component
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/batch_integration/pancreas
+      dest: resources_test/batch_integration/pancreas
diff --git a/src/tasks/batch_integration/api/comp_metric_embedding.yaml b/src/tasks/batch_integration/api/comp_metric_embedding.yaml
new file mode 100644
index 0000000000..7443fca8b4
--- /dev/null
+++ b/src/tasks/batch_integration/api/comp_metric_embedding.yaml
@@ -0,0 +1,38 @@
+functionality:
+  namespace: batch_integration/metrics
+  info:
+    type: metric
+    subtype: embedding
+    type_info:
+      label: Metric (embedding)
+      summary: A batch integration embedding metric.
+      description: |
+        A metric for evaluating batch corrected embeddings.
+    test_setup:
+      pancreas:
+        input_integrated: resources_test/batch_integration/pancreas/integrated_embedding.h5ad
+        input_solution: resources_test/batch_integration/pancreas/solution.h5ad
+      cellxgene_census:
+        input_integrated: resources_test/batch_integration/cxg_mouse_pancreas_atlas/integrated_embedding.h5ad
+        input_solution: resources_test/batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad
+  arguments:
+    - name: --input_integrated
+      __merge__: file_integrated_embedding.yaml
+      direction: input
+      required: true
+    - name: --input_solution
+      __merge__: file_solution.yaml
+      direction: input
+      required: true
+    - name: --output
+      __merge__: file_score.yaml
+      direction: output
+      required: true
+  test_resources:
+    - path: /resources_test/batch_integration/
+      dest: resources_test/batch_integration/
+    # - type: python_script
+    #   path: /src/common/comp_tests/check_metric_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /src/common/library.bib
diff --git a/src/tasks/batch_integration/api/comp_metric_feature.yaml b/src/tasks/batch_integration/api/comp_metric_feature.yaml
new file mode 100644
index 0000000000..2f741d0aa2
--- /dev/null
+++ b/src/tasks/batch_integration/api/comp_metric_feature.yaml
@@ -0,0 +1,31 @@
+functionality:
+  namespace: batch_integration/metrics
+  info:
+    type: metric
+    subtype: feature
+    type_info:
+      label: Metric (feature)
+      summary: A batch integration feature metric.
+      description: |
+        A metric for evaluating batch corrected feature spaces.
+  arguments:
+    - name: --input_integrated
+      __merge__: file_integrated_feature.yaml
+      direction: input
+      required: true
+    - name: --input_solution
+      __merge__: file_solution.yaml
+      direction: input
+      required: true
+    - name: --output
+      __merge__: file_score.yaml
+      direction: output
+      required: true
+  test_resources:
+    - path: /resources_test/batch_integration/pancreas
+      dest: resources_test/batch_integration/pancreas
+    - type: python_script
+      path: /src/common/comp_tests/check_metric_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /src/common/library.bib
diff --git a/src/tasks/batch_integration/api/comp_metric_graph.yaml b/src/tasks/batch_integration/api/comp_metric_graph.yaml
new file mode 100644
index 0000000000..66935b9663
--- /dev/null
+++ b/src/tasks/batch_integration/api/comp_metric_graph.yaml
@@ -0,0 +1,31 @@
+functionality:
+  namespace: batch_integration/metrics
+  info:
+    type: metric
+    subtype: graph
+    type_info:
+      label: Metric (graph)
+      summary: A batch integration graph metric.
+      description: |
+        A metric for evaluating batch corrected cell graphs.
+  arguments:
+    - name: --input_integrated
+      __merge__: file_integrated_graph.yaml
+      direction: input
+      required: true
+    - name: --input_solution
+      __merge__: file_solution.yaml
+      direction: input
+      required: true
+    - name: --output
+      __merge__: file_score.yaml
+      direction: output
+      required: true
+  test_resources:
+    - path: /resources_test/batch_integration/pancreas
+      dest: resources_test/batch_integration/pancreas
+    - type: python_script
+      path: /src/common/comp_tests/check_metric_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /src/common/library.bib
diff --git a/src/tasks/batch_integration/api/comp_process_dataset.yaml b/src/tasks/batch_integration/api/comp_process_dataset.yaml
new file mode 100644
index 0000000000..715ef6d3c3
--- /dev/null
+++ b/src/tasks/batch_integration/api/comp_process_dataset.yaml
@@ -0,0 +1,45 @@
+functionality:
+  namespace: batch_integration
+  info:
+    type: process_dataset
+    type_info:
+      label: Data processor
+      summary: A label projection dataset processor.
+      description: |
+        A component for processing a Common Dataset into a task-specific dataset.
+  arguments:
+    - name: "--input"
+      __merge__: file_common_dataset.yaml
+      direction: input
+      required: true
+    - name: "--output_dataset"
+      __merge__: file_dataset.yaml
+      direction: output
+      required: true
+    - name: "--output_solution"
+      __merge__: file_solution.yaml
+      direction: output
+      required: true
+    - name: "--obs_label"
+      type: "string"
+      description: "Which .obs slot to use as label."
+      default: "cell_type"
+    - name: "--obs_batch"
+      type: "string"
+      description: "Which .obs slot to use as batch covariate."
+      default: "batch"
+    - name: --hvgs
+      type: integer
+      description: Number of highly variable genes
+      default: 2000
+      required: false
+    - name: --subset_hvg
+      type: boolean
+      description: Whether to subset to highly variable genes
+      default: false
+      required: false
+  test_resources:
+    - path: /resources_test/common/pancreas/
+      dest: resources_test/common/pancreas/
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
\ No newline at end of file
diff --git a/src/tasks/batch_integration/api/comp_transformer_embedding_to_graph.yaml b/src/tasks/batch_integration/api/comp_transformer_embedding_to_graph.yaml
new file mode 100644
index 0000000000..d8e815dad5
--- /dev/null
+++ b/src/tasks/batch_integration/api/comp_transformer_embedding_to_graph.yaml
@@ -0,0 +1,25 @@
+functionality:
+  namespace: batch_integration/transformers
+  info:
+    type: transformer
+    subtype: graph
+    type_info:
+      label: Embedding to Graph
+      summary: Transform an embedding to a graph output.
+      description: |
+        Transform an embedding to a graph output by applying the k nearest neighbors algorithm.
+  arguments:
+    - name: --input
+      __merge__: file_integrated_embedding.yaml
+      direction: input
+      required: true
+    - name: --output
+      __merge__: file_integrated_graph.yaml
+      direction: output
+      required: true
+  test_resources:
+    # auto-run component
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/batch_integration/pancreas
+      dest: resources_test/batch_integration/pancreas
\ No newline at end of file
diff --git a/src/tasks/batch_integration/api/comp_transformer_feature_to_embedding.yaml b/src/tasks/batch_integration/api/comp_transformer_feature_to_embedding.yaml
new file mode 100644
index 0000000000..788e4b965a
--- /dev/null
+++ b/src/tasks/batch_integration/api/comp_transformer_feature_to_embedding.yaml
@@ -0,0 +1,25 @@
+functionality:
+  namespace: batch_integration/transformers
+  info:
+    type: transformer
+    subtype: embedding
+    type_info:
+      label: Feature to Embedding
+      summary: Transform a feature output to an embedding.
+      description: |
+        Transform a feature output to an embedding by computing a PCA on the corrected counts.
+  arguments:
+    - name: --input
+      __merge__: file_integrated_feature.yaml
+      direction: input
+      required: true
+    - name: --output
+      __merge__: file_integrated_embedding.yaml
+      direction: output
+      required: true
+  test_resources:
+    # auto-run component
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/batch_integration/pancreas
+      dest: resources_test/batch_integration/pancreas
\ No newline at end of file
diff --git a/src/tasks/batch_integration/api/file_common_dataset.yaml b/src/tasks/batch_integration/api/file_common_dataset.yaml
new file mode 100644
index 0000000000..097a6794a1
--- /dev/null
+++ b/src/tasks/batch_integration/api/file_common_dataset.yaml
@@ -0,0 +1,92 @@
+# This file is based on the spec of the common dataset located at
+# `src/datasets/api/file_common_dataset.yaml`. However, some fields
+# such as obs.cell_type and obs.batch are now required
+type: file
+example: "resources_test/common/pancreas/dataset.h5ad"
+info:
+  label: "Common Dataset"
+  summary: A subset of the common dataset.
+  slots:
+    layers:
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized expression values
+        required: true
+    obs:
+      - type: string
+        name: cell_type
+        description: Cell type information
+        required: true
+      - type: string
+        name: batch
+        description: Batch information
+        required: true
+    var:
+      - type: boolean
+        name: hvg
+        description: Whether or not the feature is considered to be a 'highly variable gene'
+        required: true
+      - type: double
+        name: hvg_score
+        description: A ranking of the features by hvg.
+        required: true
+      - type: string
+        name: feature_name
+        description: A human-readable name for the feature, usually a gene symbol.
+        required: true
+    obsm:
+      - type: double
+        name: X_pca
+        description: The resulting PCA embedding.
+        required: true
+    obsp:
+      - type: double
+        name: knn_distances
+        description: K nearest neighbors distance matrix.
+        required: true
+      - type: double
+        name: knn_connectivities
+        description: K nearest neighbors connectivities matrix.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
+      - type: object
+        name: knn
+        description: Supplementary K nearest neighbors data.
+        required: true
+
diff --git a/src/tasks/batch_integration/api/file_dataset.yaml b/src/tasks/batch_integration/api/file_dataset.yaml
new file mode 100644
index 0000000000..6d1eb928d8
--- /dev/null
+++ b/src/tasks/batch_integration/api/file_dataset.yaml
@@ -0,0 +1,69 @@
+type: file
+example: "resources_test/batch_integration/pancreas/dataset.h5ad"
+info:
+  label: "Dataset"
+  summary: Unintegrated AnnData HDF5 file.
+  slots:
+    layers:
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized expression values
+        required: true
+    obs:
+      - type: string
+        name: batch
+        description: Batch information
+        required: true
+      - type: string
+        name: label
+        description: label information
+        required: true
+    var:
+      - type: boolean
+        name: hvg
+        description: Whether or not the feature is considered to be a 'highly variable gene'
+        required: true
+      - type: double
+        name: hvg_score
+        description: A ranking of the features by hvg.
+        required: true
+      - type: string
+        name: feature_name
+        description: A human-readable name for the feature, usually a gene symbol.
+        required: true
+    obsm:
+      - type: double
+        name: X_pca
+        description: The resulting PCA embedding.
+        required: true
+    obsp:
+      - type: double
+        name: knn_distances
+        description: K nearest neighbors distance matrix.
+        required: true
+      - type: double
+        name: knn_connectivities
+        description: K nearest neighbors connectivities matrix.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - type: object
+        name: knn
+        description: Supplementary K nearest neighbors data.
+        required: true
+
diff --git a/src/tasks/batch_integration/api/file_integrated_embedding.yaml b/src/tasks/batch_integration/api/file_integrated_embedding.yaml
new file mode 100644
index 0000000000..aa526abe71
--- /dev/null
+++ b/src/tasks/batch_integration/api/file_integrated_embedding.yaml
@@ -0,0 +1,29 @@
+type: file
+example: "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+info:
+  prediction_type: embedding
+  label: "Integrated embedding"
+  summary: An integrated AnnData HDF5 file.
+  slots:
+    obsm:
+      - type: double
+        name: X_emb
+        description: integration embedding prediction
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - type: string
+        name: method_id
+        description: "A unique identifier for the method"
+        required: true
diff --git a/src/tasks/batch_integration/api/file_integrated_feature.yaml b/src/tasks/batch_integration/api/file_integrated_feature.yaml
new file mode 100644
index 0000000000..b89e16f907
--- /dev/null
+++ b/src/tasks/batch_integration/api/file_integrated_feature.yaml
@@ -0,0 +1,29 @@
+type: file
+example: "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+info:
+  prediction_type: feature
+  label: "Integrated Feature"
+  summary: Integrated AnnData HDF5 file.
+  slots:
+    layers:
+      - type: double
+        name: corrected_counts
+        description: Corrected counts after integration
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - type: string
+        name: method_id
+        description: "A unique identifier for the method"
+        required: true
\ No newline at end of file
diff --git a/src/tasks/batch_integration/api/file_integrated_graph.yaml b/src/tasks/batch_integration/api/file_integrated_graph.yaml
new file mode 100644
index 0000000000..8c09147d0d
--- /dev/null
+++ b/src/tasks/batch_integration/api/file_integrated_graph.yaml
@@ -0,0 +1,37 @@
+type: file
+example: "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+info:
+  prediction_type: graph
+  label: "Integrated Graph"
+  summary: Integrated AnnData HDF5 file.
+  slots:
+    obsp:
+      - type: double
+        name: connectivities
+        description: Neighbors connectivities matrix.
+        required: true
+      - type: double
+        name: distances
+        description: Neighbors connectivities matrix.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - type: string
+        name: method_id
+        description: "A unique identifier for the method"
+        required: true
+      - type: object
+        name: neighbors
+        description: Supplementary K nearest neighbors data.
+        required: true
diff --git a/src/tasks/batch_integration/api/file_score.yaml b/src/tasks/batch_integration/api/file_score.yaml
new file mode 100644
index 0000000000..9b4dac654f
--- /dev/null
+++ b/src/tasks/batch_integration/api/file_score.yaml
@@ -0,0 +1,29 @@
+type: file
+example: "score.h5ad"
+info:
+  label: "Score"
+  summary: "Metric score file"
+  slots:
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
+      - type: string
+        name: method_id
+        description: "A unique identifier for the method"
+        required: true
+      - type: string
+        name: metric_ids
+        description: "One or more unique metric identifiers"
+        multiple: true
+        required: true        
+      - type: double
+        name: metric_values
+        description: "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'."
+        multiple: true
+        required: true
\ No newline at end of file
diff --git a/src/tasks/batch_integration/api/file_solution.yaml b/src/tasks/batch_integration/api/file_solution.yaml
new file mode 100644
index 0000000000..7e8b07ea4c
--- /dev/null
+++ b/src/tasks/batch_integration/api/file_solution.yaml
@@ -0,0 +1,89 @@
+type: file
+example: "resources_test/batch_integration/pancreas/solution.h5ad"
+info:
+  label: "Solution"
+  summary: Solution dataset
+  slots:
+    layers:
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized expression values
+        required: true
+    obs:
+      - type: string
+        name: batch
+        description: Batch information
+        required: true
+      - type: string
+        name: label
+        description: label information
+        required: true
+    var:
+      - type: boolean
+        name: hvg
+        description: Whether or not the feature is considered to be a 'highly variable gene'
+        required: true
+      - type: double
+        name: hvg_score
+        description: A ranking of the features by hvg.
+        required: true
+      - type: string
+        name: feature_name
+        description: A human-readable name for the feature, usually a gene symbol.
+        required: true
+    obsm:
+      - type: double
+        name: X_pca
+        description: The resulting PCA embedding.
+        required: true
+    obsp:
+      - type: double
+        name: knn_distances
+        description: K nearest neighbors distance matrix.
+        required: true
+      - type: double
+        name: knn_connectivities
+        description: K nearest neighbors connectivities matrix.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
+      - type: object
+        name: knn
+        description: Supplementary K nearest neighbors data.
+        required: true
+
diff --git a/src/tasks/batch_integration/api/task_info.yaml b/src/tasks/batch_integration/api/task_info.yaml
new file mode 100644
index 0000000000..bc3a575029
--- /dev/null
+++ b/src/tasks/batch_integration/api/task_info.yaml
@@ -0,0 +1,41 @@
+name: batch_integration
+label: Batch Integration
+v1:
+  path: openproblems/tasks/batch_integration/README.md
+  commit: 637163fba7d74ab5393c2adbee5354dcf4d46f85
+summary: Remove unwanted batch effects from scRNA data while retaining biologically meaningful variation.
+image: thumbnail.svg
+motivation: |
+  As single-cell technologies advance, single-cell datasets are growing both in size and complexity.
+  Especially in consortia such as the Human Cell Atlas, individual studies combine data from multiple labs, each sequencing multiple individuals possibly with different technologies.
+  This gives rise to complex batch effects in the data that must be computationally removed to perform a joint analysis.
+  These batch integration methods must remove the batch effect while not removing relevant biological information.
+  Currently, over 200 tools exist that aim to remove batch effects scRNA-seq datasets [@zappia2018exploring].
+  These methods balance the removal of batch effects with the conservation of nuanced biological information in different ways.
+  This abundance of tools has complicated batch integration method choice, leading to several benchmarks on this topic [@luecken2020benchmarking; @tran2020benchmark; @chazarragil2021flexible; @mereu2020benchmarking].
+  Yet, benchmarks use different metrics, method implementations and datasets. Here we build a living benchmarking task for batch integration methods with the vision of improving the consistency of method evaluation.
+description: |
+  In this task we evaluate batch integration methods on their ability to remove batch effects in the data while conserving variation attributed to biological effects.
+  As input, methods require either normalised or unnormalised data with multiple batches and consistent cell type labels.
+  The batch integrated output can be a feature matrix, a low dimensional embedding and/or a neighbourhood graph.
+  The respective batch-integrated representation is then evaluated using sets of metrics that capture how well batch effects are removed and whether biological variance is conserved.
+  We have based this particular task on the latest, and most extensive benchmark of single-cell data integration methods.
+authors:
+  - name: Michaela Mueller
+    roles: [ maintainer, author ]
+    info:
+      github: mumichae
+  - name: Kai Waldrant
+    roles: [ contributor ]
+    info:
+      github: KaiWaldrant
+      orcid: "0009-0003-8555-1361"
+  - name: Robrecht Cannoodt
+    roles: [ contributor ]
+    info:
+      github: rcannood
+      orcid: "0000-0003-3641-729X"
+  - name: Daniel Strobl
+    roles: [ author ]
+    info:
+      github: danielStrobl
diff --git a/src/tasks/batch_integration/api/thumbnail.svg b/src/tasks/batch_integration/api/thumbnail.svg
new file mode 100644
index 0000000000..77626c5bfb
--- /dev/null
+++ b/src/tasks/batch_integration/api/thumbnail.svg
@@ -0,0 +1 @@
+<?xml version="1.0" encoding="UTF-8"?><svg id="Layer_1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 600 200"><defs><style>.cls-1{stroke:#211f1f;stroke-width:3px;}.cls-1,.cls-2,.cls-3,.cls-4,.cls-5,.cls-6,.cls-7,.cls-8,.cls-9{stroke-miterlimit:10;}.cls-1,.cls-3{fill:none;}.cls-10{fill:#211f1f;}.cls-2{fill:#9b7ebd;stroke:#1b1464;}.cls-2,.cls-4,.cls-5,.cls-6,.cls-7,.cls-8,.cls-9{opacity:.9;}.cls-2,.cls-4,.cls-5,.cls-6,.cls-8,.cls-9{stroke-width:.5px;}.cls-11{font-family:ArialMT, Arial;font-size:16px;}.cls-3{stroke:#231f20;stroke-width:2px;}.cls-4{fill:#b7d59b;}.cls-4,.cls-9{stroke:#006837;}.cls-5{fill:#fba29a;stroke:#ff1d25;}.cls-6{fill:#fcd375;stroke:#f7931e;}.cls-7{fill:#ccc;stroke:gray;stroke-width:.75px;}.cls-8{fill:#69aee9;stroke:#0071bc;}.cls-9{fill:#00a99d;}</style></defs><g><g><ellipse class="cls-6" cx="383.1" cy="93.49" rx="5.5" ry="5.48" transform="translate(-1.3 5.51) rotate(-.82)"/><ellipse class="cls-6" cx="396.94" cy="65.36" rx="5.5" ry="5.48" transform="translate(-.9 5.71) rotate(-.82)"/><ellipse class="cls-6" cx="382.92" cy="80.54" rx="5.48" ry="5.49" transform="translate(191.24 421.67) rotate(-72.55)"/><ellipse class="cls-6" cx="397.44" cy="80.54" rx="5.48" ry="5.49" transform="translate(201.4 435.53) rotate(-72.55)"/><ellipse class="cls-6" cx="382.68" cy="63.79" rx="5.39" ry="5.45" transform="translate(285.33 439.34) rotate(-84.93)"/><ellipse class="cls-2" cx="475.67" cy="35.33" rx="5.48" ry="5.49"/><ellipse class="cls-2" cx="426.43" cy="48.87" rx="5.48" ry="5.49"/><ellipse class="cls-2" cx="440.18" cy="44.94" rx="5.48" ry="5.49"/><ellipse class="cls-2" cx="448.47" cy="34.06" rx="5.48" ry="5.49"/><ellipse class="cls-8" cx="501.2" cy="50.04" rx="5.49" ry="5.48" transform="translate(292.48 508.45) rotate(-71.22)"/><ellipse class="cls-8" cx="524.55" cy="36.76" rx="5.48" ry="5.49"/><ellipse class="cls-8" cx="500.5" cy="20.39" rx="5.48" ry="5.49"/><ellipse class="cls-8" cx="484.65" cy="26.01" rx="5.48" ry="5.49"/><ellipse class="cls-8" cx="496.9" cy="35.98" rx="5.49" ry="5.48" transform="translate(302.87 494.85) rotate(-71.22)"/><ellipse class="cls-8" cx="527.48" cy="23.89" rx="5.49" ry="5.48" transform="translate(335.06 515.6) rotate(-71.22)"/><ellipse class="cls-8" cx="488.26" cy="45.64" rx="5.49" ry="5.48" transform="translate(287.87 493.22) rotate(-71.22)"/><rect class="cls-2" x="452.7" y="41.41" width="10.43" height="10.99"/><rect class="cls-8" x="508.5" y="62.73" width="10.43" height="10.99"/><rect class="cls-6" x="391.32" y="88.07" width="10.43" height="10.99"/><rect class="cls-6" x="392.23" y="44.67" width="10.43" height="10.99"/><rect class="cls-6" x="399.03" y="101.63" width="10.43" height="10.99"/><rect class="cls-2" x="417.6" y="57.23" width="10.43" height="10.99"/><rect class="cls-2" x="457.47" y="25.28" width="10.43" height="10.99"/><rect class="cls-2" x="439.86" y="51.94" width="10.43" height="10.99"/><rect class="cls-8" x="532.93" y="34.33" width="10.43" height="10.99"/><rect class="cls-8" x="509.77" y="16.26" width="10.43" height="10.99"/><rect class="cls-8" x="512.99" y="46.44" width="10.43" height="10.99"/><rect class="cls-8" x="505.73" y="31.27" width="10.43" height="10.99"/><ellipse class="cls-5" cx="551.74" cy="119.72" rx="5.48" ry="5.49" transform="translate(148.82 515.21) rotate(-56.71)"/><ellipse class="cls-5" cx="524.56" cy="123.01" rx="5.48" ry="5.49" transform="translate(133.82 493.97) rotate(-56.71)"/><ellipse class="cls-5" cx="542.21" cy="104.98" rx="5.39" ry="5.45" transform="translate(358.43 625.78) rotate(-81.52)"/><rect class="cls-5" x="520.74" y="104.04" width="10.43" height="10.99"/><rect class="cls-5" x="523.19" y="91.36" width="10.43" height="10.99"/><rect class="cls-5" x="533.49" y="112.02" width="10.43" height="10.99"/><ellipse class="cls-4" cx="502.3" cy="122.25" rx="5.49" ry="5.48" transform="translate(-4.22 225.62) rotate(-25.21)"/><ellipse class="cls-4" cx="471.98" cy="160.26" rx="5.49" ry="5.48" transform="translate(-23.3 216.32) rotate(-25.21)"/><ellipse class="cls-4" cx="458.43" cy="158" rx="5.49" ry="5.48" transform="translate(-23.63 210.33) rotate(-25.21)"/><ellipse class="cls-9" cx="486.04" cy="127.73" rx="5.49" ry="5.48" transform="translate(-8.11 219.21) rotate(-25.21)"/><ellipse class="cls-9" cx="469.48" cy="101.62" rx="5.49" ry="5.48" transform="translate(1.44 209.67) rotate(-25.21)"/><ellipse class="cls-9" cx="459.05" cy="127.73" rx="5.49" ry="5.48" transform="translate(-10.68 207.72) rotate(-25.21)"/><ellipse class="cls-9" cx="477.19" cy="115.71" rx="5.49" ry="5.48" transform="translate(-3.83 214.3) rotate(-25.21)"/><ellipse class="cls-4" cx="510.87" cy="137.65" rx="5.49" ry="5.48" transform="translate(-9.96 230.74) rotate(-25.21)"/><rect class="cls-4" x="491.88" y="139.34" width="10.43" height="10.99"/><rect class="cls-4" x="479.48" y="161.55" width="10.43" height="10.99"/><rect class="cls-4" x="479.48" y="146.77" width="10.43" height="10.99"/><rect class="cls-4" x="466.76" y="168.62" width="10.43" height="10.99"/><rect class="cls-4" x="482.63" y="174.11" width="10.43" height="10.99"/><rect class="cls-4" x="465.9" y="142.1" width="10.43" height="10.99"/><rect class="cls-4" x="453.21" y="135.24" width="10.43" height="10.99"/><rect class="cls-4" x="504.27" y="144.84" width="10.43" height="10.99"/><rect class="cls-9" x="477.42" y="93" width="10.43" height="10.99"/><rect class="cls-9" x="466.49" y="125.89" width="10.43" height="10.99"/><rect class="cls-9" x="484.91" y="107.62" width="10.43" height="10.99"/><rect class="cls-9" x="457.37" y="109.18" width="10.43" height="10.99"/></g><g><ellipse class="cls-6" cx="149.5" cy="179.94" rx="5.5" ry="5.48" transform="translate(-2.57 2.16) rotate(-.82)"/><ellipse class="cls-6" cx="149.32" cy="166.99" rx="5.48" ry="5.49" transform="translate(-54.77 259.35) rotate(-72.55)"/><ellipse class="cls-6" cx="163.84" cy="166.99" rx="5.48" ry="5.49" transform="translate(-44.6 273.2) rotate(-72.55)"/><ellipse class="cls-2" cx="159.35" cy="105.01" rx="5.48" ry="5.49"/><ellipse class="cls-8" cx="162.31" cy="69.54" rx="5.49" ry="5.48" transform="translate(44.23 200.83) rotate(-71.22)"/><ellipse class="cls-5" cx="172.01" cy="46.03" rx="5.48" ry="5.49" transform="translate(39.12 164.55) rotate(-56.71)"/><ellipse class="cls-5" cx="172.69" cy="60.61" rx="5.39" ry="5.45" transform="translate(87.28 222.48) rotate(-81.52)"/><ellipse class="cls-6" cx="156.95" cy="153.94" rx="5.5" ry="5.48" transform="translate(-2.19 2.27) rotate(-.82)"/><rect class="cls-8" x="30.85" y="95.89" width="10.43" height="10.99"/><rect class="cls-2" x="39.41" y="51.35" width="10.43" height="10.99"/><rect class="cls-5" x="88.84" y="109.89" width="10.99" height="10.43"/><rect class="cls-4" x="95.07" y="95.84" width="10.99" height="10.43"/><rect class="cls-9" x="96.9" y="59.84" width="10.99" height="10.43"/><rect class="cls-6" x="49.51" y="23.66" width="10.43" height="10.99"/><rect class="cls-6" x="41.38" y="38.36" width="10.43" height="10.99"/><rect class="cls-2" x="55.51" y="60.85" width="10.43" height="10.99"/><rect class="cls-5" x="72.11" y="108.07" width="10.99" height="10.43"/><rect class="cls-8" x="44.07" y="104.45" width="10.43" height="10.99"/><rect class="cls-2" x="37.22" y="81.39" width="10.43" height="10.99"/><rect class="cls-5" x="49.45" y="120.58" width="10.99" height="10.43"/><rect class="cls-4" x="77.71" y="82.31" width="10.99" height="10.43"/><rect class="cls-9" x="92.04" y="72.21" width="10.99" height="10.43"/><rect class="cls-6" x="63.3" y="14.58" width="10.43" height="10.99"/><rect class="cls-6" x="36.17" y="25.2" width="10.43" height="10.99"/><rect class="cls-2" x="41.26" y="64.83" width="10.43" height="10.99"/><rect class="cls-5" x="57.74" y="107.49" width="10.99" height="10.43"/><rect class="cls-8" x="27.41" y="109.95" width="10.43" height="10.99"/><rect class="cls-2" x="57" y="42.47" width="10.43" height="10.99"/><rect class="cls-4" x="79.92" y="95.58" width="10.99" height="10.43"/><rect class="cls-9" x="104.22" y="82.31" width="10.99" height="10.43"/><rect class="cls-6" x="71.09" y="42.47" width="10.43" height="10.99"/><rect class="cls-6" x="64.1" y="29.15" width="10.43" height="10.99"/><rect class="cls-2" x="28.55" y="68.7" width="10.43" height="10.99"/><rect class="cls-5" x="62.98" y="120.81" width="10.99" height="10.43"/><rect class="cls-9" x="79.92" y="60.66" width="10.99" height="10.43"/><ellipse class="cls-8" cx="157.44" cy="83.87" rx="5.49" ry="5.48" transform="translate(27.36 205.93) rotate(-71.22)"/><ellipse class="cls-8" cx="148.33" cy="66.22" rx="5.49" ry="5.48" transform="translate(37.88 185.34) rotate(-71.22)"/><ellipse class="cls-2" cx="171.68" cy="114.64" rx="5.48" ry="5.49"/><ellipse class="cls-2" cx="169.71" cy="130.85" rx="5.48" ry="5.49"/><ellipse class="cls-2" cx="156.95" cy="121.67" rx="5.48" ry="5.49"/><ellipse class="cls-2" cx="172.01" cy="147.07" rx="5.48" ry="5.49"/><ellipse class="cls-8" cx="155.99" cy="52.19" rx="5.49" ry="5.48" transform="translate(56.36 183.08) rotate(-71.22)"/><ellipse class="cls-8" cx="171.68" cy="94.01" rx="5.49" ry="5.48" transform="translate(27.41 226.29) rotate(-71.22)"/><ellipse class="cls-8" cx="172.54" cy="80.46" rx="5.49" ry="5.48" transform="translate(40.82 217.91) rotate(-71.22)"/><ellipse class="cls-5" cx="174.47" cy="33.22" rx="5.48" ry="5.49" transform="translate(50.94 160.82) rotate(-56.71)"/><ellipse class="cls-5" cx="182.99" cy="52.19" rx="5.39" ry="5.45" transform="translate(104.39 225.49) rotate(-81.52)"/><ellipse class="cls-4" cx="227.6" cy="65.94" rx="5.49" ry="5.48" transform="translate(142.9 287.25) rotate(-85.21)"/><ellipse class="cls-4" cx="218.87" cy="76.55" rx="5.49" ry="5.48" transform="translate(124.32 288.26) rotate(-85.21)"/><ellipse class="cls-9" cx="192.03" cy="42.04" rx="5.49" ry="5.48" transform="translate(134.11 229.89) rotate(-85.21)"/><ellipse class="cls-4" cx="223.23" cy="51.86" rx="5.49" ry="5.48" transform="translate(10.31 136.63) rotate(-34.79)"/><ellipse class="cls-4" cx="223.23" cy="36.63" rx="5.49" ry="5.48" transform="translate(19 133.91) rotate(-34.79)"/><ellipse class="cls-9" cx="206.14" cy="51.86" rx="5.49" ry="5.48" transform="translate(7.25 126.88) rotate(-34.79)"/><ellipse class="cls-9" cx="187.93" cy="28.39" rx="5.49" ry="5.48" transform="translate(143.96 213.3) rotate(-85.21)"/><ellipse class="cls-9" cx="208.22" cy="37.21" rx="5.49" ry="5.48" transform="translate(15.98 125.44) rotate(-34.79)"/></g></g><text class="cls-11" transform="translate(266.28 157.89)"><tspan x="0" y="0">Batch 1</tspan></text><text class="cls-11" transform="translate(266.28 177.87)"><tspan x="0" y="0">Batch 2</tspan></text><ellipse class="cls-7" cx="256.62" cy="151.57" rx="6.5" ry="6.48" transform="translate(-32.11 77.68) rotate(-16.21)"/><rect class="cls-7" x="250.45" y="165.52" width="12.34" height="13"/><g><polyline class="cls-3" points="431.8 171.88 387.75 171.88 387.75 130.19"/><text class="cls-11" transform="translate(387.75 188.34)"><tspan x="0" y="0">dim-2</tspan></text><text class="cls-11" transform="translate(383.03 171.88) rotate(-90)"><tspan x="0" y="0">dim-1</tspan></text></g><g><polyline class="cls-3" points="72.34 171.88 28.28 171.88 28.28 130.19"/><text class="cls-11" transform="translate(28.29 188.34)"><tspan x="0" y="0">dim-2</tspan></text><text class="cls-11" transform="translate(23.57 171.88) rotate(-90)"><tspan x="0" y="0">dim-1</tspan></text></g><g><line class="cls-1" x1="246.33" y1="106.88" x2="330.15" y2="106.88"/><polygon class="cls-10" points="327.96 114.36 340.92 106.88 327.96 99.4 327.96 114.36"/></g></svg>
\ No newline at end of file
diff --git a/src/tasks/batch_integration/control_methods/no_integration/batch_embed/config.vsh.yaml b/src/tasks/batch_integration/control_methods/no_integration/batch_embed/config.vsh.yaml
new file mode 100644
index 0000000000..c2484fbaa2
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/no_integration/batch_embed/config.vsh.yaml
@@ -0,0 +1,24 @@
+# use method api spec
+__merge__: ../../../api/comp_control_method_embedding.yaml
+functionality:
+  name: batch_embed
+  namespace: batch_integration/control_methods/no_integration
+  info:
+    label: No integration by Batch
+    summary: "Cells are embedded by computing PCA independently on each batch"
+    description: "Cells are embedded by computing PCA independently on each batch"
+    v1:
+      path: openproblems/tasks/_batch_integration/batch_integration_embed/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [midtime, lowmem, lowcpu]
diff --git a/src/tasks/batch_integration/control_methods/no_integration/batch_embed/script.py b/src/tasks/batch_integration/control_methods/no_integration/batch_embed/script.py
new file mode 100644
index 0000000000..801440ce65
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/no_integration/batch_embed/script.py
@@ -0,0 +1,49 @@
+import sys
+import scanpy as sc
+import numpy as np
+
+## VIASH START
+
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad',
+}
+
+meta = { 
+    'functionality': 'foo',
+    'config': 'bar'
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+adata.var["highly_variable"] = adata.var["hvg"]
+
+print("Process dataset", flush=True)
+adata.obsm["X_emb"] = np.zeros((adata.shape[0], 50), dtype=float)
+for batch in adata.obs["batch"].unique():
+    batch_idx = adata.obs["batch"] == batch
+    n_comps = min(50, np.sum(batch_idx))
+    solver = "full" if n_comps == np.sum(batch_idx) else "arpack"
+    adata.obsm["X_emb"][batch_idx, :n_comps] = sc.tl.pca(
+        adata[batch_idx].copy(),
+        n_comps=n_comps,
+        use_highly_variable=True,
+        svd_solver=solver,
+        copy=True,
+    ).obsm["X_pca"]
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
\ No newline at end of file
diff --git a/src/tasks/batch_integration/control_methods/no_integration/global_embed/config.vsh.yaml b/src/tasks/batch_integration/control_methods/no_integration/global_embed/config.vsh.yaml
new file mode 100644
index 0000000000..95212518c5
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/no_integration/global_embed/config.vsh.yaml
@@ -0,0 +1,24 @@
+# use method api spec
+__merge__: ../../../api/comp_control_method_embedding.yaml
+functionality:
+  name: global_embed
+  namespace: batch_integration/control_methods/no_integration
+  info:
+    label: No integration
+    summary: "Cells are embedded by PCA on the unintegrated data"
+    description: "Cells are embedded by PCA on the unintegrated data"
+    v1:
+      path: openproblems/tasks/_batch_integration/_common/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [ "midtime", "lowmem", "lowcpu"]
diff --git a/src/tasks/batch_integration/control_methods/no_integration/global_embed/script.py b/src/tasks/batch_integration/control_methods/no_integration/global_embed/script.py
new file mode 100644
index 0000000000..f45038806b
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/no_integration/global_embed/script.py
@@ -0,0 +1,36 @@
+import sys
+import scanpy as sc
+
+## VIASH START
+
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad',
+}
+
+meta = { 
+    'functionality': 'foo',
+    'config': 'bar',
+    "resources_dir": "src/tasks/batch_integration/control_methods/"
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+print("process dataset", flush=True)
+adata.obsm["X_emb"] = adata.obsm["X_pca"]
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
\ No newline at end of file
diff --git a/src/tasks/batch_integration/control_methods/no_integration/global_feature/config.vsh.yaml b/src/tasks/batch_integration/control_methods/no_integration/global_feature/config.vsh.yaml
new file mode 100644
index 0000000000..b20701c8f1
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/no_integration/global_feature/config.vsh.yaml
@@ -0,0 +1,24 @@
+# use method api spec
+__merge__: ../../../api/comp_control_method_feature.yaml
+functionality:
+  name: global_feature
+  namespace: batch_integration/control_methods/no_integration
+  info:
+    label: No integration
+    summary: "Original feature space is not modified"
+    description: "Original feature space is not modified"
+    v1:
+      path: openproblems/tasks/_batch_integration/_common/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [ "midtime", "lowmem", "lowcpu"]
diff --git a/src/tasks/batch_integration/control_methods/no_integration/global_feature/script.py b/src/tasks/batch_integration/control_methods/no_integration/global_feature/script.py
new file mode 100644
index 0000000000..2acdbf9b7a
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/no_integration/global_feature/script.py
@@ -0,0 +1,38 @@
+import sys
+import scanpy as sc
+
+## VIASH START
+
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad',
+}
+
+meta = { 
+    'functionality': 'foo',
+    'config': 'bar',
+    "resources_dir": "src/tasks/batch_integration/control_methods/"
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+# no processing, subset matrix to highly variable genes
+adata_hvg = adata[:, adata.var["hvg"]].copy()
+adata.layers['corrected_counts'] = adata_hvg.X.copy()
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/control_methods/no_integration/global_graph/config.vsh.yaml b/src/tasks/batch_integration/control_methods/no_integration/global_graph/config.vsh.yaml
new file mode 100644
index 0000000000..86886ce263
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/no_integration/global_graph/config.vsh.yaml
@@ -0,0 +1,25 @@
+# use method api spec
+__merge__: ../../../api/comp_control_method_graph.yaml
+functionality:
+  name: global_graph
+  namespace: batch_integration/control_methods/no_integration
+  info:
+    label: No integration
+    summary: "kNN graph is built on the PCA of the unintegrated data"
+    description: "Cells are embedded by PCA on the unintegrated data. A kNN graph is built on this PCA."
+    v1:
+      path: openproblems/tasks/_batch_integration/_common/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+    - path: ../../utils.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [ "midtime", "lowmem", "lowcpu"]
diff --git a/src/tasks/batch_integration/control_methods/no_integration/global_graph/script.py b/src/tasks/batch_integration/control_methods/no_integration/global_graph/script.py
new file mode 100644
index 0000000000..4824c8f443
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/no_integration/global_graph/script.py
@@ -0,0 +1,41 @@
+import scanpy as sc
+import sys
+
+## VIASH START
+
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad',
+}
+
+meta = { 
+    'functionality': 'foo',
+    'config': 'bar',
+    "resources_dir": "src/tasks/batch_integration/control_methods/"
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _set_uns
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsp='obsp',
+    uns='uns'
+)
+
+print("process dataset", flush=True)
+neighbors_map = adata.uns['knn']
+adata.obsp['connectivities'] = adata.obsp[neighbors_map['connectivities_key']]
+adata.obsp['distances'] = adata.obsp[neighbors_map['distances_key']]
+_set_uns(adata, neighbors_key='knn')
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
\ No newline at end of file
diff --git a/src/tasks/batch_integration/control_methods/perfect_integration/celltype_embed/config.vsh.yaml b/src/tasks/batch_integration/control_methods/perfect_integration/celltype_embed/config.vsh.yaml
new file mode 100644
index 0000000000..6c853a7719
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/perfect_integration/celltype_embed/config.vsh.yaml
@@ -0,0 +1,25 @@
+# use method api spec
+__merge__: ../../../api/comp_control_method_embedding.yaml
+functionality:
+  name: celltype_embed
+  namespace: batch_integration/control_methods/perfect_integration
+  info:
+    label: Perfect embedding by cell type
+    summary: "Cells are embedded as a one-hot encoding of celltype labels"
+    description: "Cells are embedded as a one-hot encoding of celltype labels"
+    v1:
+      path: openproblems/tasks/_batch_integration/_common/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+    - path: ../../utils.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [midtime, lowmem, lowcpu]
\ No newline at end of file
diff --git a/src/tasks/batch_integration/control_methods/perfect_integration/celltype_embed/script.py b/src/tasks/batch_integration/control_methods/perfect_integration/celltype_embed/script.py
new file mode 100644
index 0000000000..ca16a60ab2
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/perfect_integration/celltype_embed/script.py
@@ -0,0 +1,34 @@
+import anndata as ad
+import sys
+
+## VIASH START
+
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad',
+}
+
+meta = { 
+    'functionality': 'foo',
+    'config': 'bar'
+}
+
+## VIASH END
+sys.path.append(meta["resources_dir"])
+from utils import _perfect_embedding
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    uns='uns'
+)
+
+print('Process data...', flush=True)
+adata.obsm["X_emb"] = _perfect_embedding(partition=adata.obs["label"])
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/config.vsh.yaml b/src/tasks/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/config.vsh.yaml
new file mode 100644
index 0000000000..e945e3bc58
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/config.vsh.yaml
@@ -0,0 +1,29 @@
+# use method api spec
+__merge__: ../../../api/comp_control_method_embedding.yaml
+functionality:
+  name: celltype_jitter_embed
+  namespace: batch_integration/control_methods/perfect_integration
+  info:
+    label: Perfect embedding by celltype with jitter
+    summary: "Cells are embedded as a one-hot encoding of celltype labels, with a small amount of random noise added to the embedding"
+    description: "Cells are embedded as a one-hot encoding of celltype labels, with a small amount of random noise added to the embedding"
+    v1:
+      path: openproblems/tasks/_batch_integration/batch_integration_embed/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+  arguments:
+    - name: "--jitter"
+      type: double
+      default: 0.01
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+    - path: ../../utils.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [midtime, lowmem, lowcpu]
\ No newline at end of file
diff --git a/src/tasks/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/script.py b/src/tasks/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/script.py
new file mode 100644
index 0000000000..8f88f77472
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/script.py
@@ -0,0 +1,38 @@
+import anndata as ad
+import sys
+
+## VIASH START
+
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad',
+    'jitter': 0.01,
+}
+
+meta = { 
+    'functionality': 'foo',
+    'config': 'bar'
+}
+
+## VIASH END
+sys.path.append(meta["resources_dir"])
+from utils import _perfect_embedding
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    uns='uns'
+)
+
+print('Process data...', flush=True)
+adata.obsm["X_emb"] = _perfect_embedding(
+    partition=adata.obs["label"],
+    jitter=par["jitter"]
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/control_methods/random_integration/batch_embed/config.vsh.yaml b/src/tasks/batch_integration/control_methods/random_integration/batch_embed/config.vsh.yaml
new file mode 100644
index 0000000000..d8bcee01d4
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/random_integration/batch_embed/config.vsh.yaml
@@ -0,0 +1,25 @@
+# use method api spec
+__merge__: ../../../api/comp_control_method_embedding.yaml
+functionality:
+  name: batch_embed
+  namespace: batch_integration/control_methods/random_integration
+  info:
+    label: Random integration by batch
+    summary: "Embedding coordinates are randomly permuted within each batch"
+    description: "Embedding coordinates are randomly permuted within each batch"
+    v1:
+      path: openproblems/tasks/_batch_integration/_common/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+    - path: ../../utils.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [ "midtime", "lowmem", "lowcpu"]
diff --git a/src/tasks/batch_integration/control_methods/random_integration/batch_embed/script.py b/src/tasks/batch_integration/control_methods/random_integration/batch_embed/script.py
new file mode 100644
index 0000000000..175a449a49
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/random_integration/batch_embed/script.py
@@ -0,0 +1,40 @@
+import sys
+import scanpy as sc
+
+## VIASH START
+
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad',
+}
+
+meta = { 
+    'functionality': 'foo',
+    'config': 'bar',
+    "resources_dir": "src/tasks/batch_integration/control_methods/"
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_features
+from read_anndata_partial import read_anndata
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+print("process dataset", flush=True)
+adata.obsm["X_emb"] = _randomize_features(
+    adata.obsm["X_pca"],
+    partition=adata.obs["batch"],
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/control_methods/random_integration/batch_feature/config.vsh.yaml b/src/tasks/batch_integration/control_methods/random_integration/batch_feature/config.vsh.yaml
new file mode 100644
index 0000000000..5f98284bb9
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/random_integration/batch_feature/config.vsh.yaml
@@ -0,0 +1,25 @@
+# use method api spec
+__merge__: ../../../api/comp_control_method_feature.yaml
+functionality:
+  name: batch_feature
+  namespace: batch_integration/control_methods/random_integration
+  info:
+    label: Random integration by batch
+    summary: "Feature values are randomly permuted within each batch"
+    description: "Feature values are randomly permuted within each batch"
+    v1:
+      path: openproblems/tasks/_batch_integration/_common/methods/baseline.py
+      commit: acf5c95a7306b819c4a13972783433d0a48f769b
+    preferred_normalization: log_cp10k
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+    - path: ../../utils.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [ "midtime", "lowmem", "lowcpu"]
\ No newline at end of file
diff --git a/src/tasks/batch_integration/control_methods/random_integration/batch_feature/script.py b/src/tasks/batch_integration/control_methods/random_integration/batch_feature/script.py
new file mode 100644
index 0000000000..630871e780
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/random_integration/batch_feature/script.py
@@ -0,0 +1,41 @@
+import anndata as ad
+import sys
+
+
+## VIASH START
+
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad'
+}
+
+meta = {
+    'functionality_name': 'foo',
+    'config': 'bar',
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_features
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+adata.layers['corrected_counts'] = _randomize_features(
+    adata.X,
+    partition=adata.obs["batch"],
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/control_methods/random_integration/batch_graph/config.vsh.yaml b/src/tasks/batch_integration/control_methods/random_integration/batch_graph/config.vsh.yaml
new file mode 100644
index 0000000000..72a12c5031
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/random_integration/batch_graph/config.vsh.yaml
@@ -0,0 +1,25 @@
+# use method api spec
+__merge__: ../../../api/comp_control_method_graph.yaml
+functionality:
+  name: batch_graph
+  namespace: batch_integration/control_methods/random_integration
+  info:
+    label: Random integration
+    summary: "Graph connectivity values are randomly permuted within each batch"
+    description: "Graph connectivity values are randomly permuted within each batch"
+    v1:
+      path: openproblems/tasks/_batch_integration/_common/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+    - path: ../../utils.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [ "midtime", "lowmem", "lowcpu"]
diff --git a/src/tasks/batch_integration/control_methods/random_integration/batch_graph/script.py b/src/tasks/batch_integration/control_methods/random_integration/batch_graph/script.py
new file mode 100644
index 0000000000..d5c20aa185
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/random_integration/batch_graph/script.py
@@ -0,0 +1,41 @@
+import anndata as ad
+import sys
+
+## VIASH START
+
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad'
+}
+
+meta = {
+    'functionality_name': 'foo',
+    'config': 'bar',
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_graph
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsp='obsp',
+    uns='uns'
+)
+
+print('Randomize graph...', flush=True)
+adata = _randomize_graph(
+    adata,
+    neighbors_key="knn",
+    partition=adata.obs["batch"],
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/control_methods/random_integration/celltype_embed/config.vsh.yaml b/src/tasks/batch_integration/control_methods/random_integration/celltype_embed/config.vsh.yaml
new file mode 100644
index 0000000000..b4457498c9
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/random_integration/celltype_embed/config.vsh.yaml
@@ -0,0 +1,25 @@
+# use method api spec
+__merge__: ../../../api/comp_control_method_embedding.yaml
+functionality:
+  name: celltype_embed
+  namespace: batch_integration/control_methods/random_integration
+  info:
+    label: Random embedding by cell type
+    summary: "Embedding coordinates are randomized within celltype labels"
+    description: "Embedding coordinates are randomized within celltype labels"
+    v1:
+      path: openproblems/tasks/_batch_integration/_common/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+    - path: ../../utils.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [ "midtime", "lowmem", "lowcpu"]
diff --git a/src/tasks/batch_integration/control_methods/random_integration/celltype_embed/script.py b/src/tasks/batch_integration/control_methods/random_integration/celltype_embed/script.py
new file mode 100644
index 0000000000..bf26568079
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/random_integration/celltype_embed/script.py
@@ -0,0 +1,38 @@
+import anndata as ad
+import sys
+
+## VIASH START
+
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad',
+}
+
+meta = { 
+    'functionality': 'foo',
+    'config': 'bar'
+}
+
+## VIASH END
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_features
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+print('Process data...', flush=True)
+adata.obsm["X_emb"] = _randomize_features(
+    adata.obsm["X_pca"],
+    partition=adata.obs["label"]
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/control_methods/random_integration/celltype_feature/config.vsh.yaml b/src/tasks/batch_integration/control_methods/random_integration/celltype_feature/config.vsh.yaml
new file mode 100644
index 0000000000..7c483739c2
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/random_integration/celltype_feature/config.vsh.yaml
@@ -0,0 +1,25 @@
+# use method api spec
+__merge__: ../../../api/comp_control_method_feature.yaml
+functionality:
+  name: celltype_feature
+  namespace: batch_integration/control_methods/random_integration
+  info:
+    label: Random feature by cell type
+    summary: "Features are randomized within celltype labels"
+    description: "Features are randomized within celltype labels"
+    v1:
+      path: openproblems/tasks/_batch_integration/_common/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+    - path: ../../utils.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [ "midtime", "lowmem", "lowcpu"]
diff --git a/src/tasks/batch_integration/control_methods/random_integration/celltype_feature/script.py b/src/tasks/batch_integration/control_methods/random_integration/celltype_feature/script.py
new file mode 100644
index 0000000000..9f1302df0d
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/random_integration/celltype_feature/script.py
@@ -0,0 +1,42 @@
+import sys
+import scanpy as sc
+
+## VIASH START
+
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad',
+}
+
+meta = { 
+    'functionality': 'foo',
+    'config': 'bar',
+    "resources_dir": "src/tasks/batch_integration/control_methods/"
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_features
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+print("Process data...", flush=True)
+adata.layers['corrected_counts'] = _randomize_features(
+    adata.X,
+    partition=adata.obs["label"]
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/control_methods/random_integration/celltype_graph/config.vsh.yaml b/src/tasks/batch_integration/control_methods/random_integration/celltype_graph/config.vsh.yaml
new file mode 100644
index 0000000000..6015185616
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/random_integration/celltype_graph/config.vsh.yaml
@@ -0,0 +1,25 @@
+# use method api spec
+__merge__: ../../../api/comp_control_method_graph.yaml
+functionality:
+  name: celltype_graph
+  namespace: batch_integration/control_methods/random_integration
+  info:
+    label: Random graph by cell type
+    summary: "Graph connectivities are randomized within celltype labels"
+    description: "Graph connectivities are randomized within celltype labels"
+    v1:
+      path: openproblems/tasks/_batch_integration/_common/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+    - path: ../../utils.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [ "midtime", "lowmem", "lowcpu"]
diff --git a/src/tasks/batch_integration/control_methods/random_integration/celltype_graph/script.py b/src/tasks/batch_integration/control_methods/random_integration/celltype_graph/script.py
new file mode 100644
index 0000000000..3634d55dbd
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/random_integration/celltype_graph/script.py
@@ -0,0 +1,41 @@
+import sys
+import scanpy as sc
+
+## VIASH START
+
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad',
+}
+
+meta = { 
+    'functionality': 'foo',
+    'config': 'bar',
+    "resources_dir": "src/tasks/batch_integration/control_methods/"
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_graph
+from read_anndata_partial import read_anndata
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsp='obsp',
+    uns='uns'
+)
+
+print("Process data...", flush=True)
+adata = _randomize_graph(
+    adata,
+    neighbors_key="knn",
+    partition=adata.obs["label"],
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/control_methods/random_integration/global_embed/config.vsh.yaml b/src/tasks/batch_integration/control_methods/random_integration/global_embed/config.vsh.yaml
new file mode 100644
index 0000000000..0343c37817
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/random_integration/global_embed/config.vsh.yaml
@@ -0,0 +1,25 @@
+# use method api spec
+__merge__: ../../../api/comp_control_method_embedding.yaml
+functionality:
+  name: global_embed
+  namespace: batch_integration/control_methods/random_integration
+  info:
+    label: Random integration
+    summary: "Embedding coordinates are randomly permuted"
+    description: "Embedding coordinates are randomly permuted"
+    v1:
+      path: openproblems/tasks/_batch_integration/_common/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+    - path: ../../utils.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [ "midtime", "lowmem", "lowcpu"]
diff --git a/src/tasks/batch_integration/control_methods/random_integration/global_embed/script.py b/src/tasks/batch_integration/control_methods/random_integration/global_embed/script.py
new file mode 100644
index 0000000000..ca626600b8
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/random_integration/global_embed/script.py
@@ -0,0 +1,37 @@
+import sys
+import scanpy as sc
+
+## VIASH START
+
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad',
+}
+
+meta = { 
+    'functionality': 'foo',
+    'config': 'bar',
+    "resources_dir": "src/tasks/batch_integration/control_methods/"
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_features
+from read_anndata_partial import read_anndata
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+print("process dataset", flush=True)
+adata.obsm["X_emb"] = _randomize_features(adata.obsm["X_pca"])
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/control_methods/random_integration/global_feature/config.vsh.yaml b/src/tasks/batch_integration/control_methods/random_integration/global_feature/config.vsh.yaml
new file mode 100644
index 0000000000..f49ee146a1
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/random_integration/global_feature/config.vsh.yaml
@@ -0,0 +1,25 @@
+# use method api spec
+__merge__: ../../../api/comp_control_method_feature.yaml
+functionality:
+  name: global_feature
+  namespace: batch_integration/control_methods/random_integration
+  info:
+    label: Random integration
+    summary: "Feature values are randomly permuted"
+    description: "Feature values are randomly permuted"
+    v1:
+      path: openproblems/tasks/_batch_integration/_common/methods/baseline.py
+      commit: acf5c95a7306b819c4a13972783433d0a48f769b
+    preferred_normalization: log_cp10k
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+    - path: ../../utils.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [ "midtime", "lowmem", "lowcpu"]
\ No newline at end of file
diff --git a/src/tasks/batch_integration/control_methods/random_integration/global_feature/script.py b/src/tasks/batch_integration/control_methods/random_integration/global_feature/script.py
new file mode 100644
index 0000000000..c74c7d2a5e
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/random_integration/global_feature/script.py
@@ -0,0 +1,37 @@
+import anndata as ad
+import sys
+
+
+## VIASH START
+
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad'
+}
+
+meta = {
+    'functionality_name': 'foo',
+    'config': 'bar',
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_features
+from read_anndata_partial import read_anndata
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+adata.layers['corrected_counts'] = _randomize_features(adata.X)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/control_methods/random_integration/global_graph/config.vsh.yaml b/src/tasks/batch_integration/control_methods/random_integration/global_graph/config.vsh.yaml
new file mode 100644
index 0000000000..1b92cbc70a
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/random_integration/global_graph/config.vsh.yaml
@@ -0,0 +1,25 @@
+# use method api spec
+__merge__: ../../../api/comp_control_method_graph.yaml
+functionality:
+  name: global_graph
+  namespace: batch_integration/control_methods/random_integration
+  info:
+    label: Random integration
+    summary: "Graph connectivity values are randomly permuted"
+    description: "Graph connectivity values are randomly permuted"
+    v1:
+      path: openproblems/tasks/_batch_integration/_common/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+    - path: ../../utils.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [ "midtime", "lowmem", "lowcpu"]
diff --git a/src/tasks/batch_integration/control_methods/random_integration/global_graph/script.py b/src/tasks/batch_integration/control_methods/random_integration/global_graph/script.py
new file mode 100644
index 0000000000..cd4d64f043
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/random_integration/global_graph/script.py
@@ -0,0 +1,37 @@
+import anndata as ad
+import sys
+
+## VIASH START
+
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad'
+}
+
+meta = {
+    'functionality_name': 'foo',
+    'config': 'bar',
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_graph
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsp='obsp',
+    uns='uns'
+)
+
+print('Randomize graph...', flush=True)
+adata = _randomize_graph(adata, neighbors_key="knn")
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/control_methods/utils.py b/src/tasks/batch_integration/control_methods/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/src/tasks/batch_integration/control_methods/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/src/tasks/batch_integration/methods/bbknn/config.vsh.yaml b/src/tasks/batch_integration/methods/bbknn/config.vsh.yaml
new file mode 100644
index 0000000000..8eff37339f
--- /dev/null
+++ b/src/tasks/batch_integration/methods/bbknn/config.vsh.yaml
@@ -0,0 +1,51 @@
+# use method api spec
+__merge__: ../../api/comp_method_graph.yaml
+functionality:
+  name: bbknn
+  info:
+    label: BBKNN
+    summary: "BBKNN creates k nearest neighbours graph by identifying neighbours within batches, then combining and processing them with UMAP for visualization."
+    description: |
+      "BBKNN or batch balanced k nearest neighbours graph is built for each cell by
+      identifying its k nearest neighbours within each defined batch separately,
+      creating independent neighbour sets for each cell in each batch. These sets
+      are then combined and processed with the UMAP algorithm for visualisation."
+    reference: "polanski2020bbknn"
+    repository_url: "https://github.com/Teichlab/bbknn"
+    documentation_url: "https://github.com/Teichlab/bbknn#readme"
+    v1:
+      path: openproblems/tasks/_batch_integration/batch_integration_graph/methods/bbknn.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+    variants:
+      bbknn_full_unscaled:
+      bbknn_full_scaled:
+        preferred_normalization: log_cp10k_scaled
+  arguments:
+    - name: --annoy_n_trees
+      type: integer
+      default: 10
+      description: Number of trees to use in the annoy forrest.
+    - name: --neighbors_within_batch
+      type: integer
+      default: 3
+      description: Number of neighbors to report within each batch.
+    - name: --n_hvg
+      type: integer
+      default: 2000
+      description: Number of highly variable genes to use.
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - bbknn
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
diff --git a/src/tasks/batch_integration/methods/bbknn/script.py b/src/tasks/batch_integration/methods/bbknn/script.py
new file mode 100644
index 0000000000..1496fda0bb
--- /dev/null
+++ b/src/tasks/batch_integration/methods/bbknn/script.py
@@ -0,0 +1,63 @@
+import sys
+import anndata as ad
+import scanpy as sc
+import bbknn
+
+## VIASH START
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad',
+    'annoy_n_trees': 10,
+    'neighbors_within_batch': 3,
+    'n_hvg': 2000,
+}
+meta = {
+    'functionality_name': 'foo',
+    'config': 'bar'
+}
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+if par['n_hvg']:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var['hvg_score'].to_numpy().argsort()[::-1][:par['n_hvg']]
+    adata = adata[:, idx].copy()
+    sc.pp.pca(adata)
+
+print('Run BBKNN', flush=True)
+kwargs = dict(batch_key='batch', copy=True)
+kwargs['annoy_n_trees'] = par['annoy_n_trees']
+kwargs['neighbors_within_batch'] = par['neighbors_within_batch']
+
+ad_bbknn = bbknn.bbknn(adata, **kwargs)
+
+print("Store output", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    obsp={
+        'connectivities': ad_bbknn.obsp['connectivities'],
+        'distances': ad_bbknn.obsp['distances'],
+    },
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+        'neighbors': ad_bbknn.uns['neighbors']
+    }
+)
+
+print("Store outputs", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/methods/combat/config.vsh.yaml b/src/tasks/batch_integration/methods/combat/config.vsh.yaml
new file mode 100644
index 0000000000..f94333627d
--- /dev/null
+++ b/src/tasks/batch_integration/methods/combat/config.vsh.yaml
@@ -0,0 +1,42 @@
+# use method api spec
+__merge__: ../../api/comp_method_feature.yaml
+functionality:
+  name: combat
+  info:
+    label: Combat
+    summary: "Adjusting batch effects in microarray expression data using
+      empirical Bayes methods"
+    description: |
+      "An Empirical Bayes (EB) approach to correct for batch effects. It
+      estimates batch-specific parameters by pooling information across genes in
+      each batch and shrinks the estimates towards the overall mean of the batch
+      effect estimates across all genes. These parameters are then used to adjust
+      the data for batch effects, leading to more accurate and reproducible
+      results."
+    reference: "hansen2012removing"
+    repository_url: "https://scanpy.readthedocs.io/en/stable/api/scanpy.pp.combat.html"
+    documentation_url: "https://scanpy.readthedocs.io/en/stable/api/scanpy.pp.combat.html"
+    v1:
+      path: openproblems/tasks/_batch_integration/batch_integration_graph/methods/combat.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+    variants:
+      combat_full_unscaled:
+      combat_full_scaled:
+        preferred_normalization: log_cp10k_scaled
+  arguments:
+    - name: --n_hvg
+      type: integer
+      default: 2000
+      description: Number of highly variable genes to use.
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, lowcpu]
diff --git a/src/tasks/batch_integration/methods/combat/script.py b/src/tasks/batch_integration/methods/combat/script.py
new file mode 100644
index 0000000000..9f282efb9c
--- /dev/null
+++ b/src/tasks/batch_integration/methods/combat/script.py
@@ -0,0 +1,57 @@
+import sys
+import scanpy as sc
+from scipy.sparse import csr_matrix
+
+## VIASH START
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad',
+    'n_hvg': 2000,
+}
+
+meta = {
+    'functionality_name': 'foo',
+    'config': 'bar'
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+if par['n_hvg']:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var['hvg_score'].to_numpy().argsort()[::-1][:par['n_hvg']]
+    adata = adata[:, idx].copy()
+
+
+print('Run Combat', flush=True)
+adata.X = sc.pp.combat(adata, key='batch', inplace=False)
+
+
+print("Store output", flush=True)
+output = sc.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+    },
+    layers={
+        'corrected_counts': csr_matrix(adata.X),
+    }
+)
+
+print("Store outputs", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/methods/fastmnn_embedding/config.vsh.yaml b/src/tasks/batch_integration/methods/fastmnn_embedding/config.vsh.yaml
new file mode 100644
index 0000000000..cd885da3cd
--- /dev/null
+++ b/src/tasks/batch_integration/methods/fastmnn_embedding/config.vsh.yaml
@@ -0,0 +1,36 @@
+# use method api spec
+__merge__: ../../api/comp_method_embedding.yaml
+functionality:
+  name: fastmnn_embedding
+  info:
+    label: fastMnn (embedding)
+    summary: "A simpler version of the original mnnCorrect algorithm."
+    description: |
+      The fastMNN() approach is much simpler than the original mnnCorrect() algorithm, and proceeds in several steps.
+
+      1. Perform a multi-sample PCA on the (cosine-)normalized expression values to reduce dimensionality.
+      2. Identify MNN pairs in the low-dimensional space between a reference batch and a target batch.
+      3. Remove variation along the average batch vector in both reference and target batches.
+      4. Correct the cells in the target batch towards the reference, using locally weighted correction vectors.
+      5. Merge the corrected target batch with the reference, and repeat with the next target batch.
+
+    reference: "haghverdi2018batch"
+    repository_url: "https://code.bioconductor.org/browse/batchelor/"
+    documentation_url: "https://bioconductor.org/packages/batchelor/"
+    preferred_normalization: log_cp10k
+    v1:
+      path: openproblems/tasks/_batch_integration/batch_integration_graph/methods/fastmnn.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+  resources:
+    - type: r_script
+      path: ../fastmnn_feature/script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        bioc:
+          - batchelor
+  - type: nextflow
+    directives:
+      label: [midtime, lowcpu, highmem]
diff --git a/src/tasks/batch_integration/methods/fastmnn_feature/config.vsh.yaml b/src/tasks/batch_integration/methods/fastmnn_feature/config.vsh.yaml
new file mode 100644
index 0000000000..e28406eb54
--- /dev/null
+++ b/src/tasks/batch_integration/methods/fastmnn_feature/config.vsh.yaml
@@ -0,0 +1,34 @@
+__merge__: ../../api/comp_method_feature.yaml
+functionality:
+  name: fastmnn_feature
+  info:
+    label: fastMnn (feature)
+    summary: "A simpler version of the original mnnCorrect algorithm."
+    description: |
+      The fastMNN() approach is much simpler than the original mnnCorrect() algorithm, and proceeds in several steps.
+
+      1. Perform a multi-sample PCA on the (cosine-)normalized expression values to reduce dimensionality.
+      2. Identify MNN pairs in the low-dimensional space between a reference batch and a target batch.
+      3. Remove variation along the average batch vector in both reference and target batches.
+      4. Correct the cells in the target batch towards the reference, using locally weighted correction vectors.
+      5. Merge the corrected target batch with the reference, and repeat with the next target batch.
+
+    reference: "haghverdi2018batch"
+    repository_url: "https://code.bioconductor.org/browse/batchelor/"
+    documentation_url: "https://bioconductor.org/packages/batchelor/"
+    preferred_normalization: log_cp10k
+    v1:
+      path: openproblems/tasks/_batch_integration/batch_integration_graph/methods/fastmnn.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        bioc: batchelor
+  - type: nextflow
+    directives:
+      label: [midtime, lowcpu, highmem]
diff --git a/src/tasks/batch_integration/methods/fastmnn_feature/script.R b/src/tasks/batch_integration/methods/fastmnn_feature/script.R
new file mode 100644
index 0000000000..dbccd52d29
--- /dev/null
+++ b/src/tasks/batch_integration/methods/fastmnn_feature/script.R
@@ -0,0 +1,51 @@
+cat("Loading dependencies\n")
+suppressPackageStartupMessages({
+  requireNamespace("anndata", quietly = TRUE)
+  library(Matrix, warn.conflicts = FALSE)
+  requireNamespace("batchelor", quietly = TRUE)
+  library(SingleCellExperiment, warn.conflicts = FALSE)
+})
+## VIASH START
+par <- list(
+  input = 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+  output = 'output.h5ad'
+)
+meta <- list(
+  functionality_name = "mnn_correct_feature"
+)
+## VIASH END
+
+cat("Read input\n")
+adata <- anndata::read_h5ad(par$input)
+
+# TODO: pass output of 'multiBatchNorm' to fastMNN
+
+cat("Run mnn\n")
+out <- suppressWarnings(batchelor::fastMNN(
+  t(adata$layers[["normalized"]]),
+  batch = adata$obs[["batch"]]
+))
+
+cat("Reformat output\n")
+# reusing the same script for fastmnn_embed and fastmnn_feature
+return_type <- gsub("fastmnn_", "", meta[["functionality_name"]])
+
+output <- anndata::AnnData(
+  shape = adata$shape,
+  uns = list(
+    dataset_id = adata$uns[["dataset_id"]],
+    normalization_id = adata$uns[["normalization_id"]],
+    method_id = meta$functionality_name
+  )
+)
+
+if (return_type == "feature") {
+  layer <- as(SummarizedExperiment::assay(out, "reconstructed"), "sparseMatrix")
+  output$layers[["corrected_counts"]] <- t(layer)
+} else if (return_type == "embedding") {
+  obsm <- SingleCellExperiment::reducedDim(out, "corrected")
+  output$obsm[["X_emb"]] <- obsm
+}
+
+cat("Write output to file\n")
+zzz <- output$write_h5ad(par$output, compression = "gzip")
diff --git a/src/tasks/batch_integration/methods/liger/config.vsh.yaml b/src/tasks/batch_integration/methods/liger/config.vsh.yaml
new file mode 100644
index 0000000000..4c638d467b
--- /dev/null
+++ b/src/tasks/batch_integration/methods/liger/config.vsh.yaml
@@ -0,0 +1,31 @@
+# use method api spec
+__merge__: ../../api/comp_method_embedding.yaml
+functionality:
+  name: liger
+  info:
+    label: LIGER
+    summary: Linked Inference of Genomic Experimental Relationships
+    description: |
+      LIGER or linked inference of genomic experimental relationships uses iNMF 
+      deriving and implementing a novel coordinate descent algorithm to efficiently 
+      do the factorization. Joint clustering is performed and factor loadings are 
+      normalised.
+    reference: welch2019single
+    repository_url: https://github.com/welch-lab/liger
+    documentation_url: https://github.com/welch-lab/liger
+    preferred_normalization: log_cp10k
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: apt
+        packages: cmake
+      - type: r
+        cran: rliger
+        github: welch-lab/RcppPlanc
+  - type: nextflow
+    directives:
+      label: [lowcpu, highmem, midtime]
diff --git a/src/tasks/batch_integration/methods/liger/script.R b/src/tasks/batch_integration/methods/liger/script.R
new file mode 100644
index 0000000000..b7159063ff
--- /dev/null
+++ b/src/tasks/batch_integration/methods/liger/script.R
@@ -0,0 +1,108 @@
+cat(">> Load dependencies\n")
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("rliger", quietly = TRUE)
+
+## VIASH START
+par <- list(
+  input = "resources_test/batch_integration/pancreas/dataset.h5ad",
+  output = "output.h5ad"
+)
+meta <- list(
+  functionality_name = "liger"
+)
+## VIASH END
+
+cat("Read input\n")
+adata <- anndata::read_h5ad(par$input)
+
+anndataToLiger <- function(adata) {
+  # fetch batch names
+  batch <- adata$obs$batch
+  batch_names <- as.character(unique(batch))
+
+  # restructure data
+  raw_data <- lapply(batch_names, function(batch_name) {
+    Matrix::t(adata$layers[["counts"]][batch == batch_name, , drop = FALSE])
+  })
+  names(raw_data) <- batch_names
+
+  rliger::createLiger(rawData = raw_data, removeMissing = FALSE)
+}
+
+addNormalizedDataToLiger <- function(adata, lobj) {
+  norm_data <- lapply(names(rliger::rawData(lobj)), function(name) {
+    norm <- adata$layers[["normalized"]]
+
+    # subset
+    col_names <- colnames(rliger::rawData(lobj)[[name]])
+    row_names <- rownames(rliger::rawData(lobj)[[name]])
+    prefix <- paste0(name, "_")
+    col_names <- sub(prefix, "", col_names)
+
+    norm <- norm[
+      col_names,
+      row_names,
+      drop = FALSE
+    ]
+
+    # add prefix
+    rownames(norm) <- paste0(prefix, rownames(norm))
+
+    # transpose
+    norm <- Matrix::t(norm)
+
+    # turn into dgcMatrix
+    as(as(norm, "denseMatrix"), "CsparseMatrix")
+  })
+  names(norm_data) <- names(rliger::rawData(lobj))
+
+  for (name in names(rliger::rawData(lobj))) {
+    lobj@datasets[[name]]@normData <- norm_data[[name]]
+  }
+
+  lobj
+}
+
+cat(">> Create Liger Data object\n")
+lobj <- anndataToLiger(adata)
+
+cat(">> Normalize data\n")
+lobj <- addNormalizedDataToLiger(adata, lobj)
+
+# could also use the rliger normalization instead
+# lobj <- rliger::normalize(lobj)
+
+cat(">> Select genes\n")
+# lobj <- rliger::selectGenes(lobj)
+# overwrite gene selection to include all genes
+lobj@varFeatures <- adata$var_names
+
+cat(">> Perform scaling\n")
+lobj <- rliger::scaleNotCenter(lobj, removeMissing = FALSE)
+
+cat(">> Joint Matrix Factorization\n")
+lobj <- rliger::runIntegration(lobj, k = 20)
+
+cat(">> Quantile normalization\n")
+lobj <- rliger::quantileNorm(lobj)
+
+cat(">> Store output\n")
+# remove dataset names from rownames
+for (name in names(rliger::rawData(lobj))) {
+  rownames(lobj@H.norm) <- sub(paste0(name, "_"), "", rownames(lobj@H.norm))
+}
+
+output <- anndata::AnnData(
+  uns = list(
+    dataset_id = adata$uns[["dataset_id"]],
+    normalization_id = adata$uns[["normalization_id"]],
+    method_id = meta$functionality_name
+  ),
+  obsm = list(
+    X_emb = lobj@H.norm[rownames(adata), , drop = FALSE]
+  ),
+  shape = adata$shape
+)
+
+cat(">> Write AnnData to file\n")
+zzz <- output$write_h5ad(par$output, compression = "gzip")
diff --git a/src/tasks/batch_integration/methods/mnn_correct/config.vsh.yaml b/src/tasks/batch_integration/methods/mnn_correct/config.vsh.yaml
new file mode 100644
index 0000000000..1c999fa540
--- /dev/null
+++ b/src/tasks/batch_integration/methods/mnn_correct/config.vsh.yaml
@@ -0,0 +1,27 @@
+# use method api spec
+__merge__: ../../api/comp_method_feature.yaml
+functionality:
+  name: mnn_correct
+  info:
+    label: mnnCorrect
+    summary: "Correct for batch effects in single-cell expression data using the mutual nearest neighbors method."
+    description: |
+      We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space.
+      Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches.
+    reference: "haghverdi2018batch"
+    repository_url: "https://code.bioconductor.org/browse/batchelor/"
+    documentation_url: "https://bioconductor.org/packages/batchelor/"
+    preferred_normalization: log_cp10k
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        bioc:
+          - batchelor
+  - type: nextflow
+    directives:
+      label: [midtime, lowcpu, highmem]
diff --git a/src/tasks/batch_integration/methods/mnn_correct/script.R b/src/tasks/batch_integration/methods/mnn_correct/script.R
new file mode 100644
index 0000000000..0e6dfa2606
--- /dev/null
+++ b/src/tasks/batch_integration/methods/mnn_correct/script.R
@@ -0,0 +1,47 @@
+cat("Loading dependencies\n")
+suppressPackageStartupMessages({
+  requireNamespace("anndata", quietly = TRUE)
+  library(Matrix, warn.conflicts = FALSE)
+  requireNamespace("batchelor", quietly = TRUE)
+  library(SingleCellExperiment, warn.conflicts = FALSE)
+})
+## VIASH START
+par <- list(
+  input = 'resources_test/batch_integration/pancreas/dataset.h5ad',
+  output = 'output.h5ad'
+)
+meta <- list(
+  functionality_name = "mnn_correct_feature"
+)
+## VIASH END
+
+cat("Read input\n")
+adata <- anndata::read_h5ad(par$input)
+
+cat("Run mnn\n")
+out <- suppressWarnings(batchelor::mnnCorrect(
+  t(adata$layers[["normalized"]]),
+  batch = adata$obs[["batch"]]
+))
+
+cat("Reformat output\n")
+layer <- SummarizedExperiment::assay(out, "corrected")
+as(t(layer), "sparseMatrix")
+
+
+
+cat("Store outputs\n")
+output <- anndata::AnnData(
+  uns = list(
+    dataset_id = adata$uns[["dataset_id"]],
+    normalization_id = adata$uns[["normalization_id"]],
+    method_id = meta$functionality_name
+  ),
+  layers = list(
+    corrected_counts = as(t(layer), "sparseMatrix")
+  ),
+  shape = adata$shape
+)
+
+cat("Write output to file\n")
+zzz <- output$write_h5ad(par$output, compression = "gzip")
diff --git a/src/tasks/batch_integration/methods/mnnpy/config.vsh.yaml b/src/tasks/batch_integration/methods/mnnpy/config.vsh.yaml
new file mode 100644
index 0000000000..2c5075534b
--- /dev/null
+++ b/src/tasks/batch_integration/methods/mnnpy/config.vsh.yaml
@@ -0,0 +1,52 @@
+# use method api spec
+__merge__: ../../api/comp_method_feature.yaml
+functionality:
+  name: mnnpy
+  info:
+    label: mnnpy
+    summary: "Batch effect correction by matching mutual nearest neighbors, Python implementation."
+    description: |
+      An implementation of MNN correct in python featuring low memory usage, full multicore support and compatibility with the scanpy framework.
+
+      Batch effect correction by matching mutual nearest neighbors (Haghverdi et al, 2018) has been implemented as a function 'mnnCorrect' in the R package scran. Sadly it's extremely slow for big datasets and doesn't make full use of the parallel architecture of modern CPUs.
+
+      This project is a python implementation of the MNN correct algorithm which takes advantage of python's extendability and hackability. It seamlessly integrates with the scanpy framework and has multicore support in its bones.
+    reference: "hie2019efficient"
+    repository_url: "https://github.com/chriscainx/mnnpy"
+    documentation_url: "https://github.com/chriscainx/mnnpy#readme"
+    v1:
+      path: openproblems/tasks/_batch_integration/batch_integration_graph/methods/mnn.py
+      commit: 29803b95c88b4ec5921df2eec7111fd5d1a95daf
+    preferred_normalization: log_cp10k
+    variants:
+      mnn_full_unscaled:
+      mnn_full_scaled:
+        preferred_normalization: log_cp10k_scaled
+  arguments:
+    - name: --n_hvg
+      type: integer
+      default: 2000
+      description: Number of highly variable genes to use.
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  # Due to a [ gcc-8 ] dependency in the mnnpy package, we need to use a python:3.8 image
+  - type: docker
+    image: python:3.8
+    setup:
+      - type: apt
+        packages:
+          - procps
+      - type: python
+        pypi:
+          - anndata~=0.8.0
+          - scanpy
+          - pyyaml
+          - requests
+          - jsonschema
+        github:
+          - chriscainx/mnnpy
+  - type: nextflow
+    directives:
+      label: [ midtime, lowcpu, lowmem ]
diff --git a/src/tasks/batch_integration/methods/mnnpy/script.py b/src/tasks/batch_integration/methods/mnnpy/script.py
new file mode 100644
index 0000000000..1551573650
--- /dev/null
+++ b/src/tasks/batch_integration/methods/mnnpy/script.py
@@ -0,0 +1,55 @@
+import anndata as ad
+import mnnpy
+
+## VIASH START
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad',
+    'n_hvg': 2000,
+}
+meta = {
+    'functionality_name': 'foo',
+    'config': 'bar'
+}
+## VIASH END
+
+print('Read input', flush=True)
+adata = ad.read_h5ad(par['input'])
+adata.X = adata.layers['normalized']
+del adata.layers['normalized']
+del adata.layers['counts']
+
+if par['n_hvg']:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var['hvg_score'].to_numpy().argsort()[::-1][:par['n_hvg']]
+    adata = adata[:, idx].copy()
+
+print('Run mnn', flush=True)
+split = []
+batch_categories = adata.obs['batch'].cat.categories
+for i in batch_categories:
+    split.append(adata[adata.obs['batch'] == i].copy())
+corrected, _, _ = mnnpy.mnn_correct(
+        *split,
+        batch_key='batch',
+        batch_categories=batch_categories,
+        index_unique=None
+    )
+
+print("Store outputs", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+    },
+    layers={
+        'corrected_counts': corrected.X,
+    }
+)
+
+
+print("Store outputs", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/methods/pyliger/config.vsh.yaml b/src/tasks/batch_integration/methods/pyliger/config.vsh.yaml
new file mode 100644
index 0000000000..cf16b2e684
--- /dev/null
+++ b/src/tasks/batch_integration/methods/pyliger/config.vsh.yaml
@@ -0,0 +1,37 @@
+# use method api spec
+__merge__: ../../api/comp_method_embedding.yaml
+functionality:
+  name: pyliger
+  info:
+    label: pyliger
+    summary: Python implementation of LIGER (Linked Inference of Genomic Experimental Relationships
+    description: |
+      LIGER (installed as rliger) is a package for integrating and analyzing multiple 
+      single-cell datasets, developed by the Macosko lab and maintained/extended by the 
+      Welch lab. It relies on integrative non-negative matrix factorization to identify 
+      shared and dataset-specific factors.
+    reference: welch2019single
+    repository_url: https://github.com/welch-lab/pyliger
+    documentation_url: https://github.com/welch-lab/pyliger
+    preferred_normalization: log_cp10k
+    variants:
+      liger_unscaled:
+      liger_scaled:
+        preferred_normalization: log_cp10k_scaled
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - umap-learn[plot]
+          - pyliger
+          - dask-expr
+  - type: nextflow
+    directives:
+      label: [lowcpu, highmem, midtime]
diff --git a/src/tasks/batch_integration/methods/pyliger/script.py b/src/tasks/batch_integration/methods/pyliger/script.py
new file mode 100644
index 0000000000..2066e6965b
--- /dev/null
+++ b/src/tasks/batch_integration/methods/pyliger/script.py
@@ -0,0 +1,86 @@
+import sys
+import anndata as ad
+import numpy as np
+import pyliger
+
+## VIASH START
+par = {
+    'input': 'resources_test/batch_integration/pancreas/dataset.h5ad',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'pyliger'
+}
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('>> Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/counts',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+adata.layers['norm_data'] = read_anndata(par['input'], X='layers/normalized').X
+
+print('>> Prepare data', flush=True)
+adata_per_batch = []
+for batch in adata.obs['batch'].unique():
+  adb = adata[adata.obs['batch'] == batch].copy()
+  
+  # save row sum and sum of squares for further use
+  norm_sum = np.ravel(np.sum(adb.layers["norm_data"], axis=0))
+  norm_sum_sq = np.ravel(np.sum(adb.layers["norm_data"].power(2), axis=0))
+  adb.var["norm_sum"] = norm_sum
+  adb.var["norm_sum_sq"] = norm_sum_sq
+  adb.var["norm_mean"] = norm_sum / adb.shape[0]
+
+  # set more metadata
+  adb.obs.index.name = 'cell_barcode'
+  adb.var.index.name = 'gene_id'
+  adb.uns['sample_name'] = batch
+
+  # append to list
+  adata_per_batch.append(adb)
+
+print('Create liger object', flush=True)
+lobj = pyliger.create_liger(
+  adata_per_batch,
+  remove_missing=False
+)
+
+# do not select genes
+lobj.var_genes = adata.var_names
+
+print('>> Scaling', flush=True)
+pyliger.scale_not_center(lobj, remove_missing=False)
+
+print('>> Optimize ALS', flush=True)
+pyliger.optimize_ALS(lobj, k=20)
+
+print('>> Quantile normalization', flush=True)
+pyliger.quantile_norm(lobj)
+
+print('>> Concatenate outputs', flush=True)
+ad_out = ad.concat(lobj.adata_list)
+
+print('Store output', flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+        obsm={
+        'X_emb': ad_out[adata.obs_names, :].obsm['H_norm']
+    },
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/methods/scalex_embed/config.vsh.yaml b/src/tasks/batch_integration/methods/scalex_embed/config.vsh.yaml
new file mode 100644
index 0000000000..3437df19c9
--- /dev/null
+++ b/src/tasks/batch_integration/methods/scalex_embed/config.vsh.yaml
@@ -0,0 +1,41 @@
+__merge__: ../../api/comp_method_embedding.yaml
+functionality:
+  name: scalex_embed
+  info:
+    label: SCALEX (embedding)
+    summary: Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space
+    description : |
+      SCALEX is a method for integrating heterogeneous single-cell data online using a VAE framework. Its generalised encoder disentangles batch-related components from batch-invariant biological components, which are then projected into a common cell-embedding space.
+    reference: xiong2021online
+    repository_url: https://github.com/jsxlei/SCALEX
+    documentation_url: https://scalex.readthedocs.io
+    v1:
+      path: openproblems/tasks/_batch_integration/batch_integration_graph/methods/scalex.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+    variants:
+      scalex_feature_unscaled:
+      scanorama_feature_scaled:
+        preferred_normalization: log_cp10k_scaled
+  arguments:
+    - name: --n_hvg
+      type: integer
+      default: 2000
+      description: Number of highly variable genes to use.
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - scalex
+          - numpy<1.24
+          - torch<2.1
+  - type: nextflow
+    directives:
+      label: [lowmem, lowcpu, midtime]
diff --git a/src/tasks/batch_integration/methods/scalex_embed/script.py b/src/tasks/batch_integration/methods/scalex_embed/script.py
new file mode 100644
index 0000000000..9974eba4b3
--- /dev/null
+++ b/src/tasks/batch_integration/methods/scalex_embed/script.py
@@ -0,0 +1,70 @@
+import sys
+import anndata as ad
+import scalex
+
+## VIASH START
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad',
+    'hvg': True,
+}
+meta = {
+    'functionality_name' : 'foo',
+    'config': 'bar'
+}
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+
+if par['n_hvg']:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var['hvg_score'].to_numpy().argsort()[::-1][:par['n_hvg']]
+    adata = adata[:, idx].copy()
+
+print('Run SCALEX', flush=True)
+adata = scalex.SCALEX(
+    adata,
+    batch_key="batch",
+    ignore_umap=True,
+    impute=adata.obs["batch"].cat.categories[0],
+    processed=True,
+    max_iteration=40,
+    min_features=None,
+    min_cells=None,
+    n_top_features=0,
+    outdir=None,
+    gpu=0,
+)
+adata.obsm["X_emb"] = adata.obsm["latent"]
+
+print("Store outputs", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    layers={
+        'corrected_counts': adata.layers["impute"],
+    },
+    obsm={
+        'X_emb': adata.obsm['latent'],
+    },
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/methods/scalex_feature/config.vsh.yaml b/src/tasks/batch_integration/methods/scalex_feature/config.vsh.yaml
new file mode 100644
index 0000000000..1874bc190e
--- /dev/null
+++ b/src/tasks/batch_integration/methods/scalex_feature/config.vsh.yaml
@@ -0,0 +1,41 @@
+__merge__: ../../api/comp_method_feature.yaml
+functionality:
+  name: scalex_feature
+  info:
+    label: SCALEX (feature)
+    summary: Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space
+    description : |
+      SCALEX is a method for integrating heterogeneous single-cell data online using a VAE framework. Its generalised encoder disentangles batch-related components from batch-invariant biological components, which are then projected into a common cell-embedding space.
+    reference: xiong2021online
+    repository_url: https://github.com/jsxlei/SCALEX
+    documentation_url: https://scalex.readthedocs.io
+    v1:
+      path: openproblems/tasks/_batch_integration/batch_integration_graph/methods/scalex.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+    variants:
+      scalex_feature_unscaled:
+      scanorama_feature_scaled:
+        preferred_normalization: log_cp10k_scaled
+  arguments:
+    - name: --n_hvg
+      type: integer
+      default: 2000
+      description: Number of highly variable genes to use.
+  resources:
+    - type: python_script
+      path: ../scalex_embed/script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - scalex
+          - numpy<1.24
+          - torch<2.1
+  - type: nextflow
+    directives:
+      label: [lowmem, lowcpu, midtime]
diff --git a/src/tasks/batch_integration/methods/scanorama_embed/config.vsh.yaml b/src/tasks/batch_integration/methods/scanorama_embed/config.vsh.yaml
new file mode 100644
index 0000000000..b5dcd8f54a
--- /dev/null
+++ b/src/tasks/batch_integration/methods/scanorama_embed/config.vsh.yaml
@@ -0,0 +1,41 @@
+# use method api spec
+__merge__: ../../api/comp_method_embedding.yaml
+functionality:
+  name: scanorama_embed
+  info:
+    label: Scanorama (embedding)
+    summary: "Efficient integration of heterogeneous single-cell 
+      transcriptomes using Scanorama"
+    description: |
+      "Scanorama is an extension of the MNN method. Other then MNN, it finds mutual nearest neighbours over all batches and embeds observations into a joint hyperplane."
+    reference: "hie2019efficient"
+    repository_url: "https://github.com/brianhie/scanorama"
+    documentation_url: "https://github.com/brianhie/scanorama#readme"
+    v1:
+      path: openproblems/tasks/_batch_integration/batch_integration_graph/methods/scanorama.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+    variants:
+      scanorama_embed_full_unscaled:
+      scanorama_embed_full_scaled:
+        preferred_normalization: log_cp10k_scaled
+  arguments:
+    - name: --n_hvg
+      type: integer
+      default: 2000
+      description: Number of highly variable genes to use.
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - scanorama
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
\ No newline at end of file
diff --git a/src/tasks/batch_integration/methods/scanorama_embed/script.py b/src/tasks/batch_integration/methods/scanorama_embed/script.py
new file mode 100644
index 0000000000..db12b458d5
--- /dev/null
+++ b/src/tasks/batch_integration/methods/scanorama_embed/script.py
@@ -0,0 +1,87 @@
+import sys
+import anndata as ad
+import scanorama
+
+## VIASH START
+par = {
+    'input': 'resources_test/batch_integration/pancreas/unintegrated.h5ad',
+    'output': 'output.h5ad',
+    'n_hvg': 2000,
+}
+meta = {
+    'functionality_name': 'foo',
+    'config': 'bar'
+}
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+# based on scib
+# -> https://github.com/theislab/scib/blob/59ae6eee5e611d9d3db067685ec96c28804e9127/scib/utils.py#L51C1-L72C62
+def merge_adata(*adata_list, **kwargs):
+    """Merge adatas from list while remove duplicated ``obs`` and ``var`` columns
+
+    :param adata_list: ``anndata`` objects to be concatenated
+    :param kwargs: arguments to be passed to ``anndata.AnnData.concatenate``
+    """
+
+    if len(adata_list) == 1:
+        return adata_list[0]
+
+    # Make sure that adatas do not contain duplicate columns
+    for _adata in adata_list:
+        for attr in ("obs", "var"):
+            df = getattr(_adata, attr)
+            dup_mask = df.columns.duplicated()
+            if dup_mask.any():
+                print(
+                    f"Deleting duplicated keys `{list(df.columns[dup_mask].unique())}` from `adata.{attr}`."
+                )
+                setattr(_adata, attr, df.loc[:, ~dup_mask])
+
+    return ad.AnnData.concatenate(*adata_list, **kwargs)
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+if par['n_hvg']:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var['hvg_score'].to_numpy().argsort()[::-1][:par['n_hvg']]
+    adata = adata[:, idx].copy()
+
+print('Run scanorama', flush=True)
+split = []
+batch_categories = adata.obs['batch'].cat.categories
+for i in batch_categories:
+    split.append(adata[adata.obs['batch'] == i].copy())
+corrected = scanorama.correct_scanpy(split, return_dimred=True)
+corrected = merge_adata(*corrected, batch_key='batch', batch_categories=batch_categories, index_unique=None)
+
+print("Store output", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+    },
+    layers={
+        'corrected_counts': corrected.X,
+    },
+    obsm={
+        'X_emb': corrected.obsm["X_scanorama"],
+    }
+)
+
+print("Write output to file", flush=True)
+output.write(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/methods/scanorama_feature/config.vsh.yaml b/src/tasks/batch_integration/methods/scanorama_feature/config.vsh.yaml
new file mode 100644
index 0000000000..3f735ddffd
--- /dev/null
+++ b/src/tasks/batch_integration/methods/scanorama_feature/config.vsh.yaml
@@ -0,0 +1,41 @@
+# use method api spec
+__merge__: ../../api/comp_method_feature.yaml
+functionality:
+  name: scanorama_feature
+  info:
+    label: Scanorama (feature)
+    summary: "Efficient integration of heterogeneous single-cell
+      transcriptomes using Scanorama"
+    description: |
+      "Scanorama is an extension of the MNN method. Other then MNN, it finds mutual nearest neighbours over all batches and embeds observations into a joint hyperplane."
+    reference: "hie2019efficient"
+    repository_url: "https://github.com/brianhie/scanorama"
+    documentation_url: "https://github.com/brianhie/scanorama#readme"
+    v1:
+      path: openproblems/tasks/_batch_integration/batch_integration_graph/methods/scanorama.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+    variants:
+      scanorama_feature_full_unscaled:
+      scanorama_feature_full_scaled:
+        preferred_normalization: log_cp10k_scaled
+  arguments:
+    - name: --n_hvg
+      type: integer
+      default: 2000
+      description: Number of highly variable genes to use.
+  resources:
+    - type: python_script
+      path: ../scanorama_embed/script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - scanorama
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
diff --git a/src/tasks/batch_integration/methods/scanvi/config.vsh.yaml b/src/tasks/batch_integration/methods/scanvi/config.vsh.yaml
new file mode 100644
index 0000000000..5615fd72cd
--- /dev/null
+++ b/src/tasks/batch_integration/methods/scanvi/config.vsh.yaml
@@ -0,0 +1,61 @@
+__merge__: ../../api/comp_method_embedding.yaml
+
+functionality:
+  name: scanvi
+  info:
+    label: scANVI
+    summary: "scANVI is a deep learning method that considers cell type labels."
+    description : |
+      scANVI (single-cell ANnotation using Variational Inference; Python class SCANVI) is a semi-supervised model for single-cell transcriptomics data. In a sense, it can be seen as a scVI extension that can leverage the cell type knowledge for a subset of the cells present in the data sets to infer the states of the rest of the cells.
+    reference: "lopez2018deep"
+    repository_url: "https://github.com/scverse/scvi-tools"
+    documentation_url: "https://docs.scvi-tools.org/en/stable/user_guide/models/scanvi.html"
+    v1:
+      path: openproblems/tasks/_batch_integration/batch_integration_graph/methods/scanvi.py
+      commit: 29803b95c88b4ec5921df2eec7111fd5d1a95daf
+    preferred_normalization: counts
+    variants:
+      scanvi_full_unscaled:
+  arguments:
+    - name: --n_hvg
+      type: integer
+      default: 2000
+      description: Number of highly variable genes to use.
+    - name: --n_latent
+      type: integer
+      default: 30
+      description: Number of latent dimensions.
+    - name: --n_hidden
+      type: integer
+      default: 128
+      description: Number of hidden units.
+    - name: --n_layers
+      type: integer
+      default: 2
+      description: Number of layers.
+    - name: --max_epochs_scvi
+      type: integer
+      example: 400
+      description: Maximum number of training epochs for scVI.
+    - name: --max_epochs_scanvi
+      type: integer
+      example: 10
+      description: Maximum number of training epochs for scANVI.
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - scvi-tools>=1.1.0
+      - type: docker
+        run: |
+          pip install -U "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
+  - type: nextflow
+    directives:
+      label: [midtime, lowmem, lowcpu, gpu]
diff --git a/src/tasks/batch_integration/methods/scanvi/script.py b/src/tasks/batch_integration/methods/scanvi/script.py
new file mode 100644
index 0000000000..35d5b80f32
--- /dev/null
+++ b/src/tasks/batch_integration/methods/scanvi/script.py
@@ -0,0 +1,76 @@
+import sys
+import anndata as ad
+from scvi.model import SCVI, SCANVI
+
+## VIASH START
+par = {
+    'input': 'resources_test/batch_integration/pancreas/dataset.h5ad',
+    'output': 'output.h5ad',
+    'n_hvg': 2000,
+    'n_latent': 30,
+    'n_hidden': 128,
+    'n_layers': 2,
+    'max_epochs_scvi': 20,
+    'max_epochs_scanvi': 20
+}
+meta = {
+    'functionality_name' : 'scanvi',
+}
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/counts',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    adata = adata[:, idx].copy()
+
+print("Processing data", flush=True)
+SCVI.setup_anndata(adata, batch_key="batch")
+
+print("Run scVI", flush=True)
+model_kwargs = {
+    key: par[key]
+    for key in ["n_latent", "n_hidden", "n_layers"]
+    if par[key] is not None
+}
+
+vae = SCVI(adata, **model_kwargs)
+
+vae.train(max_epochs=par["max_epochs_scvi"], train_size=1.0)
+
+print('Run SCANVI', flush=True)
+scanvae = SCANVI.from_scvi_model(
+    scvi_model=vae,
+    labels_key="label",
+    unlabeled_category="UnknownUnknown", # pick anything definitely not in a dataset
+)
+scanvae.train(max_epochs=par["max_epochs_scanvi"], train_size=1.0)
+
+print("Store outputs", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    obsm={
+        "X_emb": scanvae.get_latent_representation(),
+    },
+    uns={
+        "dataset_id": adata.uns["dataset_id"],
+        "normalization_id": adata.uns["normalization_id"],
+        "method_id": meta["functionality_name"],
+    },
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
diff --git a/src/tasks/batch_integration/methods/scvi/config.vsh.yaml b/src/tasks/batch_integration/methods/scvi/config.vsh.yaml
new file mode 100644
index 0000000000..45eb09d5cf
--- /dev/null
+++ b/src/tasks/batch_integration/methods/scvi/config.vsh.yaml
@@ -0,0 +1,59 @@
+# use method api spec
+__merge__: ../../api/comp_method_embedding.yaml
+functionality:
+  name: scvi
+  info:
+    label: scVI
+    summary: "scVI combines a variational autoencoder with a hierarchical Bayesian model."
+    description: |
+      scVI combines a variational autoencoder with a hierarchical Bayesian model. It uses the negative binomial distribution to describe gene expression of each cell, conditioned on unobserved factors and the batch variable. ScVI is run as implemented in Luecken et al.
+    reference: "lopez2018deep"
+    repository_url: "https://github.com/scverse/scvi-tools"
+    documentation_url: "https://docs.scvi-tools.org/en/stable/user_guide/models/scvi.html"
+    v1:
+      path: openproblems/tasks/_batch_integration/batch_integration_graph/methods/scvi.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: counts
+    variants:
+      scvi_full_unscaled:
+  # defaults are derived from te scvi tutorial:
+  # https://docs.scvi-tools.org/en/stable/tutorials/notebooks/scrna/harmonization.html
+  arguments:
+    - name: --n_hvg
+      type: integer
+      default: 2000
+      description: Number of highly variable genes to use.
+    - name: --n_latent
+      type: integer
+      default: 30
+      description: Number of latent dimensions.
+    - name: --n_hidden
+      type: integer
+      default: 128
+      description: Number of hidden units.
+    - name: --n_layers
+      type: integer
+      default: 2
+      description: Number of layers.
+    - name: --max_epochs
+      type: integer
+      example: 400
+      description: Maximum number of epochs.
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - scvi-tools>=1.1.0
+      - type: docker
+        run: |
+          pip install -U "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu, gpu]
diff --git a/src/tasks/batch_integration/methods/scvi/script.py b/src/tasks/batch_integration/methods/scvi/script.py
new file mode 100644
index 0000000000..26490737a5
--- /dev/null
+++ b/src/tasks/batch_integration/methods/scvi/script.py
@@ -0,0 +1,66 @@
+import sys
+import anndata as ad
+from scvi.model import SCVI
+
+## VIASH START
+par = {
+    'input': 'resources_test/batch_integration/pancreas/dataset.h5ad',
+    'output': 'output.h5ad',
+    'n_hvg': 2000,
+    'n_latent': 30,
+    'n_hidden': 128,
+    'n_layers': 2,
+    'max_epochs': 400
+}
+meta = {
+    'functionality_name' : 'scvi',
+}
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/counts',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    adata = adata[:, idx].copy()
+
+print("Processing data", flush=True)
+SCVI.setup_anndata(adata, batch_key="batch")
+
+print("Run scVI", flush=True)
+model_kwargs = {
+    key: par[key]
+    for key in ["n_latent", "n_hidden", "n_layers"]
+    if par[key] is not None
+}
+
+vae = SCVI(adata, **model_kwargs)
+
+vae.train(max_epochs=par["max_epochs"], train_size=1.0)
+
+print("Store outputs", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    obsm={
+        "X_emb": vae.get_latent_representation(),
+    },
+    uns={
+        "dataset_id": adata.uns["dataset_id"],
+        "normalization_id": adata.uns["normalization_id"],
+        "method_id": meta["functionality_name"],
+    },
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
diff --git a/src/tasks/batch_integration/metrics/asw_batch/config.vsh.yaml b/src/tasks/batch_integration/metrics/asw_batch/config.vsh.yaml
new file mode 100644
index 0000000000..be6567271c
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/asw_batch/config.vsh.yaml
@@ -0,0 +1,50 @@
+# use metric api spec
+__merge__: ../../api/comp_metric_embedding.yaml
+functionality:
+  name: asw_batch
+  info:
+    metrics:
+      - name: asw_batch
+        label: ASW batch
+        summary: Average silhouette of batches per cell identity label (cell type)
+        description: |
+          We consider the absolute silhouette width, s(i), on
+          batch labels per cell i. Here, 0 indicates that batches are well mixed, and any
+          deviation from 0 indicates a batch effect:
+          𝑠batch(𝑖)=|𝑠(𝑖)|.
+
+          To ensure higher scores indicate better batch mixing, these scores are scaled by
+          subtracting them from 1. As we expect batches to integrate within cell identity
+          clusters, we compute the batchASWj score for each cell label j separately,
+          using the equation:
+          batchASW𝑗=1|𝐶𝑗|∑𝑖∈𝐶𝑗1−𝑠batch(𝑖),
+
+          where Cj is the set of cells with the cell label j and |Cj| denotes the number of cells
+          in that set.
+
+          To obtain the final batchASW score, the label-specific batchASWj scores are averaged:
+          batchASW=1|𝑀|∑𝑗∈𝑀batchASW𝑗.
+
+          Here, M is the set of unique cell labels.
+        reference: luecken2022benchmarking
+        min: 0
+        max: 1
+        maximize: true
+        v1:
+          path: openproblems/tasks/_batch_integration/batch_integration_embed/metrics/sil_batch.py
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - scib==1.1.5
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
diff --git a/src/tasks/batch_integration/metrics/asw_batch/script.py b/src/tasks/batch_integration/metrics/asw_batch/script.py
new file mode 100644
index 0000000000..35b110b895
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/asw_batch/script.py
@@ -0,0 +1,44 @@
+import sys
+import anndata as ad
+from scib.metrics import silhouette_batch
+
+## VIASH START
+par = {
+    'input_integrated': 'resources_test/batch_integration/pancreas/integrated_embedding.h5ad',
+    'output': 'output.h5ad',
+}
+meta = {
+    'functionality_name': 'foo',
+}
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsm='obsm', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('compute score', flush=True)
+score = silhouette_batch(
+    adata,
+    batch_key='batch',
+    label_key='label',
+    embed='X_emb',
+)
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': adata.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/metrics/asw_label/config.vsh.yaml b/src/tasks/batch_integration/metrics/asw_label/config.vsh.yaml
new file mode 100644
index 0000000000..068381b9e3
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/asw_label/config.vsh.yaml
@@ -0,0 +1,38 @@
+# use metric api spec
+__merge__: ../../api/comp_metric_embedding.yaml
+functionality:
+  name: asw_label
+  info:
+    metrics:
+      - name: asw_label
+        label: ASW Label
+        summary: Average silhouette of cell identity labels (cell types)
+        description: |
+          For the bio-conservation score, the ASW was computed on cell identity labels and
+          scaled to a value between 0 and 1 using the equation:
+          celltypeASW=(ASW_C+1)/2,
+
+          where C denotes the set of all cell identity labels.
+          For information about the batch silhouette score, check sil_batch.
+        reference: luecken2022benchmarking
+        min: 0
+        max: 1
+        maximize: true
+        v1:
+          path: openproblems/tasks/_batch_integration/batch_integration_embed/metrics/silhouette.py
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - scib==1.1.5
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
diff --git a/src/tasks/batch_integration/metrics/asw_label/script.py b/src/tasks/batch_integration/metrics/asw_label/script.py
new file mode 100644
index 0000000000..01a7a2ad41
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/asw_label/script.py
@@ -0,0 +1,44 @@
+import sys
+import anndata as ad
+from scib.metrics import silhouette
+
+## VIASH START
+par = {
+    'input_integrated': 'resources_test/batch_integration/pancreas/integrated_embedding.h5ad',
+    'output': 'output.h5ad',
+}
+
+meta = {
+    'functionality_name': 'foo',
+}
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsm='obsm', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('compute score', flush=True)
+score = silhouette(
+    adata,
+    label_key='label',
+    embed='X_emb'
+)
+
+print("Create output AnnData object", flush=True)
+output = ad.AnnData(
+    uns={
+        "dataset_id": adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        "method_id": adata.uns['method_id'],
+        "metric_ids": [meta['functionality_name']],
+        "metric_values": [score]
+    }
+)
+
+print("Write data to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
diff --git a/src/tasks/batch_integration/metrics/cell_cycle_conservation/config.vsh.yaml b/src/tasks/batch_integration/metrics/cell_cycle_conservation/config.vsh.yaml
new file mode 100644
index 0000000000..3852029a60
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/cell_cycle_conservation/config.vsh.yaml
@@ -0,0 +1,47 @@
+# use metric api spec
+__merge__: ../../api/comp_metric_embedding.yaml
+functionality:
+  name: cell_cycle_conservation
+  info:
+    metrics:
+      - name: cell_cycle_conservation
+        label: Cell Cycle Conservation
+        summary: Cell cycle conservation score based on principle component regression on cell cycle gene scores
+        description: |
+          The cell-cycle conservation score evaluates how well the cell-cycle effect can be
+          captured before and after integration. We computed cell-cycle scores using Scanpy’s
+          score_cell_cycle function with a reference gene set from Tirosh et al for the
+          respective cell-cycle phases. We used the same set of cell-cycle genes for mouse and
+          human data (using capitalization to convert between the gene symbols). We then computed
+          the variance contribution of the resulting S and G2/M phase scores using principal
+          component regression (Principal component regression), which was performed for each
+          batch separately. The differences in variance before, Varbefore, and after, Varafter,
+          integration were aggregated into a final score between 0 and 1, using the equation:
+          CCconservation=1−|Varafter−Varbefore|/Varbefore.
+
+          In this equation, values close to 0 indicate lower conservation and 1 indicates complete
+          conservation of the variance explained by cell cycle. In other words, the variance
+          remains unchanged within each batch for complete conservation, while any deviation from
+          the preintegration variance contribution reduces the score.
+        reference: luecken2022benchmarking
+        min: 0
+        max: 1
+        maximize: true
+        v1:
+          path: openproblems/tasks/_batch_integration/batch_integration_embed/metrics/cc_score.py
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - scib==1.1.5
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
diff --git a/src/tasks/batch_integration/metrics/cell_cycle_conservation/script.py b/src/tasks/batch_integration/metrics/cell_cycle_conservation/script.py
new file mode 100644
index 0000000000..fa432a21c6
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/cell_cycle_conservation/script.py
@@ -0,0 +1,69 @@
+import sys
+import anndata as ad
+from scib.metrics import cell_cycle
+import numpy as np
+
+## VIASH START
+par = {
+    'input_integrated': 'resources_test/batch_integration/pancreas/integrated_embedding.h5ad',
+    'output': 'output.h5ad'
+}
+
+meta = {
+    'functionality_name': 'foo'
+}
+## VIASH END
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata_solution = read_anndata(
+    par['input_solution'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+adata_integrated = read_anndata(
+    par['input_integrated'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+print('Use gene symbols for features', flush=True)
+adata_solution.var_names = adata_solution.var['feature_name']
+
+translator = {
+    "homo_sapiens": "human",
+    "mus_musculus": "mouse",
+}
+
+print('Compute score', flush=True)
+if adata_solution.uns['dataset_organism'] not in translator:
+    score = np.nan
+else:
+    organism = translator[adata_solution.uns['dataset_organism']]
+    score = cell_cycle(
+        adata_solution,
+        adata_integrated,
+        batch_key='batch',
+        embed='X_emb',
+        organism=organism,
+    )
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata_solution.uns['dataset_id'],
+        'normalization_id': adata_solution.uns['normalization_id'],
+        'method_id': adata_integrated.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/metrics/clustering_overlap/config.vsh.yaml b/src/tasks/batch_integration/metrics/clustering_overlap/config.vsh.yaml
new file mode 100644
index 0000000000..8d92033e40
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/clustering_overlap/config.vsh.yaml
@@ -0,0 +1,61 @@
+# use metric api spec
+__merge__: ../../api/comp_metric_graph.yaml
+functionality:
+  name: clustering_overlap
+  info:
+    metrics:
+      - name: ari
+        label: ARI
+        summary: Adjusted Rand Index compares clustering overlap, correcting for random labels and considering correct overlaps and disagreements.
+        description: |
+          The Adjusted Rand Index (ARI) compares the overlap of two clusterings;
+          it considers both correct clustering overlaps while also counting correct
+          disagreements between two clusterings.
+          We compared the cell-type labels with the NMI-optimized
+          Louvain clustering computed on the integrated dataset.
+          The adjustment of the Rand index corrects for randomly correct labels.
+          An ARI of 0 or 1 corresponds to random labeling or a perfect match,
+          respectively.
+        reference:
+          - hubert1985comparing
+          - luecken2022benchmarking
+        min: 0
+        max: 1
+        maximize: true
+        v1:
+          path: openproblems/tasks/_batch_integration/batch_integration_graph/metrics/ari.py
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+      - name: nmi
+        label: NMI
+        summary: "NMI compares overlap by scaling using mean entropy terms and optimizing Louvain clustering to obtain the best match between clusters and labels."
+        description: |
+          Normalized Mutual Information (NMI) compares the overlap of two clusterings.
+          We used NMI to compare the cell-type labels with Louvain clusters computed on
+          the integrated dataset. The overlap was scaled using the mean of the entropy terms
+          for cell-type and cluster labels. Thus, NMI scores of 0 or 1 correspond to uncorrelated
+          clustering or a perfect match, respectively. We performed optimized Louvain clustering
+          for this metric to obtain the best match between clusters and labels.
+        reference:
+          - amelio2015normalized
+          - luecken2022benchmarking
+        min: 0
+        max: 1
+        maximize: true
+        v1:
+          path: openproblems/tasks/_batch_integration/batch_integration_graph/metrics/nmi.py
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - scib==1.1.5
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
diff --git a/src/tasks/batch_integration/metrics/clustering_overlap/script.py b/src/tasks/batch_integration/metrics/clustering_overlap/script.py
new file mode 100644
index 0000000000..7bb9e533c8
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/clustering_overlap/script.py
@@ -0,0 +1,53 @@
+import sys
+import anndata as ad
+import scanpy as sc
+from scib.metrics.clustering import cluster_optimal_resolution
+from scib.metrics import ari, nmi
+
+## VIASH START
+par = {
+    'adata_integrated': 'resources_test/batch_integration/pancreas/integrated_graph.h5ad',
+    'output': 'output.h5ad',
+}
+
+meta = {
+    'functionality_name': 'foo'
+}
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsp='obsp', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('Run optimal Leiden clustering', flush=True)
+cluster_optimal_resolution(
+    adata=adata,
+    label_key='label',
+    cluster_key='cluster',
+    cluster_function=sc.tl.leiden,
+)
+
+print('Compute ARI score', flush=True)
+ari_score = ari(adata, cluster_key='cluster', label_key='label')
+
+print('Compute NMI score', flush=True)
+nmi_score = nmi(adata, cluster_key='cluster', label_key='label')
+
+print("Create output AnnData object", flush=True)
+output = ad.AnnData(
+    uns={
+        "dataset_id": adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        "method_id": adata.uns['method_id'],
+        "metric_ids": [ "ari", "nmi" ],
+        "metric_values": [ ari_score, nmi_score ]
+    }
+)
+
+print("Write data to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/batch_integration/metrics/graph_connectivity/config.vsh.yaml b/src/tasks/batch_integration/metrics/graph_connectivity/config.vsh.yaml
new file mode 100644
index 0000000000..6384feca62
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/graph_connectivity/config.vsh.yaml
@@ -0,0 +1,47 @@
+# use metric api spec
+__merge__: ../../api/comp_metric_graph.yaml
+functionality:
+  name: graph_connectivity
+  info:
+    metrics:
+      - name: graph_connectivity
+        label: Graph Connectivity
+        summary: Connectivity of the subgraph per cell type label
+        description: |
+          The graph connectivity metric assesses whether the kNN graph representation,
+          G, of the integrated data directly connects all cells with the same cell
+          identity label. For each cell identity label c, we created the subset kNN
+          graph G(Nc;Ec) to contain only cells from a given label. Using these subset
+          kNN graphs, we computed the graph connectivity score using the equation:
+
+          gc =1/|C| Σc∈C |LCC(G(Nc;Ec))|/|Nc|.
+
+          Here, C represents the set of cell identity labels, |LCC()| is the number
+          of nodes in the largest connected component of the graph, and |Nc| is the
+          number of nodes with cell identity c. The resultant score has a range
+          of (0;1], where 1 indicates that all cells with the same cell identity
+          are connected in the integrated kNN graph, and the lowest possible score
+          indicates a graph where no cell is connected. As this score is computed
+          on the kNN graph, it can be used to evaluate all integration outputs.
+        reference: luecken2022benchmarking
+        min: 0
+        max: 1
+        maximize: true
+        v1:
+          path: https://github.com/openproblems-bio/openproblems/blob/main/openproblems/tasks/_batch_integration/batch_integration_graph/metrics/graph_connectivity.py
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - scib==1.1.5
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
diff --git a/src/tasks/batch_integration/metrics/graph_connectivity/script.py b/src/tasks/batch_integration/metrics/graph_connectivity/script.py
new file mode 100644
index 0000000000..ead8f146bc
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/graph_connectivity/script.py
@@ -0,0 +1,42 @@
+import sys
+import anndata as ad
+import scib
+
+## VIASH START
+par = {
+    'input_integrated': 'resources_test/batch_integration/pancreas/integrated_embedding.h5ad',
+    'output': 'output.h5ad',
+}
+meta = {
+    'functionality_name': 'foo',
+}
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsp='obsp', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('compute score', flush=True)
+score = scib.metrics.graph_connectivity(
+    adata,
+    label_key='label'
+)
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': adata.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/metrics/hvg_overlap/config.vsh.yaml b/src/tasks/batch_integration/metrics/hvg_overlap/config.vsh.yaml
new file mode 100644
index 0000000000..a8025783d6
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/hvg_overlap/config.vsh.yaml
@@ -0,0 +1,46 @@
+# use metric api spec
+__merge__: ../../api/comp_metric_feature.yaml
+functionality:
+  name: hvg_overlap
+  info:
+    metrics:
+      - name: hvg_overlap
+        label: HVG overlap
+        summary: Overlap of highly variable genes per batch before and after integration.
+        description: |
+          The HVG conservation score is a proxy for the preservation of
+          the biological signal. If the data integration method returned
+          a corrected data matrix, we computed the number of HVGs before
+          and after correction for each batch via Scanpy’s
+          highly_variable_genes function (using the ‘cell ranger’ flavor).
+          If available, we computed 500 HVGs per batch. If fewer than 500
+          genes were present in the integrated object for a batch,
+          the number of HVGs was set to half the total genes in that batch.
+          The overlap coefficient is as follows:
+          overlap(𝑋,𝑌)=|𝑋∩𝑌|/min(|𝑋|,|𝑌|),
+
+          where X and Y denote the fraction of preserved informative genes.
+          The overall HVG score is the mean of the per-batch HVG overlap
+          coefficients.
+        reference: luecken2022benchmarking
+        min: 0
+        max: 1
+        maximize: true
+        v1:
+          path: openproblems/tasks/_batch_integration/batch_integration_feature/metrics/hvg_conservation.py
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - scib==1.1.5
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
diff --git a/src/tasks/batch_integration/metrics/hvg_overlap/script.py b/src/tasks/batch_integration/metrics/hvg_overlap/script.py
new file mode 100644
index 0000000000..b7d177e991
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/hvg_overlap/script.py
@@ -0,0 +1,55 @@
+import sys
+import anndata as ad
+from scib.metrics import hvg_overlap
+
+## VIASH START
+par = {
+    'input_integrated': 'resources_test/batch_integration/pancreas/integrated_embedding.h5ad',
+    'output': 'output.h5ad',
+}
+
+meta = {
+    'functionality_name': 'foo',
+}
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata_solution = read_anndata(
+    par['input_solution'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+adata_integrated = read_anndata(
+    par['input_integrated'],
+    X='layers/corrected_counts',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+print('compute score', flush=True)
+score = hvg_overlap(
+    adata_solution,
+    adata_integrated,
+    batch_key="batch"
+)
+
+print("Create output AnnData object", flush=True)
+output = ad.AnnData(
+    uns={
+        "dataset_id": adata_solution.uns['dataset_id'],
+        'normalization_id': adata_solution.uns['normalization_id'],
+        "method_id": adata_integrated.uns['method_id'],
+        "metric_ids": [meta['functionality_name']],
+        "metric_values": [score]
+    }
+)
+
+print("Write data to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
diff --git a/src/tasks/batch_integration/metrics/isolated_label_asw/config.vsh.yaml b/src/tasks/batch_integration/metrics/isolated_label_asw/config.vsh.yaml
new file mode 100644
index 0000000000..65e1970c4f
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/isolated_label_asw/config.vsh.yaml
@@ -0,0 +1,40 @@
+# use metric api spec
+__merge__: ../../api/comp_metric_embedding.yaml
+functionality:
+  name: isolated_label_asw
+  info:
+    metrics:
+      - name: isolated_label_asw
+        label: Isolated label ASW
+        summary: Evaluate how well isolated labels separate by average silhouette width
+        description: |
+          Isolated cell labels are defined as the labels present in the least number
+          of batches in the integration task. The score evaluates how well these isolated labels
+          separate from other cell identities.
+
+          The isolated label ASW score is obtained by computing the
+          ASW of isolated versus non-isolated labels on the PCA embedding (ASW metric above) and
+          scaling this score to be between 0 and 1. The final score for each metric version
+          consists of the mean isolated score of all isolated labels.
+        reference: luecken2022benchmarking
+        v1:
+          path: openproblems/tasks/_batch_integration/batch_integration_graph/metrics/iso_label_sil.py
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+        min: 0
+        max: 1
+        maximize: true
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - scib==1.1.5
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
diff --git a/src/tasks/batch_integration/metrics/isolated_label_asw/script.py b/src/tasks/batch_integration/metrics/isolated_label_asw/script.py
new file mode 100644
index 0000000000..094937e687
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/isolated_label_asw/script.py
@@ -0,0 +1,49 @@
+import sys
+import anndata as ad
+from scib.metrics import isolated_labels_asw
+
+## VIASH START
+par = {
+    'input_integrated': 'resources_test/batch_integration/pancreas/integrated_embedding.h5ad',
+    'output': 'output.h5ad',
+}
+
+meta = {
+    'functionality_name': 'foo',
+}
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsm='obsm', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('compute score', flush=True)
+
+score = isolated_labels_asw(
+    adata,
+    label_key='label',
+    batch_key='batch',
+    embed='X_emb',
+    iso_threshold=None,
+    verbose=True,
+)
+print(score, flush=True)
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': adata.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
\ No newline at end of file
diff --git a/src/tasks/batch_integration/metrics/isolated_label_f1/config.vsh.yaml b/src/tasks/batch_integration/metrics/isolated_label_f1/config.vsh.yaml
new file mode 100644
index 0000000000..6b8f0703bf
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/isolated_label_f1/config.vsh.yaml
@@ -0,0 +1,52 @@
+# use metric api spec
+__merge__: ../../api/comp_metric_graph.yaml
+functionality:
+  name: isolated_label_f1
+  info:
+    metrics:
+      - name: isolated_label_f1
+        label: Isolated label F1 score
+        summary: Evaluate how well isolated labels coincide with clusters 
+        description: |
+          We developed two isolated label scores to evaluate how well the data integration methods
+          dealt with cell identity labels shared by few batches. Specifically, we identified
+          isolated cell labels as the labels present in the least number of batches in the
+          integration task.
+          The score evaluates how well these isolated labels separate from other cell identities.
+          We implemented the isolated label metric in two versions:
+          (1) the best clustering of the isolated label (F1 score) and
+          (2) the global ASW of the isolated label. For the cluster-based score,
+          we first optimize the cluster assignment of the isolated label using the F1 score˚
+          across louvain clustering resolutions ranging from 0.1 to 2 in resolution steps of 0.1.
+          The optimal F1 score for the isolated label is then used as the metric score.
+          The F1 score is a weighted mean of precision and recall given by the equation:
+          𝐹1=2×(precision×recall)/(precision+recall).
+
+          It returns a value between 0 and 1,
+          where 1 shows that all of the isolated label cells and no others are captured in
+          the cluster. For the isolated label ASW score, we compute the ASW of isolated
+          versus nonisolated labels on the PCA embedding (ASW metric above) and scale this
+          score to be between 0 and 1. The final score for each metric version consists of
+          the mean isolated score of all isolated labels.
+        reference: luecken2022benchmarking
+        v1:
+          path: openproblems/tasks/_batch_integration/batch_integration_graph/metrics/iso_label_f1.py
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+        min: 0
+        max: 1
+        maximize: true
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - scib==1.1.5
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
diff --git a/src/tasks/batch_integration/metrics/isolated_label_f1/script.py b/src/tasks/batch_integration/metrics/isolated_label_f1/script.py
new file mode 100644
index 0000000000..30fe25bccf
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/isolated_label_f1/script.py
@@ -0,0 +1,48 @@
+import sys
+import anndata as ad
+from scib.metrics import isolated_labels_f1
+
+## VIASH START
+par = {
+    'input_integrated': 'resources_test/batch_integration/pancreas/integrated_embedding.h5ad',
+    'output': 'output.h5ad',
+}
+
+meta = {
+    'functionality_name': 'foo',
+}
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsp='obsp', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('compute score', flush=True)
+score = isolated_labels_f1(
+    adata,
+    label_key='label',
+    batch_key='batch',
+    embed=None,
+    iso_threshold=None,
+    verbose=True,
+)
+print(score, flush=True)
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': adata.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
\ No newline at end of file
diff --git a/src/tasks/batch_integration/metrics/kbet/config.vsh.yaml b/src/tasks/batch_integration/metrics/kbet/config.vsh.yaml
new file mode 100644
index 0000000000..aca556a8fc
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/kbet/config.vsh.yaml
@@ -0,0 +1,57 @@
+# use metric api spec
+__merge__: ../../api/comp_metric_embedding.yaml
+functionality:
+  name: kbet
+  info:
+    metrics:
+      - name: kbet
+        label: kBET
+        summary: kBET algorithm to determine how well batches are mixed within a cell type
+        description: |
+          The kBET algorithm (v.0.99.6, release 4c9dafa) determines whether the label composition
+          of a k nearest neighborhood of a cell is similar to the expected (global) label
+          composition (Buettner et al., Nat Meth 2019). The test is repeated for a random subset
+          of cells, and the results are summarized as a rejection rate over all tested
+          neighborhoods. Thus, kBET works on a kNN graph.
+
+          We compute kNN graphs where k = 50 for joint embeddings and corrected feature outputs
+          via Scanpy preprocessing steps. To test for technical effects and to account for
+          cell-type frequency shifts across datasets, we applied kBET
+          separately on the batch variable for each cell identity label. Using the kBET defaults,
+          a k equal to the median of the number of cells per batch within each label is used for
+          this computation. Additionally, we set the minimum and maximum thresholds of k to 10 and
+          100, respectively. As kNN graphs that have been subset by cell identity labels may no
+          longer be connected, we compute kBET per connected component. If >25% of cells were
+          assigned to connected components too small for kBET computation (smaller than k × 3),
+          we assigned a kBET score of 1 to denote poor batch removal. Subsequently, kBET scores
+          for each label were averaged and subtracted from 1 to give a final kBET score.
+
+          In Open Problems we do not run kBET on graph outputs to avoid computation-intensive
+          diffusion processes being run.
+        reference: luecken2022benchmarking
+        v1:
+          path: openproblems/tasks/_batch_integration/batch_integration_embed/metrics/kBET.py
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+        min: 0
+        max: 1
+        maximize: true
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        github: theislab/kBET
+      - type: python
+        pypi:
+          - scib==1.1.5
+          - rpy2>=3
+          - anndata2ri
+          - scipy<=1.13
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
diff --git a/src/tasks/batch_integration/metrics/kbet/script.py b/src/tasks/batch_integration/metrics/kbet/script.py
new file mode 100644
index 0000000000..9834f525d5
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/kbet/script.py
@@ -0,0 +1,49 @@
+import sys
+import anndata as ad
+from scib.metrics import kBET
+
+## VIASH START
+par = {
+    'input_integrated': 'resources_test/batch_integration/pancreas/integrated_embedding.h5ad',
+    'output': 'output.h5ad',
+}
+
+meta = {
+    'functionality_name': 'foo',
+}
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsm='obsm', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('compute score', flush=True)
+score = kBET(
+    adata,
+    batch_key="batch",
+    label_key="label",
+    type_="embed",
+    embed="X_emb",
+    scaled=True,
+    verbose=False,
+)
+print(score, flush=True)
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': adata.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
\ No newline at end of file
diff --git a/src/tasks/batch_integration/metrics/lisi/config.vsh.yaml b/src/tasks/batch_integration/metrics/lisi/config.vsh.yaml
new file mode 100644
index 0000000000..750574f84a
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/lisi/config.vsh.yaml
@@ -0,0 +1,56 @@
+# use metric api spec
+__merge__: ../../api/comp_metric_graph.yaml
+functionality:
+  status: disabled
+  name: lisi
+  info:
+    metrics:
+      - name: ilisi
+        label: iLISI
+        summary: Local inverse Simpson's Index
+        description: |
+          Local Inverse Simpson's Index metrics adapted from Korsunsky et al. 2019 to run on
+          all full feature, embedding and kNN integration outputs via shortest path-based
+          distance computation on single-cell kNN graphs. The metric assesses whether clusters
+          of cells in a single-cell RNA-seq dataset are well-mixed across a categorical batch
+          variable.
+          
+          The original LISI score ranges from 0 to the number of categories, with the latter
+          indicating good cell mixing. This is rescaled to a score between 0 and 1.
+        reference: luecken2022benchmarking
+        min: 0
+        max: 1
+        maximize: true
+        repository_url: https://github.com/theislab/scib/blob/ed3e2846414ca1e3dc07552c0eef1e68d82230d4/scib/metrics/lisi.py
+        documentation_url: https://scib.readthedocs.io/en/latest/api/scib.metrics.ilisi_graph.html
+      - name: clisi
+        label: cLISI
+        summary: Local inverse Simpson's Index
+        description: |
+          Local Inverse Simpson's Index metrics adapted from Korsunsky et al. 2019 to run on
+          all full feature, embedding and kNN integration outputs via shortest path-based
+          distance computation on single-cell kNN graphs. The metric assesses whether clusters
+          of cells in a single-cell RNA-seq dataset are well-mixed across a categorical cell type variable.
+          
+          The original LISI score ranges from 0 to the number of categories, with the latter indicating good cell mixing. This is rescaled to a score between 0 and 1.
+        reference: luecken2022benchmarking
+        min: 0
+        max: 1
+        maximize: true
+        repository_url: https://github.com/theislab/scib/blob/ed3e2846414ca1e3dc07552c0eef1e68d82230d4/scib/metrics/lisi.py
+        documentation_url: https://scib.readthedocs.io/en/latest/api/scib.metrics.clisi_graph.html
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - git+https://github.com/theislab/scib.git@v1.1.5
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
diff --git a/src/tasks/batch_integration/metrics/lisi/script.py b/src/tasks/batch_integration/metrics/lisi/script.py
new file mode 100644
index 0000000000..44181dab71
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/lisi/script.py
@@ -0,0 +1,64 @@
+import sys
+import numpy as np
+import anndata as ad
+from scib.metrics.lisi import lisi_graph_py
+
+## VIASH START
+par = {
+    'input_integrated': 'resources_test/batch_integration/pancreas/integrated_embedding.h5ad',
+    'output': 'output.h5ad',
+}
+meta = {
+    'functionality_name': 'foo',
+}
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsp='obsp', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('compute iLISI score...', flush=True)
+ilisi_scores = lisi_graph_py(
+    adata=adata,
+    obs_key='batch',
+    n_neighbors=90,
+    perplexity=None,
+    subsample=None,
+    n_cores=1,
+    verbose=False,
+)
+ilisi = np.nanmedian(ilisi_scores)
+ilisi = (ilisi - 1) / (adata.obs['batch'].nunique() - 1)
+
+print('compute cLISI scores...', flush=True)
+clisi_scores = lisi_graph_py(
+    adata=adata,
+    obs_key='label',
+    n_neighbors=90,
+    perplexity=None,
+    subsample=None,
+    n_cores=1,
+    verbose=False,
+)
+clisi = np.nanmedian(clisi_scores)
+nlabs = adata.obs['label'].nunique()
+clisi = (nlabs - clisi) / (nlabs - 1)
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': adata.uns['method_id'],
+        'metric_ids': [ 'ilisi', 'clisi' ],
+        'metric_values': [ ilisi, clisi ]
+    }
+)
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/batch_integration/metrics/pcr/config.vsh.yaml b/src/tasks/batch_integration/metrics/pcr/config.vsh.yaml
new file mode 100644
index 0000000000..d3391fb528
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/pcr/config.vsh.yaml
@@ -0,0 +1,44 @@
+# use metric api spec
+__merge__: ../../api/comp_metric_embedding.yaml
+functionality:
+  name: pcr
+  info:
+    metrics:
+      - name: pcr
+        label: PCR
+        summary: Compare explained variance by batch before and after integration
+        description: |
+          Principal component regression, derived from PCA, has previously been used to quantify
+          batch removal. Briefly, the R2 was calculated from a linear regression of the
+          covariate of interest (for example, the batch variable B) onto each principal component.
+          The variance contribution of the batch effect per principal component was then
+          calculated as the product of the variance explained by the ith principal component (PC)
+          and the corresponding R2(PCi|B). The sum across all variance contributions by the batch
+          effects in all principal components gives the total variance explained by the batch
+          variable as follows:
+          Var(𝐶|𝐵)=∑𝑖=1𝐺Var(𝐶|PC𝑖)×𝑅2(PC𝑖|𝐵),
+
+          where Var(C|PCi) is the variance of the data matrix C explained by the ith principal
+          component.
+        reference: luecken2022benchmarking
+        v1:
+          path: openproblems/tasks/_batch_integration/batch_integration_embed/metrics/pcr.py
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+        min: 0
+        max: 1
+        maximize: true
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - scib==1.1.5
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
diff --git a/src/tasks/batch_integration/metrics/pcr/script.py b/src/tasks/batch_integration/metrics/pcr/script.py
new file mode 100644
index 0000000000..512b3dff6b
--- /dev/null
+++ b/src/tasks/batch_integration/metrics/pcr/script.py
@@ -0,0 +1,59 @@
+import sys
+import anndata as ad
+from scib.metrics import pcr_comparison
+
+## VIASH START
+par = {
+    'input_integrated': 'resources_test/batch_integration/pancreas/integrated_embedding.h5ad',
+    'output': 'output.h5ad',
+}
+
+meta = {
+    'functionality_name': 'foo',
+}
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata_solution = read_anndata(
+    par['input_solution'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    # obsm='obsm',
+    # varm='varm',
+    uns='uns'
+)
+adata_integrated = read_anndata(
+    par['input_integrated'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+print('compute score', flush=True)
+score = pcr_comparison(
+    adata_solution,
+    adata_integrated,
+    embed='X_emb',
+    covariate='batch',
+    verbose=False
+)
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata_solution.uns['dataset_id'],
+        'normalization_id': adata_solution.uns['normalization_id'],
+        'method_id': adata_integrated.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
\ No newline at end of file
diff --git a/src/tasks/batch_integration/process_dataset/config.vsh.yaml b/src/tasks/batch_integration/process_dataset/config.vsh.yaml
new file mode 100644
index 0000000000..73ea5815c3
--- /dev/null
+++ b/src/tasks/batch_integration/process_dataset/config.vsh.yaml
@@ -0,0 +1,18 @@
+__merge__: ../api/comp_process_dataset.yaml
+functionality:
+  name: process_dataset
+  description: Preprocess adata object for data integration
+  resources:
+    - type: python_script
+      path: script.py
+    - path: /src/common/helper_functions/subset_anndata.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi:
+          - scib==1.1.5
+  - type: nextflow
+    directives:
+      label: [highmem, midcpu , midtime]
diff --git a/src/tasks/batch_integration/process_dataset/script.py b/src/tasks/batch_integration/process_dataset/script.py
new file mode 100644
index 0000000000..cf8af4c4b7
--- /dev/null
+++ b/src/tasks/batch_integration/process_dataset/script.py
@@ -0,0 +1,66 @@
+import sys
+import anndata as ad
+
+## VIASH START
+par = {
+    'input': 'resources_test/common/pancreas/dataset.h5ad',
+    'hvgs': 2000,
+    'obs_label': 'cell_type',
+    'obs_batch': 'batch',
+    'subset_hvg': False,
+    'output': 'output.h5ad'
+}
+meta = {
+    "config": "target/nextflow/batch_integration/process_dataset/.config.vsh.yaml",
+    "resources_dir": "src/common/helper_functions"
+}
+## VIASH END
+
+# import helper functions
+sys.path.append(meta['resources_dir'])
+from subset_anndata import read_config_slots_info, subset_anndata
+
+print('Read input', flush=True)
+input = ad.read_h5ad(par['input'])
+
+def compute_batched_hvg(adata, n_hvgs):
+    adata = adata.copy()
+    adata.X = adata.layers['normalized'].copy()
+    if n_hvgs > adata.n_vars or n_hvgs <= 0:
+        hvg_list = adata.var_names.tolist()
+    else:
+        import scib
+        hvg_list = scib.pp.hvg_batch(
+            adata,
+            batch_key='batch',
+            target_genes=n_hvgs,
+            adataOut=False
+        )
+    adata.var['hvg'] = adata.var_names.isin(hvg_list)
+    del adata.X
+    return adata
+
+print(f'Select {par["hvgs"]} highly variable genes', flush=True)
+adata_with_hvg = compute_batched_hvg(input, n_hvgs=par['hvgs'])
+
+if par['subset_hvg']:
+    print('Subsetting to HVG dimensions', flush=True)
+    adata_with_hvg = adata_with_hvg[:, adata_with_hvg.var['hvg']].copy()
+
+print(">> Figuring out which data needs to be copied to which output file", flush=True)
+# use par arguments to look for label and batch value in different slots
+slot_mapping = {
+    "obs": {
+        "label": par["obs_label"],
+        "batch": par["obs_batch"],
+    }
+}
+slot_info = read_config_slots_info(meta["config"], slot_mapping)
+
+print(">> Create output object", flush=True)
+output_dataset = subset_anndata(adata_with_hvg, slot_info["output_dataset"])
+output_solution = subset_anndata(adata_with_hvg, slot_info["output_solution"])
+
+print('Writing adatas to file', flush=True)
+output_dataset.write(par['output_dataset'], compression='gzip')
+output_solution.write(par['output_solution'], compression='gzip')
diff --git a/src/tasks/batch_integration/resources_scripts/process_datasets.sh b/src/tasks/batch_integration/resources_scripts/process_datasets.sh
new file mode 100755
index 0000000000..b49c203af8
--- /dev/null
+++ b/src/tasks/batch_integration/resources_scripts/process_datasets.sh
@@ -0,0 +1,33 @@
+#!/bin/bash
+
+cat > /tmp/params.yaml << 'HERE'
+input_states: s3://openproblems-data/resources/datasets/**/state.yaml
+rename_keys: 'input:output_dataset'
+settings: '{"output_dataset": "$id/dataset.h5ad", "output_solution": "$id/solution.h5ad"}'
+output_state: "$id/state.yaml"
+publish_dir: s3://openproblems-data/resources/batch_integration/datasets
+HERE
+
+cat > /tmp/nextflow.config << HERE
+process {
+  executor = 'awsbatch'
+  withName:'.*publishStatesProc' {
+      memory = '16GB'
+      disk = '100GB'
+   }
+  withLabel:highmem {
+      memory = '350GB'
+   }
+}
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/batch_integration/workflows/process_datasets/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --entry-name auto \
+  --config /tmp/nextflow.config \
+  --labels batch_integration,process_datasets
\ No newline at end of file
diff --git a/src/tasks/batch_integration/resources_scripts/run_benchmark.sh b/src/tasks/batch_integration/resources_scripts/run_benchmark.sh
new file mode 100755
index 0000000000..cd83810680
--- /dev/null
+++ b/src/tasks/batch_integration/resources_scripts/run_benchmark.sh
@@ -0,0 +1,22 @@
+#!/bin/bash
+
+RUN_ID="run_$(date +%Y-%m-%d_%H-%M-%S)"
+publish_dir="s3://openproblems-data/resources/batch_integration/results/${RUN_ID}"
+
+cat > /tmp/params.yaml << HERE
+input_states: s3://openproblems-data/resources/batch_integration/datasets/**/state.yaml
+rename_keys: 'input_dataset:output_dataset,input_solution:output_solution'
+output_state: "state.yaml"
+publish_dir: "$publish_dir"
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/batch_integration/workflows/run_benchmark/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --entry-name auto \
+  --config src/wf_utils/labels_tw.config \
+  --labels batch_integration,full
\ No newline at end of file
diff --git a/src/tasks/batch_integration/resources_scripts/run_benchmark_test.sh b/src/tasks/batch_integration/resources_scripts/run_benchmark_test.sh
new file mode 100755
index 0000000000..eca3049d3a
--- /dev/null
+++ b/src/tasks/batch_integration/resources_scripts/run_benchmark_test.sh
@@ -0,0 +1,25 @@
+#!/bin/bash
+
+cat > /tmp/params.yaml << 'HERE'
+input_states: s3://openproblems-data/resources_test/batch_integration/**/state.yaml
+rename_keys: 'input_dataset:output_dataset,input_solution:output_solution'
+output_state: "state.yaml"
+publish_dir: s3://openproblems-nextflow/temp/batch_integration/
+HERE
+
+cat > /tmp/nextflow.config << HERE
+process {
+  executor = 'awsbatch'
+}
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/batch_integration/workflows/run_benchmark/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --entry-name auto \
+  --config /tmp/nextflow.config \
+  --labels batch_integration,test
\ No newline at end of file
diff --git a/src/tasks/batch_integration/resources_test_scripts/process.sh b/src/tasks/batch_integration/resources_test_scripts/process.sh
new file mode 100755
index 0000000000..3ab0dd2a4d
--- /dev/null
+++ b/src/tasks/batch_integration/resources_test_scripts/process.sh
@@ -0,0 +1,49 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+RAW_DATA=resources_test/common
+DATASET_DIR=resources_test/batch_integration
+
+mkdir -p $DATASET_DIR
+
+# process dataset
+echo Running process_dataset
+nextflow run . \
+  -main-script target/nextflow/batch_integration/workflows/process_datasets/main.nf \
+  -profile docker \
+  -entry auto \
+  --input_states "$RAW_DATA/**/state.yaml" \
+  --rename_keys 'input:output_dataset' \
+  --settings '{"output_dataset": "$id/dataset.h5ad", "output_solution": "$id/solution.h5ad"}' \
+  --publish_dir "$DATASET_DIR" \
+  --output_state '$id/state.yaml'
+# output_state should be moved to settings once workaround is solved
+
+for id in pancreas cxg_mouse_pancreas_atlas; do
+  if [ ! -f $DATASET_DIR/$id/dataset.h5ad ]; then
+    echo "Dataset $id not found"
+    exit 1
+  fi
+
+  echo Running BBKNN on $id
+  viash run src/tasks/batch_integration/methods/bbknn/config.vsh.yaml -- \
+    --input $DATASET_DIR/$id/dataset.h5ad \
+    --output $DATASET_DIR/$id/integrated_graph.h5ad
+
+  echo Running SCVI on $id
+  viash run src/tasks/batch_integration/methods/scvi/config.vsh.yaml -- \
+    --input $DATASET_DIR/$id/dataset.h5ad \
+    --output $DATASET_DIR/$id/integrated_embedding.h5ad
+
+  echo Running combat on $id
+  viash run src/tasks/batch_integration/methods/combat/config.vsh.yaml -- \
+    --input $DATASET_DIR/$id/dataset.h5ad \
+    --output $DATASET_DIR/$id/integrated_feature.h5ad
+done
\ No newline at end of file
diff --git a/src/tasks/batch_integration/transformers/embed_to_graph/config.vsh.yaml b/src/tasks/batch_integration/transformers/embed_to_graph/config.vsh.yaml
new file mode 100644
index 0000000000..e841081a91
--- /dev/null
+++ b/src/tasks/batch_integration/transformers/embed_to_graph/config.vsh.yaml
@@ -0,0 +1,19 @@
+__merge__: ../../api/comp_transformer_embedding_to_graph.yaml
+functionality:
+  name: embed_to_graph
+  info:
+    label: Embedding to Graph
+    summary: Transform an embedding to a graph output.
+    description: |
+      Transform an embedding to a graph output by applying the k nearest neighbors algorithm.
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
diff --git a/src/tasks/batch_integration/transformers/embed_to_graph/script.py b/src/tasks/batch_integration/transformers/embed_to_graph/script.py
new file mode 100644
index 0000000000..74166eb77c
--- /dev/null
+++ b/src/tasks/batch_integration/transformers/embed_to_graph/script.py
@@ -0,0 +1,33 @@
+import sys
+import scanpy as sc
+
+## VIASH START
+par = {
+    'input': 'resources_test/batch_integration/pancreas/integrated_embedding.h5ad',
+    'ouput': 'output.h5ad'
+}
+
+meta = { 
+    'functionality': 'foo',
+    'config': 'bar'
+}
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+
+print('Run kNN...', flush=True)
+sc.pp.neighbors(adata, use_rep='X_emb')
+
+print("Store outputs", flush=True)
+adata.write_h5ad(par['output'], compression='gzip')
\ No newline at end of file
diff --git a/src/tasks/batch_integration/transformers/feature_to_embed/config.vsh.yaml b/src/tasks/batch_integration/transformers/feature_to_embed/config.vsh.yaml
new file mode 100644
index 0000000000..e08013c63b
--- /dev/null
+++ b/src/tasks/batch_integration/transformers/feature_to_embed/config.vsh.yaml
@@ -0,0 +1,20 @@
+__merge__: ../../api/comp_transformer_feature_to_embedding.yaml
+functionality:
+  name: feature_to_embed
+  info:
+    type: transformer
+    label: Feature to Embedding
+    summary: Transform a feature output to an embedding.
+    description: |
+      Transform a feature output to an embedding by computing a PCA on the corrected counts.
+  resources:
+    - type: python_script
+      path: script.py
+    - type: python_script
+      path: /src/common/helper_functions/read_anndata_partial.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
diff --git a/src/tasks/batch_integration/transformers/feature_to_embed/script.py b/src/tasks/batch_integration/transformers/feature_to_embed/script.py
new file mode 100644
index 0000000000..0e022db8b1
--- /dev/null
+++ b/src/tasks/batch_integration/transformers/feature_to_embed/script.py
@@ -0,0 +1,41 @@
+import sys
+import scanpy as sc
+
+## VIASH START
+par = {
+    'input': 'resources_test/batch_integration/pancreas/integrated_feature.h5ad',
+    'ouput': 'output.h5ad'
+}
+
+meta = { 
+    'functionality': 'foo',
+    'config': 'bar'
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/corrected_counts',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+
+print('Run PCA', flush=True)
+adata.obsm['X_emb'] = sc.pp.pca(
+    adata.X,
+    n_comps=50,
+    use_highly_variable=False,  # Do we want to set this to True?
+    svd_solver='arpack',
+    return_info=False
+)
+
+print('Store outputs', flush=True)
+adata.write_h5ad(par['output'], compression='gzip')
\ No newline at end of file
diff --git a/src/tasks/batch_integration/workflows/process_datasets/config.vsh.yaml b/src/tasks/batch_integration/workflows/process_datasets/config.vsh.yaml
new file mode 100644
index 0000000000..3273e84165
--- /dev/null
+++ b/src/tasks/batch_integration/workflows/process_datasets/config.vsh.yaml
@@ -0,0 +1,30 @@
+functionality:
+  name: "process_datasets"
+  namespace: "batch_integration/workflows"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input"
+          __merge__: "/src/tasks/batch_integration/api/file_common_dataset.yaml"
+          required: true
+          direction: input
+    - name: Outputs
+      arguments:
+        - name: "--output_dataset"
+          __merge__: /src/tasks/batch_integration/api/file_dataset.yaml
+          required: true
+          direction: output
+        - name: "--output_solution"
+          __merge__: /src/tasks/batch_integration/api/file_solution.yaml
+          required: true
+          direction: output
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - path: /src/wf_utils/helper.nf
+  dependencies:
+    - name: common/check_dataset_schema
+    - name: batch_integration/process_dataset
+platforms:
+  - type: nextflow
diff --git a/src/tasks/batch_integration/workflows/process_datasets/main.nf b/src/tasks/batch_integration/workflows/process_datasets/main.nf
new file mode 100644
index 0000000000..59cfee9f47
--- /dev/null
+++ b/src/tasks/batch_integration/workflows/process_datasets/main.nf
@@ -0,0 +1,54 @@
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    | check_dataset_schema.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset": checks["exit_code"] == 0 ? state.input : null,
+        ]
+      }
+    )
+
+    // remove datasets which didn't pass the schema check
+    | filter { id, state ->
+      state.dataset != null
+    }
+
+    | process_dataset.run(
+      fromState: [ input: "dataset" ],
+      toState: [
+        output_dataset: "output_dataset",
+        output_solution: "output_solution"
+      ]
+    )
+
+    // only output the files for which an output file was specified
+    | setState(["output_dataset", "output_solution"])
+
+  emit:
+  output_ch
+}
diff --git a/src/tasks/batch_integration/workflows/process_datasets/run_nextflow.sh b/src/tasks/batch_integration/workflows/process_datasets/run_nextflow.sh
new file mode 100755
index 0000000000..28e9382879
--- /dev/null
+++ b/src/tasks/batch_integration/workflows/process_datasets/run_nextflow.sh
@@ -0,0 +1,25 @@
+#!/bin/bash
+
+# Run this prior to executing this script:
+# bin/viash_build -q 'batch_integration'
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+export NXF_VER=22.04.5
+
+nextflow run . \
+  -main-script target/nextflow/batch_integration/workflows/process_datasets/main.nf \
+  -profile docker \
+  -entry auto \
+  -c src/wf_utils/labels_ci.config \
+  --id resources_test \
+  --input_states "resources_test/common/**/state.yaml" \
+  --rename_keys 'input:output_dataset' \
+  --settings '{"output_dataset": "dataset.h5ad", "output_solution": "solution.h5ad"}' \
+  --publish_dir "output/test"
\ No newline at end of file
diff --git a/src/tasks/batch_integration/workflows/run_benchmark/config.vsh.yaml b/src/tasks/batch_integration/workflows/run_benchmark/config.vsh.yaml
new file mode 100644
index 0000000000..fd6f6811d2
--- /dev/null
+++ b/src/tasks/batch_integration/workflows/run_benchmark/config.vsh.yaml
@@ -0,0 +1,115 @@
+functionality:
+  name: "run_benchmark"
+  namespace: "batch_integration/workflows"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input_dataset"
+          __merge__: /src/tasks/batch_integration/api/file_dataset.yaml
+          required: true
+          direction: input
+        - name: "--input_solution"
+          __merge__: /src/tasks/batch_integration/api/file_solution.yaml
+          required: true
+          direction: input
+    - name: Outputs
+      arguments:
+        - name: "--output_scores"
+          type: file
+          required: true
+          direction: output
+          description: A yaml file containing the scores of each of the methods
+          default: score_uns.yaml
+        - name: "--output_method_configs"
+          type: file
+          required: true
+          direction: output
+          default: method_configs.yaml
+        - name: "--output_metric_configs"
+          type: file
+          required: true
+          direction: output
+          default: metric_configs.yaml
+        - name: "--output_dataset_info"
+          type: file
+          required: true
+          direction: output
+          default: dataset_uns.yaml
+        - name: "--output_task_info"
+          type: file
+          required: true
+          direction: output
+          default: task_info.yaml
+    - name: Methods
+      arguments:
+        - name: "--method_ids"
+          type: string
+          multiple: true
+          description: A list of method ids to run. If not specified, all methods will be run.
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - type: file
+      path: ../../api/task_info.yaml
+  dependencies: 
+    - name: common/check_dataset_schema
+    - name: common/extract_metadata
+    - name: batch_integration/methods/bbknn
+    - name: batch_integration/methods/combat
+    - name: batch_integration/methods/fastmnn_embedding
+    - name: batch_integration/methods/fastmnn_feature
+    - name: batch_integration/methods/liger
+    - name: batch_integration/methods/mnn_correct
+    - name: batch_integration/methods/mnnpy
+    - name: batch_integration/methods/pyliger
+    - name: batch_integration/methods/scalex_embed
+    - name: batch_integration/methods/scalex_feature
+    - name: batch_integration/methods/scanorama_embed
+    - name: batch_integration/methods/scanorama_feature
+    - name: batch_integration/methods/scanvi
+    - name: batch_integration/methods/scvi
+    - name: batch_integration/control_methods/no_integration/batch_embed
+      alias: no_integration_batch_embed
+    - name: batch_integration/control_methods/no_integration/global_embed
+      alias: no_integration_global_embed
+    - name: batch_integration/control_methods/no_integration/global_feature
+      alias: no_integration_global_feature
+    - name: batch_integration/control_methods/no_integration/global_graph
+      alias: no_integration_global_graph
+    - name: batch_integration/control_methods/perfect_integration/celltype_embed
+      alias: perfect_integration_celltype_embed
+    - name: batch_integration/control_methods/perfect_integration/celltype_jitter_embed
+      alias: perfect_integration_celltype_jitter_embed
+    - name: batch_integration/control_methods/random_integration/batch_embed
+      alias: random_integration_batch_embed
+    - name: batch_integration/control_methods/random_integration/batch_feature
+      alias: random_integration_batch_feature
+    - name: batch_integration/control_methods/random_integration/batch_graph
+      alias: random_integration_batch_graph
+    - name: batch_integration/control_methods/random_integration/celltype_embed
+      alias: random_integration_celltype_embed
+    - name: batch_integration/control_methods/random_integration/celltype_feature
+      alias: random_integration_celltype_feature
+    - name: batch_integration/control_methods/random_integration/celltype_graph
+      alias: random_integration_celltype_graph
+    - name: batch_integration/control_methods/random_integration/global_embed
+      alias: random_integration_global_embed
+    - name: batch_integration/control_methods/random_integration/global_feature
+      alias: random_integration_global_feature
+    - name: batch_integration/control_methods/random_integration/global_graph
+      alias: random_integration_global_graph
+    - name: batch_integration/transformers/feature_to_embed
+    - name: batch_integration/transformers/embed_to_graph
+    - name: batch_integration/metrics/asw_batch
+    - name: batch_integration/metrics/asw_label
+    - name: batch_integration/metrics/cell_cycle_conservation
+    - name: batch_integration/metrics/clustering_overlap
+    - name: batch_integration/metrics/graph_connectivity
+    - name: batch_integration/metrics/hvg_overlap
+    - name: batch_integration/metrics/isolated_label_asw
+    - name: batch_integration/metrics/isolated_label_f1
+    - name: batch_integration/metrics/kbet
+    - name: batch_integration/metrics/pcr
+platforms:
+  - type: nextflow
diff --git a/src/tasks/batch_integration/workflows/run_benchmark/main.nf b/src/tasks/batch_integration/workflows/run_benchmark/main.nf
new file mode 100644
index 0000000000..d86293f2a5
--- /dev/null
+++ b/src/tasks/batch_integration/workflows/run_benchmark/main.nf
@@ -0,0 +1,258 @@
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // construct list of methods
+  methods = [
+    bbknn,
+    combat,
+    fastmnn_embedding,
+    fastmnn_feature,
+    liger,
+    mnn_correct,
+    mnnpy,
+    pyliger,
+    scalex_embed,
+    scalex_feature,
+    scanorama_embed,
+    scanorama_feature,
+    scanvi,
+    scvi,
+    no_integration_batch_embed,
+    no_integration_global_embed,
+    no_integration_global_feature,
+    no_integration_global_graph,
+    perfect_integration_celltype_embed,
+    perfect_integration_celltype_jitter_embed,
+    random_integration_batch_embed,
+    random_integration_batch_feature,
+    random_integration_batch_graph,
+    random_integration_celltype_embed,
+    random_integration_celltype_feature,
+    random_integration_celltype_graph,
+    random_integration_global_embed,
+    random_integration_global_feature,
+    random_integration_global_graph,
+  ]
+
+  // construct list of metrics
+  metrics = [
+    asw_batch,
+    asw_label,
+    cell_cycle_conservation,
+    clustering_overlap,
+    graph_connectivity,
+    hvg_overlap,
+    isolated_label_asw,
+    isolated_label_f1,
+    kbet,
+    pcr
+  ]
+
+  /****************************
+   * EXTRACT DATASET METADATA *
+   ****************************/
+  dataset_ch = input_ch
+    // store join id
+    | map{ id, state -> 
+      [id, state + ["_meta": [join_id: id]]]
+    }
+    
+    // extract the dataset metadata
+    | extract_metadata.run(
+      fromState: [input: "input_solution"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+  /***************************
+   * RUN METHODS AND METRICS *
+   ***************************/
+  // run all methods
+  method_out_ch1 = dataset_ch
+    | runEach(
+      components: methods,
+
+      // use the 'filter' argument to only run a method on the normalisation the component is asking for
+      filter: { id, state, comp ->
+        def norm = state.dataset_uns.normalization_id
+        def pref = comp.config.functionality.info.preferred_normalization
+        // if the preferred normalisation is none at all,
+        // we can pass whichever dataset we want
+        def norm_check = (norm == "log_cp10k" && pref == "counts") || norm == pref
+        def method_check = !state.method_ids || state.method_ids.contains(comp.config.functionality.name)
+
+        method_check && norm_check
+      },
+
+      // define a new 'id' by appending the method name to the dataset id
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: [input: "input_dataset"],
+
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          method_id: comp.config.functionality.name,
+          method_output: output.output,
+          method_subtype: comp.config.functionality.info.subtype
+        ]
+      }
+    )
+  
+
+  // append feature->embed transformations
+  method_out_ch2 = method_out_ch1
+    | runEach(
+      components: feature_to_embed,
+      id: { id, state, comp ->
+        id + "_f2e"
+      },
+      filter: { id, state, comp -> state.method_subtype == "feature"},
+      fromState: [ input: "method_output" ],
+      toState: { id, output, state, comp ->
+        state + [
+          method_output: output.output,
+          method_subtype: comp.config.functionality.info.subtype
+        ]
+      }
+    )
+    | mix(method_out_ch1)
+
+  // append embed->graph transformations
+  method_out_ch3 = method_out_ch2
+    | runEach(
+      components: embed_to_graph,
+      id: { id, state, comp ->
+        id + "_e2g"
+      },
+      filter: { id, state, comp -> state.method_subtype == "embedding"},
+      fromState: [ input: "method_output" ],
+      toState: { id, output, state, comp ->
+        state + [
+          method_output: output.output,
+          method_subtype: comp.config.functionality.info.subtype
+        ]
+      }
+    )
+    | mix(method_out_ch2)
+
+  // run metrics
+  score_ch = method_out_ch3
+    | runEach(
+      components: metrics,
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+      filter: { id, state, comp ->
+        state.method_subtype == comp.config.functionality.info.subtype
+      },
+      fromState: [
+        input_integrated: "method_output",
+        input_solution: "input_solution"
+      ],
+      toState: { id, output, state, comp ->
+        state + [
+          metric_id: comp.config.functionality.name,
+          metric_output: output.output
+        ]
+      }
+    )
+
+
+  /******************************
+   * GENERATE OUTPUT YAML FILES *
+   ******************************/
+  // TODO: can we store everything below in a separate helper function?
+
+  // extract the dataset metadata
+  dataset_meta_ch = dataset_ch
+    // only keep one of the normalization methods
+    | filter{ id, state ->
+      state.dataset_uns.normalization_id == "log_cp10k"
+    }
+    | joinStates { ids, states ->
+      // store the dataset metadata in a file
+      def dataset_uns = states.collect{state ->
+        def uns = state.dataset_uns.clone()
+        uns.remove("normalization_id")
+        uns
+      }
+      def dataset_uns_yaml_blob = toYamlBlob(dataset_uns)
+      def dataset_uns_file = tempFile("dataset_uns.yaml")
+      dataset_uns_file.write(dataset_uns_yaml_blob)
+
+      ["output", [output_dataset_info: dataset_uns_file]]
+    }
+
+  output_ch = score_ch
+    // extract scores
+    | extract_metadata.run(
+      key: "extract_scores",
+      fromState: [input: "metric_output"],
+      toState: { id, output, state ->
+        state + [
+          score_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | joinStates { ids, states ->
+      // store the method configs in a file
+      def method_configs = methods.collect{it.config}
+      def method_configs_yaml_blob = toYamlBlob(method_configs)
+      def method_configs_file = tempFile("method_configs.yaml")
+      method_configs_file.write(method_configs_yaml_blob)
+
+      // store the metric configs in a file
+      def metric_configs = metrics.collect{it.config}
+      def metric_configs_yaml_blob = toYamlBlob(metric_configs)
+      def metric_configs_file = tempFile("metric_configs.yaml")
+      metric_configs_file.write(metric_configs_yaml_blob)
+
+      // store the task info in a file
+      def task_info_file = meta.resources_dir.resolve("task_info.yaml")
+
+      // store the scores in a file
+      def score_uns = states.collect{it.score_uns}
+      def score_uns_yaml_blob = toYamlBlob(score_uns)
+      def score_uns_file = tempFile("score_uns.yaml")
+      score_uns_file.write(score_uns_yaml_blob)
+
+      // create state
+      def new_state = [
+        output_method_configs: method_configs_file,
+        output_metric_configs: metric_configs_file,
+        output_task_info: task_info_file,
+        output_scores: score_uns_file,
+        _meta: states[0]._meta
+      ]
+
+      ["output", new_state]
+    }
+
+    // merge all of the output data 
+    | mix(dataset_meta_ch)
+    | joinStates{ ids, states ->
+      def mergedStates = states.inject([:]) { acc, m -> acc + m }
+      [ids[0], mergedStates]
+    }
+
+  emit:
+  output_ch
+}
diff --git a/src/tasks/batch_integration/workflows/run_benchmark/run_test.sh b/src/tasks/batch_integration/workflows/run_benchmark/run_test.sh
new file mode 100755
index 0000000000..a24ebb706f
--- /dev/null
+++ b/src/tasks/batch_integration/workflows/run_benchmark/run_test.sh
@@ -0,0 +1,31 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+# export TOWER_WORKSPACE_ID=53907369739130
+
+DATASETS_DIR="resources_test/batch_integration"
+OUTPUT_DIR="output/temp"
+
+if [ ! -d "$OUTPUT_DIR" ]; then
+  mkdir -p "$OUTPUT_DIR"
+fi
+
+export NXF_VER=22.04.5
+nextflow run . \
+  -main-script target/nextflow/batch_integration/workflows/run_benchmark/main.nf \
+  -profile docker \
+  -resume \
+  -c src/wf_utils/labels_ci.config \
+  -entry auto \
+  --input_states "$DATASETS_DIR/**/state.yaml" \
+  --rename_keys 'input_dataset:output_dataset,input_solution:output_solution' \
+  --settings '{"output_scores": "scores.yaml", "output_dataset_info": "dataset_info.yaml", "output_method_configs": "method_configs.yaml", "output_metric_configs": "metric_configs.yaml", "output_task_info": "task_info.yaml"}' \
+  --publish_dir "$OUTPUT_DIR" \
+  --output_state "state.yaml"
\ No newline at end of file
diff --git a/src/tasks/denoising/README.md b/src/tasks/denoising/README.md
new file mode 100644
index 0000000000..5f33715180
--- /dev/null
+++ b/src/tasks/denoising/README.md
@@ -0,0 +1,357 @@
+# Denoising
+
+
+Removing noise in sparse single-cell RNA-sequencing count data
+
+Path:
+[`src/tasks/denoising`](https://github.com/openproblems-bio/openproblems/tree/main/src/tasks/denoising)
+
+## Motivation
+
+Single-cell RNA-Seq protocols only detect a fraction of the mRNA
+molecules present in each cell. As a result, the measurements (UMI
+counts) observed for each gene and each cell are associated with
+generally high levels of technical noise ([Grün et al.,
+2014](https://www.nature.com/articles/nmeth.2930)). Denoising describes
+the task of estimating the true expression level of each gene in each
+cell. In the single-cell literature, this task is also referred to as
+*imputation*, a term which is typically used for missing data problems
+in statistics. Similar to the use of the terms “dropout”, “missing
+data”, and “technical zeros”, this terminology can create confusion
+about the underlying measurement process ([Sarkar and Stephens,
+2020](https://www.biorxiv.org/content/10.1101/2020.04.07.030007v2)).
+
+## Description
+
+A key challenge in evaluating denoising methods is the general lack of a
+ground truth. A recent benchmark study ([Hou et al.,
+2020](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02132-x))
+relied on flow-sorted datasets, mixture control experiments ([Tian et
+al., 2019](https://www.nature.com/articles/s41592-019-0425-8)), and
+comparisons with bulk RNA-Seq data. Since each of these approaches
+suffers from specific limitations, it is difficult to combine these
+different approaches into a single quantitative measure of denoising
+accuracy. Here, we instead rely on an approach termed molecular
+cross-validation (MCV), which was specifically developed to quantify
+denoising accuracy in the absence of a ground truth ([Batson et al.,
+2019](https://www.biorxiv.org/content/10.1101/786269v1)). In MCV, the
+observed molecules in a given scRNA-Seq dataset are first partitioned
+between a *training* and a *test* dataset. Next, a denoising method is
+applied to the training dataset. Finally, denoising accuracy is measured
+by comparing the result to the test dataset. The authors show that both
+in theory and in practice, the measured denoising accuracy is
+representative of the accuracy that would be obtained on a ground truth
+dataset.
+
+## Authors & contributors
+
+| name              | roles              |
+|:------------------|:-------------------|
+| Wesley Lewis      | author, maintainer |
+| Scott Gigante     | author, maintainer |
+| Robrecht Cannoodt | author             |
+| Kai Waldrant      | author             |
+
+## API
+
+``` mermaid
+flowchart LR
+  file_common_dataset("Common dataset")
+  comp_process_dataset[/"Data processor"/]
+  file_train("Training data")
+  file_test("Test data")
+  comp_control_method[/"Control method"/]
+  comp_method[/"Method"/]
+  comp_metric[/"Metric"/]
+  file_denoised("Denoised data")
+  file_score("Score")
+  file_common_dataset---comp_process_dataset
+  comp_process_dataset-->file_train
+  comp_process_dataset-->file_test
+  file_train---comp_control_method
+  file_train---comp_method
+  file_test---comp_control_method
+  file_test---comp_metric
+  comp_control_method-->file_denoised
+  comp_method-->file_denoised
+  comp_metric-->file_score
+  file_denoised---comp_metric
+```
+
+## File format: Common dataset
+
+A dataset processed by the common dataset processing pipeline.
+
+Example file: `resources_test/common/pancreas/dataset.h5ad`
+
+Description:
+
+This dataset contains both raw counts and normalized data matrices, as
+well as a PCA embedding, HVG selection and a kNN graph.
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'organism', 'organism_ontology_term_id', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_general', 'tissue_general_ontology_term_id', 'batch', 'soma_joinid', 'size_factors'
+     var: 'feature_id', 'feature_name', 'soma_joinid', 'hvg', 'hvg_score'
+     obsm: 'X_pca'
+     obsp: 'knn_distances', 'knn_connectivities'
+     varm: 'pca_loadings'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id', 'pca_variance', 'knn'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                                              | Type      | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+|:--------------------------------------------------|:----------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `obs["dataset_id"]`                               | `string`  | (*Optional*) Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.                                                                                                                                                                                                                                                                                                                                                                    |
+| `obs["assay"]`                                    | `string`  | (*Optional*) Type of assay used to generate the cell data, indicating the methodology or technique employed.                                                                                                                                                                                                                                                                                                                                                                                  |
+| `obs["assay_ontology_term_id"]`                   | `string`  | (*Optional*) Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.                                                                                                                                                                                                                                                                                                                                                       |
+| `obs["cell_type"]`                                | `string`  | (*Optional*) Classification of the cell type based on its characteristics and function within the tissue or organism.                                                                                                                                                                                                                                                                                                                                                                         |
+| `obs["cell_type_ontology_term_id"]`               | `string`  | (*Optional*) Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.                                                                                                                                                                                                                                                                                                                                                  |
+| `obs["development_stage"]`                        | `string`  | (*Optional*) Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.                                                                                                                                                                                                                                                                                                                                                   |
+| `obs["development_stage_ontology_term_id"]`       | `string`  | (*Optional*) Ontology term identifier for the developmental stage, providing a standardized reference to the organism’s developmental phase. If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used. If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used. Otherwise, the Uberon (`UBERON:`) ontology is used. |
+| `obs["disease"]`                                  | `string`  | (*Optional*) Information on any disease or pathological condition associated with the cell or donor.                                                                                                                                                                                                                                                                                                                                                                                          |
+| `obs["disease_ontology_term_id"]`                 | `string`  | (*Optional*) Ontology term identifier for the disease, enabling standardized disease classification and referencing. Must be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).                                                                                                                                                                                                                              |
+| `obs["donor_id"]`                                 | `string`  | (*Optional*) Identifier for the donor from whom the cell sample is obtained.                                                                                                                                                                                                                                                                                                                                                                                                                  |
+| `obs["is_primary_data"]`                          | `boolean` | (*Optional*) Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.                                                                                                                                                                                                                                                                                                                                          |
+| `obs["organism"]`                                 | `string`  | (*Optional*) Organism from which the cell sample is obtained.                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| `obs["organism_ontology_term_id"]`                | `string`  | (*Optional*) Ontology term identifier for the organism, providing a standardized reference for the organism. Must be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.                                                                                                                                                                                                                                                                             |
+| `obs["self_reported_ethnicity"]`                  | `string`  | (*Optional*) Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.                                                                                                                                                                                                                                                                                                                                                      |
+| `obs["self_reported_ethnicity_ontology_term_id"]` | `string`  | (*Optional*) Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications. If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.                                                                                                                                                                                                                    |
+| `obs["sex"]`                                      | `string`  | (*Optional*) Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.                                                                                                                                                                                                                                                                                                                                                                 |
+| `obs["sex_ontology_term_id"]`                     | `string`  | (*Optional*) Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.                                                                                                                                                                                                                                                                                                                |
+| `obs["suspension_type"]`                          | `string`  | (*Optional*) Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.                                                                                                                                                                                                                                                                                                                                              |
+| `obs["tissue"]`                                   | `string`  | (*Optional*) Specific tissue from which the cells were derived, key for context and specificity in cell studies.                                                                                                                                                                                                                                                                                                                                                                              |
+| `obs["tissue_ontology_term_id"]`                  | `string`  | (*Optional*) Ontology term identifier for the tissue, providing a standardized reference for the tissue type. For organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity). For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.                                                                                              |
+| `obs["tissue_general"]`                           | `string`  | (*Optional*) General category or classification of the tissue, useful for broader grouping and comparison of cell data.                                                                                                                                                                                                                                                                                                                                                                       |
+| `obs["tissue_general_ontology_term_id"]`          | `string`  | (*Optional*) Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types. For organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity). For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.                                                                                  |
+| `obs["batch"]`                                    | `string`  | (*Optional*) A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.                                                                                                                                                                                                                                                                                                                                                              |
+| `obs["soma_joinid"]`                              | `integer` | (*Optional*) If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.                                                                                                                                                                                                                                                                                                                                                                                    |
+| `obs["size_factors"]`                             | `double`  | (*Optional*) The size factors created by the normalisation method, if any.                                                                                                                                                                                                                                                                                                                                                                                                                    |
+| `var["feature_id"]`                               | `string`  | (*Optional*) Unique identifier for the feature, usually a ENSEMBL gene id.                                                                                                                                                                                                                                                                                                                                                                                                                    |
+| `var["feature_name"]`                             | `string`  | A human-readable name for the feature, usually a gene symbol.                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| `var["soma_joinid"]`                              | `integer` | (*Optional*) If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.                                                                                                                                                                                                                                                                                                                                                                                 |
+| `var["hvg"]`                                      | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’.                                                                                                                                                                                                                                                                                                                                                                                                                      |
+| `var["hvg_score"]`                                | `double`  | A score for the feature indicating how highly variable it is.                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| `obsm["X_pca"]`                                   | `double`  | The resulting PCA embedding.                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
+| `obsp["knn_distances"]`                           | `double`  | K nearest neighbors distance matrix.                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+| `obsp["knn_connectivities"]`                      | `double`  | K nearest neighbors connectivities matrix.                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+| `varm["pca_loadings"]`                            | `double`  | The PCA loadings matrix.                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+| `layers["counts"]`                                | `integer` | Raw counts.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+| `layers["normalized"]`                            | `double`  | Normalised expression values.                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| `uns["dataset_id"]`                               | `string`  | A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.                                                                                                                                                                                                                                                                                                                          |
+| `uns["dataset_name"]`                             | `string`  | A human-readable name for the dataset.                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+| `uns["dataset_url"]`                              | `string`  | (*Optional*) Link to the original source of the dataset.                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+| `uns["dataset_reference"]`                        | `string`  | (*Optional*) Bibtex reference of the paper in which the dataset was published.                                                                                                                                                                                                                                                                                                                                                                                                                |
+| `uns["dataset_summary"]`                          | `string`  | Short description of the dataset.                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+| `uns["dataset_description"]`                      | `string`  | Long description of the dataset.                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+| `uns["dataset_organism"]`                         | `string`  | (*Optional*) The organism of the sample in the dataset.                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+| `uns["normalization_id"]`                         | `string`  | Which normalization was used.                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| `uns["pca_variance"]`                             | `double`  | The PCA variance objects.                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+| `uns["knn"]`                                      | `object`  | Supplementary K nearest neighbors data.                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+
+</div>
+
+## Component type: Data processor
+
+Path:
+[`src/denoising`](https://github.com/openproblems-bio/openproblems/tree/main/src/denoising)
+
+A denoising dataset processor.
+
+Arguments:
+
+<div class="small">
+
+| Name             | Type   | Description                                                       |
+|:-----------------|:-------|:------------------------------------------------------------------|
+| `--input`        | `file` | A dataset processed by the common dataset processing pipeline.    |
+| `--output_train` | `file` | (*Output*) The subset of molecules used for the training dataset. |
+| `--output_test`  | `file` | (*Output*) The subset of molecules used for the test dataset.     |
+
+</div>
+
+## File format: Training data
+
+The subset of molecules used for the training dataset
+
+Example file: `resources_test/denoising/pancreas/train.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     layers: 'counts'
+     uns: 'dataset_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                | Type      | Description                          |
+|:--------------------|:----------|:-------------------------------------|
+| `layers["counts"]`  | `integer` | Raw counts.                          |
+| `uns["dataset_id"]` | `string`  | A unique identifier for the dataset. |
+
+</div>
+
+## File format: Test data
+
+The subset of molecules used for the test dataset
+
+Example file: `resources_test/denoising/pancreas/test.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     layers: 'counts'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'train_sum'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                         | Type      | Description                                                                    |
+|:-----------------------------|:----------|:-------------------------------------------------------------------------------|
+| `layers["counts"]`           | `integer` | Raw counts.                                                                    |
+| `uns["dataset_id"]`          | `string`  | A unique identifier for the dataset.                                           |
+| `uns["dataset_name"]`        | `string`  | Nicely formatted name.                                                         |
+| `uns["dataset_url"]`         | `string`  | (*Optional*) Link to the original source of the dataset.                       |
+| `uns["dataset_reference"]`   | `string`  | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]`     | `string`  | Short description of the dataset.                                              |
+| `uns["dataset_description"]` | `string`  | Long description of the dataset.                                               |
+| `uns["dataset_organism"]`    | `string`  | (*Optional*) The organism of the sample in the dataset.                        |
+| `uns["train_sum"]`           | `integer` | The total number of counts in the training dataset.                            |
+
+</div>
+
+## Component type: Control method
+
+Path:
+[`src/denoising/control_methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/denoising/control_methods)
+
+Quality control methods for verifying the pipeline.
+
+Arguments:
+
+<div class="small">
+
+| Name            | Type   | Description                                                    |
+|:----------------|:-------|:---------------------------------------------------------------|
+| `--input_train` | `file` | The subset of molecules used for the training dataset.         |
+| `--input_test`  | `file` | The subset of molecules used for the test dataset.             |
+| `--output`      | `file` | (*Output*) A denoised dataset as output by a denoising method. |
+
+</div>
+
+## Component type: Method
+
+Path:
+[`src/denoising/methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/denoising/methods)
+
+A denoising method.
+
+Arguments:
+
+<div class="small">
+
+| Name            | Type   | Description                                                    |
+|:----------------|:-------|:---------------------------------------------------------------|
+| `--input_train` | `file` | The subset of molecules used for the training dataset.         |
+| `--output`      | `file` | (*Output*) A denoised dataset as output by a denoising method. |
+
+</div>
+
+## Component type: Metric
+
+Path:
+[`src/denoising/metrics`](https://github.com/openproblems-bio/openproblems/tree/main/src/denoising/metrics)
+
+A denoising metric.
+
+Arguments:
+
+<div class="small">
+
+| Name               | Type   | Description                                         |
+|:-------------------|:-------|:----------------------------------------------------|
+| `--input_test`     | `file` | The subset of molecules used for the test dataset.  |
+| `--input_denoised` | `file` | A denoised dataset as output by a denoising method. |
+| `--output`         | `file` | (*Output*) Metric score file.                       |
+
+</div>
+
+## File format: Denoised data
+
+A denoised dataset as output by a denoising method.
+
+Example file: `resources_test/denoising/pancreas/denoised.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     layers: 'denoised'
+     uns: 'dataset_id', 'method_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                 | Type      | Description                          |
+|:---------------------|:----------|:-------------------------------------|
+| `layers["denoised"]` | `integer` | denoised data.                       |
+| `uns["dataset_id"]`  | `string`  | A unique identifier for the dataset. |
+| `uns["method_id"]`   | `string`  | A unique identifier for the method.  |
+
+</div>
+
+## File format: Score
+
+NA
+
+Example file: `resources_test/denoising/pancreas/score.h5ad`
+
+Description:
+
+Metric score file
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                   | Type     | Description                                                                                  |
+|:-----------------------|:---------|:---------------------------------------------------------------------------------------------|
+| `uns["dataset_id"]`    | `string` | A unique identifier for the dataset.                                                         |
+| `uns["method_id"]`     | `string` | A unique identifier for the method.                                                          |
+| `uns["metric_ids"]`    | `string` | One or more unique metric identifiers.                                                       |
+| `uns["metric_values"]` | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |
+
+</div>
+
diff --git a/src/tasks/denoising/api/comp_control_method.yaml b/src/tasks/denoising/api/comp_control_method.yaml
new file mode 100644
index 0000000000..6fe13f2a35
--- /dev/null
+++ b/src/tasks/denoising/api/comp_control_method.yaml
@@ -0,0 +1,33 @@
+functionality:
+  namespace: "denoising/control_methods"
+  info:
+    type: control_method
+    type_info:
+      label: Control method
+      summary: Quality control methods for verifying the pipeline.
+      description: |
+        These components have the same interface as the regular methods
+        but also receive the solution object as input. It serves as a
+        starting point to test the relative accuracy of new methods in
+        the task, and also as a quality control for the metrics defined
+        in the task. 
+  arguments:
+    - name: "--input_train"
+      __merge__: file_train.yaml
+      direction: input
+      required: true
+    - name: "--input_test"
+      __merge__: file_test.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_denoised.yaml
+      direction: output
+      required: true
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/denoising/pancreas
+      dest: resources_test/denoising/pancreas
\ No newline at end of file
diff --git a/src/tasks/denoising/api/comp_method.yaml b/src/tasks/denoising/api/comp_method.yaml
new file mode 100644
index 0000000000..517723772d
--- /dev/null
+++ b/src/tasks/denoising/api/comp_method.yaml
@@ -0,0 +1,26 @@
+functionality:
+  namespace: "denoising/methods"
+  info:
+    type: method
+    type_info:
+      label: Method
+      summary: A denoising method.
+      description: |
+        A denoising method to remove noise (i.e. technical artifacts) from a dataset.
+  arguments:
+    - name: "--input_train"
+      __merge__: file_train.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_denoised.yaml
+      direction: output
+      required: true
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/denoising/pancreas
+      dest: resources_test/denoising/pancreas
+    - path: /src/common/library.bib
\ No newline at end of file
diff --git a/src/tasks/denoising/api/comp_metric.yaml b/src/tasks/denoising/api/comp_metric.yaml
new file mode 100644
index 0000000000..c2ef922239
--- /dev/null
+++ b/src/tasks/denoising/api/comp_metric.yaml
@@ -0,0 +1,31 @@
+functionality:
+  namespace: "denoising/metrics"
+  info:
+    type: metric
+    type_info:
+      label: Metric
+      summary: A denoising metric.
+      description: |
+        A metric for evaluating denoised datasets.
+  arguments:
+    - name: "--input_test"
+      __merge__: file_test.yaml
+      direction: input
+      required: true
+    - name: "--input_denoised"
+      __merge__: file_denoised.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_score.yaml
+      direction: output
+      required: true
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/check_metric_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/denoising/pancreas
+      dest: resources_test/denoising/pancreas
+    - path: /src/common/library.bib
+        
\ No newline at end of file
diff --git a/src/tasks/denoising/api/comp_process_dataset.yaml b/src/tasks/denoising/api/comp_process_dataset.yaml
new file mode 100644
index 0000000000..ce6874c0ea
--- /dev/null
+++ b/src/tasks/denoising/api/comp_process_dataset.yaml
@@ -0,0 +1,27 @@
+functionality:
+  namespace: "denoising"
+  info:
+    type: process_dataset
+    type_info:
+      label: Data processor
+      summary: A denoising dataset processor.
+      description: |
+        A component for processing a Common Dataset into a task-specific dataset.
+  arguments:
+    - name: "--input"
+      __merge__: /src/datasets/api/file_common_dataset.yaml
+      direction: input
+      required: true
+    - name: "--output_train"
+      __merge__: file_train.yaml
+      direction: output
+      required: true
+    - name: "--output_test"
+      __merge__: file_test.yaml
+      direction: output
+      required: true
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/common/pancreas
+      dest: resources_test/common/pancreas
diff --git a/src/tasks/denoising/api/file_common_dataset.yaml b/src/tasks/denoising/api/file_common_dataset.yaml
new file mode 100644
index 0000000000..ff913ce0de
--- /dev/null
+++ b/src/tasks/denoising/api/file_common_dataset.yaml
@@ -0,0 +1,40 @@
+type: file
+example: "resources_test/common/pancreas/dataset.h5ad"
+info:
+  label: "Common Dataset"
+  summary: A subset of the common dataset.
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
diff --git a/src/tasks/denoising/api/file_denoised.yaml b/src/tasks/denoising/api/file_denoised.yaml
new file mode 100644
index 0000000000..fc79694028
--- /dev/null
+++ b/src/tasks/denoising/api/file_denoised.yaml
@@ -0,0 +1,21 @@
+type: file
+example: "resources_test/denoising/pancreas/denoised.h5ad"
+info:
+  label: "Denoised data"
+  summary: A denoised dataset as output by a denoising method.
+  slots:
+    layers:
+      - type: integer
+        name: denoised
+        description: denoised data
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: method_id
+        description: "A unique identifier for the method"
+        required: true
+      
\ No newline at end of file
diff --git a/src/tasks/denoising/api/file_score.yaml b/src/tasks/denoising/api/file_score.yaml
new file mode 100644
index 0000000000..4f34eeb7f7
--- /dev/null
+++ b/src/tasks/denoising/api/file_score.yaml
@@ -0,0 +1,21 @@
+type: file
+description: "Metric score file"
+example: "resources_test/denoising/pancreas/score.h5ad"
+info:
+  label: "Score"
+  slots:
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+      - type: string
+        name: method_id
+        description: "A unique identifier for the method"
+      - type: string
+        name: metric_ids
+        description: "One or more unique metric identifiers"
+        multiple: true
+      - type: double
+        name: metric_values
+        description: "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'."
+        multiple: true
diff --git a/src/tasks/denoising/api/file_test.yaml b/src/tasks/denoising/api/file_test.yaml
new file mode 100644
index 0000000000..371b3054f7
--- /dev/null
+++ b/src/tasks/denoising/api/file_test.yaml
@@ -0,0 +1,44 @@
+type: file
+example: "resources_test/denoising/pancreas/test.h5ad"
+info:
+  label: "Test data"
+  summary: The subset of molecules used for the test dataset
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - name: train_sum
+        type: integer
+        description: The total number of counts in the training dataset.
+        required: true
\ No newline at end of file
diff --git a/src/tasks/denoising/api/file_train.yaml b/src/tasks/denoising/api/file_train.yaml
new file mode 100644
index 0000000000..302eae2d5c
--- /dev/null
+++ b/src/tasks/denoising/api/file_train.yaml
@@ -0,0 +1,16 @@
+type: file
+example: "resources_test/denoising/pancreas/train.h5ad"
+info:
+  label: "Training data"
+  summary: The subset of molecules used for the training dataset
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
\ No newline at end of file
diff --git a/src/tasks/denoising/api/task_info.yaml b/src/tasks/denoising/api/task_info.yaml
new file mode 100644
index 0000000000..f7de1118f2
--- /dev/null
+++ b/src/tasks/denoising/api/task_info.yaml
@@ -0,0 +1,54 @@
+name: denoising
+label: Denoising
+v1:
+  path: openproblems/tasks/denoising/README.md
+  commit: 3fe9251ba906061b6769eed2ac9da0db5f8e26bb
+summary: "Removing noise in sparse single-cell RNA-sequencing count data"
+image: "thumbnail.svg"
+motivation: |
+  Single-cell RNA-Seq protocols only detect a fraction of the mRNA molecules present
+  in each cell. As a result, the measurements (UMI counts) observed for each gene and each
+  cell are associated with generally high levels of technical noise ([Grün et al.,
+  2014](https://www.nature.com/articles/nmeth.2930)). Denoising describes the task of
+  estimating the true expression level of each gene in each cell. In the single-cell
+  literature, this task is also referred to as *imputation*, a term which is typically
+  used for missing data problems in statistics. Similar to the use of the terms "dropout",
+  "missing data", and "technical zeros", this terminology can create confusion about the
+  underlying measurement process ([Sarkar and Stephens,
+  2020](https://www.biorxiv.org/content/10.1101/2020.04.07.030007v2)).
+description: |
+  A key challenge in evaluating denoising methods is the general lack of a ground truth. A
+  recent benchmark study ([Hou et al.,
+  2020](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02132-x))
+  relied on flow-sorted datasets, mixture control experiments ([Tian et al.,
+  2019](https://www.nature.com/articles/s41592-019-0425-8)), and comparisons with bulk
+  RNA-Seq data. Since each of these approaches suffers from specific limitations, it is
+  difficult to combine these different approaches into a single quantitative measure of
+  denoising accuracy. Here, we instead rely on an approach termed molecular
+  cross-validation (MCV), which was specifically developed to quantify denoising accuracy
+  in the absence of a ground truth ([Batson et al.,
+  2019](https://www.biorxiv.org/content/10.1101/786269v1)). In MCV, the observed molecules
+  in a given scRNA-Seq dataset are first partitioned between a *training* and a *test*
+  dataset. Next, a denoising method is applied to the training dataset. Finally, denoising
+  accuracy is measured by comparing the result to the test dataset. The authors show that
+  both in theory and in practice, the measured denoising accuracy is representative of the
+  accuracy that would be obtained on a ground truth dataset.
+authors:
+  - name: "Wesley Lewis"
+    roles: [ author, maintainer ]
+    info:
+      github: wes-lewis
+  - name: "Scott Gigante"
+    roles: [ author, maintainer ]
+    info:
+      github: scottgigante
+      orcid: "0000-0002-4544-2764"
+  - name: Robrecht Cannoodt
+    roles: [ author ]
+    info:
+      github: rcannood
+      orcid: "0000-0003-3641-729X"
+  - name: Kai Waldrant
+    roles: [ author ]
+    info:
+      github: KaiWaldrant
\ No newline at end of file
diff --git a/src/tasks/denoising/api/thumbnail.svg b/src/tasks/denoising/api/thumbnail.svg
new file mode 100644
index 0000000000..65936f0e1e
--- /dev/null
+++ b/src/tasks/denoising/api/thumbnail.svg
@@ -0,0 +1 @@
+<?xml version="1.0" encoding="UTF-8"?><svg id="Layer_1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 600 200"><defs><style>.cls-1{stroke:#211f1f;stroke-width:3px;}.cls-1,.cls-2{fill:none;stroke-miterlimit:10;}.cls-3{fill:#211f1f;}.cls-4{font-family:ArialMT, Arial;font-size:16px;}.cls-2{stroke:#231f20;stroke-width:2px;}.cls-5{fill:#67569e;stroke:#67569e;}.cls-5,.cls-6,.cls-7,.cls-8,.cls-9,.cls-10,.cls-11,.cls-12,.cls-13,.cls-14,.cls-15,.cls-16,.cls-17,.cls-18,.cls-19,.cls-20,.cls-21,.cls-22,.cls-23,.cls-24,.cls-25,.cls-26,.cls-27,.cls-28,.cls-29,.cls-30,.cls-31,.cls-32,.cls-33,.cls-34,.cls-35,.cls-36,.cls-37,.cls-38,.cls-39,.cls-40,.cls-41,.cls-42,.cls-43,.cls-44,.cls-45,.cls-46,.cls-47,.cls-48,.cls-49,.cls-50,.cls-51,.cls-52,.cls-53,.cls-54,.cls-55,.cls-56,.cls-57,.cls-58,.cls-59,.cls-60,.cls-61,.cls-62,.cls-63,.cls-64,.cls-65,.cls-66,.cls-67,.cls-68,.cls-69,.cls-70,.cls-71,.cls-72,.cls-73,.cls-74,.cls-75,.cls-76,.cls-77,.cls-78,.cls-79,.cls-80,.cls-81,.cls-82,.cls-83,.cls-84,.cls-85,.cls-86,.cls-87,.cls-88,.cls-89,.cls-90,.cls-91,.cls-92,.cls-93,.cls-94,.cls-95,.cls-96,.cls-97,.cls-98,.cls-99,.cls-100,.cls-101,.cls-102,.cls-103,.cls-104,.cls-105,.cls-106,.cls-107,.cls-108,.cls-109,.cls-110,.cls-111,.cls-112,.cls-113,.cls-114,.cls-115,.cls-116,.cls-117,.cls-118,.cls-119,.cls-120,.cls-121,.cls-122,.cls-123{stroke-linejoin:round;}.cls-6{fill:#9997c1;stroke:#9997c1;}.cls-7{fill:#68579f;stroke:#68579f;}.cls-8{fill:#65539d;stroke:#65539d;}.cls-9{fill:#9896c1;stroke:#9896c1;}.cls-10{fill:#9694c0;stroke:#9694c0;}.cls-11{fill:#6c5fa3;stroke:#6c5fa3;}.cls-12{fill:#7d7bb3;stroke:#7d7bb3;}.cls-13{fill:#9190be;stroke:#9190be;}.cls-14{fill:#6b5ea2;stroke:#6b5ea2;}.cls-15{fill:#7f7eb5;stroke:#7f7eb5;}.cls-16{fill:#6a5ba1;stroke:#6a5ba1;}.cls-17{fill:#b7b8d5;stroke:#b7b8d5;}.cls-18{fill:#8483b8;stroke:#8483b8;}.cls-19{fill:#a09fc7;stroke:#a09fc7;}.cls-20{fill:#8a88ba;stroke:#8a88ba;}.cls-21{fill:#b4b5d3;stroke:#b4b5d3;}.cls-22{fill:#9d9cc5;stroke:#9d9cc5;}.cls-23{fill:#b8b9d6;stroke:#b8b9d6;}.cls-24{fill:#9290be;stroke:#9290be;}.cls-25{fill:#9f9ec6;stroke:#9f9ec6;}.cls-26{fill:#9c9bc4;stroke:#9c9bc4;}.cls-27{fill:#9c9ac3;stroke:#9c9ac3;}.cls-28{fill:#7c7ab2;stroke:#7c7ab2;}.cls-29{fill:#9795c1;stroke:#9795c1;}.cls-30{fill:#babbd7;stroke:#babbd7;}.cls-31{fill:#8e8dbd;stroke:#8e8dbd;}.cls-32{fill:#8d8cbc;stroke:#8d8cbc;}.cls-33{fill:#c2c3dd;stroke:#c2c3dd;}.cls-34{fill:#c1c2dc;stroke:#c1c2dc;}.cls-35{fill:#8f8ebd;stroke:#8f8ebd;}.cls-36{fill:#b6b7d5;stroke:#b6b7d5;}.cls-37{fill:#9e9dc6;stroke:#9e9dc6;}.cls-38{fill:#8b89bb;stroke:#8b89bb;}.cls-39{fill:#a1a0c7;stroke:#a1a0c7;}.cls-40{fill:#b9bad7;stroke:#b9bad7;}.cls-41{fill:#b3b4d3;stroke:#b3b4d3;}.cls-42{fill:#908fbe;stroke:#908fbe;}.cls-43{fill:#9391bf;stroke:#9391bf;}.cls-44{fill:#8584b8;stroke:#8584b8;}.cls-45{fill:#411b7f;stroke:#411b7f;}.cls-46{fill:#3f187e;stroke:#3f187e;}.cls-47{fill:#401a7f;stroke:#401a7f;}.cls-48{fill:#452082;stroke:#452082;}.cls-49{fill:#381078;stroke:#381078;}.cls-50{fill:#431f81;stroke:#431f81;}.cls-51{fill:#a8a8cc;stroke:#a8a8cc;}.cls-52{fill:#bec0db;stroke:#bec0db;}.cls-53{fill:#a4a3c9;stroke:#a4a3c9;}.cls-54{fill:#b2b3d2;stroke:#b2b3d2;}.cls-55{fill:#aeaed0;stroke:#aeaed0;}.cls-56{fill:#bdbfda;stroke:#bdbfda;}.cls-57{fill:#c0c2dc;stroke:#c0c2dc;}.cls-58{fill:#bbbdd8;stroke:#bbbdd8;}.cls-59{fill:#cbcbe1;stroke:#cbcbe1;}.cls-60{fill:#cccce1;stroke:#cccce1;}.cls-61{fill:#c6c7de;stroke:#c6c7de;}.cls-62{fill:#c5c6de;stroke:#c5c6de;}.cls-63{fill:#d3d3e5;stroke:#d3d3e5;}.cls-64{fill:#afafd0;stroke:#afafd0;}.cls-65{fill:#d4d4e6;stroke:#d4d4e6;}.cls-66{fill:#d6d6e6;stroke:#d6d6e6;}.cls-67{fill:#d5d5e6;stroke:#d5d5e6;}.cls-68{fill:#c8c8df;stroke:#c8c8df;}.cls-69{fill:#c3c4dd;stroke:#c3c4dd;}.cls-70{fill:#d2d2e5;stroke:#d2d2e5;}.cls-71{fill:#d1d1e4;stroke:#d1d1e4;}.cls-72{fill:#c9c9e0;stroke:#c9c9e0;}.cls-73{fill:#cacae0;stroke:#cacae0;}.cls-74{fill:#d0d1e4;stroke:#d0d1e4;}.cls-75{fill:#cfd0e3;stroke:#cfd0e3;}.cls-76{fill:#a6a6cb;stroke:#a6a6cb;}.cls-77{fill:#bfc1db;stroke:#bfc1db;}.cls-78{fill:#a7a7cb;stroke:#a7a7cb;}.cls-79{fill:#aaaacd;stroke:#aaaacd;}.cls-80{fill:#bcbed9;stroke:#bcbed9;}.cls-81{fill:#a5a4ca;stroke:#a5a4ca;}.cls-82{fill:#b0b1d1;stroke:#b0b1d1;}.cls-83{fill:#cdcde2;stroke:#cdcde2;}.cls-84{fill:#cdcee2;stroke:#cdcee2;}.cls-85{fill:#cecfe3;stroke:#cecfe3;}.cls-86{fill:#50318b;stroke:#50318b;}.cls-87{fill:#5e4897;stroke:#5e4897;}.cls-88{fill:#5e4796;stroke:#5e4796;}.cls-89{fill:#7167a7;stroke:#7167a7;}.cls-90{fill:#8281b7;stroke:#8281b7;}.cls-91{fill:#7873ae;stroke:#7873ae;}.cls-92{fill:#4d2d89;stroke:#4d2d89;}.cls-93{fill:#7770ac;stroke:#7770ac;}.cls-94{fill:#7772ad;stroke:#7772ad;}.cls-95{fill:#7a77b0;stroke:#7a77b0;}.cls-96{fill:#7976af;stroke:#7976af;}.cls-97{fill:#7974ae;stroke:#7974ae;}.cls-98{fill:#756eab;stroke:#756eab;}.cls-99{fill:#4a2886;stroke:#4a2886;}.cls-100{fill:#4a2987;stroke:#4a2987;}.cls-101{fill:#7b79b1;stroke:#7b79b1;}.cls-102{fill:#4f308a;stroke:#4f308a;}.cls-103{fill:#8382b7;stroke:#8382b7;}.cls-104{fill:#583f92;stroke:#583f92;}.cls-105{fill:#594193;stroke:#594193;}.cls-106{fill:#3d167c;stroke:#3d167c;}.cls-107{fill:#6f64a6;stroke:#6f64a6;}.cls-108{fill:#9593c0;stroke:#9593c0;}.cls-109{fill:#807fb6;stroke:#807fb6;}.cls-110{fill:#8886ba;stroke:#8886ba;}.cls-111{fill:#8786b9;stroke:#8786b9;}.cls-112{fill:#9b99c2;stroke:#9b99c2;}.cls-113{fill:#9a98c2;stroke:#9a98c2;}.cls-114{fill:#8685b9;stroke:#8685b9;}.cls-115{fill:#8987ba;stroke:#8987ba;}.cls-116{fill:#54378e;stroke:#54378e;}.cls-117{fill:#a2a1c8;stroke:#a2a1c8;}.cls-118{fill:#7065a7;stroke:#7065a7;}.cls-119{fill:#9492bf;stroke:#9492bf;}.cls-120{fill:#a3a2c9;stroke:#a3a2c9;}.cls-121{fill:#573d91;stroke:#573d91;}.cls-122{fill:#64529c;stroke:#64529c;}.cls-123{fill:#482785;stroke:#482785;}</style></defs><g><g><path id="C0_0_f65b318a5e" class="cls-71" d="M372.59,100.6c.68,.81,1.01,1.86,.92,2.91-.09,1.05-.6,2.03-1.41,2.71-.81,.68-1.86,1.01-2.91,.92-1.05-.09-2.03-.6-2.71-1.41-.68-.81-1.01-1.86-.92-2.91,.09-1.05,.6-2.03,1.41-2.71s1.86-1.01,2.91-.92c1.05,.09,2.03,.6,2.71,1.41Z"/><path id="C0_0_f65b318a5e-2" class="cls-10" d="M438.82,127.7c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-3" class="cls-65" d="M376.9,62.5c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-4" class="cls-77" d="M418.79,79.29c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28-.95-.46-2.05-.53-3.05-.19-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-5" class="cls-87" d="M526.89,132.2c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-6" class="cls-19" d="M434.68,136.46c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-7" class="cls-31" d="M460.15,113.09c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-8" class="cls-73" d="M404.66,76.15c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-9" class="cls-19" d="M424.33,120.41c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-10" class="cls-111" d="M472.34,119.61c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-11" class="cls-114" d="M475.49,108.58c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-12" class="cls-82" d="M423.36,99.54c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-13" class="cls-39" d="M414.32,131.71c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-14" class="cls-101" d="M491.96,141.15c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-15" class="cls-78" d="M433.9,119.06c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-16" class="cls-12" d="M488.12,120.62c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-17" class="cls-15" d="M481.6,130.26c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-18" class="cls-76" d="M435.7,116.66c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-19" class="cls-64" d="M436.17,100.88c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-20" class="cls-60" d="M368.11,114.49c.68,.81,1.01,1.86,.92,2.91-.09,1.05-.6,2.03-1.41,2.71-.81,.68-1.86,1.01-2.91,.92-1.05-.09-2.03-.6-2.71-1.41-.68-.81-1.01-1.86-.92-2.91,.09-1.05,.6-2.03,1.41-2.71s1.86-1.01,2.91-.92c1.05,.09,2.03,.6,2.71,1.41Z"/><path id="C0_0_f65b318a5e-21" class="cls-8" d="M512.04,125.44c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-22" class="cls-8" d="M512.33,124.61c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-23" class="cls-53" d="M444.85,99.26c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-24" class="cls-39" d="M449.22,98.77c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-25" class="cls-41" d="M410.58,98.89c.71,.79,1.7,1.26,2.75,1.31,1.06,.06,2.09-.31,2.88-1.02s1.26-1.7,1.31-2.75c.06-1.06-.31-2.09-1.02-2.88s-1.7-1.26-2.75-1.31c-1.06-.06-2.09,.31-2.88,1.02-.79,.71-1.26,1.7-1.31,2.75-.06,1.06,.31,2.09,1.02,2.88Z"/><path id="C0_0_f65b318a5e-26" class="cls-112" d="M460.93,105.67c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-27" class="cls-65" d="M389.64,56.36c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28-.95-.46-2.05-.53-3.05-.19-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-28" class="cls-6" d="M442.56,138.61c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-29" class="cls-20" d="M478.43,108.16c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-30" class="cls-36" d="M414.89,102.57c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-31" class="cls-57" d="M392.08,118.37c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-32" class="cls-81" d="M442.81,110.04c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-33" class="cls-63" d="M375.11,67.13c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-34" class="cls-40" d="M404.98,125.09c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-35" class="cls-62" d="M393.18,101.76c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-36" class="cls-30" d="M399.8,127.74c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-37" class="cls-41" d="M406.58,133.04c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14-.98-.4-2.08-.39-3.05,.03-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-38" class="cls-117" d="M442.21,117.34c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-39" class="cls-26" d="M445.08,129.19c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14-.98-.4-2.08-.39-3.05,.03-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-40" class="cls-57" d="M375.21,122.29c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-41" class="cls-84" d="M358.42,85.69c.68,.81,1.01,1.86,.92,2.91-.09,1.05-.6,2.03-1.41,2.71s-1.86,1.01-2.91,.92c-1.05-.09-2.03-.6-2.71-1.41-.68-.81-1.01-1.86-.92-2.91,.09-1.05,.6-2.03,1.41-2.71s1.86-1.01,2.91-.92c1.05,.09,2.03,.6,2.71,1.41Z"/><path id="C0_0_f65b318a5e-42" class="cls-14" d="M501.36,134.38c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-43" class="cls-79" d="M443.62,103.66c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-44" class="cls-28" d="M491.8,118.41c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-45" class="cls-55" d="M409.42,134.36c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-46" class="cls-96" d="M492.79,126.69c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-47" class="cls-40" d="M419.98,90.78c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-48" class="cls-120" d="M433.58,138.11c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14-.98-.4-2.08-.39-3.05,.03-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-49" class="cls-111" d="M469.65,130.38c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-50" class="cls-33" d="M414.37,77.71c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-51" class="cls-90" d="M466.77,116.74c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-52" class="cls-113" d="M452.7,123.67c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14-.98-.4-2.08-.39-3.05,.03-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-53" class="cls-31" d="M470.65,110.63c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-54" class="cls-55" d="M396.98,129.07c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-55" class="cls-61" d="M416.84,72.65c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-56" class="cls-118" d="M503.03,118.89c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-57" class="cls-82" d="M422.29,105.37c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14-.98-.4-2.08-.39-3.05,.03-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-58" class="cls-40" d="M416.77,94.73c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-59" class="cls-33" d="M402.51,106.54c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-60" class="cls-110" d="M470.93,121.86c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-61" class="cls-24" d="M458.1,122.04c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-62" class="cls-22" d="M436.57,124.7c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-63" class="cls-112" d="M459.09,111.04c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-64" class="cls-78" d="M422.08,128.25c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-65" class="cls-30" d="M419.05,82.46c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-66" class="cls-43" d="M463.29,105.22c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-67" class="cls-44" d="M475.44,117.44c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-68" class="cls-13" d="M463.88,110.36c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-69" class="cls-119" d="M458.76,128.11c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-70" class="cls-43" d="M435.79,110.59c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-71" class="cls-109" d="M485.51,110.45c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-72" class="cls-24" d="M458.12,122.23c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14-.98-.4-2.08-.39-3.05,.03-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-73" class="cls-119" d="M453.26,120.04c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-74" class="cls-28" d="M478.98,123.63c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-75" class="cls-39" d="M431.89,126.34c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-76" class="cls-72" d="M384.16,119.2c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-77" class="cls-32" d="M472.12,106.02c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-78" class="cls-112" d="M447.33,134.27c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14-.98-.4-2.08-.39-3.05,.03-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-79" class="cls-5" d="M507.63,122.59c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-80" class="cls-10" d="M431.81,109.19c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-81" class="cls-28" d="M486.14,131.54c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-82" class="cls-54" d="M412.28,119.79c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14-.98-.4-2.08-.39-3.05,.03-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-83" class="cls-89" d="M497.7,136.32c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-84" class="cls-25" d="M445.69,118.34c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-85" class="cls-52" d="M417.01,83.75c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-86" class="cls-69" d="M387.75,108.4c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-87" class="cls-9" d="M446.57,131.88c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-88" class="cls-81" d="M440.34,108.87c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-89" class="cls-32" d="M471.46,104.78c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-90" class="cls-26" d="M435.72,135.31c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14-.98-.4-2.08-.39-3.05,.03-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-91" class="cls-29" d="M445.19,118.74c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-92" class="cls-77" d="M395.37,115.35c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-93" class="cls-37" d="M441.79,114.87c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-94" class="cls-15" d="M472.67,124.69c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-95" class="cls-15" d="M481.83,134.55c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-96" class="cls-12" d="M492.53,113.16c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-97" class="cls-71" d="M398.4,71.45c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-98" class="cls-25" d="M453.85,104.21c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-99" class="cls-117" d="M433.6,141.25c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14-.98-.4-2.08-.39-3.05,.03-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-100" class="cls-90" d="M482.31,113.24c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-101" class="cls-74" d="M384.07,65.59c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-102" class="cls-114" d="M485.02,111.35c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-103" class="cls-84" d="M355.74,91.87c.68,.81,1.01,1.86,.92,2.91-.09,1.05-.6,2.03-1.41,2.71-.81,.68-1.86,1.01-2.91,.92-1.05-.09-2.03-.6-2.71-1.41s-1.01-1.86-.92-2.91c.09-1.05,.6-2.03,1.41-2.71,.81-.68,1.86-1.01,2.91-.92,1.05,.09,2.03,.6,2.71,1.41Z"/><path id="C0_0_f65b318a5e-104" class="cls-11" d="M503.64,125.71c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-105" class="cls-91" d="M485.59,125.48c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-106" class="cls-25" d="M436.79,135.49c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-107" class="cls-27" d="M440.01,142.79c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-108" class="cls-77" d="M400.31,109.19c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-109" class="cls-10" d="M467.02,113.07c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-110" class="cls-62" d="M408.46,83.65c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02,.46-.95,.53-2.05,.19-3.04-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-111" class="cls-7" d="M507.63,134.21c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-112" class="cls-56" d="M423.12,88.38c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-113" class="cls-94" d="M500.34,124.52c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-114" class="cls-17" d="M408.68,111.84c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-115" class="cls-43" d="M459.61,131.51c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-116" class="cls-90" d="M480.96,115.5c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-117" class="cls-79" d="M417.37,133.53c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-118" class="cls-62" d="M405.78,84.37c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-119" class="cls-51" d="M443.73,92.76c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28-.95-.46-2.05-.53-3.05-.19-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-120" class="cls-15" d="M487.5,113.74c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-121" class="cls-113" d="M446.16,130.03c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-122" class="cls-22" d="M441.76,128.89c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-123" class="cls-91" d="M495.9,129.59c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-124" class="cls-88" d="M519.88,127.8c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-125" class="cls-60" d="M369.01,111.4c.68,.81,1.01,1.86,.92,2.91-.09,1.05-.6,2.03-1.41,2.71s-1.86,1.01-2.91,.92c-1.05-.09-2.03-.6-2.71-1.41-.68-.81-1.01-1.86-.92-2.91,.09-1.05,.6-2.03,1.41-2.71s1.86-1.01,2.91-.92c1.05,.09,2.03,.6,2.71,1.41Z"/><path id="C0_0_f65b318a5e-126" class="cls-82" d="M395.64,127.28c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-127" class="cls-42" d="M464.69,124.27c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-128" class="cls-6" d="M456.75,105.56c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-129" class="cls-15" d="M489.56,115.7c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-130" class="cls-83" d="M362.9,99.59c.68,.81,1.01,1.86,.92,2.91-.09,1.05-.6,2.03-1.41,2.71s-1.86,1.01-2.91,.92c-1.05-.09-2.03-.6-2.71-1.41-.68-.81-1.01-1.86-.92-2.91,.09-1.05,.6-2.03,1.41-2.71,.81-.68,1.86-1.01,2.91-.92,1.05,.09,2.03,.6,2.71,1.41Z"/><path id="C0_0_f65b318a5e-131" class="cls-26" d="M444.08,117.78c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-132" class="cls-76" d="M415.73,123.77c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-133" class="cls-19" d="M427.43,135.62c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14-.98-.4-2.08-.39-3.05,.03-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-134" class="cls-17" d="M433.45,87.9c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-135" class="cls-52" d="M385.06,113.03c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-136" class="cls-81" d="M428.07,135.77c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-137" class="cls-27" d="M431.63,141.51c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-138" class="cls-103" d="M478.42,119.25c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-139" class="cls-90" d="M476.86,135.85c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-140" class="cls-120" d="M445.99,102.88c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-141" class="cls-43" d="M459.37,124.41c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-142" class="cls-65" d="M361.11,62.53c.68,.81,1.01,1.86,.92,2.91-.09,1.05-.6,2.03-1.41,2.71s-1.86,1.01-2.91,.92c-1.05-.09-2.03-.6-2.71-1.41-.68-.81-1.01-1.86-.92-2.91,.09-1.05,.6-2.03,1.41-2.71s1.86-1.01,2.91-.92c1.05,.09,2.03,.6,2.71,1.41Z"/><path id="C0_0_f65b318a5e-143" class="cls-114" d="M469.72,136.54c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-144" class="cls-59" d="M383.27,103.76c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-145" class="cls-27" d="M452.43,98.15c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-146" class="cls-19" d="M415.96,138.51c1.02,.26,2.11,.11,3.02-.43,.91-.54,1.57-1.42,1.83-2.44,.26-1.02,.11-2.11-.43-3.02-.54-.91-1.42-1.57-2.44-1.83s-2.11-.11-3.02,.43c-.91,.54-1.57,1.42-1.83,2.44-.26,1.02-.11,2.11,.43,3.02,.54,.91,1.42,1.57,2.44,1.83Z"/><path id="C0_0_f65b318a5e-147" class="cls-85" d="M400.69,82.99c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28-.95-.46-2.05-.53-3.05-.19-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-148" class="cls-113" d="M454.17,123.22c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-149" class="cls-108" d="M453.52,127.79c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-150" class="cls-67" d="M364.53,108.32c.68,.81,1.01,1.86,.92,2.91-.09,1.05-.6,2.03-1.41,2.71-.81,.68-1.86,1.01-2.91,.92-1.05-.09-2.03-.6-2.71-1.41s-1.01-1.86-.92-2.91c.09-1.05,.6-2.03,1.41-2.71s1.86-1.01,2.91-.92c1.05,.09,2.03,.6,2.71,1.41Z"/><path id="C0_0_f65b318a5e-151" class="cls-70" d="M408.66,61.36c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-152" class="cls-38" d="M460.58,119.69c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14-.98-.4-2.08-.39-3.05,.03-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-153" class="cls-34" d="M382.53,102.76c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-154" class="cls-82" d="M439.98,90.27c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28-.95-.46-2.05-.53-3.05-.19-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-155" class="cls-75" d="M360.95,99.05c.68,.81,1.01,1.86,.92,2.91-.09,1.05-.6,2.03-1.41,2.71-.81,.68-1.86,1.01-2.91,.92-1.05-.09-2.03-.6-2.71-1.41s-1.01-1.86-.92-2.91c.09-1.05,.6-2.03,1.41-2.71s1.86-1.01,2.91-.92c1.05,.09,2.03,.6,2.71,1.41Z"/><path id="C0_0_f65b318a5e-156" class="cls-57" d="M399.74,97.23c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-157" class="cls-82" d="M431.22,98.21c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-158" class="cls-75" d="M387.65,64.04c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-159" class="cls-68" d="M415.46,64.04c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-160" class="cls-56" d="M373.42,111.48c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-161" class="cls-95" d="M481.28,138.59c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-162" class="cls-42" d="M459.06,139.94c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-163" class="cls-55" d="M414.98,117.14c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-164" class="cls-64" d="M403.01,118.47c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14-.98-.4-2.08-.39-3.05,.03-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-165" class="cls-74" d="M403.34,66.54c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02,.46-.95,.53-2.05,.19-3.04-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-166" class="cls-25" d="M434.08,131.94c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-167" class="cls-112" d="M444.42,144.98c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-168" class="cls-13" d="M440.74,109.27c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-169" class="cls-97" d="M489.86,120.34c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-170" class="cls-15" d="M485.31,111.07c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-171" class="cls-44" d="M462.25,142.24c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-172" class="cls-39" d="M436.89,117.53c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14-.98-.4-2.08-.39-3.05,.03-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-173" class="cls-121" d="M530.43,126.27c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-174" class="cls-16" d="M502.36,118.93c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-175" class="cls-59" d="M376.17,109.86c.68,.81,1.01,1.86,.92,2.91-.09,1.05-.6,2.03-1.41,2.71-.81,.68-1.86,1.01-2.91,.92-1.05-.09-2.03-.6-2.71-1.41s-1.01-1.86-.92-2.91c.09-1.05,.6-2.03,1.41-2.71s1.86-1.01,2.91-.92c1.05,.09,2.03,.6,2.71,1.41Z"/><path id="C0_0_f65b318a5e-176" class="cls-37" d="M436.14,116.18c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-177" class="cls-79" d="M427.25,115.35c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14-.98-.4-2.08-.39-3.05,.03-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-178" class="cls-23" d="M430.57,92.34c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-179" class="cls-75" d="M394.02,60.88c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-180" class="cls-34" d="M383.58,125.22c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14-.98-.4-2.08-.39-3.05,.03-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-181" class="cls-31" d="M461.99,115.66c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-182" class="cls-44" d="M476.34,115.89c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-183" class="cls-64" d="M418.92,110.52c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-184" class="cls-97" d="M496.11,120.84c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-185" class="cls-6" d="M463.44,114.04c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-186" class="cls-13" d="M457.49,119.7c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-187" class="cls-21" d="M404.23,135.69c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14-.98-.4-2.08-.39-3.05,.03-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-188" class="cls-55" d="M400.99,122.44c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-189" class="cls-113" d="M452.48,137.8c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14-.98-.4-2.08-.39-3.05,.03-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-190" class="cls-110" d="M469.37,118.12c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-191" class="cls-71" d="M397.3,65.41c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-192" class="cls-115" d="M466.49,124.84c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-193" class="cls-18" d="M486.55,102.8c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-194" class="cls-24" d="M460.85,129.7c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-195" class="cls-31" d="M471.21,105.15c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-196" class="cls-98" d="M503.81,115.36c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-197" class="cls-22" d="M454.39,115c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-198" class="cls-79" d="M431.1,110.5c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-199" class="cls-90" d="M477.12,113.18c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-200" class="cls-32" d="M463.23,136.01c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-201" class="cls-73" d="M411.3,79.49c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-202" class="cls-115" d="M466.78,123.88c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-203" class="cls-71" d="M359.32,64.07c.68,.81,1.01,1.86,.92,2.91-.09,1.05-.6,2.03-1.41,2.71s-1.86,1.01-2.91,.92c-1.05-.09-2.03-.6-2.71-1.41-.68-.81-1.01-1.86-.92-2.91,.09-1.05,.6-2.03,1.41-2.71s1.86-1.01,2.91-.92c1.05,.09,2.03,.6,2.71,1.41Z"/><path id="C0_0_f65b318a5e-204" class="cls-104" d="M532.08,130.4c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-205" class="cls-19" d="M421.3,140.7c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14-.98-.4-2.08-.39-3.05,.03-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-206" class="cls-81" d="M442.81,103.64c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-207" class="cls-87" d="M517.19,132.86c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-208" class="cls-29" d="M459.76,95.23c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-209" class="cls-94" d="M488.63,137.37c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-210" class="cls-35" d="M463.48,112.8c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-211" class="cls-58" d="M407.93,97.27c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-212" class="cls-122" d="M504.98,126.42c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-213" class="cls-23" d="M406.47,126.42c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-214" class="cls-51" d="M431.79,108.36c.71,.79,1.7,1.26,2.75,1.31,1.06,.06,2.09-.31,2.88-1.02,.79-.71,1.26-1.7,1.31-2.75,.06-1.06-.31-2.09-1.02-2.88s-1.7-1.26-2.75-1.31c-1.06-.06-2.09,.31-2.88,1.02s-1.26,1.7-1.31,2.75c-.06,1.06,.31,2.09,1.02,2.88Z"/><path id="C0_0_f65b318a5e-215" class="cls-112" d="M446.57,136.46c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-216" class="cls-96" d="M494.14,116.53c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-217" class="cls-60" d="M357.53,93.41c.68,.81,1.01,1.86,.92,2.91-.09,1.05-.6,2.03-1.41,2.71s-1.86,1.01-2.91,.92c-1.05-.09-2.03-.6-2.71-1.41-.68-.81-1.01-1.86-.92-2.91,.09-1.05,.6-2.03,1.41-2.71s1.86-1.01,2.91-.92c1.05,.09,2.03,.6,2.71,1.41Z"/><path id="C0_0_f65b318a5e-218" class="cls-58" d="M402.01,107.87c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18,.4-.98,.39-2.08-.03-3.05-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18-.4,.98-.39,2.08,.03,3.05,.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-219" class="cls-105" d="M537.23,129.09c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-220" class="cls-93" d="M498.9,139.18c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-221" class="cls-60" d="M408.21,68.71c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02,.46-.95,.53-2.05,.19-3.04-.34-1-1.07-1.82-2.02-2.28-.95-.46-2.05-.53-3.05-.19-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-222" class="cls-107" d="M504.79,118.6c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-223" class="cls-66" d="M401.62,55.2c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-224" class="cls-80" d="M388.3,130.39c.98,.4,2.08,.39,3.05-.03,.97-.41,1.74-1.2,2.14-2.18s.39-2.08-.03-3.05c-.41-.97-1.2-1.74-2.18-2.14s-2.08-.39-3.05,.03c-.97,.41-1.74,1.2-2.14,2.18s-.39,2.08,.03,3.05c.41,.97,1.2,1.74,2.18,2.14Z"/><path id="C0_0_f65b318a5e-225" class="cls-111" d="M467.76,126.33c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-226" class="cls-42" d="M463.85,108.82c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-227" class="cls-58" d="M415.3,86.45c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-228" class="cls-15" d="M485.58,117.3c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-229" class="cls-93" d="M502.04,124.69c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-230" class="cls-32" d="M469.54,134.22c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-231" class="cls-71" d="M412.37,59.68c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28-.95-.46-2.05-.53-3.05-.19-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-232" class="cls-65" d="M388.55,60.95c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02,.46-.95,.53-2.05,.19-3.04-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-233" class="cls-84" d="M353.95,70.25c.68,.81,1.01,1.86,.92,2.91-.09,1.05-.6,2.03-1.41,2.71s-1.86,1.01-2.91,.92c-1.05-.09-2.03-.6-2.71-1.41-.68-.81-1.01-1.86-.92-2.91,.09-1.05,.6-2.03,1.41-2.71,.81-.68,1.86-1.01,2.91-.92,1.05,.09,2.03,.6,2.71,1.41Z"/><path id="C0_0_f65b318a5e-234" class="cls-84" d="M362.01,90.32c.68,.81,1.01,1.86,.92,2.91-.09,1.05-.6,2.03-1.41,2.71s-1.86,1.01-2.91,.92c-1.05-.09-2.03-.6-2.71-1.41s-1.01-1.86-.92-2.91c.09-1.05,.6-2.03,1.41-2.71,.81-.68,1.86-1.01,2.91-.92,1.05,.09,2.03,.6,2.71,1.41Z"/><path id="C0_0_f65b318a5e-235" class="cls-83" d="M409.2,57.71c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-236" class="cls-65" d="M372.43,59.41c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-237" class="cls-67" d="M373.49,103.68c.68,.81,1.01,1.86,.92,2.91-.09,1.05-.6,2.03-1.41,2.71-.81,.68-1.86,1.01-2.91,.92-1.05-.09-2.03-.6-2.71-1.41s-1.01-1.86-.92-2.91c.09-1.05,.6-2.03,1.41-2.71s1.86-1.01,2.91-.92c1.05,.09,2.03,.6,2.71,1.41Z"/><path id="C0_0_f65b318a5e-238" class="cls-70" d="M408.43,71.59c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-239" class="cls-75" d="M365.59,98.04c.68,.81,1.01,1.86,.92,2.91-.09,1.05-.6,2.03-1.41,2.71s-1.86,1.01-2.91,.92c-1.05-.09-2.03-.6-2.71-1.41s-1.01-1.86-.92-2.91c.09-1.05,.6-2.03,1.41-2.71,.81-.68,1.86-1.01,2.91-.92,1.05,.09,2.03,.6,2.71,1.41Z"/><path id="C0_0_f65b318a5e-240" class="cls-71" d="M373.32,57.87c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-241" class="cls-60" d="M354.84,79.51c.68,.81,1.01,1.86,.92,2.91-.09,1.05-.6,2.03-1.41,2.71-.81,.68-1.86,1.01-2.91,.92-1.05-.09-2.03-.6-2.71-1.41-.68-.81-1.01-1.86-.92-2.91,.09-1.05,.6-2.03,1.41-2.71,.81-.68,1.86-1.01,2.91-.92,1.05,.09,2.03,.6,2.71,1.41Z"/><path id="C0_0_f65b318a5e-242" class="cls-66" d="M401.39,65.43c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02,.46-.95,.53-2.05,.19-3.04-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-243" class="cls-100" d="M466.46,46.83c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-244" class="cls-123" d="M472.02,40.8c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-245" class="cls-45" d="M492.3,42.02c.45-.96,.49-2.05,.13-3.05-.36-.99-1.1-1.8-2.06-2.25s-2.05-.49-3.05-.13c-.99,.36-1.8,1.1-2.25,2.06s-.49,2.05-.13,3.05c.36,.99,1.1,1.8,2.06,2.25,.96,.45,2.05,.49,3.05,.13,.99-.36,1.8-1.1,2.25-2.06Z"/><path id="C0_0_f65b318a5e-246" class="cls-106" d="M494.53,36.49c.45-.96,.49-2.05,.13-3.05-.36-.99-1.1-1.8-2.06-2.25s-2.05-.49-3.05-.13c-.99,.36-1.8,1.1-2.25,2.06s-.49,2.05-.13,3.05c.36,.99,1.1,1.8,2.06,2.25s2.05,.49,3.05,.13c.99-.36,1.8-1.1,2.25-2.06Z"/><path id="C0_0_f65b318a5e-247" class="cls-116" d="M473.13,49.84c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-248" class="cls-92" d="M478.69,47.83c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-249" class="cls-49" d="M495.64,28.45c.45-.96,.49-2.05,.13-3.05-.36-.99-1.1-1.8-2.06-2.25-.96-.45-2.05-.49-3.05-.13-.99,.36-1.8,1.1-2.25,2.06-.45,.96-.49,2.05-.13,3.05,.36,.99,1.1,1.8,2.06,2.25s2.05,.49,3.05,.13c.99-.36,1.8-1.1,2.25-2.06Z"/><path id="C0_0_f65b318a5e-250" class="cls-92" d="M486.19,43.02c.45-.96,.49-2.05,.13-3.05-.36-.99-1.1-1.8-2.06-2.25s-2.05-.49-3.05-.13c-.99,.36-1.8,1.1-2.25,2.06-.45,.96-.49,2.05-.13,3.05,.36,.99,1.1,1.8,2.06,2.25s2.05,.49,3.05,.13c.99-.36,1.8-1.1,2.25-2.06Z"/><path id="C0_0_f65b318a5e-251" class="cls-47" d="M491.75,46.54c.45-.96,.49-2.05,.13-3.05-.36-.99-1.1-1.8-2.06-2.25s-2.05-.49-3.05-.13c-.99,.36-1.8,1.1-2.25,2.06s-.49,2.05-.13,3.05c.36,.99,1.1,1.8,2.06,2.25s2.05,.49,3.05,.13c.99-.36,1.8-1.1,2.25-2.06Z"/><path id="C0_0_f65b318a5e-252" class="cls-46" d="M493.97,40.01c.45-.96,.49-2.05,.13-3.05-.36-.99-1.1-1.8-2.06-2.25s-2.05-.49-3.05-.13c-.99,.36-1.8,1.1-2.25,2.06s-.49,2.05-.13,3.05c.36,.99,1.1,1.8,2.06,2.25,.96,.45,2.05,.49,3.05,.13,.99-.36,1.8-1.1,2.25-2.06Z"/><path id="C0_0_f65b318a5e-253" class="cls-48" d="M481.75,37.5c.45-.96,.49-2.05,.13-3.05-.36-.99-1.1-1.8-2.06-2.25s-2.05-.49-3.05-.13c-.99,.36-1.8,1.1-2.25,2.06-.45,.96-.49,2.05-.13,3.05,.36,.99,1.1,1.8,2.06,2.25s2.05,.49,3.05,.13c.99-.36,1.8-1.1,2.25-2.06Z"/><path id="C0_0_f65b318a5e-254" class="cls-99" d="M476.47,40.3c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-255" class="cls-100" d="M472.58,42.81c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-256" class="cls-88" d="M462.57,41.81c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-257" class="cls-86" d="M483.69,50.85c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-258" class="cls-50" d="M492.86,34.48c.45-.96,.49-2.05,.13-3.05-.36-.99-1.1-1.8-2.06-2.25-.96-.45-2.05-.49-3.05-.13-.99,.36-1.8,1.1-2.25,2.06s-.49,2.05-.13,3.05c.36,.99,1.1,1.8,2.06,2.25s2.05,.49,3.05,.13c.99-.36,1.8-1.1,2.25-2.06Z"/><path id="C0_0_f65b318a5e-259" class="cls-102" d="M479.24,37.79c1.06,0,2.07-.42,2.82-1.17,.75-.75,1.17-1.76,1.17-2.82s-.42-2.07-1.17-2.82c-.75-.75-1.76-1.17-2.82-1.17s-2.07,.42-2.82,1.17c-.75,.75-1.17,1.76-1.17,2.82s.42,2.07,1.17,2.82c.75,.75,1.76,1.17,2.82,1.17Z"/><path id="C0_0_f65b318a5e-260" class="cls-48" d="M488.97,36.99c.45-.96,.49-2.05,.13-3.05-.36-.99-1.1-1.8-2.06-2.25s-2.05-.49-3.05-.13c-.99,.36-1.8,1.1-2.25,2.06s-.49,2.05-.13,3.05c.36,.99,1.1,1.8,2.06,2.25s2.05,.49,3.05,.13c.99-.36,1.8-1.1,2.25-2.06Z"/><path id="C0_0_f65b318a5e-261" class="cls-100" d="M475.35,36.85c1.06,0,2.07,.42,2.82,1.17,.75,.75,1.17,1.76,1.17,2.82s-.42,2.07-1.17,2.82c-.75,.75-1.76,1.17-2.82,1.17s-2.07-.42-2.82-1.17c-.75-.75-1.17-1.76-1.17-2.82s.42-2.07,1.17-2.82c.75-.75,1.76-1.17,2.82-1.17Z"/><path id="C0_0_f65b318a5e-262" class="cls-123" d="M482.02,43.38c1.06,0,2.07,.42,2.82,1.17,.75,.75,1.17,1.76,1.17,2.82s-.42,2.07-1.17,2.82c-.75,.75-1.76,1.17-2.82,1.17s-2.07-.42-2.82-1.17c-.75-.75-1.17-1.76-1.17-2.82s.42-2.07,1.17-2.82c.75-.75,1.76-1.17,2.82-1.17Z"/><path id="C0_0_f65b318a5e-263" class="cls-45" d="M491.47,29.31c1.06,0,2.07,.42,2.82,1.17,.75,.75,1.17,1.76,1.17,2.82s-.42,2.07-1.17,2.82c-.75,.75-1.76,1.17-2.82,1.17s-2.07-.42-2.82-1.17c-.75-.75-1.17-1.76-1.17-2.82s.42-2.07,1.17-2.82c.75-.75,1.76-1.17,2.82-1.17Z"/><path id="C0_0_f65b318a5e-264" class="cls-106" d="M488.63,28.28c.13-1.05,.67-2,1.51-2.65,.83-.65,1.89-.94,2.94-.81s2,.67,2.65,1.51c.65,.83,.94,1.89,.81,2.94-.13,1.05-.67,2-1.51,2.65-.83,.65-1.89,.94-2.94,.81s-2-.67-2.65-1.51c-.65-.83-.94-1.89-.81-2.94Z"/><path id="C0_0_f65b318a5e-265" class="cls-116" d="M459.24,43.88c1.06,0,2.07,.42,2.82,1.17,.75,.75,1.17,1.76,1.17,2.82s-.42,2.07-1.17,2.82c-.75,.75-1.76,1.17-2.82,1.17s-2.07-.42-2.82-1.17c-.75-.75-1.17-1.76-1.17-2.82s.42-2.07,1.17-2.82c.75-.75,1.76-1.17,2.82-1.17Z"/><path id="C0_0_f65b318a5e-266" class="cls-92" d="M468.13,35.84c1.06,0,2.07,.42,2.82,1.17,.75,.75,1.17,1.76,1.17,2.82s-.42,2.07-1.17,2.82c-.75,.75-1.76,1.17-2.82,1.17s-2.07-.42-2.82-1.17c-.75-.75-1.17-1.76-1.17-2.82s.42-2.07,1.17-2.82c.75-.75,1.76-1.17,2.82-1.17Z"/><path id="C0_0_f65b318a5e-267" class="cls-49" d="M490.08,22.57c.45-.96,1.26-1.7,2.25-2.06,.99-.36,2.09-.31,3.05,.13,.96,.45,1.7,1.26,2.06,2.25,.36,.99,.31,2.09-.13,3.05-.45,.96-1.26,1.7-2.25,2.06-.99,.36-2.09,.31-3.05-.13-.96-.45-1.7-1.26-2.06-2.25-.36-.99-.31-2.09,.13-3.05Z"/><path id="C0_0_f65b318a5e-268" class="cls-92" d="M467.02,38.36c1.06,0,2.07,.42,2.82,1.17,.75,.75,1.17,1.76,1.17,2.82s-.42,2.07-1.17,2.82c-.75,.75-1.76,1.17-2.82,1.17s-2.07-.42-2.82-1.17c-.75-.75-1.17-1.76-1.17-2.82s.42-2.07,1.17-2.82c.75-.75,1.76-1.17,2.82-1.17Z"/><path id="C0_0_f65b318a5e-269" class="cls-47" d="M480.63,35.63c.45-.96,1.26-1.7,2.25-2.06,.99-.36,2.09-.31,3.05,.13s1.7,1.26,2.06,2.25c.36,.99,.31,2.09-.13,3.05s-1.26,1.7-2.25,2.06c-.99,.36-2.09,.31-3.05-.13-.96-.45-1.7-1.26-2.06-2.25-.36-.99-.31-2.09,.13-3.05Z"/><path id="C0_0_f65b318a5e-270" class="cls-46" d="M482.3,32.62c.45-.96,1.26-1.7,2.25-2.06,.99-.36,2.09-.31,3.05,.13,.96,.45,1.7,1.26,2.06,2.25,.36,.99,.31,2.09-.13,3.05-.45,.96-1.26,1.7-2.25,2.06-.99,.36-2.09,.31-3.05-.13s-1.7-1.26-2.06-2.25c-.36-.99-.31-2.09,.13-3.05Z"/><path id="C0_0_f65b318a5e-271" class="cls-48" d="M479.52,41.66c.45-.96,1.26-1.7,2.25-2.06,.99-.36,2.09-.31,3.05,.13s1.7,1.26,2.06,2.25c.36,.99,.31,2.09-.13,3.05-.45,.96-1.26,1.7-2.25,2.06-.99,.36-2.09,.31-3.05-.13-.96-.45-1.7-1.26-2.06-2.25-.36-.99-.31-2.09,.13-3.05Z"/><path id="C0_0_f65b318a5e-272" class="cls-99" d="M464.8,42.37c1.06,0,2.07,.42,2.82,1.17,.75,.75,1.17,1.76,1.17,2.82s-.42,2.07-1.17,2.82c-.75,.75-1.76,1.17-2.82,1.17s-2.07-.42-2.82-1.17c-.75-.75-1.17-1.76-1.17-2.82s.42-2.07,1.17-2.82c.75-.75,1.76-1.17,2.82-1.17Z"/><path id="C0_0_f65b318a5e-273" class="cls-100" d="M463.13,40.36c1.06,0,2.07,.42,2.82,1.17,.75,.75,1.17,1.76,1.17,2.82s-.42,2.07-1.17,2.82c-.75,.75-1.76,1.17-2.82,1.17s-2.07-.42-2.82-1.17c-.75-.75-1.17-1.76-1.17-2.82s.42-2.07,1.17-2.82c.75-.75,1.76-1.17,2.82-1.17Z"/><path id="C0_0_f65b318a5e-274" class="cls-88" d="M452.57,41.37c1.06,0,2.07,.42,2.82,1.17,.75,.75,1.17,1.76,1.17,2.82s-.42,2.07-1.17,2.82c-.75,.75-1.76,1.17-2.82,1.17s-2.07-.42-2.82-1.17c-.75-.75-1.17-1.76-1.17-2.82s.42-2.07,1.17-2.82c.75-.75,1.76-1.17,2.82-1.17Z"/><path id="C0_0_f65b318a5e-275" class="cls-86" d="M457.02,37.85c1.06,0,2.07,.42,2.82,1.17,.75,.75,1.17,1.76,1.17,2.82s-.42,2.07-1.17,2.82c-.75,.75-1.76,1.17-2.82,1.17s-2.07-.42-2.82-1.17c-.75-.75-1.17-1.76-1.17-2.82s.42-2.07,1.17-2.82c.75-.75,1.76-1.17,2.82-1.17Z"/><path id="C0_0_f65b318a5e-276" class="cls-50" d="M483.97,29.1c.45-.96,1.26-1.7,2.25-2.06,.99-.36,2.09-.31,3.05,.13,.96,.45,1.7,1.26,2.06,2.25,.36,.99,.31,2.09-.13,3.05-.45,.96-1.26,1.7-2.25,2.06-.99,.36-2.09,.31-3.05-.13-.96-.45-1.7-1.26-2.06-2.25-.36-.99-.31-2.09,.13-3.05Z"/><path id="C0_0_f65b318a5e-277" class="cls-102" d="M469.24,44.38c1.06,0,2.07,.42,2.82,1.17,.75,.75,1.17,1.76,1.17,2.82s-.42,2.07-1.17,2.82c-.75,.75-1.76,1.17-2.82,1.17s-2.07-.42-2.82-1.17c-.75-.75-1.17-1.76-1.17-2.82s.42-2.07,1.17-2.82c.75-.75,1.76-1.17,2.82-1.17Z"/><path id="C0_0_f65b318a5e-278" class="cls-48" d="M476.19,37.64c.45-.96,1.26-1.7,2.25-2.06,.99-.36,2.09-.31,3.05,.13s1.7,1.26,2.06,2.25c.36,.99,.31,2.09-.13,3.05-.45,.96-1.26,1.7-2.25,2.06-.99,.36-2.09,.31-3.05-.13-.96-.45-1.7-1.26-2.06-2.25-.36-.99-.31-2.09,.13-3.05Z"/><path id="C0_0_f65b318a5e-279" class="cls-65" d="M366.16,71.76c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-280" class="cls-73" d="M396.21,86.55c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02,.46-.95,.53-2.05,.19-3.04-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-281" class="cls-63" d="M365.26,79.48c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-282" class="cls-71" d="M386.75,81.03c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-283" class="cls-74" d="M371.53,77.94c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-284" class="cls-85" d="M390.73,91.08c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-285" class="cls-75" d="M374.22,74.85c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-286" class="cls-71" d="M385.86,73.31c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-287" class="cls-65" d="M362.58,82.57c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02,.46-.95,.53-2.05,.19-3.04-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-288" class="cls-73" d="M384.96,94.92c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-289" class="cls-63" d="M361.68,90.29c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-290" class="cls-71" d="M377.8,93.38c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02,.46-.95,.53-2.05,.19-3.04-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-291" class="cls-74" d="M367.95,88.75c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02-.46,.95-.53,2.05-.19,3.04,.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-292" class="cls-85" d="M381.38,101.1c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02,.46-.95,.53-2.05,.19-3.04-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-293" class="cls-75" d="M369.74,87.2c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02s.53-2.05,.19-3.04c-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/><path id="C0_0_f65b318a5e-294" class="cls-71" d="M376.01,84.11c.95,.46,2.05,.53,3.05,.19,1-.34,1.82-1.07,2.28-2.02,.46-.95,.53-2.05,.19-3.04-.34-1-1.07-1.82-2.02-2.28s-2.05-.53-3.05-.19c-1,.34-1.82,1.07-2.28,2.02s-.53,2.05-.19,3.04c.34,1,1.07,1.82,2.02,2.28Z"/></g><g><path id="C0_0_f65b318a5e-295" class="cls-88" d="M84.8,100.6c.69,.81,1.02,1.86,.93,2.91-.09,1.05-.61,2.03-1.43,2.71-.82,.68-1.88,1.01-2.95,.92-1.07-.09-2.05-.6-2.74-1.41-.69-.81-1.02-1.86-.93-2.91,.09-1.05,.61-2.03,1.43-2.71s1.88-1.01,2.95-.92c1.07,.09,2.05,.6,2.74,1.41Z"/><path id="C0_0_f65b318a5e-296" class="cls-10" d="M151.88,127.7c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-297" class="cls-65" d="M89.17,62.5c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-298" class="cls-77" d="M131.59,79.29c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-299" class="cls-87" d="M241.09,132.2c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-300" class="cls-19" d="M147.69,136.46c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-301" class="cls-31" d="M173.48,113.09c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-302" class="cls-73" d="M117.28,76.15c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-303" class="cls-19" d="M137.2,120.41c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-304" class="cls-111" d="M185.83,119.61c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-305" class="cls-114" d="M189.02,108.58c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-306" class="cls-115" d="M136.22,95.74c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-307" class="cls-39" d="M127.07,131.71c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-308" class="cls-101" d="M205.71,141.15c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-309" class="cls-78" d="M146.9,119.06c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-310" class="cls-12" d="M201.82,120.62c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-311" class="cls-15" d="M195.21,130.26c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-312" class="cls-76" d="M148.72,116.66c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-313" class="cls-64" d="M149.19,100.88c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-314" class="cls-60" d="M80.26,114.49c.69,.81,1.02,1.86,.93,2.91-.09,1.05-.61,2.03-1.43,2.71-.82,.68-1.88,1.01-2.95,.92-1.07-.09-2.05-.6-2.74-1.41-.69-.81-1.02-1.86-.93-2.91,.09-1.05,.61-2.03,1.43-2.71s1.88-1.01,2.95-.92c1.07,.09,2.05,.6,2.74,1.41Z"/><path id="C0_0_f65b318a5e-315" class="cls-8" d="M226.04,125.44c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-316" class="cls-8" d="M226.33,124.61c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-317" class="cls-53" d="M157.99,99.26c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-318" class="cls-39" d="M162.41,98.77c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-319" class="cls-41" d="M123.27,98.89c.72,.79,1.72,1.26,2.79,1.31,1.07,.06,2.12-.31,2.91-1.02,.8-.71,1.27-1.7,1.33-2.75,.06-1.06-.32-2.09-1.03-2.88s-1.72-1.26-2.79-1.31c-1.07-.06-2.12,.31-2.91,1.02s-1.27,1.7-1.33,2.75c-.06,1.06,.32,2.09,1.03,2.88Z"/><path id="C0_0_f65b318a5e-320" class="cls-112" d="M174.27,105.67c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-321" class="cls-65" d="M102.07,56.36c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-322" class="cls-6" d="M155.67,138.61c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-323" class="cls-20" d="M192,108.16c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-324" class="cls-78" d="M127.64,102.57c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-325" class="cls-57" d="M104.54,118.37c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-326" class="cls-81" d="M155.92,110.04c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-327" class="cls-107" d="M87.35,67.13c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-328" class="cls-40" d="M117.6,125.09c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-329" class="cls-62" d="M105.65,101.76c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-330" class="cls-30" d="M112.36,127.74c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-331" class="cls-41" d="M119.23,133.04c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-332" class="cls-117" d="M155.31,117.34c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-333" class="cls-70" d="M158.22,129.19c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-334" class="cls-57" d="M87.45,122.29c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-335" class="cls-84" d="M70.45,85.69c.69,.81,1.02,1.86,.93,2.91-.09,1.05-.61,2.03-1.43,2.71-.82,.68-1.88,1.01-2.95,.92-1.07-.09-2.05-.6-2.74-1.41-.69-.81-1.02-1.86-.93-2.91,.09-1.05,.61-2.03,1.43-2.71s1.88-1.01,2.95-.92c1.07,.09,2.05,.6,2.74,1.41Z"/><path id="C0_0_f65b318a5e-336" class="cls-14" d="M215.22,134.38c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-337" class="cls-79" d="M156.74,103.66c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-338" class="cls-28" d="M205.54,118.41c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-339" class="cls-88" d="M122.1,134.36c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-340" class="cls-96" d="M206.54,126.69c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-341" class="cls-40" d="M132.8,90.78c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-342" class="cls-120" d="M146.57,138.11c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-343" class="cls-111" d="M183.1,130.38c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-344" class="cls-33" d="M127.12,77.71c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-345" class="cls-90" d="M180.19,116.74c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-346" class="cls-113" d="M165.94,123.67c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-347" class="cls-31" d="M184.12,110.63c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-348" class="cls-55" d="M109.5,129.07c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-349" class="cls-70" d="M129.62,72.65c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-350" class="cls-118" d="M216.92,118.89c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-351" class="cls-82" d="M135.14,105.37c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-352" class="cls-70" d="M129.54,94.73c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-353" class="cls-33" d="M115.1,106.54c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-354" class="cls-110" d="M184.4,121.86c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-355" class="cls-24" d="M171.41,122.04c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-356" class="cls-22" d="M149.6,124.7c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-357" class="cls-112" d="M172.41,111.04c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-358" class="cls-58" d="M134.93,128.25c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-359" class="cls-70" d="M131.85,82.46c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-360" class="cls-43" d="M176.66,105.22c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-361" class="cls-44" d="M188.97,117.44c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-362" class="cls-13" d="M177.27,110.36c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-363" class="cls-119" d="M172.07,128.11c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-364" class="cls-43" d="M148.81,110.59c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-365" class="cls-109" d="M199.17,110.45c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-366" class="cls-24" d="M171.43,122.23c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-367" class="cls-119" d="M166.51,120.04c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-368" class="cls-70" d="M192.56,123.63c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-369" class="cls-39" d="M144.86,126.34c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-370" class="cls-72" d="M96.52,119.2c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-371" class="cls-32" d="M185.6,106.02c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-372" class="cls-112" d="M160.5,134.27c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-373" class="cls-65" d="M221.57,122.59c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-374" class="cls-10" d="M144.78,109.19c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-375" class="cls-28" d="M199.8,131.54c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-376" class="cls-54" d="M125,119.79c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-377" class="cls-89" d="M211.52,136.32c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-378" class="cls-70" d="M158.84,118.34c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-379" class="cls-52" d="M129.79,83.75c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-380" class="cls-90" d="M100.15,108.4c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-381" class="cls-9" d="M159.73,131.88c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-382" class="cls-81" d="M153.42,108.87c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-383" class="cls-32" d="M184.93,104.78c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-384" class="cls-26" d="M148.74,135.31c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-385" class="cls-29" d="M158.33,118.74c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-386" class="cls-88" d="M107.87,115.35c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-387" class="cls-37" d="M154.88,114.87c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-388" class="cls-15" d="M186.17,124.69c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-389" class="cls-15" d="M195.44,134.55c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-390" class="cls-70" d="M206.28,113.16c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-391" class="cls-90" d="M110.94,71.45c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-392" class="cls-25" d="M167.1,104.21c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-393" class="cls-117" d="M146.6,141.25c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-394" class="cls-90" d="M195.93,113.24c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-395" class="cls-78" d="M96.42,65.59c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-396" class="cls-114" d="M198.67,111.35c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-397" class="cls-84" d="M67.73,91.87c.69,.81,1.02,1.86,.93,2.91-.09,1.05-.61,2.03-1.43,2.71s-1.88,1.01-2.95,.92c-1.07-.09-2.05-.6-2.74-1.41s-1.02-1.86-.93-2.91c.09-1.05,.61-2.03,1.43-2.71s1.88-1.01,2.95-.92c1.07,.09,2.05,.6,2.74,1.41Z"/><path id="C0_0_f65b318a5e-398" class="cls-11" d="M217.53,125.71c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-399" class="cls-91" d="M199.25,125.48c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-400" class="cls-25" d="M149.82,135.49c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-401" class="cls-27" d="M153.09,142.79c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-402" class="cls-77" d="M112.87,109.19c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-403" class="cls-88" d="M180.45,113.07c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-404" class="cls-62" d="M121.13,83.65c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-405" class="cls-7" d="M221.57,134.21c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-406" class="cls-78" d="M135.98,88.38c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-407" class="cls-94" d="M214.19,124.52c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-408" class="cls-88" d="M121.36,111.84c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-409" class="cls-43" d="M172.93,131.51c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-410" class="cls-90" d="M194.56,115.5c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-411" class="cls-79" d="M130.15,133.53c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-412" class="cls-13" d="M118.42,84.37c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-413" class="cls-51" d="M156.86,92.76c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-414" class="cls-15" d="M201.18,113.74c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-415" class="cls-113" d="M159.31,130.03c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-416" class="cls-70" d="M154.86,128.89c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-417" class="cls-70" d="M209.69,129.59c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-418" class="cls-58" d="M233.99,127.8c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-419" class="cls-60" d="M81.17,111.4c.69,.81,1.02,1.86,.93,2.91-.09,1.05-.61,2.03-1.43,2.71-.82,.68-1.88,1.01-2.95,.92-1.07-.09-2.05-.6-2.74-1.41-.69-.81-1.02-1.86-.93-2.91,.09-1.05,.61-2.03,1.43-2.71s1.88-1.01,2.95-.92c1.07,.09,2.05,.6,2.74,1.41Z"/><path id="C0_0_f65b318a5e-420" class="cls-82" d="M108.15,127.28c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-421" class="cls-42" d="M178.08,124.27c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-422" class="cls-6" d="M170.04,105.56c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-423" class="cls-15" d="M203.28,115.7c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-424" class="cls-83" d="M74.98,99.59c.69,.81,1.02,1.86,.93,2.91-.09,1.05-.61,2.03-1.43,2.71-.82,.68-1.88,1.01-2.95,.92-1.07-.09-2.05-.6-2.74-1.41-.69-.81-1.02-1.86-.93-2.91,.09-1.05,.61-2.03,1.43-2.71,.82-.68,1.88-1.01,2.95-.92,1.07,.09,2.05,.6,2.74,1.41Z"/><path id="C0_0_f65b318a5e-425" class="cls-26" d="M157.21,117.78c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-426" class="cls-76" d="M128.49,123.77c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-427" class="cls-19" d="M140.34,135.62c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-428" class="cls-17" d="M146.44,87.9c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-429" class="cls-88" d="M97.43,113.03c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-430" class="cls-81" d="M141,135.77c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-431" class="cls-88" d="M144.59,141.51c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-432" class="cls-103" d="M191.99,119.25c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-433" class="cls-90" d="M190.41,135.85c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-434" class="cls-120" d="M159.14,102.88c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-435" class="cls-43" d="M172.69,124.41c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-436" class="cls-65" d="M73.17,62.53c.69,.81,1.02,1.86,.93,2.91-.09,1.05-.61,2.03-1.43,2.71s-1.88,1.01-2.95,.92c-1.07-.09-2.05-.6-2.74-1.41-.69-.81-1.02-1.86-.93-2.91,.09-1.05,.61-2.03,1.43-2.71s1.88-1.01,2.95-.92c1.07,.09,2.05,.6,2.74,1.41Z"/><path id="C0_0_f65b318a5e-437" class="cls-114" d="M183.17,136.54c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-438" class="cls-59" d="M95.61,103.76c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-439" class="cls-88" d="M165.67,98.15c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-440" class="cls-19" d="M128.72,138.51c1.04,.26,2.14,.11,3.06-.43,.92-.54,1.59-1.42,1.85-2.44s.11-2.11-.44-3.02c-.55-.91-1.44-1.57-2.47-1.83-1.04-.26-2.14-.11-3.06,.43-.92,.54-1.59,1.42-1.85,2.44s-.11,2.11,.44,3.02c.55,.91,1.44,1.57,2.47,1.83Z"/><path id="C0_0_f65b318a5e-441" class="cls-85" d="M113.26,82.99c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-442" class="cls-70" d="M167.43,123.22c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-443" class="cls-70" d="M166.76,127.79c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-444" class="cls-67" d="M76.64,108.32c.69,.81,1.02,1.86,.93,2.91-.09,1.05-.61,2.03-1.43,2.71-.82,.68-1.88,1.01-2.95,.92-1.07-.09-2.05-.6-2.74-1.41-.69-.81-1.02-1.86-.93-2.91,.09-1.05,.61-2.03,1.43-2.71s1.88-1.01,2.95-.92c1.07,.09,2.05,.6,2.74,1.41Z"/><path id="C0_0_f65b318a5e-445" class="cls-70" d="M121.33,61.36c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-446" class="cls-38" d="M173.92,119.69c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-447" class="cls-34" d="M94.87,102.76c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-448" class="cls-115" d="M153.05,90.27c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-449" class="cls-75" d="M73.01,99.05c.69,.81,1.02,1.86,.93,2.91-.09,1.05-.61,2.03-1.43,2.71s-1.88,1.01-2.95,.92c-1.07-.09-2.05-.6-2.74-1.41s-1.02-1.86-.93-2.91c.09-1.05,.61-2.03,1.43-2.71s1.88-1.01,2.95-.92c1.07,.09,2.05,.6,2.74,1.41Z"/><path id="C0_0_f65b318a5e-450" class="cls-78" d="M112.3,97.23c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-451" class="cls-82" d="M144.18,98.21c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-452" class="cls-75" d="M100.05,64.04c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-453" class="cls-68" d="M128.22,64.04c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-454" class="cls-56" d="M85.64,111.48c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-455" class="cls-70" d="M194.89,138.59c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-456" class="cls-42" d="M172.38,139.94c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-457" class="cls-55" d="M127.73,117.14c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-458" class="cls-64" d="M115.61,118.47c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-459" class="cls-74" d="M115.94,66.54c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-460" class="cls-115" d="M147.08,128.14c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-461" class="cls-88" d="M157.55,144.98c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-462" class="cls-13" d="M153.82,109.27c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-463" class="cls-70" d="M203.58,120.34c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-464" class="cls-15" d="M198.97,111.07c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-465" class="cls-44" d="M175.61,142.24c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-466" class="cls-39" d="M149.92,117.53c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-467" class="cls-88" d="M244.66,126.27c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-468" class="cls-16" d="M216.24,118.93c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-469" class="cls-59" d="M88.43,109.86c.69,.81,1.02,1.86,.93,2.91-.09,1.05-.61,2.03-1.43,2.71-.82,.68-1.88,1.01-2.95,.92-1.07-.09-2.05-.6-2.74-1.41s-1.02-1.86-.93-2.91c.09-1.05,.61-2.03,1.43-2.71,.82-.68,1.88-1.01,2.95-.92,1.07,.09,2.05,.6,2.74,1.41Z"/><path id="C0_0_f65b318a5e-470" class="cls-37" d="M149.16,116.18c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-471" class="cls-13" d="M140.16,115.35c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-472" class="cls-23" d="M143.53,92.34c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-473" class="cls-75" d="M106.5,60.88c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-474" class="cls-70" d="M95.93,125.22c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-475" class="cls-31" d="M175.35,115.66c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-476" class="cls-44" d="M189.88,115.89c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-477" class="cls-70" d="M131.73,110.52c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-478" class="cls-97" d="M209.9,120.84c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-479" class="cls-6" d="M176.81,114.04c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-480" class="cls-70" d="M170.79,119.7c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-481" class="cls-21" d="M116.85,135.69c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-482" class="cls-13" d="M113.56,122.44c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18s.39-2.08-.03-3.05c-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-483" class="cls-113" d="M165.72,137.8c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-484" class="cls-58" d="M182.83,118.12c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-485" class="cls-71" d="M109.83,65.41c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-486" class="cls-115" d="M179.9,124.84c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-487" class="cls-70" d="M200.22,102.8c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-488" class="cls-24" d="M174.19,129.7c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-489" class="cls-31" d="M184.69,105.15c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-490" class="cls-98" d="M217.71,115.36c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-491" class="cls-22" d="M167.65,115c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-492" class="cls-79" d="M144.06,110.5c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-493" class="cls-70" d="M190.67,113.18c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-494" class="cls-70" d="M176.6,136.01c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-495" class="cls-70" d="M124,79.49c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-496" class="cls-115" d="M180.2,123.88c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-497" class="cls-71" d="M71.36,64.07c.69,.81,1.02,1.86,.93,2.91-.09,1.05-.61,2.03-1.43,2.71s-1.88,1.01-2.95,.92c-1.07-.09-2.05-.6-2.74-1.41-.69-.81-1.02-1.86-.93-2.91,.09-1.05,.61-2.03,1.43-2.71s1.88-1.01,2.95-.92c1.07,.09,2.05,.6,2.74,1.41Z"/><path id="C0_0_f65b318a5e-498" class="cls-65" d="M246.33,130.4c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-499" class="cls-19" d="M134.14,140.7c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-500" class="cls-65" d="M155.92,103.64c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-501" class="cls-87" d="M231.26,132.86c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-502" class="cls-29" d="M173.09,95.23c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-503" class="cls-94" d="M202.33,137.37c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-504" class="cls-35" d="M176.86,112.8c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-505" class="cls-7" d="M120.59,97.27c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-506" class="cls-122" d="M218.89,126.42c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-507" class="cls-23" d="M119.12,126.42c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-508" class="cls-7" d="M144.76,108.36c.72,.79,1.72,1.26,2.79,1.31,1.07,.06,2.12-.31,2.91-1.02s1.27-1.7,1.33-2.75c.06-1.06-.32-2.09-1.03-2.88-.72-.79-1.72-1.26-2.79-1.31-1.07-.06-2.12,.31-2.91,1.02s-1.27,1.7-1.33,2.75c-.06,1.06,.32,2.09,1.03,2.88Z"/><path id="C0_0_f65b318a5e-509" class="cls-58" d="M159.73,136.46c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14-.99-.4-2.1-.39-3.09,.03-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-510" class="cls-88" d="M207.91,116.53c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-511" class="cls-60" d="M69.54,93.41c.69,.81,1.02,1.86,.93,2.91-.09,1.05-.61,2.03-1.43,2.71s-1.88,1.01-2.95,.92c-1.07-.09-2.05-.6-2.74-1.41s-1.02-1.86-.93-2.91c.09-1.05,.61-2.03,1.43-2.71,.82-.68,1.88-1.01,2.95-.92,1.07,.09,2.05,.6,2.74,1.41Z"/><path id="C0_0_f65b318a5e-512" class="cls-58" d="M114.59,107.87c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18s-.39,2.08,.03,3.05c.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-513" class="cls-105" d="M251.56,129.09c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-514" class="cls-70" d="M212.73,139.18c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-515" class="cls-88" d="M120.87,68.71c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-516" class="cls-58" d="M218.7,118.6c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-517" class="cls-66" d="M114.2,55.2c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-518" class="cls-80" d="M100.71,130.39c.99,.4,2.1,.39,3.09-.03,.99-.41,1.76-1.2,2.17-2.18,.4-.98,.39-2.08-.03-3.05-.42-.97-1.21-1.74-2.2-2.14s-2.1-.39-3.09,.03c-.99,.41-1.76,1.2-2.17,2.18-.4,.98-.39,2.08,.03,3.05,.42,.97,1.21,1.74,2.2,2.14Z"/><path id="C0_0_f65b318a5e-519" class="cls-65" d="M181.19,126.33c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-520" class="cls-78" d="M177.23,108.82c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-521" class="cls-58" d="M128.05,86.45c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-522" class="cls-65" d="M199.24,117.3c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-523" class="cls-88" d="M215.91,124.69c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-524" class="cls-70" d="M182.99,134.22c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-525" class="cls-71" d="M125.09,59.68c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-526" class="cls-107" d="M100.96,60.95c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-527" class="cls-78" d="M65.91,70.25c.69,.81,1.02,1.86,.93,2.91-.09,1.05-.61,2.03-1.43,2.71-.82,.68-1.88,1.01-2.95,.92-1.07-.09-2.05-.6-2.74-1.41-.69-.81-1.02-1.86-.93-2.91,.09-1.05,.61-2.03,1.43-2.71s1.88-1.01,2.95-.92c1.07,.09,2.05,.6,2.74,1.41Z"/><path id="C0_0_f65b318a5e-528" class="cls-78" d="M74.08,90.32c.69,.81,1.02,1.86,.93,2.91-.09,1.05-.61,2.03-1.43,2.71-.82,.68-1.88,1.01-2.95,.92-1.07-.09-2.05-.6-2.74-1.41-.69-.81-1.02-1.86-.93-2.91,.09-1.05,.61-2.03,1.43-2.71,.82-.68,1.88-1.01,2.95-.92,1.07,.09,2.05,.6,2.74,1.41Z"/><path id="C0_0_f65b318a5e-529" class="cls-90" d="M121.87,57.71c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-530" class="cls-65" d="M84.63,59.41c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-531" class="cls-67" d="M85.71,103.68c.69,.81,1.02,1.86,.93,2.91-.09,1.05-.61,2.03-1.43,2.71s-1.88,1.01-2.95,.92c-1.07-.09-2.05-.6-2.74-1.41-.69-.81-1.02-1.86-.93-2.91,.09-1.05,.61-2.03,1.43-2.71,.82-.68,1.88-1.01,2.95-.92,1.07,.09,2.05,.6,2.74,1.41Z"/><path id="C0_0_f65b318a5e-532" class="cls-70" d="M121.1,71.59c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-533" class="cls-78" d="M77.7,98.04c.69,.81,1.02,1.86,.93,2.91-.09,1.05-.61,2.03-1.43,2.71-.82,.68-1.88,1.01-2.95,.92-1.07-.09-2.05-.6-2.74-1.41s-1.02-1.86-.93-2.91c.09-1.05,.61-2.03,1.43-2.71,.82-.68,1.88-1.01,2.95-.92,1.07,.09,2.05,.6,2.74,1.41Z"/><path id="C0_0_f65b318a5e-534" class="cls-71" d="M85.54,57.87c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-535" class="cls-60" d="M66.82,79.51c.69,.81,1.02,1.86,.93,2.91-.09,1.05-.61,2.03-1.43,2.71-.82,.68-1.88,1.01-2.95,.92-1.07-.09-2.05-.6-2.74-1.41-.69-.81-1.02-1.86-.93-2.91,.09-1.05,.61-2.03,1.43-2.71,.82-.68,1.88-1.01,2.95-.92,1.07,.09,2.05,.6,2.74,1.41Z"/><path id="C0_0_f65b318a5e-536" class="cls-78" d="M113.97,65.43c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-537" class="cls-100" d="M179.88,46.83c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-538" class="cls-123" d="M185.51,40.8c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-539" class="cls-45" d="M206.05,42.02c.45-.96,.5-2.05,.13-3.05-.37-.99-1.12-1.8-2.09-2.25-.97-.45-2.08-.49-3.09-.13-1.01,.36-1.83,1.1-2.28,2.06-.45,.96-.5,2.05-.13,3.05,.37,.99,1.12,1.8,2.09,2.25,.97,.45,2.08,.49,3.09,.13,1.01-.36,1.83-1.1,2.28-2.06Z"/><path id="C0_0_f65b318a5e-540" class="cls-106" d="M208.3,36.49c.45-.96,.5-2.05,.13-3.05-.37-.99-1.12-1.8-2.09-2.25s-2.08-.49-3.09-.13c-1.01,.36-1.83,1.1-2.28,2.06s-.5,2.05-.13,3.05c.37,.99,1.12,1.8,2.09,2.25s2.08,.49,3.09,.13c1.01-.36,1.83-1.1,2.28-2.06Z"/><path id="C0_0_f65b318a5e-541" class="cls-116" d="M186.63,49.84c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-542" class="cls-92" d="M192.26,47.83c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-543" class="cls-49" d="M209.43,28.45c.45-.96,.5-2.05,.13-3.05-.37-.99-1.12-1.8-2.09-2.25s-2.08-.49-3.09-.13c-1.01,.36-1.83,1.1-2.28,2.06-.45,.96-.5,2.05-.13,3.05,.37,.99,1.12,1.8,2.09,2.25s2.08,.49,3.09,.13c1.01-.36,1.83-1.1,2.28-2.06Z"/><path id="C0_0_f65b318a5e-544" class="cls-92" d="M199.86,43.02c.45-.96,.5-2.05,.13-3.05-.37-.99-1.12-1.8-2.09-2.25-.97-.45-2.08-.49-3.09-.13-1.01,.36-1.83,1.1-2.28,2.06-.45,.96-.5,2.05-.13,3.05,.37,.99,1.12,1.8,2.09,2.25s2.08,.49,3.09,.13c1.01-.36,1.83-1.1,2.28-2.06Z"/><path id="C0_0_f65b318a5e-545" class="cls-6" d="M205.49,46.54c.45-.96,.5-2.05,.13-3.05-.37-.99-1.12-1.8-2.09-2.25-.97-.45-2.08-.49-3.09-.13-1.01,.36-1.83,1.1-2.28,2.06-.45,.96-.5,2.05-.13,3.05,.37,.99,1.12,1.8,2.09,2.25,.97,.45,2.08,.49,3.09,.13,1.01-.36,1.83-1.1,2.28-2.06Z"/><path id="C0_0_f65b318a5e-546" class="cls-46" d="M207.74,40.01c.45-.96,.5-2.05,.13-3.05-.37-.99-1.12-1.8-2.09-2.25s-2.08-.49-3.09-.13c-1.01,.36-1.83,1.1-2.28,2.06s-.5,2.05-.13,3.05c.37,.99,1.12,1.8,2.09,2.25,.97,.45,2.08,.49,3.09,.13,1.01-.36,1.83-1.1,2.28-2.06Z"/><path id="C0_0_f65b318a5e-547" class="cls-48" d="M195.36,37.5c.45-.96,.5-2.05,.13-3.05-.37-.99-1.12-1.8-2.09-2.25-.97-.45-2.08-.49-3.09-.13-1.01,.36-1.83,1.1-2.28,2.06-.45,.96-.5,2.05-.13,3.05,.37,.99,1.12,1.8,2.09,2.25s2.08,.49,3.09,.13c1.01-.36,1.83-1.1,2.28-2.06Z"/><path id="C0_0_f65b318a5e-548" class="cls-99" d="M190.01,40.3c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-549" class="cls-61" d="M186.07,42.81c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-550" class="cls-61" d="M175.94,41.81c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-551" class="cls-86" d="M197.33,50.85c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-552" class="cls-50" d="M206.61,34.48c.45-.96,.5-2.05,.13-3.05-.37-.99-1.12-1.8-2.09-2.25-.97-.45-2.08-.49-3.09-.13-1.01,.36-1.83,1.1-2.28,2.06s-.5,2.05-.13,3.05c.37,.99,1.12,1.8,2.09,2.25,.97,.45,2.08,.49,3.09,.13,1.01-.36,1.83-1.1,2.28-2.06Z"/><path id="C0_0_f65b318a5e-553" class="cls-107" d="M192.82,37.79c1.07,0,2.1-.42,2.85-1.17,.76-.75,1.18-1.76,1.18-2.82s-.43-2.07-1.18-2.82c-.76-.75-1.78-1.17-2.85-1.17s-2.1,.42-2.85,1.17c-.76,.75-1.18,1.76-1.18,2.82s.43,2.07,1.18,2.82c.76,.75,1.78,1.17,2.85,1.17Z"/><path id="C0_0_f65b318a5e-554" class="cls-48" d="M202.67,36.99c.45-.96,.5-2.05,.13-3.05-.37-.99-1.12-1.8-2.09-2.25-.97-.45-2.08-.49-3.09-.13-1.01,.36-1.83,1.1-2.28,2.06-.45,.96-.5,2.05-.13,3.05,.37,.99,1.12,1.8,2.09,2.25,.97,.45,2.08,.49,3.09,.13,1.01-.36,1.83-1.1,2.28-2.06Z"/><path id="C0_0_f65b318a5e-555" class="cls-100" d="M188.88,36.85c1.07,0,2.1,.42,2.85,1.17,.76,.75,1.18,1.76,1.18,2.82s-.43,2.07-1.18,2.82c-.76,.75-1.78,1.17-2.85,1.17s-2.1-.42-2.85-1.17c-.76-.75-1.18-1.76-1.18-2.82s.43-2.07,1.18-2.82c.76-.75,1.78-1.17,2.85-1.17Z"/><path id="C0_0_f65b318a5e-556" class="cls-123" d="M195.64,43.38c1.07,0,2.1,.42,2.85,1.17,.76,.75,1.18,1.76,1.18,2.82s-.43,2.07-1.18,2.82c-.76,.75-1.78,1.17-2.85,1.17s-2.1-.42-2.85-1.17c-.76-.75-1.18-1.76-1.18-2.82s.43-2.07,1.18-2.82c.76-.75,1.78-1.17,2.85-1.17Z"/><path id="C0_0_f65b318a5e-557" class="cls-61" d="M205.21,29.31c1.07,0,2.1,.42,2.85,1.17,.76,.75,1.18,1.76,1.18,2.82s-.43,2.07-1.18,2.82c-.76,.75-1.78,1.17-2.85,1.17s-2.1-.42-2.85-1.17c-.76-.75-1.18-1.76-1.18-2.82s.43-2.07,1.18-2.82c.76-.75,1.78-1.17,2.85-1.17Z"/><path id="C0_0_f65b318a5e-558" class="cls-106" d="M202.33,28.28c.13-1.05,.68-2,1.53-2.65,.85-.65,1.92-.94,2.98-.81s2.03,.67,2.69,1.51c.66,.83,.95,1.89,.82,2.94s-.68,2-1.53,2.65c-.85,.65-1.92,.94-2.98,.81s-2.03-.67-2.69-1.51c-.66-.83-.95-1.89-.82-2.94Z"/><path id="C0_0_f65b318a5e-559" class="cls-107" d="M172.56,43.88c1.07,0,2.1,.42,2.85,1.17,.76,.75,1.18,1.76,1.18,2.82s-.43,2.07-1.18,2.82c-.76,.75-1.78,1.17-2.85,1.17s-2.1-.42-2.85-1.17c-.76-.75-1.18-1.76-1.18-2.82s.43-2.07,1.18-2.82c.76-.75,1.78-1.17,2.85-1.17Z"/><path id="C0_0_f65b318a5e-560" class="cls-92" d="M181.57,35.84c1.07,0,2.1,.42,2.85,1.17,.76,.75,1.18,1.76,1.18,2.82s-.43,2.07-1.18,2.82c-.76,.75-1.78,1.17-2.85,1.17s-2.1-.42-2.85-1.17c-.76-.75-1.18-1.76-1.18-2.82s.43-2.07,1.18-2.82c.76-.75,1.78-1.17,2.85-1.17Z"/><path id="C0_0_f65b318a5e-561" class="cls-107" d="M203.8,22.57c.45-.96,1.27-1.7,2.28-2.06,1.01-.36,2.12-.31,3.09,.13s1.72,1.26,2.09,2.25c.37,.99,.32,2.09-.13,3.05-.45,.96-1.27,1.7-2.28,2.06-1.01,.36-2.12,.31-3.09-.13-.97-.45-1.72-1.26-2.09-2.25-.37-.99-.32-2.09,.13-3.05Z"/><path id="C0_0_f65b318a5e-562" class="cls-6" d="M180.44,38.36c1.07,0,2.1,.42,2.85,1.17,.76,.75,1.18,1.76,1.18,2.82s-.43,2.07-1.18,2.82c-.76,.75-1.78,1.17-2.85,1.17s-2.1-.42-2.85-1.17c-.76-.75-1.18-1.76-1.18-2.82s.43-2.07,1.18-2.82c.76-.75,1.78-1.17,2.85-1.17Z"/><path id="C0_0_f65b318a5e-563" class="cls-47" d="M194.23,35.63c.45-.96,1.27-1.7,2.28-2.06,1.01-.36,2.12-.31,3.09,.13,.97,.45,1.72,1.26,2.09,2.25,.37,.99,.32,2.09-.13,3.05s-1.27,1.7-2.28,2.06c-1.01,.36-2.12,.31-3.09-.13-.97-.45-1.72-1.26-2.09-2.25-.37-.99-.32-2.09,.13-3.05Z"/><path id="C0_0_f65b318a5e-564" class="cls-46" d="M195.92,32.62c.45-.96,1.27-1.7,2.28-2.06,1.01-.36,2.12-.31,3.09,.13s1.72,1.26,2.09,2.25c.37,.99,.32,2.09-.13,3.05s-1.27,1.7-2.28,2.06c-1.01,.36-2.12,.31-3.09-.13-.97-.45-1.72-1.26-2.09-2.25-.37-.99-.32-2.09,.13-3.05Z"/><path id="C0_0_f65b318a5e-565" class="cls-48" d="M193.1,41.66c.45-.96,1.27-1.7,2.28-2.06,1.01-.36,2.12-.31,3.09,.13s1.72,1.26,2.09,2.25c.37,.99,.32,2.09-.13,3.05-.45,.96-1.27,1.7-2.28,2.06-1.01,.36-2.12,.31-3.09-.13-.97-.45-1.72-1.26-2.09-2.25-.37-.99-.32-2.09,.13-3.05Z"/><path id="C0_0_f65b318a5e-566" class="cls-99" d="M178.19,42.37c1.07,0,2.1,.42,2.85,1.17,.76,.75,1.18,1.76,1.18,2.82s-.43,2.07-1.18,2.82c-.76,.75-1.78,1.17-2.85,1.17s-2.1-.42-2.85-1.17c-.76-.75-1.18-1.76-1.18-2.82s.43-2.07,1.18-2.82c.76-.75,1.78-1.17,2.85-1.17Z"/><path id="C0_0_f65b318a5e-567" class="cls-100" d="M176.5,40.36c1.07,0,2.1,.42,2.85,1.17,.76,.75,1.18,1.76,1.18,2.82s-.43,2.07-1.18,2.82c-.76,.75-1.78,1.17-2.85,1.17s-2.1-.42-2.85-1.17c-.76-.75-1.18-1.76-1.18-2.82s.43-2.07,1.18-2.82c.76-.75,1.78-1.17,2.85-1.17Z"/><path id="C0_0_f65b318a5e-568" class="cls-88" d="M165.81,41.37c1.07,0,2.1,.42,2.85,1.17,.76,.75,1.18,1.76,1.18,2.82s-.43,2.07-1.18,2.82c-.76,.75-1.78,1.17-2.85,1.17s-2.1-.42-2.85-1.17c-.76-.75-1.18-1.76-1.18-2.82s.43-2.07,1.18-2.82c.76-.75,1.78-1.17,2.85-1.17Z"/><path id="C0_0_f65b318a5e-569" class="cls-86" d="M170.31,37.85c1.07,0,2.1,.42,2.85,1.17,.76,.75,1.18,1.76,1.18,2.82s-.43,2.07-1.18,2.82c-.76,.75-1.78,1.17-2.85,1.17s-2.1-.42-2.85-1.17c-.76-.75-1.18-1.76-1.18-2.82s.43-2.07,1.18-2.82c.76-.75,1.78-1.17,2.85-1.17Z"/><path id="C0_0_f65b318a5e-570" class="cls-6" d="M197.61,29.1c.45-.96,1.27-1.7,2.28-2.06,1.01-.36,2.12-.31,3.09,.13,.97,.45,1.72,1.26,2.09,2.25,.37,.99,.32,2.09-.13,3.05-.45,.96-1.27,1.7-2.28,2.06-1.01,.36-2.12,.31-3.09-.13-.97-.45-1.72-1.26-2.09-2.25-.37-.99-.32-2.09,.13-3.05Z"/><path id="C0_0_f65b318a5e-571" class="cls-6" d="M182.69,44.38c1.07,0,2.1,.42,2.85,1.17,.76,.75,1.18,1.76,1.18,2.82s-.43,2.07-1.18,2.82c-.76,.75-1.78,1.17-2.85,1.17s-2.1-.42-2.85-1.17c-.76-.75-1.18-1.76-1.18-2.82s.43-2.07,1.18-2.82c.76-.75,1.78-1.17,2.85-1.17Z"/><path id="C0_0_f65b318a5e-572" class="cls-107" d="M189.73,37.64c.45-.96,1.27-1.7,2.28-2.06,1.01-.36,2.12-.31,3.09,.13s1.72,1.26,2.09,2.25c.37,.99,.32,2.09-.13,3.05s-1.27,1.7-2.28,2.06c-1.01,.36-2.12,.31-3.09-.13-.97-.45-1.72-1.26-2.09-2.25-.37-.99-.32-2.09,.13-3.05Z"/><path id="C0_0_f65b318a5e-573" class="cls-78" d="M78.28,71.76c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-574" class="cls-7" d="M108.72,86.55c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-575" class="cls-63" d="M77.38,79.48c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-576" class="cls-71" d="M99.14,81.03c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-577" class="cls-13" d="M83.73,77.94c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-578" class="cls-85" d="M103.18,91.08c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-579" class="cls-70" d="M86.45,74.85c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-580" class="cls-71" d="M98.24,73.31c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-581" class="cls-107" d="M74.66,82.57c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-582" class="cls-88" d="M97.33,94.92c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-583" class="cls-107" d="M73.75,90.29c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-584" class="cls-71" d="M90.07,93.38c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-585" class="cls-74" d="M80.1,88.75c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-586" class="cls-115" d="M93.7,101.1c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-587" class="cls-88" d="M81.91,87.2c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02,.47-.95,.54-2.05,.19-3.04-.35-1-1.09-1.82-2.05-2.28-.96-.46-2.07-.53-3.08-.19-1.01,.34-1.84,1.07-2.31,2.02-.47,.95-.54,2.05-.19,3.04,.35,1,1.09,1.82,2.05,2.28Z"/><path id="C0_0_f65b318a5e-588" class="cls-107" d="M88.26,84.11c.96,.46,2.07,.53,3.08,.19,1.01-.34,1.84-1.07,2.31-2.02s.54-2.05,.19-3.04c-.35-1-1.09-1.82-2.05-2.28s-2.07-.53-3.08-.19c-1.01,.34-1.84,1.07-2.31,2.02s-.54,2.05-.19,3.04c.35,1,1.09,1.82,2.05,2.28Z"/></g></g><g><line class="cls-1" x1="235.36" y1="82.68" x2="319.18" y2="82.68"/><polygon class="cls-3" points="316.99 90.15 329.94 82.68 316.99 75.2 316.99 90.15"/></g><g><polyline class="cls-2" points="392.25 163.47 348.19 163.47 348.19 121.78"/><text class="cls-4" transform="translate(348.19 179.93)"><tspan x="0" y="0">dim-2</tspan></text><text class="cls-4" transform="translate(343.48 163.47) rotate(-90)"><tspan x="0" y="0">dim-1</tspan></text></g><g><polyline class="cls-2" points="98.5 163.47 54.44 163.47 54.44 121.78"/><text class="cls-4" transform="translate(54.44 179.93)"><tspan x="0" y="0">dim-2</tspan></text><text class="cls-4" transform="translate(49.73 163.47) rotate(-90)"><tspan x="0" y="0">dim-1</tspan></text></g></svg>
\ No newline at end of file
diff --git a/src/tasks/denoising/control_methods/no_denoising/config.vsh.yaml b/src/tasks/denoising/control_methods/no_denoising/config.vsh.yaml
new file mode 100644
index 0000000000..64a35f9986
--- /dev/null
+++ b/src/tasks/denoising/control_methods/no_denoising/config.vsh.yaml
@@ -0,0 +1,22 @@
+__merge__: ../../api/comp_control_method.yaml
+functionality:
+  name: "no_denoising"
+  info:
+    label: No Denoising
+    summary: "negative control by copying train counts"
+    description: "This method serves as a negative control, where the denoised data is a copy of the unaltered training data. This represents the scoring threshold if denoising was not performed on the data."
+    v1:
+      path: openproblems/tasks/denoising/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    variants:
+      no_denoising:
+    preferred_normalization: counts
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/denoising/control_methods/no_denoising/script.py b/src/tasks/denoising/control_methods/no_denoising/script.py
new file mode 100644
index 0000000000..97c9a4184c
--- /dev/null
+++ b/src/tasks/denoising/control_methods/no_denoising/script.py
@@ -0,0 +1,22 @@
+import anndata as ad
+
+## VIASH START
+par = {
+    'input_train': 'output_train.h5ad',
+    'output': 'output_ND.h5ad',
+}
+meta = {
+    'functionality_name': 'foo',
+}
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+
+print("Process data", flush=True)
+input_train.layers["denoised"] = input_train.layers['counts']
+
+input_train.uns["method_id"] = meta['functionality_name']
+
+print("Write Data", flush=True)
+input_train.write_h5ad(par['output'],compression="gzip")
diff --git a/src/tasks/denoising/control_methods/perfect_denoising/config.vsh.yaml b/src/tasks/denoising/control_methods/perfect_denoising/config.vsh.yaml
new file mode 100644
index 0000000000..b16862360b
--- /dev/null
+++ b/src/tasks/denoising/control_methods/perfect_denoising/config.vsh.yaml
@@ -0,0 +1,22 @@
+__merge__: ../../api/comp_control_method.yaml
+functionality:
+  name: "perfect_denoising"
+  info:
+    label: Perfect Denoising
+    summary: "Positive control by copying the test counts"
+    description: "This method serves as a positive control, where the test data is copied 1-to-1 to the denoised data. This makes it seem as if the data is perfectly denoised as it will be compared to the test data in the metrics."
+    v1:
+      path: openproblems/tasks/denoising/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    variants: 
+      perfect_denoising:
+    preferred_normalization: counts
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/denoising/control_methods/perfect_denoising/script.py b/src/tasks/denoising/control_methods/perfect_denoising/script.py
new file mode 100644
index 0000000000..c280a4a3bc
--- /dev/null
+++ b/src/tasks/denoising/control_methods/perfect_denoising/script.py
@@ -0,0 +1,24 @@
+import anndata as ad
+
+## VIASH START
+par = {
+    'input_train': 'resources_test/denoising/pancreas/train.h5ad',
+    'input_test': 'resources_test/denoising/pancreas/test.h5ad',
+    'output': 'output_PD.h5ad',
+}
+meta = {
+    'functionality_name': 'foo',
+}
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Process data", flush=True)
+input_train.layers["denoised"] = input_test.layers['counts']
+
+input_train.uns["method_id"] = meta['functionality_name']
+
+print("Write Data", flush=True)
+input_train.write_h5ad(par['output'],compression="gzip")
diff --git a/src/tasks/denoising/methods/alra/config.vsh.yaml b/src/tasks/denoising/methods/alra/config.vsh.yaml
new file mode 100644
index 0000000000..374d317fce
--- /dev/null
+++ b/src/tasks/denoising/methods/alra/config.vsh.yaml
@@ -0,0 +1,43 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "alra"
+  info:
+    label: ALRA
+    summary: "ALRA imputes missing values in scRNA-seq data by computing rank-k approximation, thresholding by gene, and rescaling the matrix."
+    description: |
+      Adaptively-thresholded Low Rank Approximation (ALRA). 
+      
+      ALRA is a method for imputation of missing values in single cell RNA-sequencing data, 
+      described in the preprint, "Zero-preserving imputation of scRNA-seq data using low-rank approximation" 
+      available [here](https://www.biorxiv.org/content/early/2018/08/22/397588). Given a 
+      scRNA-seq expression matrix, ALRA first computes its rank-k approximation using randomized SVD. 
+      Next, each row (gene) is thresholded by the magnitude of the most negative value of that gene. 
+      Finally, the matrix is rescaled.
+    reference: "linderman2018zero"
+    repository_url: "https://github.com/KlugerLab/ALRA"
+    documentation_url: https://github.com/KlugerLab/ALRA/blob/master/README.md
+    v1:
+      path: openproblems/tasks/denoising/methods/alra.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    variants: 
+      alra:
+    preferred_normalization: counts
+  arguments:
+    - name: "--norm"
+      type: string
+      choices: ["sqrt", "log"]
+      default: "log"
+      description: Normalization method
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [ Matrix, rsvd ]
+        github: KlugerLab/ALRA
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/denoising/methods/alra/script.R b/src/tasks/denoising/methods/alra/script.R
new file mode 100644
index 0000000000..9a5b237c6f
--- /dev/null
+++ b/src/tasks/denoising/methods/alra/script.R
@@ -0,0 +1,53 @@
+cat(">> Loading dependencies\n")
+library(anndata, warn.conflicts = FALSE)
+library(ALRA, warn.conflicts = FALSE)
+
+## VIASH START
+par <- list(
+  input_train = "resources_test/denoising/pancreas/train.h5ad",
+  norm = "log",
+  output = "output.h5ad"
+)
+meta <- list(
+  functionality_name = "alra"
+)
+## VIASH END
+
+cat(">> Load input data\n")
+input_train <- read_h5ad(par$input_train, backed = "r")
+
+cat(">> Set normalization method\n")
+if (par$norm == "sqrt") {
+  norm_fn <- sqrt
+  denorm_fn <- function(x) x^2
+} else if (par$norm == "log") {
+  norm_fn <- log1p
+  denorm_fn <- expm1
+} else {
+  stop("Unknown normalization method: ", par$norm)
+}
+
+cat(">> Normalize data\n")
+data <- as.matrix(input_train$layers[["counts"]])
+totalPerCell <- rowSums(data)
+data <- sweep(data, 1, totalPerCell, "/")
+data <- norm_fn(data)
+
+cat(">> Run ALRA\n")
+data <- alra(data)$A_norm_rank_k_cor_sc
+data <- denorm_fn(data)
+data <- sweep(data, 1, totalPerCell, "*")
+
+cat(">> Store output\n")
+output <- AnnData(
+  layers = list(denoised = data),
+  obs = input_train$obs[, c(), drop = FALSE],
+  var = input_train$var[, c(), drop = FALSE],
+  uns = list(
+    dataset_id = input_train$uns[["dataset_id"]],
+    method_id = meta$functionality_name
+  )
+)
+
+cat(">> Write output to file\n")
+output$write_h5ad(par$output, compression = "gzip")
diff --git a/src/tasks/denoising/methods/dca/config.vsh.yaml b/src/tasks/denoising/methods/dca/config.vsh.yaml
new file mode 100644
index 0000000000..33c6079866
--- /dev/null
+++ b/src/tasks/denoising/methods/dca/config.vsh.yaml
@@ -0,0 +1,45 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "dca"
+  info:
+    label: DCA
+    summary: "A deep autoencoder with ZINB loss function to address the dropout effect in count data"
+    description: |
+      "Deep Count Autoencoder
+
+      Removes the dropout effect by taking the count structure, overdispersed nature and sparsity of the data into account 
+      using a deep autoencoder with zero-inflated negative binomial (ZINB) loss function."
+    reference: "eraslan2019single"
+    documentation_url: "https://github.com/theislab/dca#readme"
+    repository_url: "https://github.com/theislab/dca"
+    v1:
+      path: openproblems/tasks/denoising/methods/dca.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    variants: 
+      dca:
+    preferred_normalization: counts
+  arguments:
+    - name: "--epochs"
+      type: "integer"
+      default: 300
+      description: "Number of total epochs in training"
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: python:3.9
+    setup:
+      - type: apt
+        packages: procps
+      - type: python
+        packages:
+          - anndata~=0.8.0
+          - scanpy
+          - pyyaml
+          - requests
+          - jsonschema
+          - "git+https://github.com/scottgigante-immunai/dca.git@patch-1"
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/denoising/methods/dca/script.py b/src/tasks/denoising/methods/dca/script.py
new file mode 100644
index 0000000000..d35f3c00a5
--- /dev/null
+++ b/src/tasks/denoising/methods/dca/script.py
@@ -0,0 +1,39 @@
+import anndata as ad
+from dca.api import dca
+
+## VIASH START
+par = {
+    'input_train': 'resources_test/denoising/pancreas/train.h5ad',
+    'output': 'output_dca.h5ad',
+    'epochs': 300,
+}
+meta = {
+    'functionality_name': 'dca',
+}
+## VIASH END
+
+print("load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'], backed="r")
+
+print("Remove unneeded data", flush=True)
+output = ad.AnnData(
+    X=input_train.layers["counts"],
+    obs=input_train.obs[[]],
+    var=input_train.var[[]],
+    uns={
+        "dataset_id": input_train.uns["dataset_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+del input_train
+
+print("Run DCA", flush=True)
+dca(output, epochs=par["epochs"])
+
+print("Move output to correct location", flush=True)
+output.layers["denoised"] = output.X
+del output.X
+
+print("Writing data", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
diff --git a/src/tasks/denoising/methods/knn_smoothing/config.vsh.yaml b/src/tasks/denoising/methods/knn_smoothing/config.vsh.yaml
new file mode 100644
index 0000000000..b0c55ae0d8
--- /dev/null
+++ b/src/tasks/denoising/methods/knn_smoothing/config.vsh.yaml
@@ -0,0 +1,41 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "knn_smoothing"
+  info:
+    label: KNN Smoothing
+    summary: "Iterative kNN-smoothing denoises scRNA-seq data by iteratively increasing the size of neighbourhoods for smoothing until a maximum k value is reached."
+    description: "Iterative kNN-smoothing is a method to repair or denoise noisy scRNA-seq
+        expression matrices. Given a scRNA-seq expression matrix, KNN-smoothing first
+        applies initial normalisation and smoothing. Then, a chosen number of
+        principal components is used to calculate Euclidean distances between cells.
+        Minimally sized neighbourhoods are initially determined from these Euclidean
+        distances, and expression profiles are shared between neighbouring cells.
+        Then, the resultant smoothed matrix is used as input to the next step of
+        smoothing, where the size (k) of the considered neighbourhoods is increased,
+        leading to greater smoothing. This process continues until a chosen maximum k
+        value has been reached, at which point the iteratively smoothed object is
+        then optionally scaled to yield a final result."
+    reference: "wagner2018knearest"
+    documentation_url: "https://github.com/yanailab/knn-smoothing#readme"
+    repository_url: "https://github.com/yanailab/knn-smoothing"
+    v1:
+      path: openproblems/tasks/denoising/methods/knn_smoothing.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    variants: 
+      knn_smoothing:
+    preferred_normalization: counts
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages:
+          - scipy
+        github:
+          - scottgigante-immunai/knn-smoothing@python_package
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/denoising/methods/knn_smoothing/script.py b/src/tasks/denoising/methods/knn_smoothing/script.py
new file mode 100644
index 0000000000..450da2012a
--- /dev/null
+++ b/src/tasks/denoising/methods/knn_smoothing/script.py
@@ -0,0 +1,39 @@
+import knn_smooth
+import anndata as ad
+
+## VIASH START
+par = {
+    'input_train': 'resources_test/denoising/pancreas/train.h5ad',
+    'output': 'output_knn.h5ad',
+}
+meta = {
+    'functionality_name': 'foo',
+}
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par["input_train"], backed="r")
+
+print("Remove unneeded data", flush=True)
+X = input_train.layers["counts"].astype(float).transpose().toarray()
+
+# Create output AnnData for later use
+output = ad.AnnData(
+    obs=input_train.obs[[]],
+    var=input_train.var[[]],
+    uns={
+        "dataset_id": input_train.uns["dataset_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+del input_train
+
+print("Run KNN smoothing", flush=True)
+X = knn_smooth.knn_smoothing(X, k=10).transpose()
+
+print("Process data", flush=True)
+output.layers["denoised"] = X
+
+print("Writing data", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
diff --git a/src/tasks/denoising/methods/magic/config.vsh.yaml b/src/tasks/denoising/methods/magic/config.vsh.yaml
new file mode 100644
index 0000000000..380666a1b5
--- /dev/null
+++ b/src/tasks/denoising/methods/magic/config.vsh.yaml
@@ -0,0 +1,63 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "magic"
+  info:
+    label: MAGIC
+    summary: "MAGIC imputes and denoises scRNA-seq data that is noisy or dropout-prone."
+    description: "MAGIC (Markov Affinity-based Graph Imputation of Cells) is a method for
+        imputation and denoising of noisy or dropout-prone single cell RNA-sequencing
+        data. Given a normalised scRNA-seq expression matrix, it first calculates
+        Euclidean distances between each pair of cells in the dataset, which is then
+        augmented using a Gaussian kernel (function) and row-normalised to give a
+        normalised affinity matrix. A t-step markov process is then calculated, by
+        powering this affinity matrix t times. Finally, the powered affinity matrix
+        is right-multiplied by the normalised data, causing the final imputed values
+        to take the value of a per-gene average weighted by the affinities of cells.
+        The resultant imputed matrix is then rescaled, to more closely match the
+        magnitude of measurements in the normalised (input) matrix."
+    reference: "van2018recovering"
+    documentation_url: "https://github.com/KrishnaswamyLab/MAGIC#readme"
+    repository_url: "https://github.com/KrishnaswamyLab/MAGIC"
+    v1:
+      path: openproblems/tasks/denoising/methods/magic.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    variants: 
+      magic:
+      magic_approx:
+        solver: approximate
+      magic_knn_naive:
+        norm: log
+        decay: none
+        t: 1
+    preferred_normalization: counts
+  arguments:
+    - name: "--solver"
+      type: "string"
+      choices: ["exact", "approximate"]
+      default: "exact"
+      description: Which solver to use.
+    - name: "--norm"
+      type: string
+      choices: ["sqrt", "log"]
+      default: "log"
+      description: Normalization method
+    - name: "--decay"
+      type: integer
+      default: 1
+      description: sets decay rate of kernel tails
+    - name: "--t"
+      type: integer
+      default: 3
+      description: power to which the diffusion operator is powered
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pip: [scprep, magic-impute, scipy, scikit-learn<1.2]
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/denoising/methods/magic/script.py b/src/tasks/denoising/methods/magic/script.py
new file mode 100644
index 0000000000..075d2e21cd
--- /dev/null
+++ b/src/tasks/denoising/methods/magic/script.py
@@ -0,0 +1,76 @@
+import anndata as ad
+import numpy as np
+import scprep
+from magic import MAGIC
+import scipy
+
+
+## VIASH START
+par = {
+    "input_train": "resources_test/denoising/pancreas/train.h5ad",
+    "output": "output_magic.h5ad",
+    "solver": "exact",
+    "norm": "sqrt",
+    "decay": 1,
+    "t": 3,
+}
+meta = {
+    "functionality_name": "foo",
+}
+## VIASH END
+
+print("Load data", flush=True)
+input_train = ad.read_h5ad(par["input_train"], backed="r")
+
+print("Set normalization method", flush=True)
+if par["norm"] == "sqrt":
+    norm_fn = np.sqrt
+    denorm_fn = np.square
+elif par["norm"] == "log":
+    norm_fn = np.log1p
+    denorm_fn = np.expm1
+else:
+    raise ValueError("Unknown normalization method: " + par["norm"] + ".")
+
+print("Remove unneeded data", flush=True)
+X = input_train.layers["counts"]
+
+# Create output AnnData for later use
+output = ad.AnnData(
+    obs=input_train.obs[[]],
+    var=input_train.var[[]],
+    uns={
+        "dataset_id": input_train.uns["dataset_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+del input_train
+
+print("Normalize data", flush=True)
+X, libsize = scprep.normalize.library_size_normalize(
+    X,
+    rescale=1,
+    return_library_size=True
+)
+X = scprep.utils.matrix_transform(X, norm_fn)
+
+print("Run MAGIC", flush=True)
+magic = MAGIC(
+    solver=par["solver"],
+    decay=par["decay"],
+    t=par["t"],
+    verbose=False,
+)
+X = magic.fit_transform(X, genes="all_genes")
+
+print("Denormalizing data", flush=True)
+X = scprep.utils.matrix_transform(X, denorm_fn)
+X = scprep.utils.matrix_vector_elementwise_multiply(X, libsize, axis=0)
+
+print("Create output AnnData", flush=True)
+output.layers["denoised"] = X
+
+print("Write Data", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+
diff --git a/src/tasks/denoising/methods/saver/config.vsh.yaml b/src/tasks/denoising/methods/saver/config.vsh.yaml
new file mode 100644
index 0000000000..3c997fc36f
--- /dev/null
+++ b/src/tasks/denoising/methods/saver/config.vsh.yaml
@@ -0,0 +1,32 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: saver
+  status: disabled
+  info:
+    label: SAVER
+    summary: SAVER (Single-cell Analysis Via Expression Recovery) implements a regularized regression prediction and empirical Bayes method to recover the true gene expression profile.
+    description: |
+      SAVER takes advantage of gene-to-gene relationships to recover the true expression level of each gene in each cell,
+      removing technical variation while retaining biological variation across cells (https://github.com/mohuangx/SAVER).
+      SAVER uses a post-quality-control scRNA-seq dataset with UMI counts as input. SAVER assumes that the count of each
+      gene in each cell follows a Poisson-gamma mixture, also known as a negative binomial model. Instead of specifying
+      the gamma prior, we estimate the prior parameters in an empirical Bayes-like approach with a Poisson LASSO regression,
+      using the expression of other genes as predictors. Once the prior parameters are estimated, SAVER outputs the
+      posterior distribution of the true expression, which quantifies estimation uncertainty, and the posterior mean is
+      used as the SAVER recovered expression value.
+    reference: huang2018savergene
+    repository_url: https://github.com/mohuangx/SAVER
+    documentation_url: https://mohuangx.github.io/SAVER/index.html
+    preferred_normalization: counts
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        github: mohuangx/SAVER
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/denoising/methods/saver/script.R b/src/tasks/denoising/methods/saver/script.R
new file mode 100644
index 0000000000..f6a44f4c3a
--- /dev/null
+++ b/src/tasks/denoising/methods/saver/script.R
@@ -0,0 +1,39 @@
+cat(">> Loading dependencies\n")
+library(anndata, warn.conflicts = FALSE)
+library(SAVER, warn.conflicts = FALSE)
+library(Matrix, warn.conflicts = FALSE)
+
+## VIASH START
+par <- list(
+  input_train = "resources_test/denoising/pancreas/train.h5ad",
+  norm = "log",
+  output = "output.h5ad"
+)
+meta <- list(
+  functionality_name = "saver",
+  ncpus = 30
+)
+## VIASH END
+
+cat(">> Load input data\n")
+input_train <- read_h5ad(par$input_train, backed = "r")
+
+cat(">> Normalize data\n")
+data <- as(t(input_train$layers[["counts"]]), "CsparseMatrix")
+
+cat(">> Run SAVER\n")
+data <- t(saver(data, ncores = meta$ncpus, estimates.only = TRUE))
+
+cat(">> Store output\n")
+output <- AnnData(
+  layers = list(denoised = data),
+  obs = input_train$obs[, c(), drop = FALSE],
+  var = input_train$var[, c(), drop = FALSE],
+  uns = list(
+    dataset_id = input_train$uns[["dataset_id"]],
+    method_id = meta$functionality_name
+  )
+)
+
+cat(">> Write output to file\n")
+output$write_h5ad(par$output, compression = "gzip")
diff --git a/src/tasks/denoising/metrics/mse/config.vsh.yaml b/src/tasks/denoising/metrics/mse/config.vsh.yaml
new file mode 100644
index 0000000000..8330a8de31
--- /dev/null
+++ b/src/tasks/denoising/metrics/mse/config.vsh.yaml
@@ -0,0 +1,30 @@
+__merge__: ../../api/comp_metric.yaml
+functionality:
+  name: "mse"
+  info:
+    metrics:
+      - name: mse
+        label: Mean-squared error
+        summary: "The mean squared error between the denoised counts and the true counts."
+        description: "The mean squared error between the denoised counts of the training dataset and the true counts of the test dataset after reweighing by the train/test ratio"
+        reference: batson2019molecular
+        v1:
+          path: openproblems/tasks/denoising/metrics/mse.py
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+        maximize: false
+        min: 0
+        max: "+.inf"
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages:
+          - scikit-learn
+          - scprep
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, midcpu]
diff --git a/src/tasks/denoising/metrics/mse/script.py b/src/tasks/denoising/metrics/mse/script.py
new file mode 100644
index 0000000000..eba964f132
--- /dev/null
+++ b/src/tasks/denoising/metrics/mse/script.py
@@ -0,0 +1,51 @@
+import anndata as ad
+import scanpy as sc
+import sklearn.metrics
+import scprep
+
+## VIASH START
+par = {
+    'input_test': 'resources_test/denoising/pancreas/test.h5ad',
+    'input_denoised': 'resources_test/denoising/pancreas/magic.h5ad',
+    'output': 'output_mse.h5ad'
+}
+meta = {
+    'functionality_name': 'mse'
+}
+## VIASH END
+
+print("Load data", flush=True)
+input_denoised = ad.read_h5ad(par['input_denoised'], backed="r")
+input_test = ad.read_h5ad(par['input_test'], backed="r")
+
+test_data = ad.AnnData(X=input_test.layers["counts"], dtype="float")
+denoised_data = ad.AnnData(X=input_denoised.layers["denoised"], dtype="float")
+
+print("Normalize data", flush=True)
+
+# scaling and transformation
+target_sum = 10000
+
+sc.pp.normalize_total(test_data, target_sum)
+sc.pp.log1p(test_data)
+
+sc.pp.normalize_total(denoised_data, target_sum)
+sc.pp.log1p(denoised_data)
+
+print("Compute mse value", flush=True)
+error = sklearn.metrics.mean_squared_error(
+    scprep.utils.toarray(test_data.X), scprep.utils.toarray(denoised_data.X)
+)
+
+print("Store mse value", flush=True)
+output = ad.AnnData(
+    uns={ key: val for key, val in input_test.uns.items() },
+)
+
+output.uns["method_id"] = input_denoised.uns["method_id"]
+output.uns["metric_ids"] = meta['functionality_name']
+output.uns["metric_values"] = error
+
+print("Write adata to file", flush=True)
+output.write_h5ad(par['output'], compression="gzip")
+
diff --git a/src/tasks/denoising/metrics/poisson/config.vsh.yaml b/src/tasks/denoising/metrics/poisson/config.vsh.yaml
new file mode 100644
index 0000000000..e523a9306e
--- /dev/null
+++ b/src/tasks/denoising/metrics/poisson/config.vsh.yaml
@@ -0,0 +1,28 @@
+__merge__: ../../api/comp_metric.yaml
+functionality:
+  name: "poisson"
+  info:
+    metrics:
+      - name: poisson
+        label: Poisson Loss
+        summary: "The Poisson log likelihood of the true counts observed in the distribution of denoised counts"
+        description: "The Poisson log likelihood of observing the true counts of the test dataset given the distribution given in the denoised dataset."
+        reference: batson2019molecular
+        v1:
+          path: openproblems/tasks/denoising/metrics/poisson.py
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+        maximize: false
+        min: 0
+        max: "+.inf"
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pip: scprep
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, midcpu]
\ No newline at end of file
diff --git a/src/tasks/denoising/metrics/poisson/script.py b/src/tasks/denoising/metrics/poisson/script.py
new file mode 100644
index 0000000000..537ccf0119
--- /dev/null
+++ b/src/tasks/denoising/metrics/poisson/script.py
@@ -0,0 +1,46 @@
+import anndata as ad
+import scprep
+import numpy as np
+
+## VIASH START
+par = {
+    'input_denoised': 'output_magic.h5ad',
+    'input_test': 'output_test.h5ad',
+    'output': 'output_poisson.h5ad'
+}
+meta = {
+    'functionality_name': 'poisson'
+}
+## VIASH END
+
+print("Load Data", flush=True)
+input_denoised = ad.read_h5ad(par['input_denoised'], backed="r")
+input_test = ad.read_h5ad(par['input_test'], backed="r")
+
+test_data = scprep.utils.toarray(input_test.layers["counts"])
+denoised_data = scprep.utils.toarray(input_denoised.layers["denoised"])
+
+print("Compute metric value", flush=True)
+# scaling
+initial_sum = input_test.uns["train_sum"]
+target_sum = test_data.sum()
+denoised_data = denoised_data * target_sum / initial_sum
+
+# from molecular_cross_validation.mcv_sweep import poisson_nll_loss
+# copied from: https://github.com/czbiohub/molecular-cross-validation/blob/master/src/molecular_cross_validation/mcv_sweep.py
+def poisson_nll_loss(y_pred: np.ndarray, y_true: np.ndarray) -> float:
+    return (y_pred - y_true * np.log(y_pred + 1e-6)).mean()
+
+error = poisson_nll_loss(test_data, denoised_data)
+
+print("Store poisson value", flush=True)
+output = ad.AnnData(
+    uns={ key: val for key, val in input_test.uns.items() },
+)
+
+output.uns["method_id"] = input_denoised.uns["method_id"]
+output.uns["metric_ids"] = meta['functionality_name']
+output.uns["metric_values"] = error
+
+print("Write adata to file", flush=True)
+output.write_h5ad(par['output'], compression="gzip")
diff --git a/src/tasks/denoising/process_dataset/config.vsh.yaml b/src/tasks/denoising/process_dataset/config.vsh.yaml
new file mode 100644
index 0000000000..c9b5b06c1a
--- /dev/null
+++ b/src/tasks/denoising/process_dataset/config.vsh.yaml
@@ -0,0 +1,37 @@
+__merge__: ../api/comp_process_dataset.yaml
+functionality:
+  name: "process_dataset"
+  description: |
+    Split data using molecular cross-validation.
+  
+    Splits molecules into two (potentially overlapping) groups using a fraction ratio.
+    These are output as two separate AnnData objects.
+  arguments:
+    - name: "--method"
+      type: "string"
+      description: "The process method to assign train/test."
+      choices: ["mcv"]
+      default: "mcv"
+    - name: "--train_frac"
+      type: "double"
+      description: "The fraction the molecules need to be split to train dataset"
+      default: 0.9
+    - name: "--seed"
+      type: "integer"
+      description: "A seed for the subsampling."
+      example: 123
+  resources:
+    - type: python_script
+      path: script.py
+    - path: helper.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages:
+          - numpy
+          - scipy
+  - type: nextflow
+    directives:
+      label: [highmem, midcpu , midtime]
diff --git a/src/tasks/denoising/process_dataset/helper.py b/src/tasks/denoising/process_dataset/helper.py
new file mode 100644
index 0000000000..2044ed4c6e
--- /dev/null
+++ b/src/tasks/denoising/process_dataset/helper.py
@@ -0,0 +1,55 @@
+# MIT License
+
+# Copyright (c) 2019 Chan Zuckerberg Biohub
+
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+# Copied from https://github.com/czbiohub/molecular-cross-validation/blob/master/src/molecular_cross_validation/util.py
+
+
+from typing import Tuple
+
+import numpy as np
+
+def split_molecules(
+    umis: np.ndarray,
+    data_split: float,
+    overlap_factor: float = 0.0,
+    random_state: np.random.RandomState = None,
+) -> Tuple[np.ndarray, np.ndarray]:
+    """Splits molecules into two (potentially overlapping) groups.
+    :param umis: Array of molecules to split
+    :param data_split: Proportion of molecules to assign to the first group
+    :param overlap_factor: Overlap correction factor, if desired
+    :param random_state: For reproducible sampling
+    :return: umis_X and umis_Y, representing ``split`` and ``~(1 - split)`` counts
+             sampled from the input array
+    """
+    if random_state is None:
+        random_state = np.random.RandomState()
+
+    umis_X_disjoint = random_state.binomial(umis, data_split - overlap_factor)
+    umis_Y_disjoint = random_state.binomial(
+        umis - umis_X_disjoint, (1 - data_split) / (1 - data_split + overlap_factor)
+    )
+    overlap_factor = umis - umis_X_disjoint - umis_Y_disjoint
+    umis_X = umis_X_disjoint + overlap_factor
+    umis_Y = umis_Y_disjoint + overlap_factor
+
+    return umis_X, umis_Y
\ No newline at end of file
diff --git a/src/tasks/denoising/process_dataset/script.py b/src/tasks/denoising/process_dataset/script.py
new file mode 100644
index 0000000000..94a5884046
--- /dev/null
+++ b/src/tasks/denoising/process_dataset/script.py
@@ -0,0 +1,75 @@
+import sys
+import anndata as ad
+import numpy as np
+
+## VIASH START
+par = {
+    'input': "resources_test/common/pancreas/dataset.h5ad",
+    'output_train': "train.h5ad",
+    'output_test': "test.h5ad",
+    'train_frac': 0.9,
+    'seed': 0
+}
+meta = {
+    "functionality_name": "process_dataset",
+    "resources_dir": "src/tasks/denoising/process_dataset"
+}
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from helper import split_molecules
+
+# set random state
+random_state = np.random.RandomState(par['seed'])
+
+print(">> Load Data", flush=True)
+adata = ad.read_h5ad(par["input"])
+
+# remove all layers except for counts
+for key in list(adata.layers.keys()):
+    if key != "counts":
+        del adata.layers[key]
+
+# round counts and convert to int
+counts = np.array(adata.layers["counts"]).round().astype(int)
+
+print(">> process and split data", flush=True)
+train_data, test_data = split_molecules(
+    counts.data, par["train_frac"], 0.0, random_state
+)
+
+X_train = counts.copy()
+X_test = counts.copy()
+X_train.data = train_data
+X_test.data = test_data
+X_train.eliminate_zeros()
+X_test.eliminate_zeros()
+
+# copy adata to train_set, test_set
+output_train = ad.AnnData(
+    layers={"counts": X_train},
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    uns={"dataset_id": adata.uns["dataset_id"]}
+)
+test_uns_keys = ["dataset_id", "dataset_name", "dataset_url", "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism"]
+output_test = ad.AnnData(
+    layers={"counts": X_test},
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    uns={key: adata.uns[key] for key in test_uns_keys}
+)
+
+# add additional information for the train set
+output_test.uns["train_sum"] = X_train.sum()
+
+# Remove no cells that do not have enough reads
+is_missing = np.array(X_train.sum(axis=0) == 0)
+
+output_train = output_train[:, ~is_missing.flatten()]
+output_test = output_test[:, ~is_missing.flatten()]
+
+print(">> Write to file", flush=True)
+output_train.write_h5ad(par["output_train"])
+output_test.write_h5ad(par["output_test"])
diff --git a/src/tasks/denoising/resources_scripts/process_datasets.sh b/src/tasks/denoising/resources_scripts/process_datasets.sh
new file mode 100755
index 0000000000..873b9fb0b4
--- /dev/null
+++ b/src/tasks/denoising/resources_scripts/process_datasets.sh
@@ -0,0 +1,34 @@
+#!/bin/bash
+
+cat > /tmp/params.yaml << 'HERE'
+id: denoising_process_datasets
+input_states: s3://openproblems-data/resources/datasets/**/log_cp10k/state.yaml
+rename_keys: 'input:output_dataset'
+settings: '{"output_train": "$id/train.h5ad", "output_test": "$id/test.h5ad"}'
+output_state: "$id/state.yaml"
+publish_dir: s3://openproblems-data/resources/denoising/datasets
+HERE
+
+cat > /tmp/nextflow.config << HERE
+process {
+  executor = 'awsbatch'
+  withName:'.*publishStatesProc' {
+      memory = '16GB'
+      disk = '100GB'
+   }
+  withLabel:highmem {
+      memory = '350GB'
+   }
+}
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/denoising/workflows/process_datasets/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --entry-name auto \
+  --config /tmp/nextflow.config \
+  --labels denoising,process_datasets
\ No newline at end of file
diff --git a/src/tasks/denoising/resources_scripts/run_benchmark.sh b/src/tasks/denoising/resources_scripts/run_benchmark.sh
new file mode 100755
index 0000000000..8e38568ac8
--- /dev/null
+++ b/src/tasks/denoising/resources_scripts/run_benchmark.sh
@@ -0,0 +1,23 @@
+#!/bin/bash
+
+RUN_ID="run_$(date +%Y-%m-%d_%H-%M-%S)"
+publish_dir="s3://openproblems-data/resources/denoising/results/${RUN_ID}"
+
+# make sure only log_cp10k is used
+cat > /tmp/params.yaml << HERE
+input_states: s3://openproblems-data/resources/denoising/datasets/**/log_cp10k/state.yaml
+rename_keys: 'input_train:output_train,input_test:output_test'
+output_state: "state.yaml"
+publish_dir: "$publish_dir"
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/denoising/workflows/run_benchmark/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --entry-name auto \
+  --config src/wf_utils/labels_tw.config \
+  --labels denoising,full
\ No newline at end of file
diff --git a/src/tasks/denoising/resources_scripts/run_benchmark_test.sh b/src/tasks/denoising/resources_scripts/run_benchmark_test.sh
new file mode 100755
index 0000000000..c9023c26f1
--- /dev/null
+++ b/src/tasks/denoising/resources_scripts/run_benchmark_test.sh
@@ -0,0 +1,25 @@
+#!/bin/bash
+
+cat > /tmp/params.yaml << 'HERE'
+input_states: s3://openproblems-data/resources_test/denoising/**/state.yaml
+rename_keys: 'input_train:output_train,input_test:output_test'
+output_state: "state.yaml"
+publish_dir: s3://openproblems-nextflow/temp/denoising/
+HERE
+
+cat > /tmp/nextflow.config << HERE
+process {
+  executor = 'awsbatch'
+}
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/denoising/workflows/run_benchmark/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --entry-name auto \
+  --config /tmp/nextflow.config \
+  --labels denoising,test
\ No newline at end of file
diff --git a/src/tasks/denoising/resources_test_scripts/pancreas.sh b/src/tasks/denoising/resources_test_scripts/pancreas.sh
new file mode 100755
index 0000000000..c737b39c2e
--- /dev/null
+++ b/src/tasks/denoising/resources_test_scripts/pancreas.sh
@@ -0,0 +1,51 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+RAW_DATA=resources_test/common
+DATASET_DIR=resources_test/denoising
+
+mkdir -p $DATASET_DIR
+
+# process dataset
+echo Running process_dataset
+nextflow run . \
+  -main-script target/nextflow/denoising/workflows/process_datasets/main.nf \
+  -profile docker \
+  -entry auto \
+  --input_states "$RAW_DATA/**/state.yaml" \
+  --rename_keys 'input:output_dataset' \
+  --settings '{"output_train": "$id/train.h5ad", "output_test": "$id/test.h5ad"}' \
+  --publish_dir "$DATASET_DIR" \
+  --output_state '$id/state.yaml'
+
+# run one method
+viash run src/tasks/denoising/methods/magic/config.vsh.yaml -- \
+    --input_train $DATASET_DIR/pancreas/train.h5ad \
+    --output $DATASET_DIR/pancreas/denoised.h5ad
+
+# run one metric
+viash run src/tasks/denoising/metrics/poisson/config.vsh.yaml -- \
+    --input_denoised $DATASET_DIR/pancreas/denoised.h5ad \
+    --input_test $DATASET_DIR/pancreas/test.h5ad \
+    --output $DATASET_DIR/pancreas/score.h5ad
+
+# # run benchmark
+# export NXF_VER=22.04.5
+
+# nextflow \
+#   run . \
+#   -main-script src/tasks/denoising/workflows/run/main.nf \
+#   -profile docker \
+#   -resume \
+#   --id pancreas \
+#   --input_train $DATASET_DIR/train.h5ad \
+#   --input_test $DATASET_DIR/test.h5ad \
+#   --output scores.tsv \
+#   --publish_dir $DATASET_DIR/
\ No newline at end of file
diff --git a/src/tasks/denoising/workflows/process_datasets/config.vsh.yaml b/src/tasks/denoising/workflows/process_datasets/config.vsh.yaml
new file mode 100644
index 0000000000..6fc095704b
--- /dev/null
+++ b/src/tasks/denoising/workflows/process_datasets/config.vsh.yaml
@@ -0,0 +1,30 @@
+functionality:
+  name: "process_datasets"
+  namespace: "denoising/workflows"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input"
+          required: true
+          example: dataset.h5ad
+          __merge__: "/src/tasks/denoising/api/file_common_dataset.yaml"
+    - name: Outputs
+      arguments:
+        - name: "--output_train"
+          __merge__: "/src/tasks/denoising/api/file_train.yaml"
+          direction: output
+          required: true
+        - name: "--output_test"
+          __merge__: "/src/tasks/denoising/api/file_test.yaml"
+          direction: output
+          required: true
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - path: /src/wf_utils/helper.nf
+  dependencies:
+    - name: common/check_dataset_schema
+    - name: denoising/process_dataset
+platforms:
+  - type: nextflow
diff --git a/src/tasks/denoising/workflows/process_datasets/main.nf b/src/tasks/denoising/workflows/process_datasets/main.nf
new file mode 100644
index 0000000000..4437206b09
--- /dev/null
+++ b/src/tasks/denoising/workflows/process_datasets/main.nf
@@ -0,0 +1,54 @@
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    | check_dataset_schema.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset": checks["exit_code"] == 0 ? state.input : null,
+        ]
+      }
+    )
+
+    // remove datasets which didn't pass the schema check
+    | filter { id, state ->
+      state.dataset != null
+    }
+
+    | process_dataset.run(
+      fromState: [ input: "dataset" ],
+      toState: [
+        output_train: "output_train",
+        output_test: "output_test"
+      ]
+    )
+
+    // only output the files for which an output file was specified
+    | setState(["output_train", "output_test"])
+
+  emit:
+  output_ch
+}
diff --git a/src/tasks/denoising/workflows/process_datasets/run_test.sh b/src/tasks/denoising/workflows/process_datasets/run_test.sh
new file mode 100755
index 0000000000..ed8484693b
--- /dev/null
+++ b/src/tasks/denoising/workflows/process_datasets/run_test.sh
@@ -0,0 +1,25 @@
+#!/bin/bash
+
+# Run this prior to executing this script:
+# bin/viash_build -q 'batch_integration'
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+export NXF_VER=22.04.5
+
+nextflow run . \
+  -main-script target/nextflow/denoising/workflows/process_datasets/main.nf \
+  -profile docker \
+  -entry auto \
+  -c src/wf_utils/labels_ci.config \
+  --id run_test \
+  --input_states "resources_test/common/**/state.yaml" \
+  --rename_keys 'input:output_dataset' \
+  --settings '{"output_train": "train.h5ad", "output_test": "test.h5ad"}' \
+  --publish_dir "resources_test/denoising"
\ No newline at end of file
diff --git a/src/tasks/denoising/workflows/run_benchmark/config.vsh.yaml b/src/tasks/denoising/workflows/run_benchmark/config.vsh.yaml
new file mode 100644
index 0000000000..5b1cf3dd04
--- /dev/null
+++ b/src/tasks/denoising/workflows/run_benchmark/config.vsh.yaml
@@ -0,0 +1,67 @@
+functionality:
+  name: "run_benchmark"
+  namespace: "denoising/workflows"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input_train"
+          __merge__: "/src/tasks/denoising/api/file_train.yaml"
+          required: true
+          direction: input
+        - name: "--input_test"
+          __merge__: "/src/tasks/denoising/api/file_test.yaml"
+          required: true
+          direction: input
+    - name: Outputs
+      arguments:
+        - name: "--output_scores"
+          type: file
+          required: true
+          direction: output
+          description: A yaml file containing the scores of each of the methods
+          default: score_uns.yaml
+        - name: "--output_method_configs"
+          type: file
+          required: true
+          direction: output
+          default: method_configs.yaml
+        - name: "--output_metric_configs"
+          type: file
+          required: true
+          direction: output
+          default: metric_configs.yaml
+        - name: "--output_dataset_info"
+          type: file
+          required: true
+          direction: output
+          default: dataset_uns.yaml
+        - name: "--output_task_info"
+          type: file
+          required: true
+          direction: output
+          default: task_info.yaml
+    - name: Methods
+      arguments:
+        - name: "--method_ids"
+          type: string
+          multiple: true
+          description: A list of method ids to run. If not specified, all methods will be run.
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - type: file
+      path: "../../api/task_info.yaml"
+  dependencies:
+    - name: common/check_dataset_schema
+    - name: common/extract_metadata
+    - name: denoising/control_methods/no_denoising
+    - name: denoising/control_methods/perfect_denoising
+    - name: denoising/methods/alra
+    - name: denoising/methods/dca
+    - name: denoising/methods/knn_smoothing
+    - name: denoising/methods/magic
+    - name: denoising/metrics/mse
+    - name: denoising/metrics/poisson
+platforms:
+  - type: nextflow
diff --git a/src/tasks/denoising/workflows/run_benchmark/main.nf b/src/tasks/denoising/workflows/run_benchmark/main.nf
new file mode 100644
index 0000000000..8b8f6ebd8d
--- /dev/null
+++ b/src/tasks/denoising/workflows/run_benchmark/main.nf
@@ -0,0 +1,184 @@
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // construct list of methods
+  methods = [
+    no_denoising,
+    perfect_denoising,
+    alra,
+    dca,
+    knn_smoothing,
+    magic
+  ]
+
+  // construct list of metrics
+  metrics = [
+    mse,
+    poisson
+  ]
+
+  /****************************
+   * EXTRACT DATASET METADATA *
+   ****************************/
+  dataset_ch = input_ch
+    // store join id
+    | map{ id, state -> 
+      [id, state + ["_meta": [join_id: id]]]
+    }
+
+    // extract the dataset metadata
+    | extract_metadata.run(
+      fromState: [input: "input_test"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+    
+  /***************************
+   * RUN METHODS AND METRICS *
+   ***************************/
+  score_ch = dataset_ch
+
+    // run all methods
+    | runEach(
+      components: methods,
+
+      // use the 'filter' argument to only run a defined method or all methods
+      filter: { id, state, comp ->
+        def method_check = !state.method_ids || state.method_ids.contains(comp.config.functionality.name)
+
+        method_check
+      },
+
+      // define a new 'id' by appending the method name to the dataset id
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: [
+        input_train: "input_train",
+        input_test: "input_test"
+      ],
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          method_id: comp.config.functionality.name,
+          method_output: output.output
+        ]
+      }
+    )
+
+    // run all metrics
+    | runEach(
+      components: metrics,
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: [
+        input_test: "input_test", 
+        input_denoised: "method_output"
+      ],
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          metric_id: comp.config.functionality.name,
+          metric_output: output.output
+        ]
+      }
+    )
+
+  /******************************
+   * GENERATE OUTPUT YAML FILES *
+   ******************************/
+  // TODO: can we store everything below in a separate helper function?
+  // NOTE: the 'denoising' task doesn't use normalized data,
+  // so code related to normalization_ids is commented out
+
+  // extract the dataset metadata
+  dataset_meta_ch = dataset_ch
+    // // only keep one of the normalization methods
+    // | filter{ id, state ->
+    //   state.dataset_uns.normalization_id == "log_cp10k"
+    // }
+    | joinStates { ids, states ->
+      // store the dataset metadata in a file
+      def dataset_uns = states.collect{state ->
+        def uns = state.dataset_uns.clone()
+        // uns.remove("normalization_id")
+        uns
+      }
+      def dataset_uns_yaml_blob = toYamlBlob(dataset_uns)
+      def dataset_uns_file = tempFile("dataset_uns.yaml")
+      dataset_uns_file.write(dataset_uns_yaml_blob)
+
+      ["output", [output_dataset_info: dataset_uns_file]]
+    }
+
+  output_ch = score_ch
+
+    // extract the scores
+    | extract_metadata.run(
+      key: "extract_scores",
+      fromState: [input: "metric_output"],
+      toState: { id, output, state ->
+        state + [
+          score_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | joinStates { ids, states ->
+      // store the method configs in a file
+      def method_configs = methods.collect{it.config}
+      def method_configs_yaml_blob = toYamlBlob(method_configs)
+      def method_configs_file = tempFile("method_configs.yaml")
+      method_configs_file.write(method_configs_yaml_blob)
+
+      // store the metric configs in a file
+      def metric_configs = metrics.collect{it.config}
+      def metric_configs_yaml_blob = toYamlBlob(metric_configs)
+      def metric_configs_file = tempFile("metric_configs.yaml")
+      metric_configs_file.write(metric_configs_yaml_blob)
+
+      def task_info_file = meta.resources_dir.resolve("task_info.yaml")
+
+      // store the scores in a file
+      def score_uns = states.collect{it.score_uns}
+      def score_uns_yaml_blob = toYamlBlob(score_uns)
+      def score_uns_file = tempFile("score_uns.yaml")
+      score_uns_file.write(score_uns_yaml_blob)
+
+      def new_state = [
+        output_method_configs: method_configs_file,
+        output_metric_configs: metric_configs_file,
+        output_task_info: task_info_file,
+        output_scores: score_uns_file,
+        _meta: states[0]._meta
+      ]
+
+      ["output", new_state]
+    }
+
+    // merge all of the output data 
+    | mix(dataset_meta_ch)
+    | joinStates{ ids, states ->
+      def mergedStates = states.inject([:]) { acc, m -> acc + m }
+      [ids[0], mergedStates]
+    }
+
+  emit:
+  output_ch
+}
\ No newline at end of file
diff --git a/src/tasks/denoising/workflows/run_benchmark/run_test.sh b/src/tasks/denoising/workflows/run_benchmark/run_test.sh
new file mode 100755
index 0000000000..9b31877c52
--- /dev/null
+++ b/src/tasks/denoising/workflows/run_benchmark/run_test.sh
@@ -0,0 +1,29 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+DATASETS_DIR="resources_test/denoising"
+OUTPUT_DIR="output/temp"
+
+if [ ! -d "$OUTPUT_DIR" ]; then
+  mkdir -p "$OUTPUT_DIR"
+fi
+
+export NXF_VER=22.04.5
+nextflow run . \
+  -main-script target/nextflow/denoising/workflows/run_benchmark/main.nf \
+  -profile docker \
+  -resume \
+  -entry auto \
+  -c src/wf_utils/labels_ci.config \
+  --input_states "$DATASETS_DIR/**/state.yaml" \
+  --rename_keys 'input_train:output_train,input_test:output_test' \
+  --settings '{"output_scores": "scores.yaml", "output_dataset_info": "dataset_info.yaml", "output_method_configs": "method_configs.yaml", "output_metric_configs": "metric_configs.yaml", "output_task_info": "task_info.yaml"}' \
+  --publish_dir "$OUTPUT_DIR" \
+  --output_state "state.yaml"
diff --git a/src/tasks/dimensionality_reduction/README.md b/src/tasks/dimensionality_reduction/README.md
new file mode 100644
index 0000000000..c5bc42e09d
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/README.md
@@ -0,0 +1,376 @@
+# Dimensionality reduction for 2D visualization
+
+
+Reduction of high-dimensional datasets to 2D for visualization &
+interpretation
+
+Path:
+[`src/tasks/dimensionality_reduction`](https://github.com/openproblems-bio/openproblems/tree/main/src/tasks/dimensionality_reduction)
+
+## Motivation
+
+Data visualisation is an important part of all stages of single-cell
+analysis, from initial quality control to interpretation and
+presentation of final results. For bulk RNA-seq studies, linear
+dimensionality reduction techniques such as PCA and MDS are commonly
+used to visualise the variation between samples. While these methods are
+highly effective they can only be used to show the first few components
+of variation which cannot fully represent the increased complexity and
+number of observations in single-cell datasets. For this reason
+non-linear techniques (most notably t-SNE and UMAP) have become the
+standard for visualising single-cell studies. These methods attempt to
+compress a dataset into a two-dimensional space while attempting to
+capture as much of the variance between observations as possible. Many
+methods for solving this problem now exist. In general these methods try
+to preserve distances, while some additionally consider aspects such as
+density within the embedded space or conservation of continuous
+trajectories. Despite almost every single-cell study using one of these
+visualisations there has been debate as to whether they can effectively
+capture the variation in single-cell datasets \[@chari2023speciousart\].
+
+## Description
+
+The dimensionality reduction task attempts to quantify the ability of
+methods to embed the information present in complex single-cell studies
+into a two-dimensional space. Thus, this task is specifically designed
+for dimensionality reduction for visualisation and does not consider
+other uses of dimensionality reduction in standard single-cell workflows
+such as improving the signal-to-noise ratio (and in fact several of the
+methods use PCA as a pre-processing step for this reason). Unlike most
+tasks, methods for the dimensionality reduction task must accept a
+matrix containing expression values normalised to 10,000 counts per cell
+and log transformed (log-10k) and produce a two-dimensional coordinate
+for each cell. Pre-normalised matrices are required to enforce
+consistency between the metric evaluation (which generally requires
+normalised data) and the method runs. When these are not consistent,
+methods that use the same normalisation as used in the metric tend to
+score more highly. For some methods we also evaluate the pre-processing
+recommended by the method.
+
+## Authors & contributors
+
+| name                   | roles              |
+|:-----------------------|:-------------------|
+| Luke Zappia            | maintainer, author |
+| Michal Klein           | author             |
+| Scott Gigante          | author             |
+| Ben DeMeo              | author             |
+| Robrecht Cannoodt      | author             |
+| Kai Waldrant           | contributor        |
+| Sai Nirmayi Yasa       | contributor        |
+| Juan A. Cordero Varela | contributor        |
+
+## API
+
+``` mermaid
+flowchart LR
+  file_common_dataset("Common dataset")
+  comp_process_dataset[/"Data processor"/]
+  file_dataset("Dataset")
+  file_solution("Test data")
+  comp_control_method[/"Control method"/]
+  comp_method[/"Method"/]
+  comp_metric[/"Metric"/]
+  file_embedding("Embedding")
+  file_score("Score")
+  file_common_dataset---comp_process_dataset
+  comp_process_dataset-->file_dataset
+  comp_process_dataset-->file_solution
+  file_dataset---comp_control_method
+  file_dataset---comp_method
+  file_solution---comp_control_method
+  file_solution---comp_metric
+  comp_control_method-->file_embedding
+  comp_method-->file_embedding
+  comp_metric-->file_score
+  file_embedding---comp_metric
+```
+
+## File format: Common dataset
+
+A dataset processed by the common dataset processing pipeline.
+
+Example file: `resources_test/common/pancreas/dataset.h5ad`
+
+Description:
+
+This dataset contains both raw counts and normalized data matrices, as
+well as a PCA embedding, HVG selection and a kNN graph.
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'organism', 'organism_ontology_term_id', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_general', 'tissue_general_ontology_term_id', 'batch', 'soma_joinid', 'size_factors'
+     var: 'feature_id', 'feature_name', 'soma_joinid', 'hvg', 'hvg_score'
+     obsm: 'X_pca'
+     obsp: 'knn_distances', 'knn_connectivities'
+     varm: 'pca_loadings'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id', 'pca_variance', 'knn'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                                              | Type      | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+|:--------------------------------------------------|:----------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `obs["dataset_id"]`                               | `string`  | (*Optional*) Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.                                                                                                                                                                                                                                                                                                                                                                    |
+| `obs["assay"]`                                    | `string`  | (*Optional*) Type of assay used to generate the cell data, indicating the methodology or technique employed.                                                                                                                                                                                                                                                                                                                                                                                  |
+| `obs["assay_ontology_term_id"]`                   | `string`  | (*Optional*) Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.                                                                                                                                                                                                                                                                                                                                                       |
+| `obs["cell_type"]`                                | `string`  | (*Optional*) Classification of the cell type based on its characteristics and function within the tissue or organism.                                                                                                                                                                                                                                                                                                                                                                         |
+| `obs["cell_type_ontology_term_id"]`               | `string`  | (*Optional*) Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.                                                                                                                                                                                                                                                                                                                                                  |
+| `obs["development_stage"]`                        | `string`  | (*Optional*) Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.                                                                                                                                                                                                                                                                                                                                                   |
+| `obs["development_stage_ontology_term_id"]`       | `string`  | (*Optional*) Ontology term identifier for the developmental stage, providing a standardized reference to the organism’s developmental phase. If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used. If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used. Otherwise, the Uberon (`UBERON:`) ontology is used. |
+| `obs["disease"]`                                  | `string`  | (*Optional*) Information on any disease or pathological condition associated with the cell or donor.                                                                                                                                                                                                                                                                                                                                                                                          |
+| `obs["disease_ontology_term_id"]`                 | `string`  | (*Optional*) Ontology term identifier for the disease, enabling standardized disease classification and referencing. Must be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).                                                                                                                                                                                                                              |
+| `obs["donor_id"]`                                 | `string`  | (*Optional*) Identifier for the donor from whom the cell sample is obtained.                                                                                                                                                                                                                                                                                                                                                                                                                  |
+| `obs["is_primary_data"]`                          | `boolean` | (*Optional*) Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.                                                                                                                                                                                                                                                                                                                                          |
+| `obs["organism"]`                                 | `string`  | (*Optional*) Organism from which the cell sample is obtained.                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| `obs["organism_ontology_term_id"]`                | `string`  | (*Optional*) Ontology term identifier for the organism, providing a standardized reference for the organism. Must be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.                                                                                                                                                                                                                                                                             |
+| `obs["self_reported_ethnicity"]`                  | `string`  | (*Optional*) Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.                                                                                                                                                                                                                                                                                                                                                      |
+| `obs["self_reported_ethnicity_ontology_term_id"]` | `string`  | (*Optional*) Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications. If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.                                                                                                                                                                                                                    |
+| `obs["sex"]`                                      | `string`  | (*Optional*) Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.                                                                                                                                                                                                                                                                                                                                                                 |
+| `obs["sex_ontology_term_id"]`                     | `string`  | (*Optional*) Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.                                                                                                                                                                                                                                                                                                                |
+| `obs["suspension_type"]`                          | `string`  | (*Optional*) Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.                                                                                                                                                                                                                                                                                                                                              |
+| `obs["tissue"]`                                   | `string`  | (*Optional*) Specific tissue from which the cells were derived, key for context and specificity in cell studies.                                                                                                                                                                                                                                                                                                                                                                              |
+| `obs["tissue_ontology_term_id"]`                  | `string`  | (*Optional*) Ontology term identifier for the tissue, providing a standardized reference for the tissue type. For organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity). For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.                                                                                              |
+| `obs["tissue_general"]`                           | `string`  | (*Optional*) General category or classification of the tissue, useful for broader grouping and comparison of cell data.                                                                                                                                                                                                                                                                                                                                                                       |
+| `obs["tissue_general_ontology_term_id"]`          | `string`  | (*Optional*) Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types. For organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity). For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.                                                                                  |
+| `obs["batch"]`                                    | `string`  | (*Optional*) A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.                                                                                                                                                                                                                                                                                                                                                              |
+| `obs["soma_joinid"]`                              | `integer` | (*Optional*) If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.                                                                                                                                                                                                                                                                                                                                                                                    |
+| `obs["size_factors"]`                             | `double`  | (*Optional*) The size factors created by the normalisation method, if any.                                                                                                                                                                                                                                                                                                                                                                                                                    |
+| `var["feature_id"]`                               | `string`  | (*Optional*) Unique identifier for the feature, usually a ENSEMBL gene id.                                                                                                                                                                                                                                                                                                                                                                                                                    |
+| `var["feature_name"]`                             | `string`  | A human-readable name for the feature, usually a gene symbol.                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| `var["soma_joinid"]`                              | `integer` | (*Optional*) If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.                                                                                                                                                                                                                                                                                                                                                                                 |
+| `var["hvg"]`                                      | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’.                                                                                                                                                                                                                                                                                                                                                                                                                      |
+| `var["hvg_score"]`                                | `double`  | A score for the feature indicating how highly variable it is.                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| `obsm["X_pca"]`                                   | `double`  | The resulting PCA embedding.                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
+| `obsp["knn_distances"]`                           | `double`  | K nearest neighbors distance matrix.                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+| `obsp["knn_connectivities"]`                      | `double`  | K nearest neighbors connectivities matrix.                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+| `varm["pca_loadings"]`                            | `double`  | The PCA loadings matrix.                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+| `layers["counts"]`                                | `integer` | Raw counts.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+| `layers["normalized"]`                            | `double`  | Normalised expression values.                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| `uns["dataset_id"]`                               | `string`  | A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.                                                                                                                                                                                                                                                                                                                          |
+| `uns["dataset_name"]`                             | `string`  | A human-readable name for the dataset.                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+| `uns["dataset_url"]`                              | `string`  | (*Optional*) Link to the original source of the dataset.                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+| `uns["dataset_reference"]`                        | `string`  | (*Optional*) Bibtex reference of the paper in which the dataset was published.                                                                                                                                                                                                                                                                                                                                                                                                                |
+| `uns["dataset_summary"]`                          | `string`  | Short description of the dataset.                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+| `uns["dataset_description"]`                      | `string`  | Long description of the dataset.                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+| `uns["dataset_organism"]`                         | `string`  | (*Optional*) The organism of the sample in the dataset.                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+| `uns["normalization_id"]`                         | `string`  | Which normalization was used.                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| `uns["pca_variance"]`                             | `double`  | The PCA variance objects.                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+| `uns["knn"]`                                      | `object`  | Supplementary K nearest neighbors data.                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+
+</div>
+
+## Component type: Data processor
+
+Path:
+[`src/dimensionality_reduction`](https://github.com/openproblems-bio/openproblems/tree/main/src/dimensionality_reduction)
+
+A dimensionality reduction dataset processor.
+
+Arguments:
+
+<div class="small">
+
+| Name                | Type   | Description                                                    |
+|:--------------------|:-------|:---------------------------------------------------------------|
+| `--input`           | `file` | A dataset processed by the common dataset processing pipeline. |
+| `--output_dataset`  | `file` | (*Output*) The dataset to pass to a method.                    |
+| `--output_solution` | `file` | (*Output*) The data for evaluating a dimensionality reduction. |
+
+</div>
+
+## File format: Dataset
+
+The dataset to pass to a method.
+
+Example file:
+`resources_test/dimensionality_reduction/pancreas/dataset.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     var: 'hvg_score'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'normalization_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                      | Type      | Description                                                                          |
+|:--------------------------|:----------|:-------------------------------------------------------------------------------------|
+| `var["hvg_score"]`        | `double`  | High variability gene score (normalized dispersion). The greater, the more variable. |
+| `layers["counts"]`        | `integer` | Raw counts.                                                                          |
+| `layers["normalized"]`    | `double`  | Normalized expression values.                                                        |
+| `uns["dataset_id"]`       | `string`  | A unique identifier for the dataset.                                                 |
+| `uns["normalization_id"]` | `string`  | Which normalization was used.                                                        |
+
+</div>
+
+## File format: Test data
+
+The data for evaluating a dimensionality reduction.
+
+Example file:
+`resources_test/dimensionality_reduction/pancreas/solution.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'cell_type'
+     var: 'hvg_score'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                         | Type      | Description                                                                                              |
+|:-----------------------------|:----------|:---------------------------------------------------------------------------------------------------------|
+| `obs["cell_type"]`           | `string`  | Classification of the cell type based on its characteristics and function within the tissue or organism. |
+| `var["hvg_score"]`           | `double`  | High variability gene score (normalized dispersion). The greater, the more variable.                     |
+| `layers["counts"]`           | `integer` | Raw counts.                                                                                              |
+| `layers["normalized"]`       | `double`  | Normalized expression values.                                                                            |
+| `uns["dataset_id"]`          | `string`  | A unique identifier for the dataset.                                                                     |
+| `uns["dataset_name"]`        | `string`  | Nicely formatted name.                                                                                   |
+| `uns["dataset_url"]`         | `string`  | (*Optional*) Link to the original source of the dataset.                                                 |
+| `uns["dataset_reference"]`   | `string`  | (*Optional*) Bibtex reference of the paper in which the dataset was published.                           |
+| `uns["dataset_summary"]`     | `string`  | Short description of the dataset.                                                                        |
+| `uns["dataset_description"]` | `string`  | Long description of the dataset.                                                                         |
+| `uns["dataset_organism"]`    | `string`  | (*Optional*) The organism of the sample in the dataset.                                                  |
+| `uns["normalization_id"]`    | `string`  | Which normalization was used.                                                                            |
+
+</div>
+
+## Component type: Control method
+
+Path:
+[`src/dimensionality_reduction/control_methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/dimensionality_reduction/control_methods)
+
+Quality control methods for verifying the pipeline.
+
+Arguments:
+
+<div class="small">
+
+| Name               | Type   | Description                                                   |
+|:-------------------|:-------|:--------------------------------------------------------------|
+| `--input`          | `file` | The dataset to pass to a method.                              |
+| `--input_solution` | `file` | The data for evaluating a dimensionality reduction.           |
+| `--output`         | `file` | (*Output*) A dataset with dimensionality reduction embedding. |
+
+</div>
+
+## Component type: Method
+
+Path:
+[`src/dimensionality_reduction/methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/dimensionality_reduction/methods)
+
+A dimensionality reduction method.
+
+Arguments:
+
+<div class="small">
+
+| Name       | Type   | Description                                                   |
+|:-----------|:-------|:--------------------------------------------------------------|
+| `--input`  | `file` | The dataset to pass to a method.                              |
+| `--output` | `file` | (*Output*) A dataset with dimensionality reduction embedding. |
+
+</div>
+
+## Component type: Metric
+
+Path:
+[`src/dimensionality_reduction/metrics`](https://github.com/openproblems-bio/openproblems/tree/main/src/dimensionality_reduction/metrics)
+
+A dimensionality reduction metric.
+
+Arguments:
+
+<div class="small">
+
+| Name                | Type   | Description                                         |
+|:--------------------|:-------|:----------------------------------------------------|
+| `--input_embedding` | `file` | A dataset with dimensionality reduction embedding.  |
+| `--input_solution`  | `file` | The data for evaluating a dimensionality reduction. |
+| `--output`          | `file` | (*Output*) Metric score file.                       |
+
+</div>
+
+## File format: Embedding
+
+A dataset with dimensionality reduction embedding.
+
+Example file:
+`resources_test/dimensionality_reduction/pancreas/embedding.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obsm: 'X_emb'
+     uns: 'dataset_id', 'method_id', 'normalization_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                      | Type     | Description                          |
+|:--------------------------|:---------|:-------------------------------------|
+| `obsm["X_emb"]`           | `double` | The dimensionally reduced embedding. |
+| `uns["dataset_id"]`       | `string` | A unique identifier for the dataset. |
+| `uns["method_id"]`        | `string` | A unique identifier for the method.  |
+| `uns["normalization_id"]` | `string` | Which normalization was used.        |
+
+</div>
+
+## File format: Score
+
+Metric score file
+
+Example file:
+`resources_test/dimensionality_reduction/pancreas/score.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     uns: 'dataset_id', 'normalization_id', 'method_id', 'metric_ids', 'metric_values'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                      | Type     | Description                                                                                  |
+|:--------------------------|:---------|:---------------------------------------------------------------------------------------------|
+| `uns["dataset_id"]`       | `string` | A unique identifier for the dataset.                                                         |
+| `uns["normalization_id"]` | `string` | Which normalization was used.                                                                |
+| `uns["method_id"]`        | `string` | A unique identifier for the method.                                                          |
+| `uns["metric_ids"]`       | `string` | One or more unique metric identifiers.                                                       |
+| `uns["metric_values"]`    | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |
+
+</div>
+
diff --git a/src/tasks/dimensionality_reduction/api/comp_control_method.yaml b/src/tasks/dimensionality_reduction/api/comp_control_method.yaml
new file mode 100644
index 0000000000..dfa346752f
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/api/comp_control_method.yaml
@@ -0,0 +1,33 @@
+functionality:
+  namespace: dimensionality_reduction/control_methods
+  info:
+    type: control_method
+    type_info:
+      label: Control method
+      summary: Quality control methods for verifying the pipeline.
+      description: |
+        Control methods have the same interface as the regular methods
+        but also receive the solution object as input. It serves as a
+        starting point to test the relative accuracy of new methods in
+        the task, and also as a quality control for the metrics defined
+        in the task.
+  arguments:
+    - name: "--input"
+      __merge__: file_dataset.yaml
+      direction: input
+      required: true
+    - name: "--input_solution"
+      __merge__: file_solution.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_embedding.yaml
+      direction: output
+      required: true
+  test_resources:
+    - path: /resources_test/dimensionality_reduction/pancreas/
+      dest: resources_test/dimensionality_reduction/pancreas/
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/api/comp_method.yaml b/src/tasks/dimensionality_reduction/api/comp_method.yaml
new file mode 100644
index 0000000000..34d63607a4
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/api/comp_method.yaml
@@ -0,0 +1,27 @@
+functionality:
+  namespace: dimensionality_reduction/methods
+  info:
+    type: method
+    type_info:
+      label: Method
+      summary: A dimensionality reduction method.
+      description: |
+        A dimensionality reduction method to summarise the biological
+        information in a dataset in as few dimensions as possible.
+  arguments:
+    - name: "--input"
+      __merge__: file_dataset.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_embedding.yaml
+      direction: output
+      required: true
+  test_resources:
+    - path: /resources_test/dimensionality_reduction/pancreas/
+      dest: resources_test/dimensionality_reduction/pancreas/
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /src/common/library.bib
diff --git a/src/tasks/dimensionality_reduction/api/comp_metric.yaml b/src/tasks/dimensionality_reduction/api/comp_metric.yaml
new file mode 100644
index 0000000000..8cd90e4ca1
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/api/comp_metric.yaml
@@ -0,0 +1,30 @@
+functionality:
+  namespace: dimensionality_reduction/metrics
+  info:
+    type: metric
+    type_info:
+      label: Metric
+      summary: A dimensionality reduction metric.
+      description: |
+        A metric for evaluating dimensionality reductions.
+  arguments:
+    - name: "--input_embedding"
+      direction: input
+      __merge__: file_embedding.yaml
+      required: true
+    - name: "--input_solution"
+      __merge__: file_solution.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_score.yaml
+      direction: output
+      required: true
+  test_resources:
+    - path: /resources_test/dimensionality_reduction/pancreas/
+      dest: resources_test/dimensionality_reduction/pancreas/
+    - type: python_script
+      path: /src/common/comp_tests/check_metric_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /src/common/library.bib
diff --git a/src/tasks/dimensionality_reduction/api/comp_process_dataset.yaml b/src/tasks/dimensionality_reduction/api/comp_process_dataset.yaml
new file mode 100644
index 0000000000..1f7b150871
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/api/comp_process_dataset.yaml
@@ -0,0 +1,27 @@
+functionality:
+  namespace: dimensionality_reduction
+  info:
+    type: process_dataset
+    type_info:
+      label: Data processor
+      summary: A dimensionality reduction dataset processor.
+      description: |
+        A component for processing a Common Dataset into a task-specific dataset.
+  arguments:
+    - name: "--input"
+      __merge__: /src/datasets/api/file_common_dataset.yaml
+      direction: input
+      required: true
+    - name: "--output_dataset"
+      __merge__: file_dataset.yaml
+      direction: output
+      required: true
+    - name: "--output_solution"
+      __merge__: file_solution.yaml
+      direction: output
+      required: true
+  test_resources:
+    - path: /resources_test/common/pancreas/
+      dest: resources_test/common/pancreas/
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/api/file_common_dataset.yaml b/src/tasks/dimensionality_reduction/api/file_common_dataset.yaml
new file mode 100644
index 0000000000..dba599da9a
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/api/file_common_dataset.yaml
@@ -0,0 +1,58 @@
+type: file
+example: "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+info:
+  label: "Dataset"
+  summary: "The dataset to pass to a method."
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized expression values
+        required: true
+    obs: 
+      - type: string
+        name: cell_type
+        description: Classification of the cell type based on its characteristics and function within the tissue or organism.
+        required: true
+    var:
+      - type: double
+        name: hvg_score
+        description: High variability gene score (normalized dispersion). The greater, the more variable.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
diff --git a/src/tasks/dimensionality_reduction/api/file_dataset.yaml b/src/tasks/dimensionality_reduction/api/file_dataset.yaml
new file mode 100644
index 0000000000..8061f8f0c5
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/api/file_dataset.yaml
@@ -0,0 +1,29 @@
+type: file
+example: "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+info:
+  label: "Dataset"
+  summary: "The dataset to pass to a method."
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized expression values
+        required: true
+    var:
+      - type: double
+        name: hvg_score
+        description: High variability gene score (normalized dispersion). The greater, the more variable.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
diff --git a/src/tasks/dimensionality_reduction/api/file_embedding.yaml b/src/tasks/dimensionality_reduction/api/file_embedding.yaml
new file mode 100644
index 0000000000..c33d76ae8f
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/api/file_embedding.yaml
@@ -0,0 +1,25 @@
+type: file
+example: "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+info:
+  label: "Embedding"
+  summary: "A dataset with dimensionality reduction embedding."
+  slots:
+    obsm:
+      - type: double
+        name: X_emb
+        description: The dimensionally reduced embedding.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: method_id
+        description: "A unique identifier for the method"
+        required: true
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
+
diff --git a/src/tasks/dimensionality_reduction/api/file_score.yaml b/src/tasks/dimensionality_reduction/api/file_score.yaml
new file mode 100644
index 0000000000..71200ef9e1
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/api/file_score.yaml
@@ -0,0 +1,29 @@
+type: file
+example: "resources_test/dimensionality_reduction/pancreas/score.h5ad"
+info:
+  label: "Score"
+  summary: "Metric score file"
+  slots:
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
+      - type: string
+        name: method_id
+        description: "A unique identifier for the method"
+        required: true
+      - type: string
+        name: metric_ids
+        description: "One or more unique metric identifiers"
+        multiple: true
+        required: true        
+      - type: double
+        name: metric_values
+        description: "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'."
+        multiple: true
+        required: true
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/api/file_solution.yaml b/src/tasks/dimensionality_reduction/api/file_solution.yaml
new file mode 100644
index 0000000000..9d08f8fb7a
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/api/file_solution.yaml
@@ -0,0 +1,58 @@
+type: file
+example: "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+info:
+  label: "Test data"
+  summary: "The data for evaluating a dimensionality reduction."
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized expression values
+        required: true
+    obs: 
+      - type: string
+        name: cell_type
+        description: Classification of the cell type based on its characteristics and function within the tissue or organism.
+        required: true
+    var:
+      - type: double
+        name: hvg_score
+        description: High variability gene score (normalized dispersion). The greater, the more variable.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
diff --git a/src/tasks/dimensionality_reduction/api/task_info.yaml b/src/tasks/dimensionality_reduction/api/task_info.yaml
new file mode 100644
index 0000000000..4f24ae9764
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/api/task_info.yaml
@@ -0,0 +1,73 @@
+name: dimensionality_reduction
+label: "Dimensionality reduction for 2D visualization"
+v1:
+  path: openproblems/tasks/dimensionality_reduction/README.md
+  commit: b353a462f6ea353e0fc43d0f9fcbbe621edc3a0b
+summary: Reduction of high-dimensional datasets to 2D for visualization & interpretation
+image: "thumbnail.svg"
+motivation: |
+  Data visualisation is an important part of all stages of single-cell analysis, from
+  initial quality control to interpretation and presentation of final results. For bulk RNA-seq
+  studies, linear dimensionality reduction techniques such as PCA and MDS are commonly used
+  to visualise the variation between samples. While these methods are highly effective they
+  can only be used to show the first few components of variation which cannot fully represent
+  the increased complexity and number of observations in single-cell datasets. For this reason
+  non-linear techniques (most notably t-SNE and UMAP) have become the standard for visualising
+  single-cell studies. These methods attempt to compress a dataset into a two-dimensional space
+  while attempting to capture as much of the variance between observations as possible. Many
+  methods for solving this problem now exist. In general these methods try to preserve distances,
+  while some additionally consider aspects such as density within the embedded space or conservation
+  of continuous trajectories. Despite almost every single-cell study using one of these visualisations
+  there has been debate as to whether they can effectively capture the variation in single-cell
+  datasets [@chari2023speciousart].
+description: |
+  The dimensionality reduction task attempts to quantify the ability of methods to embed the
+  information present in complex single-cell studies into a two-dimensional space. Thus, this task
+  is specifically designed for dimensionality reduction for visualisation and does not consider other
+  uses of dimensionality reduction in standard single-cell workflows such as improving the
+  signal-to-noise ratio (and in fact several of the methods use PCA as a pre-processing step for this
+  reason). Unlike most tasks, methods for the dimensionality reduction task must accept a matrix
+  containing expression values normalised to 10,000 counts per cell and log transformed (log-10k) and
+  produce a two-dimensional coordinate for each cell. Pre-normalised matrices are required to
+  enforce consistency between the metric evaluation (which generally requires normalised data) and
+  the method runs. When these are not consistent, methods that use the same normalisation as used in
+  the metric tend to score more highly. For some methods we also evaluate the pre-processing
+  recommended by the method.
+authors:
+  - name: Luke Zappia
+    roles: [ maintainer, author ]
+    info:
+      github: lazappi
+  - name: Michal Klein
+    roles: [ author ]
+    info:
+      github: michalk8
+  - name: Scott Gigante
+    roles: [ author ]
+    info:
+      github: scottgigante
+      orcid: "0000-0002-4544-2764"
+  - name: Ben DeMeo
+    roles: [ author ]
+    info:
+      github: bendemeo
+  - name: Robrecht Cannoodt
+    roles: [ author ]
+    info:
+      github: rcannood
+      orcid: 0000-0003-3641-729X
+  - name: Kai Waldrant
+    roles: [ contributor ]
+    info:
+      github: KaiWaldrant
+      orcid: 0009-0003-8555-1361
+  - name: Sai Nirmayi Yasa
+    roles: [ contributor ]
+    info:
+      github: sainirmayi
+      orcid: 0009-0003-6319-9803
+  - name: Juan A. Cordero Varela
+    roles: [ contributor ]
+    info:
+      github: jacorvar
+      orcid: 0000-0002-7373-5433
diff --git a/src/tasks/dimensionality_reduction/api/thumbnail.svg b/src/tasks/dimensionality_reduction/api/thumbnail.svg
new file mode 100644
index 0000000000..62911379a1
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/api/thumbnail.svg
@@ -0,0 +1 @@
+<?xml version="1.0" encoding="UTF-8"?><svg id="Layer_1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 600 200"><defs><style>.cls-1{stroke:#211f1f;stroke-width:3px;}.cls-1,.cls-2{fill:none;stroke-miterlimit:10;}.cls-3{fill:#fa950e;}.cls-4{fill:#f58b00;}.cls-5{fill:#ffdfb4;}.cls-6{fill:#ff9f1c;}.cls-7{fill:#ffaf43;}.cls-8{fill:#ffefda;}.cls-9{fill:#ffbf69;}.cls-10{fill:#ffa730;}.cls-11{fill:#ffbb60;}.cls-12{fill:#ffc77c;}.cls-13{fill:#ffcf8f;}.cls-14{fill:#ffb756;}.cls-15{fill:#7ddcd3;}.cls-16{fill:#6ad6cc;}.cls-17{fill:#91e2db;}.cls-18{fill:#b8eee9;}.cls-19{fill:#2ec4b6;}.cls-20{fill:#a4e8e2;}.cls-21{fill:#aeebe6;}.cls-22{fill:#cbf3f0;}.cls-23{fill:#c2f1ed;}.cls-24{fill:#211f1f;}.cls-25{fill:#56d0c5;}.cls-26{font-family:ArialMT, Arial;font-size:16px;}.cls-2{stroke:#231f20;stroke-width:2px;}.cls-27{fill:#b5d399;}.cls-27,.cls-28,.cls-29,.cls-30,.cls-31{opacity:.9;}.cls-28{fill:#f5a099;}.cls-29{fill:#006c3b;}.cls-30{fill:#9a7eb8;}.cls-31{fill:#fad174;}</style></defs><g><line class="cls-1" x1="303.6" y1="99.05" x2="387.42" y2="99.05"/><polygon class="cls-24" points="385.23 106.53 398.18 99.05 385.23 91.57 385.23 106.53"/></g><g><ellipse class="cls-27" cx="284.05" cy="126.05" rx="4.44" ry="4.37" transform="translate(133.18 397.73) rotate(-84.89)"/><ellipse class="cls-27" cx="284.05" cy="106.73" rx="4.37" ry="4.44" transform="translate(-7.38 22.22) rotate(-4.42)"/><ellipse class="cls-31" cx="284.05" cy="164.69" rx="4.35" ry="4.46" transform="translate(-4.46 7.95) rotate(-1.59)"/><ellipse class="cls-28" cx="284.05" cy="87.41" rx="4.36" ry="4.45"/><ellipse class="cls-28" cx="284.05" cy="68.09" rx="4.36" ry="4.45"/><ellipse class="cls-29" cx="284.05" cy="29.46" rx="4.36" ry="4.45"/><ellipse class="cls-30" cx="284.05" cy="48.78" rx="4.44" ry="4.37" transform="translate(220.24 329.83) rotate(-86.95)"/><g><g><rect class="cls-15" x="218.94" y="124.78" width="14.7" height="2.5" transform="translate(100.26 352.32) rotate(-90)"/><rect class="cls-25" x="208.84" y="124.78" width="14.7" height="2.5" transform="translate(90.16 342.22) rotate(-90)"/><rect class="cls-8" x="226.51" y="124.78" width="14.7" height="2.5" transform="translate(107.83 359.89) rotate(-90)"/><rect class="cls-5" x="229.03" y="124.78" width="14.7" height="2.5" transform="translate(110.35 362.41) rotate(-90)"/><rect class="cls-25" x="196.22" y="124.78" width="14.7" height="2.5" transform="translate(77.54 329.6) rotate(-90)"/><rect class="cls-23" x="221.46" y="124.78" width="14.7" height="2.5" transform="translate(102.78 354.84) rotate(-90)"/><rect class="cls-20" x="216.41" y="124.78" width="14.7" height="2.5" transform="translate(97.73 349.79) rotate(-90)"/><rect class="cls-5" x="241.65" y="124.78" width="14.7" height="2.5" transform="translate(122.97 375.03) rotate(-90)"/><rect class="cls-5" x="239.13" y="124.78" width="14.7" height="2.5" transform="translate(120.45 372.51) rotate(-90)"/><rect class="cls-21" x="203.79" y="124.78" width="14.7" height="2.5" transform="translate(85.11 337.17) rotate(-90)"/><rect class="cls-17" x="213.89" y="124.78" width="14.7" height="2.5" transform="translate(95.21 347.27) rotate(-90)"/><rect class="cls-25" x="198.75" y="124.78" width="14.7" height="2.5" transform="translate(80.06 332.12) rotate(-90)"/><rect class="cls-3" x="256.8" y="124.78" width="14.7" height="2.5" transform="translate(138.11 390.18) rotate(-90)"/><rect class="cls-7" x="264.37" y="124.78" width="14.7" height="2.5" transform="translate(145.69 397.75) rotate(-90)"/><rect class="cls-6" x="259.32" y="124.78" width="14.7" height="2.5" transform="translate(140.64 392.7) rotate(-90)"/><rect class="cls-3" x="261.85" y="124.78" width="14.7" height="2.5" transform="translate(143.16 395.22) rotate(-90)"/><rect class="cls-9" x="254.27" y="124.78" width="14.7" height="2.5" transform="translate(135.59 387.65) rotate(-90)"/><rect class="cls-13" x="246.7" y="124.78" width="14.7" height="2.5" transform="translate(128.02 380.08) rotate(-90)"/><rect class="cls-6" x="251.75" y="124.78" width="14.7" height="2.5" transform="translate(133.07 385.13) rotate(-90)"/><rect class="cls-4" x="249.23" y="124.78" width="14.7" height="2.5" transform="translate(130.54 382.6) rotate(-90)"/><rect class="cls-8" x="244.18" y="124.78" width="14.7" height="2.5" transform="translate(125.49 377.56) rotate(-90)"/><rect class="cls-5" x="236.61" y="124.78" width="14.7" height="2.5" transform="translate(117.92 369.98) rotate(-90)"/><rect class="cls-12" x="234.08" y="124.78" width="14.7" height="2.5" transform="translate(115.4 367.46) rotate(-90)"/><rect class="cls-5" x="231.56" y="124.78" width="14.7" height="2.5" transform="translate(112.87 364.94) rotate(-90)"/><rect class="cls-5" x="223.99" y="124.78" width="14.7" height="2.5" transform="translate(105.3 357.36) rotate(-90)"/><rect class="cls-21" x="211.37" y="124.78" width="14.7" height="2.5" transform="translate(344.74 -92.68) rotate(90)"/><rect class="cls-17" x="206.32" y="124.78" width="14.7" height="2.5" transform="translate(339.7 -87.64) rotate(90)"/><rect class="cls-19" x="201.27" y="124.78" width="14.7" height="2.5" transform="translate(334.65 -82.59) rotate(90)"/><rect class="cls-15" x="170.98" y="124.78" width="14.7" height="2.5" transform="translate(52.3 304.36) rotate(-90)"/><rect class="cls-25" x="181.08" y="124.78" width="14.7" height="2.5" transform="translate(62.4 314.46) rotate(-90)"/><rect class="cls-8" x="163.41" y="124.78" width="14.7" height="2.5" transform="translate(44.73 296.79) rotate(-90)"/><rect class="cls-5" x="160.89" y="124.78" width="14.7" height="2.5" transform="translate(42.2 294.27) rotate(-90)"/><rect class="cls-25" x="193.7" y="124.78" width="14.7" height="2.5" transform="translate(75.02 327.08) rotate(-90)"/><rect class="cls-23" x="168.46" y="124.78" width="14.7" height="2.5" transform="translate(49.78 301.84) rotate(-90)"/><rect class="cls-20" x="173.51" y="124.78" width="14.7" height="2.5" transform="translate(54.82 306.89) rotate(-90)"/><rect class="cls-5" x="148.27" y="124.78" width="14.7" height="2.5" transform="translate(29.58 281.65) rotate(-90)"/><rect class="cls-5" x="150.79" y="124.78" width="14.7" height="2.5" transform="translate(32.11 284.17) rotate(-90)"/><rect class="cls-21" x="186.13" y="124.78" width="14.7" height="2.5" transform="translate(67.44 319.5) rotate(-90)"/><rect class="cls-17" x="176.03" y="124.78" width="14.7" height="2.5" transform="translate(57.35 309.41) rotate(-90)"/><rect class="cls-25" x="191.17" y="124.78" width="14.7" height="2.5" transform="translate(72.49 324.55) rotate(-90)"/><rect class="cls-3" x="133.12" y="124.78" width="14.7" height="2.5" transform="translate(14.44 266.5) rotate(-90)"/><rect class="cls-7" x="125.55" y="124.78" width="14.7" height="2.5" transform="translate(6.87 258.93) rotate(-90)"/><rect class="cls-6" x="130.6" y="124.78" width="14.7" height="2.5" transform="translate(11.92 263.98) rotate(-90)"/><rect class="cls-3" x="128.08" y="124.78" width="14.7" height="2.5" transform="translate(9.39 261.45) rotate(-90)"/><rect class="cls-9" x="135.65" y="124.78" width="14.7" height="2.5" transform="translate(16.96 269.03) rotate(-90)"/><rect class="cls-13" x="143.22" y="124.78" width="14.7" height="2.5" transform="translate(24.54 276.6) rotate(-90)"/><rect class="cls-6" x="138.17" y="124.78" width="14.7" height="2.5" transform="translate(19.49 271.55) rotate(-90)"/><rect class="cls-4" x="140.7" y="124.78" width="14.7" height="2.5" transform="translate(22.01 274.07) rotate(-90)"/><rect class="cls-8" x="145.74" y="124.78" width="14.7" height="2.5" transform="translate(27.06 279.12) rotate(-90)"/><rect class="cls-5" x="153.32" y="124.78" width="14.7" height="2.5" transform="translate(34.63 286.69) rotate(-90)"/><rect class="cls-12" x="155.84" y="124.78" width="14.7" height="2.5" transform="translate(37.16 289.22) rotate(-90)"/><rect class="cls-5" x="158.36" y="124.78" width="14.7" height="2.5" transform="translate(39.68 291.74) rotate(-90)"/><rect class="cls-5" x="165.93" y="124.78" width="14.7" height="2.5" transform="translate(47.25 299.31) rotate(-90)"/><rect class="cls-21" x="178.55" y="124.78" width="14.7" height="2.5" transform="translate(311.93 -59.87) rotate(90)"/><rect class="cls-17" x="183.6" y="124.78" width="14.7" height="2.5" transform="translate(316.98 -64.92) rotate(90)"/><rect class="cls-19" x="188.65" y="124.78" width="14.7" height="2.5" transform="translate(322.03 -69.97) rotate(90)"/><rect class="cls-15" x="67.5" y="124.78" width="14.7" height="2.5" transform="translate(-51.18 200.88) rotate(-90)"/><rect class="cls-25" x="77.6" y="124.78" width="14.7" height="2.5" transform="translate(-41.09 210.97) rotate(-90)"/><rect class="cls-8" x="59.93" y="124.78" width="14.7" height="2.5" transform="translate(-58.75 193.31) rotate(-90)"/><rect class="cls-5" x="57.4" y="124.78" width="14.7" height="2.5" transform="translate(-61.28 190.78) rotate(-90)"/><rect class="cls-25" x="90.22" y="124.78" width="14.7" height="2.5" transform="translate(-28.47 223.59) rotate(-90)"/><rect class="cls-23" x="64.98" y="124.78" width="14.7" height="2.5" transform="translate(-53.71 198.36) rotate(-90)"/><rect class="cls-20" x="70.02" y="124.78" width="14.7" height="2.5" transform="translate(-48.66 203.4) rotate(-90)"/><rect class="cls-5" x="44.79" y="124.78" width="14.7" height="2.5" transform="translate(-73.9 178.16) rotate(-90)"/><rect class="cls-5" x="47.31" y="124.78" width="14.7" height="2.5" transform="translate(-71.37 180.69) rotate(-90)"/><rect class="cls-21" x="82.64" y="124.78" width="14.7" height="2.5" transform="translate(-36.04 216.02) rotate(-90)"/><rect class="cls-17" x="72.55" y="124.78" width="14.7" height="2.5" transform="translate(-46.13 205.93) rotate(-90)"/><rect class="cls-25" x="87.69" y="124.78" width="14.7" height="2.5" transform="translate(-30.99 221.07) rotate(-90)"/><rect class="cls-3" x="29.64" y="124.78" width="14.7" height="2.5" transform="translate(-89.04 163.02) rotate(-90)"/><rect class="cls-7" x="22.07" y="124.78" width="14.7" height="2.5" transform="translate(-96.61 155.45) rotate(-90)"/><rect class="cls-6" x="27.12" y="124.78" width="14.7" height="2.5" transform="translate(-91.57 160.5) rotate(-90)"/><rect class="cls-3" x="24.59" y="124.78" width="14.7" height="2.5" transform="translate(-94.09 157.97) rotate(-90)"/><rect class="cls-9" x="32.17" y="124.78" width="14.7" height="2.5" transform="translate(-86.52 165.54) rotate(-90)"/><rect class="cls-13" x="39.74" y="124.78" width="14.7" height="2.5" transform="translate(-78.95 173.12) rotate(-90)"/><rect class="cls-6" x="34.69" y="124.78" width="14.7" height="2.5" transform="translate(-83.99 168.07) rotate(-90)"/><rect class="cls-4" x="37.21" y="124.78" width="14.7" height="2.5" transform="translate(-81.47 170.59) rotate(-90)"/><rect class="cls-8" x="42.26" y="124.78" width="14.7" height="2.5" transform="translate(-76.42 175.64) rotate(-90)"/><rect class="cls-5" x="49.83" y="124.78" width="14.7" height="2.5" transform="translate(-68.85 183.21) rotate(-90)"/><rect class="cls-12" x="52.36" y="124.78" width="14.7" height="2.5" transform="translate(-66.33 185.74) rotate(-90)"/><rect class="cls-5" x="54.88" y="124.78" width="14.7" height="2.5" transform="translate(-63.8 188.26) rotate(-90)"/><rect class="cls-5" x="62.45" y="124.78" width="14.7" height="2.5" transform="translate(-56.23 195.83) rotate(-90)"/><rect class="cls-21" x="75.07" y="124.78" width="14.7" height="2.5" transform="translate(208.45 43.61) rotate(90)"/><rect class="cls-17" x="80.12" y="124.78" width="14.7" height="2.5" transform="translate(213.5 38.56) rotate(90)"/><rect class="cls-19" x="85.17" y="124.78" width="14.7" height="2.5" transform="translate(218.55 33.51) rotate(90)"/><rect class="cls-17" x="112.93" y="124.78" width="14.7" height="2.5" transform="translate(246.31 5.75) rotate(90)"/><rect class="cls-18" x="117.98" y="124.78" width="14.7" height="2.5" transform="translate(251.36 .7) rotate(90)"/><rect class="cls-21" x="110.41" y="124.78" width="14.7" height="2.5" transform="translate(243.79 8.27) rotate(90)"/><rect class="cls-17" x="105.36" y="124.78" width="14.7" height="2.5" transform="translate(238.74 13.32) rotate(90)"/><rect class="cls-15" x="115.46" y="124.78" width="14.7" height="2.5" transform="translate(248.83 3.23) rotate(90)"/><rect class="cls-22" x="123.03" y="124.78" width="14.7" height="2.5" transform="translate(256.41 -4.34) rotate(90)"/><rect class="cls-23" x="120.5" y="124.78" width="14.7" height="2.5" transform="translate(253.88 -1.82) rotate(90)"/><rect class="cls-20" x="95.26" y="124.78" width="14.7" height="2.5" transform="translate(228.64 23.42) rotate(90)"/><rect class="cls-16" x="92.74" y="124.78" width="14.7" height="2.5" transform="translate(226.12 25.94) rotate(90)"/><rect class="cls-19" x="100.31" y="124.78" width="14.7" height="2.5" transform="translate(233.69 18.37) rotate(90)"/><rect class="cls-25" x="102.84" y="124.78" width="14.7" height="2.5" transform="translate(-15.85 236.21) rotate(-90)"/><rect class="cls-21" x="97.79" y="124.78" width="14.7" height="2.5" transform="translate(-20.89 231.17) rotate(-90)"/><rect class="cls-17" x="107.88" y="124.78" width="14.7" height="2.5" transform="translate(-10.8 241.26) rotate(-90)"/></g><g><rect class="cls-9" x="244.18" y="86.12" width="14.7" height="2.5" transform="translate(164.15 338.9) rotate(-90)"/><rect class="cls-13" x="239.13" y="86.12" width="14.7" height="2.5" transform="translate(159.11 333.85) rotate(-90)"/><rect class="cls-17" x="216.41" y="86.12" width="14.7" height="2.5" transform="translate(136.39 311.13) rotate(-90)"/><rect class="cls-13" x="241.65" y="86.12" width="14.7" height="2.5" transform="translate(161.63 336.37) rotate(-90)"/><rect class="cls-18" x="221.46" y="86.12" width="14.7" height="2.5" transform="translate(141.44 316.18) rotate(-90)"/><rect class="cls-21" x="213.89" y="86.12" width="14.7" height="2.5" transform="translate(133.87 308.61) rotate(-90)"/><rect class="cls-17" x="208.84" y="86.12" width="14.7" height="2.5" transform="translate(128.82 303.56) rotate(-90)"/><rect class="cls-15" x="218.94" y="86.12" width="14.7" height="2.5" transform="translate(138.91 313.66) rotate(-90)"/><rect class="cls-22" x="226.51" y="86.12" width="14.7" height="2.5" transform="translate(146.49 321.23) rotate(-90)"/><rect class="cls-23" x="223.99" y="86.12" width="14.7" height="2.5" transform="translate(143.96 318.71) rotate(-90)"/><rect class="cls-20" x="198.75" y="86.12" width="14.7" height="2.5" transform="translate(118.72 293.47) rotate(-90)"/><rect class="cls-22" x="229.03" y="86.12" width="14.7" height="2.5" transform="translate(149.01 323.75) rotate(-90)"/><rect class="cls-16" x="196.22" y="86.12" width="14.7" height="2.5" transform="translate(116.2 290.94) rotate(-90)"/><rect class="cls-19" x="203.79" y="86.12" width="14.7" height="2.5" transform="translate(123.77 298.51) rotate(-90)"/><rect class="cls-11" x="256.8" y="86.12" width="14.7" height="2.5" transform="translate(176.77 351.52) rotate(-90)"/><rect class="cls-6" x="264.37" y="86.12" width="14.7" height="2.5" transform="translate(184.34 359.09) rotate(-90)"/><rect class="cls-6" x="259.32" y="86.12" width="14.7" height="2.5" transform="translate(179.3 354.04) rotate(-90)"/><rect class="cls-10" x="261.85" y="86.12" width="14.7" height="2.5" transform="translate(181.82 356.56) rotate(-90)"/><rect class="cls-4" x="254.27" y="86.12" width="14.7" height="2.5" transform="translate(174.25 348.99) rotate(-90)"/><rect class="cls-14" x="246.7" y="86.12" width="14.7" height="2.5" transform="translate(166.68 341.42) rotate(-90)"/><rect class="cls-12" x="251.75" y="86.12" width="14.7" height="2.5" transform="translate(171.73 346.47) rotate(-90)"/><rect class="cls-13" x="249.23" y="86.12" width="14.7" height="2.5" transform="translate(169.2 343.95) rotate(-90)"/><rect class="cls-13" x="234.08" y="86.12" width="14.7" height="2.5" transform="translate(154.06 328.8) rotate(-90)"/><rect class="cls-13" x="231.56" y="86.12" width="14.7" height="2.5" transform="translate(151.53 326.28) rotate(-90)"/><rect class="cls-5" x="236.61" y="86.12" width="14.7" height="2.5" transform="translate(156.58 331.33) rotate(-90)"/><rect class="cls-25" x="206.32" y="86.12" width="14.7" height="2.5" transform="translate(301.04 -126.29) rotate(90)"/><rect class="cls-21" x="201.27" y="86.12" width="14.7" height="2.5" transform="translate(295.99 -121.25) rotate(90)"/><rect class="cls-17" x="211.37" y="86.12" width="14.7" height="2.5" transform="translate(306.09 -131.34) rotate(90)"/><rect class="cls-18" x="148.27" y="86.12" width="14.7" height="2.5" transform="translate(242.99 -68.24) rotate(90)"/><rect class="cls-18" x="143.22" y="86.12" width="14.7" height="2.5" transform="translate(237.94 -63.2) rotate(90)"/><rect class="cls-21" x="150.79" y="86.12" width="14.7" height="2.5" transform="translate(245.51 -70.77) rotate(90)"/><rect class="cls-16" x="145.74" y="86.12" width="14.7" height="2.5" transform="translate(240.46 -65.72) rotate(90)"/><rect class="cls-11" x="170.98" y="86.12" width="14.7" height="2.5" transform="translate(265.7 -90.96) rotate(90)"/><rect class="cls-12" x="168.46" y="86.12" width="14.7" height="2.5" transform="translate(263.18 -88.43) rotate(90)"/><rect class="cls-16" x="128.08" y="86.12" width="14.7" height="2.5" transform="translate(222.8 -48.05) rotate(90)"/><rect class="cls-19" x="125.55" y="86.12" width="14.7" height="2.5" transform="translate(220.27 -45.53) rotate(90)"/><rect class="cls-8" x="155.84" y="86.12" width="14.7" height="2.5" transform="translate(250.56 -75.81) rotate(90)"/><rect class="cls-18" x="138.17" y="86.12" width="14.7" height="2.5" transform="translate(232.89 -58.15) rotate(90)"/><rect class="cls-15" x="133.12" y="86.12" width="14.7" height="2.5" transform="translate(227.84 -53.1) rotate(90)"/><rect class="cls-23" x="153.32" y="86.12" width="14.7" height="2.5" transform="translate(248.03 -73.29) rotate(90)"/><rect class="cls-20" x="158.36" y="86.12" width="14.7" height="2.5" transform="translate(253.08 -78.34) rotate(90)"/><rect class="cls-9" x="186.13" y="86.12" width="14.7" height="2.5" transform="translate(280.85 -106.1) rotate(90)"/><rect class="cls-10" x="193.7" y="86.12" width="14.7" height="2.5" transform="translate(288.42 -113.67) rotate(90)"/><rect class="cls-11" x="188.65" y="86.12" width="14.7" height="2.5" transform="translate(283.37 -108.63) rotate(90)"/><rect class="cls-9" x="191.17" y="86.12" width="14.7" height="2.5" transform="translate(285.89 -111.15) rotate(90)"/><rect class="cls-4" x="183.6" y="86.12" width="14.7" height="2.5" transform="translate(278.32 -103.58) rotate(90)"/><rect class="cls-11" x="176.03" y="86.12" width="14.7" height="2.5" transform="translate(270.75 -96.01) rotate(90)"/><rect class="cls-10" x="181.08" y="86.12" width="14.7" height="2.5" transform="translate(275.8 -101.05) rotate(90)"/><rect class="cls-14" x="178.55" y="86.12" width="14.7" height="2.5" transform="translate(273.27 -98.53) rotate(90)"/><rect class="cls-5" x="173.51" y="86.12" width="14.7" height="2.5" transform="translate(268.23 -93.48) rotate(90)"/><rect class="cls-11" x="160.89" y="86.12" width="14.7" height="2.5" transform="translate(255.61 -80.86) rotate(90)"/><rect class="cls-12" x="163.41" y="86.12" width="14.7" height="2.5" transform="translate(258.13 -83.39) rotate(90)"/><rect class="cls-8" x="165.93" y="86.12" width="14.7" height="2.5" transform="translate(260.65 -85.91) rotate(90)"/><rect class="cls-16" x="130.6" y="86.12" width="14.7" height="2.5" transform="translate(50.58 225.32) rotate(-90)"/><rect class="cls-19" x="140.7" y="86.12" width="14.7" height="2.5" transform="translate(60.67 235.42) rotate(-90)"/><rect class="cls-20" x="135.65" y="86.12" width="14.7" height="2.5" transform="translate(55.62 230.37) rotate(-90)"/><rect class="cls-18" x="44.79" y="86.12" width="14.7" height="2.5" transform="translate(139.5 35.24) rotate(90)"/><rect class="cls-18" x="39.74" y="86.12" width="14.7" height="2.5" transform="translate(134.46 40.29) rotate(90)"/><rect class="cls-21" x="47.31" y="86.12" width="14.7" height="2.5" transform="translate(142.03 32.72) rotate(90)"/><rect class="cls-16" x="42.26" y="86.12" width="14.7" height="2.5" transform="translate(136.98 37.76) rotate(90)"/><rect class="cls-11" x="67.5" y="86.12" width="14.7" height="2.5" transform="translate(162.22 12.52) rotate(90)"/><rect class="cls-12" x="64.98" y="86.12" width="14.7" height="2.5" transform="translate(159.7 15.05) rotate(90)"/><rect class="cls-16" x="24.59" y="86.12" width="14.7" height="2.5" transform="translate(119.31 55.43) rotate(90)"/><rect class="cls-19" x="22.07" y="86.12" width="14.7" height="2.5" transform="translate(116.79 57.95) rotate(90)"/><rect class="cls-8" x="52.36" y="86.12" width="14.7" height="2.5" transform="translate(147.08 27.67) rotate(90)"/><rect class="cls-18" x="34.69" y="86.12" width="14.7" height="2.5" transform="translate(129.41 45.33) rotate(90)"/><rect class="cls-15" x="29.64" y="86.12" width="14.7" height="2.5" transform="translate(124.36 50.38) rotate(90)"/><rect class="cls-23" x="49.83" y="86.12" width="14.7" height="2.5" transform="translate(144.55 30.19) rotate(90)"/><rect class="cls-20" x="54.88" y="86.12" width="14.7" height="2.5" transform="translate(149.6 25.14) rotate(90)"/><rect class="cls-9" x="82.64" y="86.12" width="14.7" height="2.5" transform="translate(177.36 -2.62) rotate(90)"/><rect class="cls-10" x="90.22" y="86.12" width="14.7" height="2.5" transform="translate(184.94 -10.19) rotate(90)"/><rect class="cls-11" x="85.17" y="86.12" width="14.7" height="2.5" transform="translate(179.89 -5.14) rotate(90)"/><rect class="cls-9" x="87.69" y="86.12" width="14.7" height="2.5" transform="translate(182.41 -7.67) rotate(90)"/><rect class="cls-4" x="80.12" y="86.12" width="14.7" height="2.5" transform="translate(174.84 -.1) rotate(90)"/><rect class="cls-11" x="72.55" y="86.12" width="14.7" height="2.5" transform="translate(167.27 7.48) rotate(90)"/><rect class="cls-10" x="77.6" y="86.12" width="14.7" height="2.5" transform="translate(172.32 2.43) rotate(90)"/><rect class="cls-14" x="75.07" y="86.12" width="14.7" height="2.5" transform="translate(169.79 4.95) rotate(90)"/><rect class="cls-5" x="70.02" y="86.12" width="14.7" height="2.5" transform="translate(164.74 10) rotate(90)"/><rect class="cls-11" x="57.4" y="86.12" width="14.7" height="2.5" transform="translate(152.12 22.62) rotate(90)"/><rect class="cls-12" x="59.93" y="86.12" width="14.7" height="2.5" transform="translate(154.65 20.1) rotate(90)"/><rect class="cls-8" x="62.45" y="86.12" width="14.7" height="2.5" transform="translate(157.17 17.57) rotate(90)"/><rect class="cls-16" x="27.12" y="86.12" width="14.7" height="2.5" transform="translate(-52.91 121.84) rotate(-90)"/><rect class="cls-19" x="37.21" y="86.12" width="14.7" height="2.5" transform="translate(-42.81 131.93) rotate(-90)"/><rect class="cls-20" x="32.17" y="86.12" width="14.7" height="2.5" transform="translate(-47.86 126.89) rotate(-90)"/><rect class="cls-16" x="115.46" y="86.12" width="14.7" height="2.5" transform="translate(35.43 210.18) rotate(-90)"/><rect class="cls-19" x="105.36" y="86.12" width="14.7" height="2.5" transform="translate(25.34 200.08) rotate(-90)"/><rect class="cls-20" x="110.41" y="86.12" width="14.7" height="2.5" transform="translate(30.38 205.13) rotate(-90)"/><rect class="cls-15" x="102.84" y="86.12" width="14.7" height="2.5" transform="translate(22.81 197.56) rotate(-90)"/><rect class="cls-19" x="120.5" y="86.12" width="14.7" height="2.5" transform="translate(40.48 215.22) rotate(-90)"/><rect class="cls-25" x="123.03" y="86.12" width="14.7" height="2.5" transform="translate(43 217.75) rotate(-90)"/><rect class="cls-18" x="107.88" y="86.12" width="14.7" height="2.5" transform="translate(202.6 -27.86) rotate(90)"/><rect class="cls-18" x="112.93" y="86.12" width="14.7" height="2.5" transform="translate(207.65 -32.91) rotate(90)"/><rect class="cls-15" x="117.98" y="86.12" width="14.7" height="2.5" transform="translate(212.7 -37.96) rotate(90)"/><rect class="cls-20" x="95.26" y="86.12" width="14.7" height="2.5" transform="translate(189.98 -15.24) rotate(90)"/><rect class="cls-16" x="92.74" y="86.12" width="14.7" height="2.5" transform="translate(187.46 -12.72) rotate(90)"/><rect class="cls-19" x="100.31" y="86.12" width="14.7" height="2.5" transform="translate(195.03 -20.29) rotate(90)"/><rect class="cls-21" x="97.79" y="86.12" width="14.7" height="2.5" transform="translate(17.76 192.51) rotate(-90)"/></g><g><rect class="cls-21" x="223.99" y="163.44" width="14.7" height="2.5" transform="translate(66.64 396.02) rotate(-90)"/><rect class="cls-17" x="218.94" y="163.44" width="14.7" height="2.5" transform="translate(61.6 390.97) rotate(-90)"/><rect class="cls-22" x="221.46" y="163.44" width="14.7" height="2.5" transform="translate(64.12 393.5) rotate(-90)"/><rect class="cls-16" x="203.79" y="163.44" width="14.7" height="2.5" transform="translate(46.45 375.83) rotate(-90)"/><rect class="cls-19" x="213.89" y="163.44" width="14.7" height="2.5" transform="translate(56.55 385.93) rotate(-90)"/><rect class="cls-20" x="208.84" y="163.44" width="14.7" height="2.5" transform="translate(51.5 380.88) rotate(-90)"/><rect class="cls-14" x="244.18" y="163.44" width="14.7" height="2.5" transform="translate(86.84 416.21) rotate(-90)"/><rect class="cls-12" x="239.13" y="163.44" width="14.7" height="2.5" transform="translate(81.79 411.17) rotate(-90)"/><rect class="cls-15" x="216.41" y="163.44" width="14.7" height="2.5" transform="translate(59.07 388.45) rotate(-90)"/><rect class="cls-22" x="226.51" y="163.44" width="14.7" height="2.5" transform="translate(69.17 398.55) rotate(-90)"/><rect class="cls-19" x="198.75" y="163.44" width="14.7" height="2.5" transform="translate(41.41 370.78) rotate(-90)"/><rect class="cls-8" x="229.03" y="163.44" width="14.7" height="2.5" transform="translate(71.69 401.07) rotate(-90)"/><rect class="cls-25" x="196.22" y="163.44" width="14.7" height="2.5" transform="translate(38.88 368.26) rotate(-90)"/><rect class="cls-12" x="241.65" y="163.44" width="14.7" height="2.5" transform="translate(84.31 413.69) rotate(-90)"/><rect class="cls-4" x="256.8" y="163.44" width="14.7" height="2.5" transform="translate(99.46 428.83) rotate(-90)"/><rect class="cls-3" x="264.37" y="163.44" width="14.7" height="2.5" transform="translate(107.03 436.41) rotate(-90)"/><rect class="cls-7" x="259.32" y="163.44" width="14.7" height="2.5" transform="translate(101.98 431.36) rotate(-90)"/><rect class="cls-10" x="261.85" y="163.44" width="14.7" height="2.5" transform="translate(104.5 433.88) rotate(-90)"/><rect class="cls-7" x="254.27" y="163.44" width="14.7" height="2.5" transform="translate(96.93 426.31) rotate(-90)"/><rect class="cls-7" x="246.7" y="163.44" width="14.7" height="2.5" transform="translate(89.36 418.74) rotate(-90)"/><rect class="cls-14" x="251.75" y="163.44" width="14.7" height="2.5" transform="translate(94.41 423.79) rotate(-90)"/><rect class="cls-3" x="249.23" y="163.44" width="14.7" height="2.5" transform="translate(91.88 421.26) rotate(-90)"/><rect class="cls-12" x="234.08" y="163.44" width="14.7" height="2.5" transform="translate(76.74 406.12) rotate(-90)"/><rect class="cls-8" x="236.61" y="163.44" width="14.7" height="2.5" transform="translate(79.26 408.64) rotate(-90)"/><rect class="cls-22" x="231.56" y="163.44" width="14.7" height="2.5" transform="translate(74.22 403.59) rotate(-90)"/><rect class="cls-18" x="211.37" y="163.44" width="14.7" height="2.5" transform="translate(383.4 -54.02) rotate(90)"/><rect class="cls-18" x="206.32" y="163.44" width="14.7" height="2.5" transform="translate(378.35 -48.98) rotate(90)"/><rect class="cls-15" x="201.27" y="163.44" width="14.7" height="2.5" transform="translate(373.31 -43.93) rotate(90)"/><rect class="cls-21" x="165.93" y="163.44" width="14.7" height="2.5" transform="translate(8.59 337.97) rotate(-90)"/><rect class="cls-17" x="170.98" y="163.44" width="14.7" height="2.5" transform="translate(13.64 343.02) rotate(-90)"/><rect class="cls-22" x="168.46" y="163.44" width="14.7" height="2.5" transform="translate(11.12 340.5) rotate(-90)"/><rect class="cls-16" x="186.13" y="163.44" width="14.7" height="2.5" transform="translate(28.79 358.16) rotate(-90)"/><rect class="cls-19" x="176.03" y="163.44" width="14.7" height="2.5" transform="translate(18.69 348.07) rotate(-90)"/><rect class="cls-20" x="181.08" y="163.44" width="14.7" height="2.5" transform="translate(23.74 353.12) rotate(-90)"/><rect class="cls-14" x="145.74" y="163.44" width="14.7" height="2.5" transform="translate(-11.6 317.78) rotate(-90)"/><rect class="cls-12" x="150.79" y="163.44" width="14.7" height="2.5" transform="translate(-6.55 322.83) rotate(-90)"/><rect class="cls-15" x="173.51" y="163.44" width="14.7" height="2.5" transform="translate(16.17 345.54) rotate(-90)"/><rect class="cls-22" x="163.41" y="163.44" width="14.7" height="2.5" transform="translate(6.07 335.45) rotate(-90)"/><rect class="cls-19" x="191.17" y="163.44" width="14.7" height="2.5" transform="translate(33.83 363.21) rotate(-90)"/><rect class="cls-8" x="160.89" y="163.44" width="14.7" height="2.5" transform="translate(3.55 332.92) rotate(-90)"/><rect class="cls-25" x="193.7" y="163.44" width="14.7" height="2.5" transform="translate(36.36 365.74) rotate(-90)"/><rect class="cls-12" x="148.27" y="163.44" width="14.7" height="2.5" transform="translate(-9.07 320.3) rotate(-90)"/><rect class="cls-4" x="133.12" y="163.44" width="14.7" height="2.5" transform="translate(-24.22 305.16) rotate(-90)"/><rect class="cls-3" x="125.55" y="163.44" width="14.7" height="2.5" transform="translate(-31.79 297.59) rotate(-90)"/><rect class="cls-7" x="130.6" y="163.44" width="14.7" height="2.5" transform="translate(-26.74 302.64) rotate(-90)"/><rect class="cls-10" x="128.08" y="163.44" width="14.7" height="2.5" transform="translate(-29.27 300.11) rotate(-90)"/><rect class="cls-7" x="135.65" y="163.44" width="14.7" height="2.5" transform="translate(-21.69 307.68) rotate(-90)"/><rect class="cls-7" x="143.22" y="163.44" width="14.7" height="2.5" transform="translate(-14.12 315.26) rotate(-90)"/><rect class="cls-14" x="138.17" y="163.44" width="14.7" height="2.5" transform="translate(-19.17 310.21) rotate(-90)"/><rect class="cls-3" x="140.7" y="163.44" width="14.7" height="2.5" transform="translate(-16.65 312.73) rotate(-90)"/><rect class="cls-12" x="155.84" y="163.44" width="14.7" height="2.5" transform="translate(-1.5 327.88) rotate(-90)"/><rect class="cls-8" x="153.32" y="163.44" width="14.7" height="2.5" transform="translate(-4.03 325.35) rotate(-90)"/><rect class="cls-22" x="158.36" y="163.44" width="14.7" height="2.5" transform="translate(1.02 330.4) rotate(-90)"/><rect class="cls-18" x="178.55" y="163.44" width="14.7" height="2.5" transform="translate(350.59 -21.21) rotate(90)"/><rect class="cls-18" x="183.6" y="163.44" width="14.7" height="2.5" transform="translate(355.64 -26.26) rotate(90)"/><rect class="cls-15" x="188.65" y="163.44" width="14.7" height="2.5" transform="translate(360.69 -31.31) rotate(90)"/><rect class="cls-21" x="62.45" y="163.44" width="14.7" height="2.5" transform="translate(-94.89 234.49) rotate(-90)"/><rect class="cls-17" x="67.5" y="163.44" width="14.7" height="2.5" transform="translate(-89.84 239.54) rotate(-90)"/><rect class="cls-22" x="64.98" y="163.44" width="14.7" height="2.5" transform="translate(-92.36 237.01) rotate(-90)"/><rect class="cls-16" x="82.64" y="163.44" width="14.7" height="2.5" transform="translate(-74.7 254.68) rotate(-90)"/><rect class="cls-19" x="72.55" y="163.44" width="14.7" height="2.5" transform="translate(-84.79 244.59) rotate(-90)"/><rect class="cls-20" x="77.6" y="163.44" width="14.7" height="2.5" transform="translate(-79.74 249.63) rotate(-90)"/><rect class="cls-14" x="42.26" y="163.44" width="14.7" height="2.5" transform="translate(-115.08 214.3) rotate(-90)"/><rect class="cls-12" x="47.31" y="163.44" width="14.7" height="2.5" transform="translate(-110.03 219.35) rotate(-90)"/><rect class="cls-15" x="70.02" y="163.44" width="14.7" height="2.5" transform="translate(-87.32 242.06) rotate(-90)"/><rect class="cls-22" x="59.93" y="163.44" width="14.7" height="2.5" transform="translate(-97.41 231.97) rotate(-90)"/><rect class="cls-19" x="87.69" y="163.44" width="14.7" height="2.5" transform="translate(-69.65 259.73) rotate(-90)"/><rect class="cls-8" x="57.4" y="163.44" width="14.7" height="2.5" transform="translate(-99.94 229.44) rotate(-90)"/><rect class="cls-25" x="90.22" y="163.44" width="14.7" height="2.5" transform="translate(-67.12 262.25) rotate(-90)"/><rect class="cls-12" x="44.79" y="163.44" width="14.7" height="2.5" transform="translate(-112.56 216.82) rotate(-90)"/><rect class="cls-4" x="29.64" y="163.44" width="14.7" height="2.5" transform="translate(-127.7 201.68) rotate(-90)"/><rect class="cls-3" x="22.07" y="163.44" width="14.7" height="2.5" transform="translate(-135.27 194.11) rotate(-90)"/><rect class="cls-7" x="27.12" y="163.44" width="14.7" height="2.5" transform="translate(-130.22 199.15) rotate(-90)"/><rect class="cls-10" x="24.59" y="163.44" width="14.7" height="2.5" transform="translate(-132.75 196.63) rotate(-90)"/><rect class="cls-7" x="32.17" y="163.44" width="14.7" height="2.5" transform="translate(-125.18 204.2) rotate(-90)"/><rect class="cls-7" x="39.74" y="163.44" width="14.7" height="2.5" transform="translate(-117.6 211.77) rotate(-90)"/><rect class="cls-14" x="34.69" y="163.44" width="14.7" height="2.5" transform="translate(-122.65 206.73) rotate(-90)"/><rect class="cls-3" x="37.21" y="163.44" width="14.7" height="2.5" transform="translate(-120.13 209.25) rotate(-90)"/><rect class="cls-12" x="52.36" y="163.44" width="14.7" height="2.5" transform="translate(-104.98 224.39) rotate(-90)"/><rect class="cls-8" x="49.83" y="163.44" width="14.7" height="2.5" transform="translate(-107.51 221.87) rotate(-90)"/><rect class="cls-22" x="54.88" y="163.44" width="14.7" height="2.5" transform="translate(-102.46 226.92) rotate(-90)"/><rect class="cls-18" x="75.07" y="163.44" width="14.7" height="2.5" transform="translate(247.11 82.27) rotate(90)"/><rect class="cls-18" x="80.12" y="163.44" width="14.7" height="2.5" transform="translate(252.16 77.22) rotate(90)"/><rect class="cls-15" x="85.17" y="163.44" width="14.7" height="2.5" transform="translate(257.21 72.17) rotate(90)"/><rect class="cls-17" x="115.46" y="163.44" width="14.7" height="2.5" transform="translate(287.49 41.89) rotate(90)"/><rect class="cls-18" x="120.5" y="163.44" width="14.7" height="2.5" transform="translate(292.54 36.84) rotate(90)"/><rect class="cls-21" x="112.93" y="163.44" width="14.7" height="2.5" transform="translate(284.97 44.41) rotate(90)"/><rect class="cls-17" x="107.88" y="163.44" width="14.7" height="2.5" transform="translate(279.92 49.46) rotate(90)"/><rect class="cls-15" x="117.98" y="163.44" width="14.7" height="2.5" transform="translate(290.02 39.36) rotate(90)"/><rect class="cls-23" x="123.03" y="163.44" width="14.7" height="2.5" transform="translate(295.06 34.31) rotate(90)"/><rect class="cls-25" x="105.36" y="163.44" width="14.7" height="2.5" transform="translate(-51.98 277.4) rotate(-90)"/><rect class="cls-17" x="110.41" y="163.44" width="14.7" height="2.5" transform="translate(-46.93 282.44) rotate(-90)"/><rect class="cls-18" x="97.79" y="163.44" width="14.7" height="2.5" transform="translate(269.82 59.55) rotate(90)"/><rect class="cls-18" x="92.74" y="163.44" width="14.7" height="2.5" transform="translate(264.78 64.6) rotate(90)"/><rect class="cls-21" x="100.31" y="163.44" width="14.7" height="2.5" transform="translate(272.35 57.03) rotate(90)"/><rect class="cls-16" x="95.26" y="163.44" width="14.7" height="2.5" transform="translate(267.3 62.08) rotate(90)"/><rect class="cls-23" x="102.84" y="163.44" width="14.7" height="2.5" transform="translate(274.87 54.51) rotate(90)"/></g><g><rect class="cls-18" x="67.5" y="47.46" width="14.7" height="2.5" transform="translate(26.13 123.56) rotate(-90)"/><rect class="cls-18" x="72.55" y="47.46" width="14.7" height="2.5" transform="translate(31.18 128.61) rotate(-90)"/><rect class="cls-21" x="64.98" y="47.46" width="14.7" height="2.5" transform="translate(23.61 121.04) rotate(-90)"/><rect class="cls-16" x="70.02" y="47.46" width="14.7" height="2.5" transform="translate(28.66 126.09) rotate(-90)"/><rect class="cls-11" x="44.79" y="47.46" width="14.7" height="2.5" transform="translate(3.42 100.85) rotate(-90)"/><rect class="cls-12" x="47.31" y="47.46" width="14.7" height="2.5" transform="translate(5.94 103.37) rotate(-90)"/><rect class="cls-16" x="87.69" y="47.46" width="14.7" height="2.5" transform="translate(46.33 143.75) rotate(-90)"/><rect class="cls-19" x="90.22" y="47.46" width="14.7" height="2.5" transform="translate(48.85 146.28) rotate(-90)"/><rect class="cls-8" x="59.93" y="47.46" width="14.7" height="2.5" transform="translate(18.56 115.99) rotate(-90)"/><rect class="cls-18" x="77.6" y="47.46" width="14.7" height="2.5" transform="translate(36.23 133.66) rotate(-90)"/><rect class="cls-15" x="82.64" y="47.46" width="14.7" height="2.5" transform="translate(41.28 138.71) rotate(-90)"/><rect class="cls-23" x="62.45" y="47.46" width="14.7" height="2.5" transform="translate(21.09 118.51) rotate(-90)"/><rect class="cls-20" x="57.4" y="47.46" width="14.7" height="2.5" transform="translate(16.04 113.47) rotate(-90)"/><rect class="cls-9" x="29.64" y="47.46" width="14.7" height="2.5" transform="translate(-11.72 85.7) rotate(-90)"/><rect class="cls-10" x="22.07" y="47.46" width="14.7" height="2.5" transform="translate(-19.3 78.13) rotate(-90)"/><rect class="cls-11" x="27.12" y="47.46" width="14.7" height="2.5" transform="translate(-14.25 83.18) rotate(-90)"/><rect class="cls-9" x="24.59" y="47.46" width="14.7" height="2.5" transform="translate(-16.77 80.65) rotate(-90)"/><rect class="cls-4" x="32.17" y="47.46" width="14.7" height="2.5" transform="translate(-9.2 88.23) rotate(-90)"/><rect class="cls-11" x="39.74" y="47.46" width="14.7" height="2.5" transform="translate(-1.63 95.8) rotate(-90)"/><rect class="cls-10" x="34.69" y="47.46" width="14.7" height="2.5" transform="translate(-6.68 90.75) rotate(-90)"/><rect class="cls-14" x="37.21" y="47.46" width="14.7" height="2.5" transform="translate(-4.15 93.27) rotate(-90)"/><rect class="cls-5" x="42.26" y="47.46" width="14.7" height="2.5" transform="translate(.9 98.32) rotate(-90)"/><rect class="cls-11" x="54.88" y="47.46" width="14.7" height="2.5" transform="translate(13.52 110.94) rotate(-90)"/><rect class="cls-12" x="52.36" y="47.46" width="14.7" height="2.5" transform="translate(10.99 108.42) rotate(-90)"/><rect class="cls-8" x="49.83" y="47.46" width="14.7" height="2.5" transform="translate(8.47 105.89) rotate(-90)"/><rect class="cls-16" x="85.17" y="47.46" width="14.7" height="2.5" transform="translate(141.23 -43.8) rotate(90)"/><rect class="cls-19" x="75.07" y="47.46" width="14.7" height="2.5" transform="translate(131.13 -33.71) rotate(90)"/><rect class="cls-20" x="80.12" y="47.46" width="14.7" height="2.5" transform="translate(136.18 -38.75) rotate(90)"/><rect class="cls-9" x="112.93" y="47.46" width="14.7" height="2.5" transform="translate(168.99 -71.57) rotate(90)"/><rect class="cls-13" x="117.98" y="47.46" width="14.7" height="2.5" transform="translate(174.04 -76.61) rotate(90)"/><rect class="cls-17" x="140.7" y="47.46" width="14.7" height="2.5" transform="translate(196.76 -99.33) rotate(90)"/><rect class="cls-13" x="115.46" y="47.46" width="14.7" height="2.5" transform="translate(171.52 -74.09) rotate(90)"/><rect class="cls-18" x="135.65" y="47.46" width="14.7" height="2.5" transform="translate(191.71 -94.28) rotate(90)"/><rect class="cls-21" x="143.22" y="47.46" width="14.7" height="2.5" transform="translate(199.28 -101.85) rotate(90)"/><rect class="cls-17" x="148.27" y="47.46" width="14.7" height="2.5" transform="translate(204.33 -106.9) rotate(90)"/><rect class="cls-15" x="138.17" y="47.46" width="14.7" height="2.5" transform="translate(194.23 -96.81) rotate(90)"/><rect class="cls-22" x="130.6" y="47.46" width="14.7" height="2.5" transform="translate(186.66 -89.23) rotate(90)"/><rect class="cls-23" x="133.12" y="47.46" width="14.7" height="2.5" transform="translate(189.18 -91.76) rotate(90)"/><rect class="cls-20" x="158.36" y="47.46" width="14.7" height="2.5" transform="translate(214.42 -117) rotate(90)"/><rect class="cls-22" x="128.08" y="47.46" width="14.7" height="2.5" transform="translate(184.14 -86.71) rotate(90)"/><rect class="cls-16" x="160.89" y="47.46" width="14.7" height="2.5" transform="translate(216.95 -119.52) rotate(90)"/><rect class="cls-19" x="153.32" y="47.46" width="14.7" height="2.5" transform="translate(209.38 -111.95) rotate(90)"/><rect class="cls-11" x="100.31" y="47.46" width="14.7" height="2.5" transform="translate(156.37 -58.95) rotate(90)"/><rect class="cls-6" x="92.74" y="47.46" width="14.7" height="2.5" transform="translate(148.8 -51.37) rotate(90)"/><rect class="cls-6" x="97.79" y="47.46" width="14.7" height="2.5" transform="translate(153.85 -56.42) rotate(90)"/><rect class="cls-10" x="95.26" y="47.46" width="14.7" height="2.5" transform="translate(151.33 -53.9) rotate(90)"/><rect class="cls-4" x="102.84" y="47.46" width="14.7" height="2.5" transform="translate(158.9 -61.47) rotate(90)"/><rect class="cls-14" x="110.41" y="47.46" width="14.7" height="2.5" transform="translate(166.47 -69.04) rotate(90)"/><rect class="cls-12" x="105.36" y="47.46" width="14.7" height="2.5" transform="translate(161.42 -63.99) rotate(90)"/><rect class="cls-13" x="107.88" y="47.46" width="14.7" height="2.5" transform="translate(163.95 -66.52) rotate(90)"/><rect class="cls-13" x="123.03" y="47.46" width="14.7" height="2.5" transform="translate(179.09 -81.66) rotate(90)"/><rect class="cls-13" x="125.55" y="47.46" width="14.7" height="2.5" transform="translate(181.61 -84.19) rotate(90)"/><rect class="cls-5" x="120.5" y="47.46" width="14.7" height="2.5" transform="translate(176.57 -79.14) rotate(90)"/><rect class="cls-25" x="150.79" y="47.46" width="14.7" height="2.5" transform="translate(109.43 206.85) rotate(-90)"/><rect class="cls-21" x="155.84" y="47.46" width="14.7" height="2.5" transform="translate(114.47 211.9) rotate(-90)"/><rect class="cls-17" x="145.74" y="47.46" width="14.7" height="2.5" transform="translate(104.38 201.8) rotate(-90)"/><rect class="cls-9" x="216.41" y="47.46" width="14.7" height="2.5" transform="translate(272.48 -175.05) rotate(90)"/><rect class="cls-13" x="221.46" y="47.46" width="14.7" height="2.5" transform="translate(277.52 -180.1) rotate(90)"/><rect class="cls-17" x="244.18" y="47.46" width="14.7" height="2.5" transform="translate(300.24 -202.81) rotate(90)"/><rect class="cls-13" x="218.94" y="47.46" width="14.7" height="2.5" transform="translate(275 -177.57) rotate(90)"/><rect class="cls-18" x="239.13" y="47.46" width="14.7" height="2.5" transform="translate(295.19 -197.76) rotate(90)"/><rect class="cls-21" x="246.7" y="47.46" width="14.7" height="2.5" transform="translate(302.76 -205.34) rotate(90)"/><rect class="cls-17" x="251.75" y="47.46" width="14.7" height="2.5" transform="translate(307.81 -210.38) rotate(90)"/><rect class="cls-15" x="241.65" y="47.46" width="14.7" height="2.5" transform="translate(297.71 -200.29) rotate(90)"/><rect class="cls-22" x="234.08" y="47.46" width="14.7" height="2.5" transform="translate(290.14 -192.72) rotate(90)"/><rect class="cls-23" x="236.61" y="47.46" width="14.7" height="2.5" transform="translate(292.67 -195.24) rotate(90)"/><rect class="cls-20" x="261.85" y="47.46" width="14.7" height="2.5" transform="translate(317.91 -220.48) rotate(90)"/><rect class="cls-22" x="231.56" y="47.46" width="14.7" height="2.5" transform="translate(287.62 -190.19) rotate(90)"/><rect class="cls-16" x="264.37" y="47.46" width="14.7" height="2.5" transform="translate(320.43 -223) rotate(90)"/><rect class="cls-19" x="256.8" y="47.46" width="14.7" height="2.5" transform="translate(312.86 -215.43) rotate(90)"/><rect class="cls-11" x="203.79" y="47.46" width="14.7" height="2.5" transform="translate(259.86 -162.43) rotate(90)"/><rect class="cls-6" x="196.22" y="47.46" width="14.7" height="2.5" transform="translate(252.28 -154.86) rotate(90)"/><rect class="cls-6" x="201.27" y="47.46" width="14.7" height="2.5" transform="translate(257.33 -159.9) rotate(90)"/><rect class="cls-10" x="198.75" y="47.46" width="14.7" height="2.5" transform="translate(254.81 -157.38) rotate(90)"/><rect class="cls-4" x="206.32" y="47.46" width="14.7" height="2.5" transform="translate(262.38 -164.95) rotate(90)"/><rect class="cls-14" x="213.89" y="47.46" width="14.7" height="2.5" transform="translate(269.95 -172.52) rotate(90)"/><rect class="cls-12" x="208.84" y="47.46" width="14.7" height="2.5" transform="translate(264.9 -167.48) rotate(90)"/><rect class="cls-13" x="211.37" y="47.46" width="14.7" height="2.5" transform="translate(267.43 -170) rotate(90)"/><rect class="cls-13" x="226.51" y="47.46" width="14.7" height="2.5" transform="translate(282.57 -185.14) rotate(90)"/><rect class="cls-13" x="229.03" y="47.46" width="14.7" height="2.5" transform="translate(285.1 -187.67) rotate(90)"/><rect class="cls-5" x="223.99" y="47.46" width="14.7" height="2.5" transform="translate(280.05 -182.62) rotate(90)"/><rect class="cls-25" x="254.27" y="47.46" width="14.7" height="2.5" transform="translate(212.91 310.33) rotate(-90)"/><rect class="cls-21" x="259.32" y="47.46" width="14.7" height="2.5" transform="translate(217.96 315.38) rotate(-90)"/><rect class="cls-17" x="249.23" y="47.46" width="14.7" height="2.5" transform="translate(207.86 305.29) rotate(-90)"/><rect class="cls-17" x="170.98" y="47.46" width="14.7" height="2.5" transform="translate(227.04 -129.62) rotate(90)"/><rect class="cls-18" x="165.93" y="47.46" width="14.7" height="2.5" transform="translate(222 -124.57) rotate(90)"/><rect class="cls-21" x="173.51" y="47.46" width="14.7" height="2.5" transform="translate(229.57 -132.14) rotate(90)"/><rect class="cls-17" x="178.55" y="47.46" width="14.7" height="2.5" transform="translate(234.62 -137.19) rotate(90)"/><rect class="cls-15" x="168.46" y="47.46" width="14.7" height="2.5" transform="translate(224.52 -127.09) rotate(90)"/><rect class="cls-23" x="163.41" y="47.46" width="14.7" height="2.5" transform="translate(219.47 -122.05) rotate(90)"/><rect class="cls-25" x="181.08" y="47.46" width="14.7" height="2.5" transform="translate(139.71 237.14) rotate(-90)"/><rect class="cls-17" x="176.03" y="47.46" width="14.7" height="2.5" transform="translate(134.66 232.09) rotate(-90)"/><rect class="cls-18" x="188.65" y="47.46" width="14.7" height="2.5" transform="translate(244.71 -147.28) rotate(90)"/><rect class="cls-18" x="193.7" y="47.46" width="14.7" height="2.5" transform="translate(249.76 -152.33) rotate(90)"/><rect class="cls-21" x="186.13" y="47.46" width="14.7" height="2.5" transform="translate(242.19 -144.76) rotate(90)"/><rect class="cls-16" x="191.17" y="47.46" width="14.7" height="2.5" transform="translate(247.24 -149.81) rotate(90)"/><rect class="cls-23" x="183.6" y="47.46" width="14.7" height="2.5" transform="translate(239.66 -142.24) rotate(90)"/></g><g><rect class="cls-15" x="67.5" y="105.45" width="14.7" height="2.5" transform="translate(-31.85 181.55) rotate(-90)"/><rect class="cls-25" x="77.6" y="105.45" width="14.7" height="2.5" transform="translate(-21.76 191.65) rotate(-90)"/><rect class="cls-8" x="59.93" y="105.45" width="14.7" height="2.5" transform="translate(-39.42 173.98) rotate(-90)"/><rect class="cls-5" x="57.4" y="105.45" width="14.7" height="2.5" transform="translate(-41.95 171.45) rotate(-90)"/><rect class="cls-25" x="90.22" y="105.45" width="14.7" height="2.5" transform="translate(-9.14 204.27) rotate(-90)"/><rect class="cls-23" x="64.98" y="105.45" width="14.7" height="2.5" transform="translate(-34.38 179.03) rotate(-90)"/><rect class="cls-20" x="70.02" y="105.45" width="14.7" height="2.5" transform="translate(-29.33 184.07) rotate(-90)"/><rect class="cls-5" x="44.79" y="105.45" width="14.7" height="2.5" transform="translate(-54.57 158.83) rotate(-90)"/><rect class="cls-5" x="47.31" y="105.45" width="14.7" height="2.5" transform="translate(-52.04 161.36) rotate(-90)"/><rect class="cls-21" x="82.64" y="105.45" width="14.7" height="2.5" transform="translate(-16.71 196.69) rotate(-90)"/><rect class="cls-17" x="72.55" y="105.45" width="14.7" height="2.5" transform="translate(-26.8 186.6) rotate(-90)"/><rect class="cls-25" x="87.69" y="105.45" width="14.7" height="2.5" transform="translate(-11.66 201.74) rotate(-90)"/><rect class="cls-3" x="29.64" y="105.45" width="14.7" height="2.5" transform="translate(-69.71 143.69) rotate(-90)"/><rect class="cls-7" x="22.07" y="105.45" width="14.7" height="2.5" transform="translate(-77.28 136.12) rotate(-90)"/><rect class="cls-6" x="27.12" y="105.45" width="14.7" height="2.5" transform="translate(-72.24 141.17) rotate(-90)"/><rect class="cls-3" x="24.59" y="105.45" width="14.7" height="2.5" transform="translate(-74.76 138.64) rotate(-90)"/><rect class="cls-9" x="32.17" y="105.45" width="14.7" height="2.5" transform="translate(-67.19 146.21) rotate(-90)"/><rect class="cls-13" x="39.74" y="105.45" width="14.7" height="2.5" transform="translate(-59.62 153.79) rotate(-90)"/><rect class="cls-6" x="34.69" y="105.45" width="14.7" height="2.5" transform="translate(-64.66 148.74) rotate(-90)"/><rect class="cls-4" x="37.21" y="105.45" width="14.7" height="2.5" transform="translate(-62.14 151.26) rotate(-90)"/><rect class="cls-8" x="42.26" y="105.45" width="14.7" height="2.5" transform="translate(-57.09 156.31) rotate(-90)"/><rect class="cls-5" x="49.83" y="105.45" width="14.7" height="2.5" transform="translate(-49.52 163.88) rotate(-90)"/><rect class="cls-12" x="52.36" y="105.45" width="14.7" height="2.5" transform="translate(-47 166.41) rotate(-90)"/><rect class="cls-5" x="54.88" y="105.45" width="14.7" height="2.5" transform="translate(-44.47 168.93) rotate(-90)"/><rect class="cls-5" x="62.45" y="105.45" width="14.7" height="2.5" transform="translate(-36.9 176.5) rotate(-90)"/><rect class="cls-21" x="75.07" y="105.45" width="14.7" height="2.5" transform="translate(189.12 24.28) rotate(90)"/><rect class="cls-17" x="80.12" y="105.45" width="14.7" height="2.5" transform="translate(194.17 19.23) rotate(90)"/><rect class="cls-19" x="85.17" y="105.45" width="14.7" height="2.5" transform="translate(199.22 14.19) rotate(90)"/><rect class="cls-15" x="115.46" y="105.45" width="14.7" height="2.5" transform="translate(16.1 229.5) rotate(-90)"/><rect class="cls-25" x="105.36" y="105.45" width="14.7" height="2.5" transform="translate(6.01 219.41) rotate(-90)"/><rect class="cls-8" x="123.03" y="105.45" width="14.7" height="2.5" transform="translate(23.67 237.08) rotate(-90)"/><rect class="cls-5" x="125.55" y="105.45" width="14.7" height="2.5" transform="translate(26.2 239.6) rotate(-90)"/><rect class="cls-25" x="92.74" y="105.45" width="14.7" height="2.5" transform="translate(-6.61 206.79) rotate(-90)"/><rect class="cls-23" x="117.98" y="105.45" width="14.7" height="2.5" transform="translate(18.63 232.03) rotate(-90)"/><rect class="cls-20" x="112.93" y="105.45" width="14.7" height="2.5" transform="translate(13.58 226.98) rotate(-90)"/><rect class="cls-5" x="138.17" y="105.45" width="14.7" height="2.5" transform="translate(38.82 252.22) rotate(-90)"/><rect class="cls-5" x="135.65" y="105.45" width="14.7" height="2.5" transform="translate(36.29 249.7) rotate(-90)"/><rect class="cls-21" x="100.31" y="105.45" width="14.7" height="2.5" transform="translate(.96 214.36) rotate(-90)"/><rect class="cls-17" x="110.41" y="105.45" width="14.7" height="2.5" transform="translate(11.05 224.46) rotate(-90)"/><rect class="cls-25" x="95.26" y="105.45" width="14.7" height="2.5" transform="translate(-4.09 209.31) rotate(-90)"/><rect class="cls-3" x="153.32" y="105.45" width="14.7" height="2.5" transform="translate(53.96 267.36) rotate(-90)"/><rect class="cls-7" x="160.89" y="105.45" width="14.7" height="2.5" transform="translate(61.53 274.94) rotate(-90)"/><rect class="cls-6" x="155.84" y="105.45" width="14.7" height="2.5" transform="translate(56.49 269.89) rotate(-90)"/><rect class="cls-3" x="158.36" y="105.45" width="14.7" height="2.5" transform="translate(59.01 272.41) rotate(-90)"/><rect class="cls-9" x="150.79" y="105.45" width="14.7" height="2.5" transform="translate(51.44 264.84) rotate(-90)"/><rect class="cls-13" x="143.22" y="105.45" width="14.7" height="2.5" transform="translate(43.87 257.27) rotate(-90)"/><rect class="cls-6" x="148.27" y="105.45" width="14.7" height="2.5" transform="translate(48.91 262.32) rotate(-90)"/><rect class="cls-4" x="145.74" y="105.45" width="14.7" height="2.5" transform="translate(46.39 259.79) rotate(-90)"/><rect class="cls-8" x="140.7" y="105.45" width="14.7" height="2.5" transform="translate(41.34 254.74) rotate(-90)"/><rect class="cls-5" x="133.12" y="105.45" width="14.7" height="2.5" transform="translate(33.77 247.17) rotate(-90)"/><rect class="cls-12" x="130.6" y="105.45" width="14.7" height="2.5" transform="translate(31.25 244.65) rotate(-90)"/><rect class="cls-5" x="128.08" y="105.45" width="14.7" height="2.5" transform="translate(28.72 242.12) rotate(-90)"/><rect class="cls-5" x="120.5" y="105.45" width="14.7" height="2.5" transform="translate(21.15 234.55) rotate(-90)"/><rect class="cls-21" x="107.88" y="105.45" width="14.7" height="2.5" transform="translate(221.93 -8.53) rotate(90)"/><rect class="cls-17" x="102.84" y="105.45" width="14.7" height="2.5" transform="translate(216.89 -3.48) rotate(90)"/><rect class="cls-19" x="97.79" y="105.45" width="14.7" height="2.5" transform="translate(211.84 1.57) rotate(90)"/><rect class="cls-15" x="218.94" y="105.45" width="14.7" height="2.5" transform="translate(119.58 332.99) rotate(-90)"/><rect class="cls-25" x="208.84" y="105.45" width="14.7" height="2.5" transform="translate(109.49 322.89) rotate(-90)"/><rect class="cls-8" x="226.51" y="105.45" width="14.7" height="2.5" transform="translate(127.16 340.56) rotate(-90)"/><rect class="cls-5" x="229.03" y="105.45" width="14.7" height="2.5" transform="translate(129.68 343.08) rotate(-90)"/><rect class="cls-25" x="196.22" y="105.45" width="14.7" height="2.5" transform="translate(96.87 310.27) rotate(-90)"/><rect class="cls-23" x="221.46" y="105.45" width="14.7" height="2.5" transform="translate(122.11 335.51) rotate(-90)"/><rect class="cls-20" x="216.41" y="105.45" width="14.7" height="2.5" transform="translate(117.06 330.46) rotate(-90)"/><rect class="cls-5" x="241.65" y="105.45" width="14.7" height="2.5" transform="translate(142.3 355.7) rotate(-90)"/><rect class="cls-5" x="239.13" y="105.45" width="14.7" height="2.5" transform="translate(139.78 353.18) rotate(-90)"/><rect class="cls-21" x="203.79" y="105.45" width="14.7" height="2.5" transform="translate(104.44 317.84) rotate(-90)"/><rect class="cls-17" x="213.89" y="105.45" width="14.7" height="2.5" transform="translate(114.54 327.94) rotate(-90)"/><rect class="cls-25" x="198.75" y="105.45" width="14.7" height="2.5" transform="translate(99.39 312.8) rotate(-90)"/><rect class="cls-3" x="256.8" y="105.45" width="14.7" height="2.5" transform="translate(157.44 370.85) rotate(-90)"/><rect class="cls-7" x="264.37" y="105.45" width="14.7" height="2.5" transform="translate(165.02 378.42) rotate(-90)"/><rect class="cls-6" x="259.32" y="105.45" width="14.7" height="2.5" transform="translate(159.97 373.37) rotate(-90)"/><rect class="cls-3" x="261.85" y="105.45" width="14.7" height="2.5" transform="translate(162.49 375.89) rotate(-90)"/><rect class="cls-9" x="254.27" y="105.45" width="14.7" height="2.5" transform="translate(154.92 368.32) rotate(-90)"/><rect class="cls-13" x="246.7" y="105.45" width="14.7" height="2.5" transform="translate(147.35 360.75) rotate(-90)"/><rect class="cls-6" x="251.75" y="105.45" width="14.7" height="2.5" transform="translate(152.4 365.8) rotate(-90)"/><rect class="cls-4" x="249.23" y="105.45" width="14.7" height="2.5" transform="translate(149.87 363.27) rotate(-90)"/><rect class="cls-8" x="244.18" y="105.45" width="14.7" height="2.5" transform="translate(144.82 358.23) rotate(-90)"/><rect class="cls-5" x="236.61" y="105.45" width="14.7" height="2.5" transform="translate(137.25 350.65) rotate(-90)"/><rect class="cls-12" x="234.08" y="105.45" width="14.7" height="2.5" transform="translate(134.73 348.13) rotate(-90)"/><rect class="cls-5" x="231.56" y="105.45" width="14.7" height="2.5" transform="translate(132.2 345.61) rotate(-90)"/><rect class="cls-5" x="223.99" y="105.45" width="14.7" height="2.5" transform="translate(124.63 338.03) rotate(-90)"/><rect class="cls-21" x="211.37" y="105.45" width="14.7" height="2.5" transform="translate(325.42 -112.01) rotate(90)"/><rect class="cls-17" x="206.32" y="105.45" width="14.7" height="2.5" transform="translate(320.37 -106.96) rotate(90)"/><rect class="cls-19" x="201.27" y="105.45" width="14.7" height="2.5" transform="translate(315.32 -101.92) rotate(90)"/><rect class="cls-17" x="173.51" y="105.45" width="14.7" height="2.5" transform="translate(287.56 -74.15) rotate(90)"/><rect class="cls-18" x="168.46" y="105.45" width="14.7" height="2.5" transform="translate(282.51 -69.11) rotate(90)"/><rect class="cls-21" x="176.03" y="105.45" width="14.7" height="2.5" transform="translate(290.08 -76.68) rotate(90)"/><rect class="cls-17" x="181.08" y="105.45" width="14.7" height="2.5" transform="translate(295.13 -81.73) rotate(90)"/><rect class="cls-15" x="170.98" y="105.45" width="14.7" height="2.5" transform="translate(285.03 -71.63) rotate(90)"/><rect class="cls-22" x="163.41" y="105.45" width="14.7" height="2.5" transform="translate(277.46 -64.06) rotate(90)"/><rect class="cls-23" x="165.93" y="105.45" width="14.7" height="2.5" transform="translate(279.98 -66.58) rotate(90)"/><rect class="cls-20" x="191.17" y="105.45" width="14.7" height="2.5" transform="translate(305.22 -91.82) rotate(90)"/><rect class="cls-16" x="193.7" y="105.45" width="14.7" height="2.5" transform="translate(307.75 -94.34) rotate(90)"/><rect class="cls-19" x="186.13" y="105.45" width="14.7" height="2.5" transform="translate(300.18 -86.77) rotate(90)"/><rect class="cls-25" x="183.6" y="105.45" width="14.7" height="2.5" transform="translate(84.25 297.65) rotate(-90)"/><rect class="cls-21" x="188.65" y="105.45" width="14.7" height="2.5" transform="translate(89.3 302.7) rotate(-90)"/><rect class="cls-17" x="178.55" y="105.45" width="14.7" height="2.5" transform="translate(79.2 292.6) rotate(-90)"/></g><g><rect class="cls-9" x="244.18" y="66.79" width="14.7" height="2.5" transform="translate(183.48 319.57) rotate(-90)"/><rect class="cls-13" x="239.13" y="66.79" width="14.7" height="2.5" transform="translate(178.43 314.52) rotate(-90)"/><rect class="cls-17" x="216.41" y="66.79" width="14.7" height="2.5" transform="translate(155.72 291.8) rotate(-90)"/><rect class="cls-13" x="241.65" y="66.79" width="14.7" height="2.5" transform="translate(180.96 317.04) rotate(-90)"/><rect class="cls-18" x="221.46" y="66.79" width="14.7" height="2.5" transform="translate(160.77 296.85) rotate(-90)"/><rect class="cls-21" x="213.89" y="66.79" width="14.7" height="2.5" transform="translate(153.19 289.28) rotate(-90)"/><rect class="cls-17" x="208.84" y="66.79" width="14.7" height="2.5" transform="translate(148.15 284.23) rotate(-90)"/><rect class="cls-15" x="218.94" y="66.79" width="14.7" height="2.5" transform="translate(158.24 294.33) rotate(-90)"/><rect class="cls-22" x="226.51" y="66.79" width="14.7" height="2.5" transform="translate(165.81 301.9) rotate(-90)"/><rect class="cls-23" x="223.99" y="66.79" width="14.7" height="2.5" transform="translate(163.29 299.38) rotate(-90)"/><rect class="cls-20" x="198.75" y="66.79" width="14.7" height="2.5" transform="translate(138.05 274.14) rotate(-90)"/><rect class="cls-22" x="229.03" y="66.79" width="14.7" height="2.5" transform="translate(168.34 304.42) rotate(-90)"/><rect class="cls-16" x="196.22" y="66.79" width="14.7" height="2.5" transform="translate(135.53 271.61) rotate(-90)"/><rect class="cls-19" x="203.79" y="66.79" width="14.7" height="2.5" transform="translate(143.1 279.18) rotate(-90)"/><rect class="cls-11" x="256.8" y="66.79" width="14.7" height="2.5" transform="translate(196.1 332.19) rotate(-90)"/><rect class="cls-6" x="264.37" y="66.79" width="14.7" height="2.5" transform="translate(203.67 339.76) rotate(-90)"/><rect class="cls-6" x="259.32" y="66.79" width="14.7" height="2.5" transform="translate(198.63 334.71) rotate(-90)"/><rect class="cls-10" x="261.85" y="66.79" width="14.7" height="2.5" transform="translate(201.15 337.24) rotate(-90)"/><rect class="cls-4" x="254.27" y="66.79" width="14.7" height="2.5" transform="translate(193.58 329.66) rotate(-90)"/><rect class="cls-14" x="246.7" y="66.79" width="14.7" height="2.5" transform="translate(186.01 322.09) rotate(-90)"/><rect class="cls-12" x="251.75" y="66.79" width="14.7" height="2.5" transform="translate(191.05 327.14) rotate(-90)"/><rect class="cls-13" x="249.23" y="66.79" width="14.7" height="2.5" transform="translate(188.53 324.62) rotate(-90)"/><rect class="cls-13" x="234.08" y="66.79" width="14.7" height="2.5" transform="translate(173.39 309.47) rotate(-90)"/><rect class="cls-13" x="231.56" y="66.79" width="14.7" height="2.5" transform="translate(170.86 306.95) rotate(-90)"/><rect class="cls-5" x="236.61" y="66.79" width="14.7" height="2.5" transform="translate(175.91 312) rotate(-90)"/><rect class="cls-25" x="206.32" y="66.79" width="14.7" height="2.5" transform="translate(281.71 -145.62) rotate(90)"/><rect class="cls-21" x="201.27" y="66.79" width="14.7" height="2.5" transform="translate(276.66 -140.58) rotate(90)"/><rect class="cls-17" x="211.37" y="66.79" width="14.7" height="2.5" transform="translate(286.76 -150.67) rotate(90)"/><rect class="cls-18" x="148.27" y="66.79" width="14.7" height="2.5" transform="translate(223.66 -87.57) rotate(90)"/><rect class="cls-18" x="143.22" y="66.79" width="14.7" height="2.5" transform="translate(218.61 -82.52) rotate(90)"/><rect class="cls-21" x="150.79" y="66.79" width="14.7" height="2.5" transform="translate(226.18 -90.1) rotate(90)"/><rect class="cls-16" x="145.74" y="66.79" width="14.7" height="2.5" transform="translate(221.13 -85.05) rotate(90)"/><rect class="cls-11" x="170.98" y="66.79" width="14.7" height="2.5" transform="translate(246.37 -110.29) rotate(90)"/><rect class="cls-12" x="168.46" y="66.79" width="14.7" height="2.5" transform="translate(243.85 -107.76) rotate(90)"/><rect class="cls-16" x="128.08" y="66.79" width="14.7" height="2.5" transform="translate(203.47 -67.38) rotate(90)"/><rect class="cls-19" x="125.55" y="66.79" width="14.7" height="2.5" transform="translate(200.94 -64.86) rotate(90)"/><rect class="cls-8" x="155.84" y="66.79" width="14.7" height="2.5" transform="translate(231.23 -95.14) rotate(90)"/><rect class="cls-18" x="138.17" y="66.79" width="14.7" height="2.5" transform="translate(213.56 -77.48) rotate(90)"/><rect class="cls-15" x="133.12" y="66.79" width="14.7" height="2.5" transform="translate(208.51 -72.43) rotate(90)"/><rect class="cls-23" x="153.32" y="66.79" width="14.7" height="2.5" transform="translate(228.71 -92.62) rotate(90)"/><rect class="cls-20" x="158.36" y="66.79" width="14.7" height="2.5" transform="translate(233.75 -97.67) rotate(90)"/><rect class="cls-9" x="186.13" y="66.79" width="14.7" height="2.5" transform="translate(261.52 -125.43) rotate(90)"/><rect class="cls-10" x="193.7" y="66.79" width="14.7" height="2.5" transform="translate(269.09 -133) rotate(90)"/><rect class="cls-11" x="188.65" y="66.79" width="14.7" height="2.5" transform="translate(264.04 -127.96) rotate(90)"/><rect class="cls-9" x="191.17" y="66.79" width="14.7" height="2.5" transform="translate(266.57 -130.48) rotate(90)"/><rect class="cls-4" x="183.6" y="66.79" width="14.7" height="2.5" transform="translate(258.99 -122.91) rotate(90)"/><rect class="cls-11" x="176.03" y="66.79" width="14.7" height="2.5" transform="translate(251.42 -115.34) rotate(90)"/><rect class="cls-10" x="181.08" y="66.79" width="14.7" height="2.5" transform="translate(256.47 -120.38) rotate(90)"/><rect class="cls-14" x="178.55" y="66.79" width="14.7" height="2.5" transform="translate(253.95 -117.86) rotate(90)"/><rect class="cls-5" x="173.51" y="66.79" width="14.7" height="2.5" transform="translate(248.9 -112.81) rotate(90)"/><rect class="cls-11" x="160.89" y="66.79" width="14.7" height="2.5" transform="translate(236.28 -100.19) rotate(90)"/><rect class="cls-12" x="163.41" y="66.79" width="14.7" height="2.5" transform="translate(238.8 -102.72) rotate(90)"/><rect class="cls-8" x="165.93" y="66.79" width="14.7" height="2.5" transform="translate(241.33 -105.24) rotate(90)"/><rect class="cls-16" x="130.6" y="66.79" width="14.7" height="2.5" transform="translate(69.9 205.99) rotate(-90)"/><rect class="cls-19" x="140.7" y="66.79" width="14.7" height="2.5" transform="translate(80 216.09) rotate(-90)"/><rect class="cls-20" x="135.65" y="66.79" width="14.7" height="2.5" transform="translate(74.95 211.04) rotate(-90)"/><rect class="cls-18" x="44.79" y="66.79" width="14.7" height="2.5" transform="translate(120.18 15.91) rotate(90)"/><rect class="cls-18" x="39.74" y="66.79" width="14.7" height="2.5" transform="translate(115.13 20.96) rotate(90)"/><rect class="cls-21" x="47.31" y="66.79" width="14.7" height="2.5" transform="translate(122.7 13.39) rotate(90)"/><rect class="cls-16" x="42.26" y="66.79" width="14.7" height="2.5" transform="translate(117.65 18.43) rotate(90)"/><rect class="cls-11" x="67.5" y="66.79" width="14.7" height="2.5" transform="translate(142.89 -6.81) rotate(90)"/><rect class="cls-12" x="64.98" y="66.79" width="14.7" height="2.5" transform="translate(140.37 -4.28) rotate(90)"/><rect class="cls-16" x="24.59" y="66.79" width="14.7" height="2.5" transform="translate(99.98 36.1) rotate(90)"/><rect class="cls-19" x="22.07" y="66.79" width="14.7" height="2.5" transform="translate(97.46 38.63) rotate(90)"/><rect class="cls-8" x="52.36" y="66.79" width="14.7" height="2.5" transform="translate(127.75 8.34) rotate(90)"/><rect class="cls-18" x="34.69" y="66.79" width="14.7" height="2.5" transform="translate(110.08 26.01) rotate(90)"/><rect class="cls-15" x="29.64" y="66.79" width="14.7" height="2.5" transform="translate(105.03 31.05) rotate(90)"/><rect class="cls-23" x="49.83" y="66.79" width="14.7" height="2.5" transform="translate(125.22 10.86) rotate(90)"/><rect class="cls-20" x="54.88" y="66.79" width="14.7" height="2.5" transform="translate(130.27 5.81) rotate(90)"/><rect class="cls-9" x="82.64" y="66.79" width="14.7" height="2.5" transform="translate(158.04 -21.95) rotate(90)"/><rect class="cls-10" x="90.22" y="66.79" width="14.7" height="2.5" transform="translate(165.61 -29.52) rotate(90)"/><rect class="cls-11" x="85.17" y="66.79" width="14.7" height="2.5" transform="translate(160.56 -24.47) rotate(90)"/><rect class="cls-9" x="87.69" y="66.79" width="14.7" height="2.5" transform="translate(163.08 -27) rotate(90)"/><rect class="cls-4" x="80.12" y="66.79" width="14.7" height="2.5" transform="translate(155.51 -19.43) rotate(90)"/><rect class="cls-11" x="72.55" y="66.79" width="14.7" height="2.5" transform="translate(147.94 -11.85) rotate(90)"/><rect class="cls-10" x="77.6" y="66.79" width="14.7" height="2.5" transform="translate(152.99 -16.9) rotate(90)"/><rect class="cls-14" x="75.07" y="66.79" width="14.7" height="2.5" transform="translate(150.46 -14.38) rotate(90)"/><rect class="cls-5" x="70.02" y="66.79" width="14.7" height="2.5" transform="translate(145.42 -9.33) rotate(90)"/><rect class="cls-11" x="57.4" y="66.79" width="14.7" height="2.5" transform="translate(132.8 3.29) rotate(90)"/><rect class="cls-12" x="59.93" y="66.79" width="14.7" height="2.5" transform="translate(135.32 .77) rotate(90)"/><rect class="cls-8" x="62.45" y="66.79" width="14.7" height="2.5" transform="translate(137.84 -1.76) rotate(90)"/><rect class="cls-16" x="27.12" y="66.79" width="14.7" height="2.5" transform="translate(-33.58 102.51) rotate(-90)"/><rect class="cls-19" x="37.21" y="66.79" width="14.7" height="2.5" transform="translate(-23.48 112.6) rotate(-90)"/><rect class="cls-20" x="32.17" y="66.79" width="14.7" height="2.5" transform="translate(-28.53 107.56) rotate(-90)"/><rect class="cls-16" x="115.46" y="66.79" width="14.7" height="2.5" transform="translate(54.76 190.85) rotate(-90)"/><rect class="cls-19" x="105.36" y="66.79" width="14.7" height="2.5" transform="translate(44.66 180.75) rotate(-90)"/><rect class="cls-20" x="110.41" y="66.79" width="14.7" height="2.5" transform="translate(49.71 185.8) rotate(-90)"/><rect class="cls-15" x="102.84" y="66.79" width="14.7" height="2.5" transform="translate(42.14 178.23) rotate(-90)"/><rect class="cls-19" x="120.5" y="66.79" width="14.7" height="2.5" transform="translate(59.81 195.89) rotate(-90)"/><rect class="cls-25" x="123.03" y="66.79" width="14.7" height="2.5" transform="translate(62.33 198.42) rotate(-90)"/><rect class="cls-18" x="107.88" y="66.79" width="14.7" height="2.5" transform="translate(183.27 -47.19) rotate(90)"/><rect class="cls-18" x="112.93" y="66.79" width="14.7" height="2.5" transform="translate(188.32 -52.24) rotate(90)"/><rect class="cls-15" x="117.98" y="66.79" width="14.7" height="2.5" transform="translate(193.37 -57.28) rotate(90)"/><rect class="cls-20" x="95.26" y="66.79" width="14.7" height="2.5" transform="translate(170.65 -34.57) rotate(90)"/><rect class="cls-16" x="92.74" y="66.79" width="14.7" height="2.5" transform="translate(168.13 -32.05) rotate(90)"/><rect class="cls-19" x="100.31" y="66.79" width="14.7" height="2.5" transform="translate(175.7 -39.62) rotate(90)"/><rect class="cls-21" x="97.79" y="66.79" width="14.7" height="2.5" transform="translate(37.09 173.18) rotate(-90)"/></g><g><rect class="cls-21" x="223.99" y="144.11" width="14.7" height="2.5" transform="translate(85.97 376.69) rotate(-90)"/><rect class="cls-17" x="218.94" y="144.11" width="14.7" height="2.5" transform="translate(80.93 371.65) rotate(-90)"/><rect class="cls-22" x="221.46" y="144.11" width="14.7" height="2.5" transform="translate(83.45 374.17) rotate(-90)"/><rect class="cls-16" x="203.79" y="144.11" width="14.7" height="2.5" transform="translate(65.78 356.5) rotate(-90)"/><rect class="cls-19" x="213.89" y="144.11" width="14.7" height="2.5" transform="translate(75.88 366.6) rotate(-90)"/><rect class="cls-20" x="208.84" y="144.11" width="14.7" height="2.5" transform="translate(70.83 361.55) rotate(-90)"/><rect class="cls-14" x="244.18" y="144.11" width="14.7" height="2.5" transform="translate(106.17 396.88) rotate(-90)"/><rect class="cls-12" x="239.13" y="144.11" width="14.7" height="2.5" transform="translate(101.12 391.84) rotate(-90)"/><rect class="cls-15" x="216.41" y="144.11" width="14.7" height="2.5" transform="translate(78.4 369.12) rotate(-90)"/><rect class="cls-22" x="226.51" y="144.11" width="14.7" height="2.5" transform="translate(88.5 379.22) rotate(-90)"/><rect class="cls-19" x="198.75" y="144.11" width="14.7" height="2.5" transform="translate(60.73 351.45) rotate(-90)"/><rect class="cls-8" x="229.03" y="144.11" width="14.7" height="2.5" transform="translate(91.02 381.74) rotate(-90)"/><rect class="cls-25" x="196.22" y="144.11" width="14.7" height="2.5" transform="translate(58.21 348.93) rotate(-90)"/><rect class="cls-12" x="241.65" y="144.11" width="14.7" height="2.5" transform="translate(103.64 394.36) rotate(-90)"/><rect class="cls-4" x="256.8" y="144.11" width="14.7" height="2.5" transform="translate(118.79 409.5) rotate(-90)"/><rect class="cls-3" x="264.37" y="144.11" width="14.7" height="2.5" transform="translate(126.36 417.08) rotate(-90)"/><rect class="cls-7" x="259.32" y="144.11" width="14.7" height="2.5" transform="translate(121.31 412.03) rotate(-90)"/><rect class="cls-10" x="261.85" y="144.11" width="14.7" height="2.5" transform="translate(123.83 414.55) rotate(-90)"/><rect class="cls-7" x="254.27" y="144.11" width="14.7" height="2.5" transform="translate(116.26 406.98) rotate(-90)"/><rect class="cls-7" x="246.7" y="144.11" width="14.7" height="2.5" transform="translate(108.69 399.41) rotate(-90)"/><rect class="cls-14" x="251.75" y="144.11" width="14.7" height="2.5" transform="translate(113.74 404.46) rotate(-90)"/><rect class="cls-3" x="249.23" y="144.11" width="14.7" height="2.5" transform="translate(111.21 401.93) rotate(-90)"/><rect class="cls-12" x="234.08" y="144.11" width="14.7" height="2.5" transform="translate(96.07 386.79) rotate(-90)"/><rect class="cls-8" x="236.61" y="144.11" width="14.7" height="2.5" transform="translate(98.59 389.31) rotate(-90)"/><rect class="cls-22" x="231.56" y="144.11" width="14.7" height="2.5" transform="translate(93.55 384.27) rotate(-90)"/><rect class="cls-18" x="211.37" y="144.11" width="14.7" height="2.5" transform="translate(364.07 -73.35) rotate(90)"/><rect class="cls-18" x="206.32" y="144.11" width="14.7" height="2.5" transform="translate(359.03 -68.31) rotate(90)"/><rect class="cls-15" x="201.27" y="144.11" width="14.7" height="2.5" transform="translate(353.98 -63.26) rotate(90)"/><rect class="cls-21" x="165.93" y="144.11" width="14.7" height="2.5" transform="translate(27.92 318.64) rotate(-90)"/><rect class="cls-17" x="170.98" y="144.11" width="14.7" height="2.5" transform="translate(32.97 323.69) rotate(-90)"/><rect class="cls-22" x="168.46" y="144.11" width="14.7" height="2.5" transform="translate(30.45 321.17) rotate(-90)"/><rect class="cls-16" x="186.13" y="144.11" width="14.7" height="2.5" transform="translate(48.11 338.83) rotate(-90)"/><rect class="cls-19" x="176.03" y="144.11" width="14.7" height="2.5" transform="translate(38.02 328.74) rotate(-90)"/><rect class="cls-20" x="181.08" y="144.11" width="14.7" height="2.5" transform="translate(43.07 333.79) rotate(-90)"/><rect class="cls-14" x="145.74" y="144.11" width="14.7" height="2.5" transform="translate(7.73 298.45) rotate(-90)"/><rect class="cls-12" x="150.79" y="144.11" width="14.7" height="2.5" transform="translate(12.78 303.5) rotate(-90)"/><rect class="cls-15" x="173.51" y="144.11" width="14.7" height="2.5" transform="translate(35.49 326.21) rotate(-90)"/><rect class="cls-22" x="163.41" y="144.11" width="14.7" height="2.5" transform="translate(25.4 316.12) rotate(-90)"/><rect class="cls-19" x="191.17" y="144.11" width="14.7" height="2.5" transform="translate(53.16 343.88) rotate(-90)"/><rect class="cls-8" x="160.89" y="144.11" width="14.7" height="2.5" transform="translate(22.88 313.59) rotate(-90)"/><rect class="cls-25" x="193.7" y="144.11" width="14.7" height="2.5" transform="translate(55.69 346.41) rotate(-90)"/><rect class="cls-12" x="148.27" y="144.11" width="14.7" height="2.5" transform="translate(10.26 300.97) rotate(-90)"/><rect class="cls-4" x="133.12" y="144.11" width="14.7" height="2.5" transform="translate(-4.89 285.83) rotate(-90)"/><rect class="cls-3" x="125.55" y="144.11" width="14.7" height="2.5" transform="translate(-12.46 278.26) rotate(-90)"/><rect class="cls-7" x="130.6" y="144.11" width="14.7" height="2.5" transform="translate(-7.41 283.31) rotate(-90)"/><rect class="cls-10" x="128.08" y="144.11" width="14.7" height="2.5" transform="translate(-9.94 280.78) rotate(-90)"/><rect class="cls-7" x="135.65" y="144.11" width="14.7" height="2.5" transform="translate(-2.36 288.35) rotate(-90)"/><rect class="cls-7" x="143.22" y="144.11" width="14.7" height="2.5" transform="translate(5.21 295.93) rotate(-90)"/><rect class="cls-14" x="138.17" y="144.11" width="14.7" height="2.5" transform="translate(.16 290.88) rotate(-90)"/><rect class="cls-3" x="140.7" y="144.11" width="14.7" height="2.5" transform="translate(2.68 293.4) rotate(-90)"/><rect class="cls-12" x="155.84" y="144.11" width="14.7" height="2.5" transform="translate(17.83 308.55) rotate(-90)"/><rect class="cls-8" x="153.32" y="144.11" width="14.7" height="2.5" transform="translate(15.3 306.02) rotate(-90)"/><rect class="cls-22" x="158.36" y="144.11" width="14.7" height="2.5" transform="translate(20.35 311.07) rotate(-90)"/><rect class="cls-18" x="178.55" y="144.11" width="14.7" height="2.5" transform="translate(331.26 -40.54) rotate(90)"/><rect class="cls-18" x="183.6" y="144.11" width="14.7" height="2.5" transform="translate(336.31 -45.59) rotate(90)"/><rect class="cls-15" x="188.65" y="144.11" width="14.7" height="2.5" transform="translate(341.36 -50.64) rotate(90)"/><rect class="cls-21" x="62.45" y="144.11" width="14.7" height="2.5" transform="translate(-75.56 215.16) rotate(-90)"/><rect class="cls-17" x="67.5" y="144.11" width="14.7" height="2.5" transform="translate(-70.51 220.21) rotate(-90)"/><rect class="cls-22" x="64.98" y="144.11" width="14.7" height="2.5" transform="translate(-73.04 217.68) rotate(-90)"/><rect class="cls-16" x="82.64" y="144.11" width="14.7" height="2.5" transform="translate(-55.37 235.35) rotate(-90)"/><rect class="cls-19" x="72.55" y="144.11" width="14.7" height="2.5" transform="translate(-65.46 225.26) rotate(-90)"/><rect class="cls-20" x="77.6" y="144.11" width="14.7" height="2.5" transform="translate(-60.42 230.3) rotate(-90)"/><rect class="cls-14" x="42.26" y="144.11" width="14.7" height="2.5" transform="translate(-95.75 194.97) rotate(-90)"/><rect class="cls-12" x="47.31" y="144.11" width="14.7" height="2.5" transform="translate(-90.7 200.02) rotate(-90)"/><rect class="cls-15" x="70.02" y="144.11" width="14.7" height="2.5" transform="translate(-67.99 222.73) rotate(-90)"/><rect class="cls-22" x="59.93" y="144.11" width="14.7" height="2.5" transform="translate(-78.08 212.64) rotate(-90)"/><rect class="cls-19" x="87.69" y="144.11" width="14.7" height="2.5" transform="translate(-50.32 240.4) rotate(-90)"/><rect class="cls-8" x="57.4" y="144.11" width="14.7" height="2.5" transform="translate(-80.61 210.11) rotate(-90)"/><rect class="cls-25" x="90.22" y="144.11" width="14.7" height="2.5" transform="translate(-47.8 242.92) rotate(-90)"/><rect class="cls-12" x="44.79" y="144.11" width="14.7" height="2.5" transform="translate(-93.23 197.49) rotate(-90)"/><rect class="cls-4" x="29.64" y="144.11" width="14.7" height="2.5" transform="translate(-108.37 182.35) rotate(-90)"/><rect class="cls-3" x="22.07" y="144.11" width="14.7" height="2.5" transform="translate(-115.94 174.78) rotate(-90)"/><rect class="cls-7" x="27.12" y="144.11" width="14.7" height="2.5" transform="translate(-110.89 179.82) rotate(-90)"/><rect class="cls-10" x="24.59" y="144.11" width="14.7" height="2.5" transform="translate(-113.42 177.3) rotate(-90)"/><rect class="cls-7" x="32.17" y="144.11" width="14.7" height="2.5" transform="translate(-105.85 184.87) rotate(-90)"/><rect class="cls-7" x="39.74" y="144.11" width="14.7" height="2.5" transform="translate(-98.27 192.44) rotate(-90)"/><rect class="cls-14" x="34.69" y="144.11" width="14.7" height="2.5" transform="translate(-103.32 187.4) rotate(-90)"/><rect class="cls-3" x="37.21" y="144.11" width="14.7" height="2.5" transform="translate(-100.8 189.92) rotate(-90)"/><rect class="cls-12" x="52.36" y="144.11" width="14.7" height="2.5" transform="translate(-85.65 205.06) rotate(-90)"/><rect class="cls-8" x="49.83" y="144.11" width="14.7" height="2.5" transform="translate(-88.18 202.54) rotate(-90)"/><rect class="cls-22" x="54.88" y="144.11" width="14.7" height="2.5" transform="translate(-83.13 207.59) rotate(-90)"/><rect class="cls-18" x="75.07" y="144.11" width="14.7" height="2.5" transform="translate(227.78 62.94) rotate(90)"/><rect class="cls-18" x="80.12" y="144.11" width="14.7" height="2.5" transform="translate(232.83 57.89) rotate(90)"/><rect class="cls-15" x="85.17" y="144.11" width="14.7" height="2.5" transform="translate(237.88 52.84) rotate(90)"/><rect class="cls-17" x="115.46" y="144.11" width="14.7" height="2.5" transform="translate(268.16 22.56) rotate(90)"/><rect class="cls-18" x="120.5" y="144.11" width="14.7" height="2.5" transform="translate(273.21 17.51) rotate(90)"/><rect class="cls-21" x="112.93" y="144.11" width="14.7" height="2.5" transform="translate(265.64 25.08) rotate(90)"/><rect class="cls-17" x="107.88" y="144.11" width="14.7" height="2.5" transform="translate(260.59 30.13) rotate(90)"/><rect class="cls-15" x="117.98" y="144.11" width="14.7" height="2.5" transform="translate(270.69 20.03) rotate(90)"/><rect class="cls-23" x="123.03" y="144.11" width="14.7" height="2.5" transform="translate(275.74 14.98) rotate(90)"/><rect class="cls-25" x="105.36" y="144.11" width="14.7" height="2.5" transform="translate(-32.65 258.07) rotate(-90)"/><rect class="cls-17" x="110.41" y="144.11" width="14.7" height="2.5" transform="translate(-27.6 263.12) rotate(-90)"/><rect class="cls-18" x="97.79" y="144.11" width="14.7" height="2.5" transform="translate(250.5 40.22) rotate(90)"/><rect class="cls-18" x="92.74" y="144.11" width="14.7" height="2.5" transform="translate(245.45 45.27) rotate(90)"/><rect class="cls-21" x="100.31" y="144.11" width="14.7" height="2.5" transform="translate(253.02 37.7) rotate(90)"/><rect class="cls-16" x="95.26" y="144.11" width="14.7" height="2.5" transform="translate(247.97 42.75) rotate(90)"/><rect class="cls-23" x="102.84" y="144.11" width="14.7" height="2.5" transform="translate(255.54 35.18) rotate(90)"/></g><g><rect class="cls-18" x="218.94" y="28.13" width="14.7" height="2.5" transform="translate(196.9 255.67) rotate(-90)"/><rect class="cls-18" x="213.89" y="28.13" width="14.7" height="2.5" transform="translate(191.85 250.62) rotate(-90)"/><rect class="cls-21" x="221.46" y="28.13" width="14.7" height="2.5" transform="translate(199.43 258.19) rotate(-90)"/><rect class="cls-16" x="216.41" y="28.13" width="14.7" height="2.5" transform="translate(194.38 253.15) rotate(-90)"/><rect class="cls-11" x="241.65" y="28.13" width="14.7" height="2.5" transform="translate(219.62 278.39) rotate(-90)"/><rect class="cls-12" x="239.13" y="28.13" width="14.7" height="2.5" transform="translate(217.09 275.86) rotate(-90)"/><rect class="cls-16" x="198.75" y="28.13" width="14.7" height="2.5" transform="translate(176.71 235.48) rotate(-90)"/><rect class="cls-19" x="196.22" y="28.13" width="14.7" height="2.5" transform="translate(174.19 232.95) rotate(-90)"/><rect class="cls-8" x="226.51" y="28.13" width="14.7" height="2.5" transform="translate(204.47 263.24) rotate(-90)"/><rect class="cls-18" x="208.84" y="28.13" width="14.7" height="2.5" transform="translate(186.81 245.57) rotate(-90)"/><rect class="cls-15" x="203.79" y="28.13" width="14.7" height="2.5" transform="translate(181.76 240.53) rotate(-90)"/><rect class="cls-23" x="223.99" y="28.13" width="14.7" height="2.5" transform="translate(201.95 260.72) rotate(-90)"/><rect class="cls-20" x="229.03" y="28.13" width="14.7" height="2.5" transform="translate(207 265.77) rotate(-90)"/><rect class="cls-9" x="256.8" y="28.13" width="14.7" height="2.5" transform="translate(234.76 293.53) rotate(-90)"/><rect class="cls-10" x="264.37" y="28.13" width="14.7" height="2.5" transform="translate(242.33 301.1) rotate(-90)"/><rect class="cls-11" x="259.32" y="28.13" width="14.7" height="2.5" transform="translate(237.28 296.05) rotate(-90)"/><rect class="cls-9" x="261.85" y="28.13" width="14.7" height="2.5" transform="translate(239.81 298.58) rotate(-90)"/><rect class="cls-4" x="254.27" y="28.13" width="14.7" height="2.5" transform="translate(232.24 291.01) rotate(-90)"/><rect class="cls-11" x="246.7" y="28.13" width="14.7" height="2.5" transform="translate(224.66 283.43) rotate(-90)"/><rect class="cls-10" x="251.75" y="28.13" width="14.7" height="2.5" transform="translate(229.71 288.48) rotate(-90)"/><rect class="cls-14" x="249.23" y="28.13" width="14.7" height="2.5" transform="translate(227.19 285.96) rotate(-90)"/><rect class="cls-5" x="244.18" y="28.13" width="14.7" height="2.5" transform="translate(222.14 280.91) rotate(-90)"/><rect class="cls-11" x="231.56" y="28.13" width="14.7" height="2.5" transform="translate(209.52 268.29) rotate(-90)"/><rect class="cls-12" x="234.08" y="28.13" width="14.7" height="2.5" transform="translate(212.05 270.81) rotate(-90)"/><rect class="cls-8" x="236.61" y="28.13" width="14.7" height="2.5" transform="translate(214.57 273.34) rotate(-90)"/><rect class="cls-16" x="201.27" y="28.13" width="14.7" height="2.5" transform="translate(238 -179.23) rotate(90)"/><rect class="cls-19" x="211.37" y="28.13" width="14.7" height="2.5" transform="translate(248.1 -189.33) rotate(90)"/><rect class="cls-20" x="206.32" y="28.13" width="14.7" height="2.5" transform="translate(243.05 -184.28) rotate(90)"/><rect class="cls-9" x="173.51" y="28.13" width="14.7" height="2.5" transform="translate(210.24 -151.47) rotate(90)"/><rect class="cls-13" x="168.46" y="28.13" width="14.7" height="2.5" transform="translate(205.19 -146.42) rotate(90)"/><rect class="cls-17" x="145.74" y="28.13" width="14.7" height="2.5" transform="translate(182.48 -123.71) rotate(90)"/><rect class="cls-13" x="170.98" y="28.13" width="14.7" height="2.5" transform="translate(207.71 -148.95) rotate(90)"/><rect class="cls-18" x="150.79" y="28.13" width="14.7" height="2.5" transform="translate(187.52 -128.75) rotate(90)"/><rect class="cls-21" x="143.22" y="28.13" width="14.7" height="2.5" transform="translate(179.95 -121.18) rotate(90)"/><rect class="cls-17" x="138.17" y="28.13" width="14.7" height="2.5" transform="translate(174.9 -116.13) rotate(90)"/><rect class="cls-15" x="148.27" y="28.13" width="14.7" height="2.5" transform="translate(185 -126.23) rotate(90)"/><rect class="cls-22" x="155.84" y="28.13" width="14.7" height="2.5" transform="translate(192.57 -133.8) rotate(90)"/><rect class="cls-23" x="153.32" y="28.13" width="14.7" height="2.5" transform="translate(190.05 -131.28) rotate(90)"/><rect class="cls-20" x="128.08" y="28.13" width="14.7" height="2.5" transform="translate(164.81 -106.04) rotate(90)"/><rect class="cls-22" x="158.36" y="28.13" width="14.7" height="2.5" transform="translate(195.1 -136.33) rotate(90)"/><rect class="cls-16" x="125.55" y="28.13" width="14.7" height="2.5" transform="translate(162.28 -103.52) rotate(90)"/><rect class="cls-19" x="133.12" y="28.13" width="14.7" height="2.5" transform="translate(169.86 -111.09) rotate(90)"/><rect class="cls-11" x="186.13" y="28.13" width="14.7" height="2.5" transform="translate(222.86 -164.09) rotate(90)"/><rect class="cls-6" x="193.7" y="28.13" width="14.7" height="2.5" transform="translate(230.43 -171.66) rotate(90)"/><rect class="cls-6" x="188.65" y="28.13" width="14.7" height="2.5" transform="translate(225.38 -166.61) rotate(90)"/><rect class="cls-10" x="191.17" y="28.13" width="14.7" height="2.5" transform="translate(227.91 -169.14) rotate(90)"/><rect class="cls-4" x="183.6" y="28.13" width="14.7" height="2.5" transform="translate(220.33 -161.57) rotate(90)"/><rect class="cls-14" x="176.03" y="28.13" width="14.7" height="2.5" transform="translate(212.76 -153.99) rotate(90)"/><rect class="cls-12" x="181.08" y="28.13" width="14.7" height="2.5" transform="translate(217.81 -159.04) rotate(90)"/><rect class="cls-13" x="178.55" y="28.13" width="14.7" height="2.5" transform="translate(215.29 -156.52) rotate(90)"/><rect class="cls-13" x="163.41" y="28.13" width="14.7" height="2.5" transform="translate(200.14 -141.37) rotate(90)"/><rect class="cls-13" x="160.89" y="28.13" width="14.7" height="2.5" transform="translate(197.62 -138.85) rotate(90)"/><rect class="cls-5" x="165.93" y="28.13" width="14.7" height="2.5" transform="translate(202.67 -143.9) rotate(90)"/><rect class="cls-25" x="135.65" y="28.13" width="14.7" height="2.5" transform="translate(113.61 172.38) rotate(-90)"/><rect class="cls-21" x="130.6" y="28.13" width="14.7" height="2.5" transform="translate(108.56 167.33) rotate(-90)"/><rect class="cls-17" x="140.7" y="28.13" width="14.7" height="2.5" transform="translate(118.66 177.43) rotate(-90)"/><rect class="cls-9" x="70.02" y="28.13" width="14.7" height="2.5" transform="translate(106.76 -47.99) rotate(90)"/><rect class="cls-13" x="64.98" y="28.13" width="14.7" height="2.5" transform="translate(101.71 -42.94) rotate(90)"/><rect class="cls-17" x="42.26" y="28.13" width="14.7" height="2.5" transform="translate(78.99 -20.22) rotate(90)"/><rect class="cls-13" x="67.5" y="28.13" width="14.7" height="2.5" transform="translate(104.23 -45.46) rotate(90)"/><rect class="cls-18" x="47.31" y="28.13" width="14.7" height="2.5" transform="translate(84.04 -25.27) rotate(90)"/><rect class="cls-21" x="39.74" y="28.13" width="14.7" height="2.5" transform="translate(76.47 -17.7) rotate(90)"/><rect class="cls-17" x="34.69" y="28.13" width="14.7" height="2.5" transform="translate(71.42 -12.65) rotate(90)"/><rect class="cls-15" x="44.79" y="28.13" width="14.7" height="2.5" transform="translate(81.52 -22.75) rotate(90)"/><rect class="cls-22" x="52.36" y="28.13" width="14.7" height="2.5" transform="translate(89.09 -30.32) rotate(90)"/><rect class="cls-23" x="49.83" y="28.13" width="14.7" height="2.5" transform="translate(86.57 -27.8) rotate(90)"/><rect class="cls-20" x="24.59" y="28.13" width="14.7" height="2.5" transform="translate(61.33 -2.56) rotate(90)"/><rect class="cls-22" x="54.88" y="28.13" width="14.7" height="2.5" transform="translate(91.61 -32.84) rotate(90)"/><rect class="cls-16" x="22.07" y="28.13" width="14.7" height="2.5" transform="translate(58.8 -.03) rotate(90)"/><rect class="cls-19" x="29.64" y="28.13" width="14.7" height="2.5" transform="translate(66.37 -7.6) rotate(90)"/><rect class="cls-11" x="82.64" y="28.13" width="14.7" height="2.5" transform="translate(119.38 -60.61) rotate(90)"/><rect class="cls-6" x="90.22" y="28.13" width="14.7" height="2.5" transform="translate(126.95 -68.18) rotate(90)"/><rect class="cls-6" x="85.17" y="28.13" width="14.7" height="2.5" transform="translate(121.9 -63.13) rotate(90)"/><rect class="cls-10" x="87.69" y="28.13" width="14.7" height="2.5" transform="translate(124.42 -65.66) rotate(90)"/><rect class="cls-4" x="80.12" y="28.13" width="14.7" height="2.5" transform="translate(116.85 -58.08) rotate(90)"/><rect class="cls-14" x="72.55" y="28.13" width="14.7" height="2.5" transform="translate(109.28 -50.51) rotate(90)"/><rect class="cls-12" x="77.6" y="28.13" width="14.7" height="2.5" transform="translate(114.33 -55.56) rotate(90)"/><rect class="cls-13" x="75.07" y="28.13" width="14.7" height="2.5" transform="translate(111.8 -53.04) rotate(90)"/><rect class="cls-13" x="59.93" y="28.13" width="14.7" height="2.5" transform="translate(96.66 -37.89) rotate(90)"/><rect class="cls-13" x="57.4" y="28.13" width="14.7" height="2.5" transform="translate(94.14 -35.37) rotate(90)"/><rect class="cls-5" x="62.45" y="28.13" width="14.7" height="2.5" transform="translate(99.18 -40.42) rotate(90)"/><rect class="cls-25" x="32.17" y="28.13" width="14.7" height="2.5" transform="translate(10.13 68.9) rotate(-90)"/><rect class="cls-21" x="27.12" y="28.13" width="14.7" height="2.5" transform="translate(5.08 63.85) rotate(-90)"/><rect class="cls-17" x="37.21" y="28.13" width="14.7" height="2.5" transform="translate(15.18 73.95) rotate(-90)"/><rect class="cls-17" x="115.46" y="28.13" width="14.7" height="2.5" transform="translate(152.19 -93.42) rotate(90)"/><rect class="cls-18" x="120.5" y="28.13" width="14.7" height="2.5" transform="translate(157.24 -98.47) rotate(90)"/><rect class="cls-21" x="112.93" y="28.13" width="14.7" height="2.5" transform="translate(149.66 -90.9) rotate(90)"/><rect class="cls-17" x="107.88" y="28.13" width="14.7" height="2.5" transform="translate(144.62 -85.85) rotate(90)"/><rect class="cls-15" x="117.98" y="28.13" width="14.7" height="2.5" transform="translate(154.71 -95.94) rotate(90)"/><rect class="cls-23" x="123.03" y="28.13" width="14.7" height="2.5" transform="translate(159.76 -100.99) rotate(90)"/><rect class="cls-25" x="105.36" y="28.13" width="14.7" height="2.5" transform="translate(83.32 142.09) rotate(-90)"/><rect class="cls-17" x="110.41" y="28.13" width="14.7" height="2.5" transform="translate(88.37 147.14) rotate(-90)"/><rect class="cls-18" x="97.79" y="28.13" width="14.7" height="2.5" transform="translate(134.52 -75.75) rotate(90)"/><rect class="cls-18" x="92.74" y="28.13" width="14.7" height="2.5" transform="translate(129.47 -70.7) rotate(90)"/><rect class="cls-21" x="100.31" y="28.13" width="14.7" height="2.5" transform="translate(137.04 -78.28) rotate(90)"/><rect class="cls-16" x="95.26" y="28.13" width="14.7" height="2.5" transform="translate(132 -73.23) rotate(90)"/><rect class="cls-23" x="102.84" y="28.13" width="14.7" height="2.5" transform="translate(139.57 -80.8) rotate(90)"/></g></g><ellipse class="cls-31" cx="284.05" cy="145.37" rx="4.35" ry="4.46" transform="translate(-3.93 7.94) rotate(-1.59)"/></g><g><ellipse class="cls-27" cx="531.4" cy="21.95" rx="12.04" ry="11.84" transform="translate(.16 47.46) rotate(-5.11)"/><ellipse class="cls-27" cx="535.06" cy="62.91" rx="11.84" ry="12.04" transform="translate(431.09 591.53) rotate(-85.58)"/><ellipse class="cls-31" cx="466.04" cy="33.73" rx="11.78" ry="12.1" transform="translate(419.37 498.65) rotate(-88.41)"/><ellipse class="cls-28" cx="455.08" cy="132.56" rx="12.07" ry="11.81"/><ellipse class="cls-28" cx="479.22" cy="167.51" rx="12.07" ry="11.81"/><ellipse class="cls-29" cx="549.92" cy="43.29" rx="12.07" ry="11.81"/><ellipse class="cls-30" cx="555.83" cy="111.38" rx="12.03" ry="11.85" transform="translate(-5.14 29.71) rotate(-3.05)"/><ellipse class="cls-31" cx="450.48" cy="58.17" rx="11.78" ry="12.1" transform="translate(379.82 506.86) rotate(-88.41)"/></g><g><polyline class="cls-2" points="446.95 168.96 402.9 168.96 402.9 127.26"/><text class="cls-26" transform="translate(402.9 185.42)"><tspan x="0" y="0">dim-2</tspan></text><text class="cls-26" transform="translate(398.18 168.96) rotate(-90)"><tspan x="0" y="0">dim-1</tspan></text></g></svg>
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/control_methods/random_features/config.vsh.yaml b/src/tasks/dimensionality_reduction/control_methods/random_features/config.vsh.yaml
new file mode 100644
index 0000000000..6c0d36ad44
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/control_methods/random_features/config.vsh.yaml
@@ -0,0 +1,22 @@
+__merge__: ../../api/comp_control_method.yaml
+functionality:
+  name: "random_features"
+  info:
+    label: Random Features
+    summary: "Negative control by randomly embedding into a 2D space."
+    description: "This method serves as a negative control, where the data is randomly embedded into a two-dimensional space, with no attempt to preserve the original structure."
+    v1:
+      path: openproblems/tasks/dimensionality_reduction/methods/baseline.py
+      commit: 80b37e7a6aa27df4436f400397564c01276817e0
+    preferred_normalization: counts
+    variants:
+      random_features:
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/control_methods/random_features/script.py b/src/tasks/dimensionality_reduction/control_methods/random_features/script.py
new file mode 100644
index 0000000000..7908207bda
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/control_methods/random_features/script.py
@@ -0,0 +1,34 @@
+import anndata as ad
+import numpy as np
+
+## VIASH START
+par = {
+    "input": "resources_test/dimensionality_reduction/pancreas/test.h5ad",
+    "output": "reduced.h5ad",
+}
+meta = {
+    "functionality_name": "random_features",
+}
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+
+print("Create random embedding", flush=True)
+X_emb = np.random.normal(0, 1, (input.shape[0], 2))
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/control_methods/spectral_features/config.vsh.yaml b/src/tasks/dimensionality_reduction/control_methods/spectral_features/config.vsh.yaml
new file mode 100644
index 0000000000..b3ae5aa95b
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/control_methods/spectral_features/config.vsh.yaml
@@ -0,0 +1,41 @@
+__merge__: ../../api/comp_control_method.yaml
+functionality:
+  name: "spectral_features"
+  info:
+    label: Spectral Features
+    summary: "Positive control by Use 1000-dimensional diffusions maps as an embedding."
+    description: "This serves as a positive control since it uses 1000-dimensional diffusions maps as an embedding"
+    v1:
+      path: openproblems/tasks/dimensionality_reduction/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+    variants:
+      spectral_features:
+  arguments:
+    - name: "--n_comps"
+      type: integer
+      default: 1000
+      description: "Number of components to use for the embedding."
+    - name: t
+      type: integer
+      default: 1
+      description: "Number to power the eigenvalues by."
+    - name: n_retries
+      type: integer
+      default: 1
+      description: "Number of times to retry if the embedding fails, each time adding noise."
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi: 
+          - umap-learn
+          - scipy
+          - numpy
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/dimensionality_reduction/control_methods/spectral_features/script.py b/src/tasks/dimensionality_reduction/control_methods/spectral_features/script.py
new file mode 100644
index 0000000000..cf8633120c
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/control_methods/spectral_features/script.py
@@ -0,0 +1,77 @@
+import anndata as ad
+import umap
+
+## VIASH START
+par = {
+    "input": "resources_test/dimensionality_reduction/pancreas/test.h5ad",
+    "output": "reduced.h5ad",
+    "n_comps": 2,
+}
+meta = {
+    "functionality_name": "foo",
+}
+## VIASH END
+
+def diffusion_map(graph, n_comps, t, n_retries):
+    import numpy as np
+    import scipy.sparse.linalg
+
+    diag_data = np.asarray(graph.sum(axis=0))
+    identity = scipy.sparse.identity(graph.shape[0], dtype=np.float64)
+    diag = scipy.sparse.spdiags(
+        1.0 / np.sqrt(diag_data), 0, graph.shape[0], graph.shape[0]
+    )
+    laplacian = identity - diag * graph * diag
+    num_lanczos_vectors = max(2 * n_comps + 1, int(np.sqrt(graph.shape[0])))
+    try:
+        eigenvalues, eigenvectors = scipy.sparse.linalg.eigsh(
+            laplacian,
+            n_comps,
+            which="SM",
+            ncv=num_lanczos_vectors,
+            tol=1e-4,
+            v0=np.ones(laplacian.shape[0]),
+            maxiter=graph.shape[0] * 5,
+        )
+        return (eigenvalues**t) * eigenvectors
+    except scipy.sparse.linalg.ArpackNoConvergence:
+        if n_retries > 0:
+            # add some noise and try again
+            graph_rand = graph.copy().tocoo()
+            graph_rand.row = np.random.choice(
+                graph_rand.shape[0], len(graph_rand.row), replace=True
+            )
+            graph_rand.data *= 0.01
+            return diffusion_map(
+                graph + graph_rand, n_comps, t, n_retries=n_retries - 1
+            )
+        else:
+            raise
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+
+print("Create high dimensionally embedding with all features", flush=True)
+
+n_comps = min(par["n_comps"], min(input.shape) - 2)
+
+graph = umap.UMAP(transform_mode="graph").fit_transform(input.layers["normalized"])
+
+X_emb = diffusion_map(graph, n_comps, t=par["t"], n_retries=par["n_retries"])
+
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/control_methods/true_features/config.vsh.yaml b/src/tasks/dimensionality_reduction/control_methods/true_features/config.vsh.yaml
new file mode 100644
index 0000000000..a83d393072
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/control_methods/true_features/config.vsh.yaml
@@ -0,0 +1,22 @@
+__merge__: ../../api/comp_control_method.yaml
+functionality:
+  name: "true_features"
+  info:
+    label: True Features
+    summary: "Positive control by retaining the dimensionality without loss of information."
+    description: "This serves as a positive control since the original high-dimensional data is retained as is, without any loss of information"
+    v1:
+      path: openproblems/tasks/dimensionality_reduction/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+    variants:
+      true_features:
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/dimensionality_reduction/control_methods/true_features/script.py b/src/tasks/dimensionality_reduction/control_methods/true_features/script.py
new file mode 100644
index 0000000000..1a58cd4984
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/control_methods/true_features/script.py
@@ -0,0 +1,33 @@
+import anndata as ad
+
+## VIASH START
+par = {
+    "input": "resources_test/dimensionality_reduction/pancreas/test.h5ad",
+    "output": "reduced.h5ad",
+}
+meta = {
+    "functionality_name": "true_features",
+}
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+
+print("Create high dimensionally embedding with all features", flush=True)
+X_emb = input.layers["normalized"].toarray()
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/methods/densmap/config.vsh.yaml b/src/tasks/dimensionality_reduction/methods/densmap/config.vsh.yaml
new file mode 100644
index 0000000000..ff5764a561
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/densmap/config.vsh.yaml
@@ -0,0 +1,45 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "densmap"
+  info:
+    label: densMAP
+    summary: "Modified UMAP with preservation of local density information"
+    description: "A modification of UMAP that adds an extra cost term in order to preserve information about the relative local density of the data. It is performed on the same inputs as UMAP."
+    reference: "narayan2021assessing"
+    repository_url: https://github.com/lmcinnes/umap
+    documentation_url: https://github.com/lmcinnes/umap#readme
+    v1:
+      path: openproblems/tasks/dimensionality_reduction/methods/umap.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+    variants:
+      densmap_logCP10k:
+      densmap_pca_logCP10k:
+        n_pca_dims: 50
+      densmap_logCP10k_1kHVG:
+        n_hvg: 1000
+      densmap_pca_logCP10k_1kHVG:
+        n_pca_dims: 50
+        n_hvg: 1000
+  arguments:
+    - name: "--n_hvg"
+      type: integer
+      description: Number of highly variable genes to subset to. If not specified, the input matrix will not be subset.
+    - name: "--n_pca_dims"
+      type: integer
+      description: Number of PCA dimensions to use. If not specified, no PCA will be performed.
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages:
+          - umap-learn
+          - pynndescent==0.5.11
+  - type: native
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/dimensionality_reduction/methods/densmap/script.py b/src/tasks/dimensionality_reduction/methods/densmap/script.py
new file mode 100644
index 0000000000..985c95d78a
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/densmap/script.py
@@ -0,0 +1,54 @@
+import anndata as ad
+from umap import UMAP
+import scanpy as sc
+
+## VIASH START
+par = {
+    "input": "resources_test/dimensionality_reduction/pancreas/train.h5ad",
+    "output": "reduced.h5ad",
+    "n_pca_dims": 50,
+    "n_hvg": 1000
+}
+meta = {
+    "functionality_name": "foo",
+}
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+if par["n_pca_dims"]:
+    print("Apply PCA to normalized data", flush=True)
+    umap_input = sc.tl.pca(
+        X_mat,
+        n_comps=par["n_pca_dims"],
+        svd_solver="arpack"
+    )
+else:
+    print("Use normalized data as input for UMAP", flush=True)
+    umap_input = X_mat
+
+print("Run densMAP", flush=True)
+X_emb = UMAP(densmap=True, random_state=42).fit_transform(umap_input)
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/methods/diffusion_map/config.vsh.yaml b/src/tasks/dimensionality_reduction/methods/diffusion_map/config.vsh.yaml
new file mode 100644
index 0000000000..ced082c708
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/diffusion_map/config.vsh.yaml
@@ -0,0 +1,31 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: diffusion_map
+  info:
+    label: Diffusion Map
+    summary: Finding meaningful geometric descriptions of datasets using diffusion maps.
+    description: Implements diffusion map method of data parametrization, including creation and visualization of diffusion map, clustering with diffusion K-means and regression using adaptive regression model. 
+    reference: coifman2006diffusion
+    documentation_url: https://bioconductor.org/packages/release/bioc/html/destiny.html
+    repository_url: https://github.com/theislab/destiny
+    v1:
+      path: openproblems/tasks/dimensionality_reduction/methods/diffusion_map.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+  resources:
+    - type: r_script
+      path: script.R
+  arguments:
+    - name: "--n_dim"
+      type: integer
+      description: Number of dimensions.
+      default: 3
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        bioc: destiny
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/dimensionality_reduction/methods/diffusion_map/script.R b/src/tasks/dimensionality_reduction/methods/diffusion_map/script.R
new file mode 100644
index 0000000000..a9146c8db9
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/diffusion_map/script.R
@@ -0,0 +1,37 @@
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("diffusionMap", quietly = TRUE)
+
+## VIASH START
+par <- list(
+  input = "resources_test/dimensionality_reduction/pancreas/dataset.h5ad",
+  output = "output.h5ad",
+  n_dim = 3
+)
+## VIASH END
+
+cat("Reading input files\n")
+input <- anndata::read_h5ad(par$input)
+
+cat("Running destiny diffusion map\n")
+# create SummarizedExperiment object
+sce <- SingleCellExperiment::SingleCellExperiment(
+  assays = list(
+    logcounts = t(as.matrix(input$layers[["normalized"]]))
+  )
+)
+dm <- destiny::DiffusionMap(sce)
+X_emb <- destiny::eigenvectors(dm)[, seq_len(par$n_dim)]
+
+cat("Write output AnnData to file\n")
+output <- anndata::AnnData(
+  uns = list(
+    dataset_id = input$uns[["dataset_id"]],
+    normalization_id = input$uns[["normalization_id"]],
+    method_id = meta$functionality_name
+  ),
+  obsm = list(
+    X_emb = X_emb
+  ),
+  shape = input$shape
+)
+output$write_h5ad(par$output, compression = "gzip")
diff --git a/src/tasks/dimensionality_reduction/methods/ivis/config.vsh.yaml b/src/tasks/dimensionality_reduction/methods/ivis/config.vsh.yaml
new file mode 100644
index 0000000000..aa3c5ca0b4
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/ivis/config.vsh.yaml
@@ -0,0 +1,44 @@
+# see https://github.com/openproblems-bio/openproblems/blob/9ebb777b3b76337e731a3b99f4bf39462a15c4cc/openproblems/tasks/dimensionality_reduction/methods/ivis.py
+
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "ivis"
+  info:
+    label: "ivis"
+    summary: "Structure-preserving dimensionality reduction using a siamese neural network trained on triplets."
+    description: |
+      ivis is a machine learning library for reducing dimensionality of very large datasets using Siamese Neural Networks.
+      ivis preserves global data structures in a low-dimensional space, adds new data points to existing embeddings using
+      a parametric mapping function, and scales linearly to millions of observations.
+    reference: szubert2019structurepreserving
+    repository_url: "https://github.com/beringresearch/ivis"
+    documentation_url: "https://github.com/beringresearch/ivis#readme"
+    v1:
+      path: openproblems/tasks/dimensionality_reduction/methods/ivis.py
+      commit: 93d2161a08da3edf249abedff5111fb5ce527552
+    preferred_normalization: log_cp10k
+    variants:
+      ivis_logCPM_1kHVG:
+  arguments:
+    - name: '--n_pca_dims'
+      type: integer
+      default: 50
+      description: Number of principal components of PCA to use.
+    - name: "--n_hvg"
+      type: integer
+      description: Number of highly variable genes to subset to. If not specified, the input matrix will not be subset.
+      default: 1000
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages:
+          - ivis[cpu]
+          - tensorflow<2.16
+  - type: nextflow
+    directives:
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/dimensionality_reduction/methods/ivis/script.py b/src/tasks/dimensionality_reduction/methods/ivis/script.py
new file mode 100644
index 0000000000..1eade8b74d
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/ivis/script.py
@@ -0,0 +1,57 @@
+import anndata as ad
+import scanpy as sc
+from ivis import Ivis
+
+# todo: allow using gpus instead!
+
+## VIASH START
+par = {
+    "input": "resources_test/dimensionality_reduction/pancreas/dataset.h5ad",
+    "output": "reduced.h5ad",
+    "n_hvg": 1000,
+    "n_pca_dims": 50
+}
+meta = {
+    "functionality_name": "foo",
+}
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+print(f"Running PCA with {par['n_pca_dims']} dimensions", flush=True)
+X_pca = sc.tl.pca(X_mat, n_comps=par["n_pca_dims"], svd_solver="arpack")
+
+print("Run ivis", flush=True)
+# parameters taken from:
+# https://bering-ivis.readthedocs.io/en/latest/scanpy_singlecell.html#reducing-dimensionality-using-ivis
+ivis = Ivis(
+    k=15,
+    model="maaten",
+    n_epochs_without_progress=5,
+    verbose=0,
+    embedding_dims=2,
+)
+X_emb = ivis.fit_transform(X_pca)
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/methods/lmds/config.vsh.yaml b/src/tasks/dimensionality_reduction/methods/lmds/config.vsh.yaml
new file mode 100644
index 0000000000..2b651271a9
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/lmds/config.vsh.yaml
@@ -0,0 +1,44 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: lmds
+
+  info:
+    label: LMDS
+    summary: Landmark Multi-Dimensional Scaling
+    description: |
+      Landmark Multi-Dimensional Scaling (LMDS) is a method for dimensionality reduction that is based on the concept of multi-dimensional scaling.
+      LMDS is a non-linear dimensionality reduction method that is based on the concept of multi-dimensional scaling.
+    preferred_normalization: log_cp10k
+    reference: saelens2019comparison
+    documentation_url: https://dynverse.org/lmds/
+    repository_url: https://github.com/dynverse/lmds
+
+  arguments:
+    - name: "--n_dim"
+      type: integer
+      description: Number of dimensions.
+      default: 2
+    - name: "--n_landmarks"
+      type: integer
+      description: Number of landmarks.
+      default: 1000
+    - name: "--distance_method"
+      type: string
+      description: Number of clusters to be estimated over the input dataset.
+      choices: ["euclidean", "pearson", "spearman", "cosine", "chisquared", "hamming", "kullback", "manhattan", "maximum", "canberra", "minkowski"]
+      default: "pearson"
+
+  resources:
+    - type: r_script
+      path: script.R
+ 
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [ Matrix, lmds ]
+  - type: nextflow
+    directives:
+      label: [midtime, highmem, midcpu]
diff --git a/src/tasks/dimensionality_reduction/methods/lmds/script.R b/src/tasks/dimensionality_reduction/methods/lmds/script.R
new file mode 100644
index 0000000000..ae9461c496
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/lmds/script.R
@@ -0,0 +1,39 @@
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("lmds", quietly = TRUE)
+
+## VIASH START
+par <- list(
+  input = "resources_test/dimensionality_reduction/pancreas/dataset.h5ad",
+  output = "output.h5ad",
+  n_dim = 3,
+  n_landmarks = 1000,
+  distance_method = "pearson"
+)
+## VIASH END
+
+cat("Reading input files\n")
+input <- anndata::read_h5ad(par$input)
+
+# TODO: if we wanted to, we could compute the distance
+# matrix in batches. This would be useful for large datasets.
+cat("Running LMDS\n")
+X_emb <- lmds::lmds(
+  input$layers[["normalized"]],
+  ndim = par$n_dim,
+  num_landmarks = par$n_landmarks,
+  distance_method = par$distance_method
+)
+
+cat("Write output AnnData to file\n")
+output <- anndata::AnnData(
+  uns = list(
+    dataset_id = input$uns[["dataset_id"]],
+    method_id = meta$functionality_name,
+    normalization_id = input$uns[["normalization_id"]]
+  ),
+  obsm = list(
+    X_emb = X_emb
+  ),
+  shape = input$shape
+)
+output$write_h5ad(par$output, compression = "gzip")
diff --git a/src/tasks/dimensionality_reduction/methods/neuralee/config.vsh.yaml b/src/tasks/dimensionality_reduction/methods/neuralee/config.vsh.yaml
new file mode 100644
index 0000000000..0d3d0234c4
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/neuralee/config.vsh.yaml
@@ -0,0 +1,55 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "neuralee"
+  info:
+    label: NeuralEE
+    summary: "Non-linear method that uses a neural network to preserve pairwise distances between data points in a high-dimensional space."
+    description: |
+      A neural network implementation of elastic embedding. It is a
+      non-linear method that preserves pairwise distances between data points.
+      NeuralEE uses a neural network to optimize an objective function that
+      measures the difference between pairwise distances in the original
+      high-dimensional space and the two-dimensional space. It is computed on both
+      the recommended input from the package authors of 500 HVGs selected from a
+      logged expression matrix (without sequencing depth scaling) and the default
+      logCPM matrix with 1000 HVGs.
+    reference: "xiong2020neuralee"
+    repository_url: "https://github.com/HiBearME/NeuralEE"
+    documentation_url: "https://github.com/HiBearME/NeuralEE#readme"
+    v1:
+      path: openproblems/tasks/dimensionality_reduction/methods/neuralee.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+    variants:
+      neuralee_default:
+        normalize: true
+        n_hvg: 500
+      neuralee_logCP10k_1kHVG:
+        normalize: false
+        n_hvg: 1000
+  arguments:
+    - name: "--n_iter"
+      type: integer
+      description: Number of iterations.
+    - name: "--n_hvg"
+      type: integer
+      description: Number of highly variable genes to subset to. If not specified, the input matrix will not be subset.
+      default: 1000
+    - name: "--normalize"
+      type: boolean
+      default: false
+      description: Whether to perform own normalization
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages:
+          - torch
+          - "git+https://github.com/michalk8/neuralee@8946abf"
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/dimensionality_reduction/methods/neuralee/script.py b/src/tasks/dimensionality_reduction/methods/neuralee/script.py
new file mode 100644
index 0000000000..bd13a2f34d
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/neuralee/script.py
@@ -0,0 +1,78 @@
+import anndata as ad
+import torch
+from neuralee.embedding import NeuralEE
+from neuralee.dataset import GeneExpressionDataset
+
+# todo: allow gpu
+device = torch.device("cpu")
+
+## VIASH START
+par = {
+    "input": "resources_test/dimensionality_reduction/pancreas/train.h5ad",
+    "output": "reduced.h5ad",
+    "n_hvg": 1000,
+    "n_iter": 10,
+    "normalize": True
+}
+meta = {
+    "functionality_name": "foo",
+}
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+
+if par["normalize"]:
+    print("Performing own normalization", flush=True)
+    # perform own normalization based on the "recommended" preprocessing taken from example notebooks, e.g.:
+    # https://github.com/HiBearME/NeuralEE/blob/master/tests/notebooks/retina_dataset.ipynb
+    dataset = GeneExpressionDataset(input.layers["counts"])
+    dataset.log_shift()
+    if par["n_hvg"]:
+        dataset.subsample_genes(par["n_hvg"])
+    dataset.standardscale()
+
+else:
+    X_mat = input.layers["normalized"]
+
+    if par["n_hvg"]:
+        print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+        idx = input.var["hvg_score"].to_numpy().argsort()[-par["n_hvg"]:]
+        X_mat = X_mat[:, idx]
+    
+    print("Using pre-normalized data", flush=True)
+    dataset = GeneExpressionDataset(X_mat)
+
+
+# estimate the affinity matrix
+batch_size = min(1000, input.n_obs)
+print(f"Use {batch_size} cells as batch to estimate the affinity matrix", flush=True)
+dataset.affinity_split(N_small=batch_size)
+
+print("Create NeuralEE object", flush=True)
+NEE = NeuralEE(dataset, d=2, device=device)
+fine_tune_kwargs = dict(verbose=False)
+
+if par["n_iter"]:
+    fine_tune_kwargs["maxit"] = par["n_iter"]
+
+print("Run NeuralEE", flush=True)
+res = NEE.fine_tune(**fine_tune_kwargs)
+
+X_emb = res["X"].detach().cpu().numpy()
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/methods/pca/config.vsh.yaml b/src/tasks/dimensionality_reduction/methods/pca/config.vsh.yaml
new file mode 100644
index 0000000000..11d3841fb6
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/pca/config.vsh.yaml
@@ -0,0 +1,40 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "pca"
+  info:
+    label: "PCA"
+    summary: A linear method that finds orthogonal directions to compute the two-dimensional embedding.
+    description: |
+      Principal Component Analysis is a linear method that finds orthogonal
+      directions in the data that capture the most variance. The first two
+      principal components are chosen as the two-dimensional embedding. We select
+      only the first two principal components as the two-dimensional embedding. PCA
+      is calculated on the logCPM expression matrix with and without selecting 1000
+      HVGs.
+    reference: pearson1901pca
+    repository_url: https://github.com/scikit-learn/scikit-learn
+    documentation_url: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
+    v1:
+      path: openproblems/tasks/dimensionality_reduction/methods/pca.py
+      commit: 154ccb9fd99113f3d28d9c3f139194539a0290f9
+    preferred_normalization: log_cp10k
+    variants:
+      pca_logCP10k:
+      pca_logCP10k_1kHVG:
+        n_hvg: 1000
+  arguments:
+    - name: "--n_hvg"
+      type: integer
+      description: Number of highly variable genes to subset to. If not specified, the input matrix will not be subset.
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: scanpy
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/dimensionality_reduction/methods/pca/script.py b/src/tasks/dimensionality_reduction/methods/pca/script.py
new file mode 100644
index 0000000000..81cff3441f
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/pca/script.py
@@ -0,0 +1,41 @@
+import anndata as ad
+import scanpy as sc
+
+## VIASH START
+par = {
+    "input": "resources_test/dimensionality_reduction/pancreas/train.h5ad",
+    "output": "reduced.h5ad",
+    "n_hvg": 1000
+}
+meta = {
+    "functionality_name": "foo",
+}
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+print(f"Running PCA", flush=True)
+X_emb = sc.tl.pca(X_mat, n_comps=2, svd_solver="arpack")[:, :2]
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/methods/phate/config.vsh.yaml b/src/tasks/dimensionality_reduction/methods/phate/config.vsh.yaml
new file mode 100644
index 0000000000..ff63659780
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/phate/config.vsh.yaml
@@ -0,0 +1,58 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "phate"
+  info:
+    label: PHATE
+    summary: Preservating trajectories in a dataset by using heat diffusion potential.
+    description: |
+      PHATE or "Potential of Heat - diffusion for Affinity - based Transition
+      Embedding" uses the potential of heat diffusion to preserve trajectories in a
+      dataset via a diffusion process. It is an affinity - based method that
+      creates an embedding by finding the dominant eigenvalues of a Markov
+      transition matrix. We evaluate several variants including using the
+      recommended square - root transformed CPM matrix as input, this input with
+      the gamma parameter set to zero and the normal logCPM transformed matrix with
+      and without HVG selection.
+    reference: "moon2019visualizing"
+    repository_url: "https://github.com/KrishnaswamyLab/PHATE"
+    documentation_url: "https://github.com/KrishnaswamyLab/PHATE#readme"
+    v1:
+      path: openproblems/tasks/dimensionality_reduction/methods/phate.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: sqrt_cp10k
+    variants:
+      phate_default:
+      phate_sqrt:
+        gamma: 0
+      phate_logCP10k:
+        preferred_normalization: log_cp10k
+      phate_logCP10k_1kHVG:
+        n_hvg: 1000
+        preferred_normalization: log_cp10k
+  arguments:
+    - name: '--n_pca_dims'
+      type: integer
+      default: 50
+      description: Number of principal components of PCA to use.
+    - name: "--n_hvg"
+      type: integer
+      description: Number of highly variable genes to subset to. If not specified, the input matrix will not be subset.
+    - name: '--gamma'
+      type: double
+      description: Gamma value
+      default: 1
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages:
+          - phate==1.0.*
+          - scprep
+          - "scikit-learn<1.2"
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/dimensionality_reduction/methods/phate/script.py b/src/tasks/dimensionality_reduction/methods/phate/script.py
new file mode 100644
index 0000000000..a21d9e0d87
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/phate/script.py
@@ -0,0 +1,45 @@
+import anndata as ad
+from phate import PHATE
+
+## VIASH START
+par = {
+    "input": "resources_test/dimensionality_reduction/pancreas/train.h5ad",
+    "output": "reduced.h5ad",
+    "n_pca_dims": 50,
+    "n_hvg": 1000,
+    "gamma": 1
+}
+meta = {
+    "functionality_name": "foo",
+}
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Subsetting to {par['n_hvg']} HVG", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+print("Run PHATE", flush=True)
+phate_op = PHATE(n_pca=par["n_pca_dims"], verbose=False, n_jobs=-1, gamma=par["gamma"])
+X_emb = phate_op.fit_transform(X_mat)
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/methods/pymde/config.vsh.yaml b/src/tasks/dimensionality_reduction/methods/pymde/config.vsh.yaml
new file mode 100644
index 0000000000..2f733bb714
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/pymde/config.vsh.yaml
@@ -0,0 +1,41 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: pymde
+  info:
+    label: PyMDE
+    summary: "A Python implementation of Minimum-Distortion Embedding"
+    description: |
+      PyMDE is a Python implementation of Minimum-Distortion Embedding. It is a non-linear
+      method that preserves distances between cells or neighbourhoods in the original space.
+    reference: agrawal2021mde
+    repository_url: https://github.com/cvxgrp/pymde
+    documentation_url: https://pymde.org
+    v1:
+      path: openproblems/tasks/dimensionality_reduction/methods/pymde.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+  arguments:
+    - name: --embed_method
+      type: string
+      description: The method to use for embedding. Options are 'umap' and 'tsne'.
+      default: neighbors
+      choices: [ neighbors, distances ]
+    - name: --n_hvg
+      type: integer
+      description: Number of highly variable genes to subset to. If not specified, the input matrix will not be subset.
+    - name: --n_pca_dims
+      type: integer
+      description: Number of principal components to use for the initial PCA step.
+      default: 100
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: pymde
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/dimensionality_reduction/methods/pymde/script.py b/src/tasks/dimensionality_reduction/methods/pymde/script.py
new file mode 100644
index 0000000000..612582d8c3
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/pymde/script.py
@@ -0,0 +1,59 @@
+import anndata as ad
+import scanpy as sc
+import pymde
+
+## VIASH START
+par = {
+    "input": "resources_test/dimensionality_reduction/pancreas/dataset.h5ad",
+    "output": "reduced.h5ad",
+    "embed_method": "neighbors",
+    "n_hvg": 1000,
+    "n_pca_dims": 50,
+}
+meta = {
+    "functionality_name": "foo",
+}
+## VIASH END
+
+if par["embed_method"] == "neighbors":
+    mde_fn = pymde.preserve_neighbors
+elif par["embed_method"] == "distances":
+    mde_fn = pymde.preserve_distances
+else:
+    raise ValueError(f"Unknown embedding method: {par['embed_method']}")
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+print(f"Compute PCA", flush=True)
+X_pca = sc.tl.pca(X_mat, n_comps=par["n_pca_dims"], svd_solver="arpack")
+
+print(f"Run MDE", flush=True)
+X_emb = (
+    mde_fn(X_pca, embedding_dim=2, verbose=True)
+    .embed(verbose=True)
+    .detach()
+    .numpy()
+)
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/methods/simlr/config.vsh.yaml b/src/tasks/dimensionality_reduction/methods/simlr/config.vsh.yaml
new file mode 100644
index 0000000000..ba4b7b3b84
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/simlr/config.vsh.yaml
@@ -0,0 +1,57 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: simlr
+
+  info:
+    label: SIMLR
+    summary: Multikernel-based learning of distance metrics from gene expression data for dimension reduction, clustering and visulaization.
+    description: |
+      Single-cell Interpretation via Multikernel LeaRning (SIMLR) learns cell-to-cell similarity measures from single-cell RNA-seq data in using Gaussian kernels with various hyperparameters in order to perform dimension reduction, clustering and visualization. 
+      SIMLR assumes that if C separable populations exist among the N cells, then the similarity matrix should have an approximate block-diagonal structure with C blocks whereby cells have larger similarities to other cells within the same subpopulations. Learned similarity between two cells should be small if the Euclidean distance between them is large. The cell-to-cell similarity is computed using an optimization framework over an N x N similarity matrix, a low-dimensional auxilary matrix enforcing low rank constraint on the similarity matrix, and the kernel weights. 
+      Dimension reduction is achieved by the stochastic neighbor embedding methodology with the learned similarities as input. 
+    preferred_normalization: log_cp10k
+    reference: "wang2017visualization"
+    documentation_url: https://github.com/BatzoglouLabSU/SIMLR/blob/SIMLR/README.md
+    repository_url: https://github.com/BatzoglouLabSU/SIMLR
+
+  arguments:
+    - name: "--n_dim"
+      type: integer
+      description: Number of dimensions.
+    - name: "--n_clusters"
+      type: integer
+      description: Number of clusters to be estimated over the input dataset.
+    - name: "--tuning_param"
+      type: integer
+      default: 10
+      description: Number of dimensions.
+    - name: "--impute"
+      type: boolean
+      default: false
+      description: Should the input data be transposed?
+    - name: "--normalize"
+      type: boolean
+      default: false
+      description: Should the input data be normalized?
+    - name: "--cores_ratio"
+      type: integer
+      default: 1
+      description: Ratio of the number of cores to be used when computing the multi-kernel.
+
+  resources:
+    - type: r_script
+      path: script.R
+ 
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        packages: [ grDevices ]
+        cran: [ Matrix, parallel, Rcpp, pracma, RcppAnnoy, RSpectra, igraph ]
+        bioc: [ SIMLR ]
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, highmem, midcpu]
diff --git a/src/tasks/dimensionality_reduction/methods/simlr/script.R b/src/tasks/dimensionality_reduction/methods/simlr/script.R
new file mode 100644
index 0000000000..0622076c08
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/simlr/script.R
@@ -0,0 +1,69 @@
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("SIMLR", quietly = TRUE)
+
+## VIASH START
+par <- list(
+  input = "resources_test/dimensionality_reduction/pancreas/dataset.h5ad",
+  output = "output.h5ad",
+  n_clusters = NULL,
+  n_dim = NA,
+  tuning_param = 10,
+  impute = FALSE,
+  normalize = FALSE,
+  cores_ratio = 1
+)
+meta <- list(
+  functionality_name = "simlr"
+)
+## VIASH END
+
+cat("Reading input files\n")
+input <- anndata::read_h5ad(par$input)
+
+X <- t(as.matrix(input$layers[["normalized"]]))
+
+if (is.null(par$n_clusters)) {
+  cat("Estimating the number of clusters\n")
+  set.seed(1)
+  NUMC = 2:5
+  estimates <- SIMLR::SIMLR_Estimate_Number_of_Clusters(
+    X = X,
+    NUMC = NUMC,
+    cores.ratio = par$cores_ratio
+  )
+  n_clusters <- NUMC[which.min(estimates$K2)]
+} else {
+  n_clusters <- par$n_clusters
+}
+
+if (is.null(par$n_dim)) {
+  n_dim <- NA
+} else {
+  n_dim <- par$n_dim
+}
+
+cat("Running SIMLR\n")
+simlr_result <- SIMLR::SIMLR(
+  X = X,
+  c = n_clusters,
+  no.dim = n_dim,
+  k = par$tuning_param,
+  if.impute = par$impute,
+  normalize = par$normalize,
+  cores.ratio = par$cores_ratio
+)
+obsm_X_emb <- simlr_result$ydata
+
+cat("Write output AnnData to file\n")
+output <- anndata::AnnData(
+  uns = list(
+    dataset_id = input$uns[["dataset_id"]],
+    method_id = meta$functionality_name,
+    normalization_id = input$uns[["normalization_id"]]
+  ),
+  obsm = list(
+    X_emb = obsm_X_emb
+  ),
+  shape = input$shape
+)
+output$write_h5ad(par$output, compression = "gzip")
diff --git a/src/tasks/dimensionality_reduction/methods/tsne/config.vsh.yaml b/src/tasks/dimensionality_reduction/methods/tsne/config.vsh.yaml
new file mode 100644
index 0000000000..cedaba0484
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/tsne/config.vsh.yaml
@@ -0,0 +1,49 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "tsne"
+  info:
+    label: t-SNE
+    summary: "Minimizing Kullback-Leibler divergence by converting similarities into joint probabilities between data points and the low/high dimensional embedding."
+    description: |
+      t-distributed Stochastic Neighbor Embedding converts similarities
+      between data points to joint probabilities and tries to minimize the
+      Kullback-Leibler divergence between the joint probabilities of the
+      low-dimensional embedding and the high-dimensional data. We use the
+      implementation in the scanpy package with the result of PCA on the logCPM
+      expression matrix (with and without HVG selection).
+    reference: vandermaaten2008visualizing
+    repository_url: "https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html#sklearn.manifold.TSNE"
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html#sklearn.manifold.TSNE"
+    v1:
+      path: openproblems/tasks/dimensionality_reduction/methods/tsne.py
+      commit: 154ccb9fd99113f3d28d9c3f139194539a0290f9
+    preferred_normalization: log_cp10k
+    variants:
+      tsne_logCP10k:
+      tsne_logCP10k_1kHVG:
+        n_hvg: 1000
+  arguments:
+    - name: "--n_hvg"
+      type: integer
+      description: Number of highly variable genes to subset to. If not specified, the input matrix will not be subset.
+    - name: "--n_pca_dims"
+      type: integer
+      description: Number of PCA dimensions to use. If not specified, no PCA will be performed.
+      default: 50
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: apt
+        packages:
+          - cmake
+          - gcc
+      - type: python
+        github:
+          - DmitryUlyanov/Multicore-TSNE
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/dimensionality_reduction/methods/tsne/script.py b/src/tasks/dimensionality_reduction/methods/tsne/script.py
new file mode 100644
index 0000000000..171e17bded
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/tsne/script.py
@@ -0,0 +1,47 @@
+import anndata as ad
+import scanpy as sc
+
+## VIASH START
+par = {
+    "input": "resources_test/dimensionality_reduction/pancreas/train.h5ad",
+    "output": "reduced.h5ad",
+    "n_pca_dims": 50,
+    "n_hvg": 1000
+}
+meta = {
+    "functionality_name": "foo",
+}
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Subsetting to {par['n_hvg']} HVG", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+print("Computing PCA", flush=True)
+input.obsm["X_pca"] = sc.tl.pca(X_mat, n_comps=par["n_pca_dims"], svd_solver="arpack")
+
+print("Run t-SNE", flush=True)
+sc.tl.tsne(input, use_rep="X_pca", n_pcs=par["n_pca_dims"])
+X_emb = input.obsm["X_tsne"].copy()
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/methods/umap/config.vsh.yaml b/src/tasks/dimensionality_reduction/methods/umap/config.vsh.yaml
new file mode 100644
index 0000000000..a073e9dbe3
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/umap/config.vsh.yaml
@@ -0,0 +1,50 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "umap"
+  info:
+    label: UMAP
+    summary: "A manifold learning algorithm that utilizes topological data analysis for dimension reduction."
+    description: |
+      Uniform Manifold Approximation and Projection is an algorithm for
+      dimension reduction based on manifold learning techniques and ideas from
+      topological data analysis. We perform UMAP on the logCPM expression matrix
+      before and after HVG selection and with and without PCA as a pre-processing
+      step.
+    reference : "mcinnes2018umap"
+    repository_url: "https://github.com/lmcinnes/umap"
+    documentation_url: "https://github.com/lmcinnes/umap#readme"
+    v1:
+      path: openproblems/tasks/dimensionality_reduction/methods/umap.py
+      commit: 14d70b330cae09527a6d4c4e552db240601e31cf
+    preferred_normalization: log_cp10k
+    variants:
+      umap_logCP10k:
+      umap_pca_logCP10k:
+        n_pca_dims: 50
+      umap_logCP10k_1kHVG:
+        n_hvg: 1000
+      umap_pca_logCP10k_1kHVG:
+        n_pca_dims: 50
+        n_hvg: 1000
+  arguments:
+    - name: "--n_hvg"
+      type: integer
+      description: Number of highly variable genes to subset to. If not specified, the input matrix will not be subset.
+      default: 1000
+    - name: "--n_pca_dims"
+      type: integer
+      description: Number of PCA dimensions to use. If not specified, no PCA will be performed.
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages:
+          - umap-learn
+          - pynndescent==0.5.11
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/dimensionality_reduction/methods/umap/script.py b/src/tasks/dimensionality_reduction/methods/umap/script.py
new file mode 100644
index 0000000000..800e65328c
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/methods/umap/script.py
@@ -0,0 +1,54 @@
+import anndata as ad
+from umap import UMAP
+import scanpy as sc
+
+## VIASH START
+par = {
+    "input": "resources_test/dimensionality_reduction/pancreas/train.h5ad",
+    "output": "reduced.h5ad",
+    "n_pca_dims": 50,
+    "n_hvg": 1000
+}
+meta = {
+    "functionality_name": "foo",
+}
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+if par["n_pca_dims"]:
+    print("Apply PCA to normalized data", flush=True)
+    umap_input = sc.tl.pca(
+        X_mat,
+        n_comps=par["n_pca_dims"],
+        svd_solver="arpack"
+    )
+else:
+    print("Use normalized data as input for UMAP", flush=True)
+    umap_input = X_mat
+
+print("Run UMAP", flush=True)
+X_emb = UMAP(densmap=False, random_state=42).fit_transform(umap_input)
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/metrics/clustering_performance/config.vsh.yaml b/src/tasks/dimensionality_reduction/metrics/clustering_performance/config.vsh.yaml
new file mode 100644
index 0000000000..67f1078f13
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/metrics/clustering_performance/config.vsh.yaml
@@ -0,0 +1,61 @@
+__merge__: ../../api/comp_metric.yaml
+
+functionality:
+  name: clustering_performance
+  info:
+    metrics:
+      - name: normalized_mutual_information
+        label: NMI
+        summary: Normalized Mutual Information (NMI) is a measure of the concordance between clustering obtained from the reduced-dimensional embeddings and the cell labels.
+        description: |
+          The Normalized Mutual Information (NMI) is a measure of the similarity between cluster labels obtained from the clustering of dimensionality reduction embeddings and the true cell labels. It is a normalization of the Mutual Information (MI) score to scale the results between 0 (no mutual information) and 1 (perfect correlation). 
+          Mutual Information quantifies the "amount of information" obtained about one random variable by observing the other random variable. Assuming two label assignments X and Y, it is given by: 
+            $MI(X,Y) = \sum_{x=1}^{X}\sum_{y=1}^{Y}p(x,y)log(\frac{P(x,y)}{P(x)P'(y)})$, 
+          where P(x,y) is the joint probability mass function of X and Y, and P(x), P'(y) are the marginal probability mass functions of X and Y respectively. The mutual information is normalized by some generalized mean of H(X) and H(Y). Therefore, Normalized Mutual Information can be defined as: 
+            $NMI(X,Y) = \frac{MI(X,Y)}{mean(H(X),H(Y))}$, 
+          where H(X) and H(Y) are the entropies of X and Y respectively. Higher NMI score suggests that the method is effective in preserving relevant information.
+        reference: emmons2016analysis
+        documentation_url: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.normalized_mutual_info_score.html
+        repository_url: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.normalized_mutual_info_score.html
+        min: 0
+        max: 1
+        maximize: true
+      - name: adjusted_rand_index
+        label: ARI
+        summary: Adjusted Rand Index (ARI) is a measure of the similarities between two cluster assignments of the reduced-dimensional embeddings and the true cell types.
+        description: |
+          Adjusted Rand Index (ARI) is a measure of similarity between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted (from the reduced dimensional embeddings) and true clusterings (cell type labels). It is the Rand Index (RI) adjusted for chance.
+          Assuming the C as the cell type labels and K as the clustering of the reduced dimensional embedding, Rand Index can be defined as:
+            $RI = \frac{a + b}{{C}_{2}^{n_{samples}}}$,
+          where 'a' is the number of pairs of elements that are in the same set in C and in the same set in K, 'b' is the number of pairs of elements that are in different sets in C and in different sets in K, and ${C}_{2}^{n_{samples}}$ is the total number of possible pairs in the dataset. Random label assignments can be discounted as follows: 
+            $ARI = \frac{RI - E[RI]}{max(RI) - E[RI]}$, 
+          where E[RI] is the expected RI of random labellings.
+        reference: santos2009on
+        documentation_url: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html#sklearn.metrics.adjusted_rand_score
+        repository_url: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html#sklearn.metrics.adjusted_rand_score
+        min: 0
+        max: 1
+        maximize: true
+
+  # Component-specific parameters
+  arguments:
+    - name: "--nmi_avg_method"
+      type: string
+      default: arithmetic
+      description: Method to compute normalizer in the denominator for normalized mutual information score calculation. 
+      choices: [ min, geometric, arithmetic, max ]
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: [ scikit-learn, scanpy, leidenalg ]
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/dimensionality_reduction/metrics/clustering_performance/script.py b/src/tasks/dimensionality_reduction/metrics/clustering_performance/script.py
new file mode 100644
index 0000000000..eff2d5cd97
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/metrics/clustering_performance/script.py
@@ -0,0 +1,63 @@
+import anndata as ad
+import scanpy as sc
+from sklearn.cluster import KMeans
+from sklearn.metrics import normalized_mutual_info_score
+from sklearn.metrics import adjusted_rand_score
+
+## VIASH START
+par = {
+  'input_embedding': 'resources_test/dimensionality_reduction/pancreas/embedding.h5ad',
+  'input_solution': 'resources_test/dimensionality_reduction/pancreas/solution.h5ad',
+  'output': 'output.h5ad',
+  'nmi_avg_method': 'arithmetic'
+}
+meta = {
+  'functionality_name': 'clustering_performance'
+}
+## VIASH END
+
+print('Reading input files', flush=True)
+input_embedding = ad.read_h5ad(par['input_embedding'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Compute metrics', flush=True)
+
+# Perform Leiden clustering on dimensionlity reduction embedding
+n = 20
+resolutions = [2 * x / n for x in range(1, n + 1)]
+score_max = 0
+res_max = resolutions[0]
+key_max = None
+score_all = []
+
+if "neighbors" not in input_embedding.uns:
+  sc.pp.neighbors(input_embedding, use_rep="X_emb")
+
+for res in resolutions:
+  key_added = f"X_emb_leiden_{res}"
+  sc.tl.leiden(input_embedding, resolution=res, key_added=key_added)
+  score = normalized_mutual_info_score(input_solution.obs["cell_type"], input_embedding.obs[key_added], average_method = par['nmi_avg_method'])
+  score_all.append(score)
+
+  if score_max < score:
+    score_max = score
+    res_max = res
+    key_max = key_added
+
+# Compute NMI scores
+nmi = normalized_mutual_info_score(input_solution.obs["cell_type"], input_embedding.obs[key_max], average_method = par['nmi_avg_method'])
+
+# Compute ARI scores
+ari = adjusted_rand_score(input_solution.obs["cell_type"], input_embedding.obs[key_max])
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  uns={
+    'dataset_id': input_embedding.uns['dataset_id'],
+    'normalization_id': input_embedding.uns['normalization_id'],
+    'method_id': input_embedding.uns['method_id'],
+    'metric_ids': [ 'normalized_mutual_information', 'adjusted_rand_index' ],
+    'metric_values': [ nmi, ari ]
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/dimensionality_reduction/metrics/coranking/config.vsh.yaml b/src/tasks/dimensionality_reduction/metrics/coranking/config.vsh.yaml
new file mode 100644
index 0000000000..6787e88f7e
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/metrics/coranking/config.vsh.yaml
@@ -0,0 +1,166 @@
+__merge__: ../../api/comp_metric.yaml
+functionality:
+  name: "coranking"
+  # description: |
+  #   This is a set of metrics which all use a co-ranking matrix as the basis of the metric.
+  info:
+    metrics:
+      - name: continuity_at_k30
+        label: Continuity at k=30
+        reference: venna2006local
+        summary: "The continuity metric at k=30 computed on the co-ranking matrix between expression matrix and embedding."
+        description: "The continuity metric at k=30 computed on the co-ranking matrix between expression matrix and embedding."
+        repository_url: https://github.com/gdkrmr/coRanking/
+        documentation_url: https://coranking.guido-kraemer.com/
+        min: 0
+        max: 1
+        maximize: true
+        v1:
+          path: openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py 
+          commit: e3be930c6d4bbd656ab1e656badb52bb50e6cdd6
+          note: |
+            The original v1 implementations consisted of a lot of helper functions which were 
+            derived from the pyDRMetrics package. This version uses the coRanking package
+            to avoid reimplementing and potentially introducing a lot of bugs in how
+            the various metrics are computed.
+
+            In addition, the references for each of the metrics were looked up to
+            properly attribute the original authors of each of the metrics.
+      - name: trustworthiness_at_k30
+        label: Trustworthiness at k=30
+        summary: "The trustworthiness metric at k=30 computed on the co-ranking matrix between expression matrix and embedding."
+        description: "The trustworthiness metric at k=30 computed on the co-ranking matrix between expression matrix and embedding."
+        repository_url: https://github.com/gdkrmr/coRanking/
+        documentation_url: https://coranking.guido-kraemer.com/
+        reference: venna2006local
+        min: 0
+        max: 1
+        maximize: true
+        v1:
+          path: openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py 
+          commit: e3be930c6d4bbd656ab1e656badb52bb50e6cdd6
+          note: |
+            The original v1 implementations consisted of a lot of helper functions which were 
+            derived from the pyDRMetrics package. This version uses the coRanking package
+            to avoid reimplementing and potentially introducing a lot of bugs in how
+            the various metrics are computed.
+
+            In addition, the references for each of the metrics were looked up to
+            properly attribute the original authors of each of the metrics.
+      - name: qnx_at_k30
+        label: The value for QNX at k=30
+        summary: "The QNX metric at k=30 computed on the co-ranking matrix between expression matrix and embedding."
+        description: "The QNX metric at k=30 computed on the co-ranking matrix between expression matrix and embedding."
+        repository_url: https://github.com/gdkrmr/coRanking/
+        documentation_url: https://coranking.guido-kraemer.com/
+        reference: lee2009quality
+        min: 0
+        max: 1
+        maximize: true
+        v1:
+          path: openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py 
+          commit: e3be930c6d4bbd656ab1e656badb52bb50e6cdd6
+          note: |
+            The original v1 implementations consisted of a lot of helper functions which were 
+            derived from the pyDRMetrics package. This version uses the coRanking package
+            to avoid reimplementing and potentially introducing a lot of bugs in how
+            the various metrics are computed.
+
+            In addition, the references for each of the metrics were looked up to
+            properly attribute the original authors of each of the metrics.
+      - name: lcmc_at_k30
+        label: The value for LCMC at k=30
+        summary: "The LCMC metric at k=30 computed on the co-ranking matrix between expression matrix and embedding."
+        description: "The LCMC metric at k=30 computed on the co-ranking matrix between expression matrix and embedding."
+        repository_url: https://github.com/gdkrmr/coRanking/
+        documentation_url: https://coranking.guido-kraemer.com/
+        reference: chen2009local
+        min: 0
+        max: 1
+        maximize: true
+        v1:
+          path: openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py 
+          commit: e3be930c6d4bbd656ab1e656badb52bb50e6cdd6
+          note: |
+            The original v1 implementations consisted of a lot of helper functions which were 
+            derived from the pyDRMetrics package. This version uses the coRanking package
+            to avoid reimplementing and potentially introducing a lot of bugs in how
+            the various metrics are computed.
+
+            In addition, the references for each of the metrics were looked up to
+            properly attribute the original authors of each of the metrics.
+      - name: qnx_auc
+        label: Area under the QNX curve
+        summary: "The AU-QNX metric at k=30 computed on the co-ranking matrix between expression matrix and embedding."
+        description: "The AU-QNX metric at k=30 computed on the co-ranking matrix between expression matrix and embedding."
+        repository_url: https://github.com/gdkrmr/coRanking/
+        documentation_url: https://coranking.guido-kraemer.com/
+        reference: lueks2011evaluate
+        min: 0
+        max: 1
+        maximize: true
+        v1:
+          path: openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py 
+          commit: e3be930c6d4bbd656ab1e656badb52bb50e6cdd6
+          note: |
+            The original v1 implementations consisted of a lot of helper functions which were 
+            derived from the pyDRMetrics package. This version uses the coRanking package
+            to avoid reimplementing and potentially introducing a lot of bugs in how
+            the various metrics are computed.
+
+            In addition, the references for each of the metrics were looked up to
+            properly attribute the original authors of each of the metrics.
+      - name: qlocal
+        label: Local quality measure
+        summary: "The local quality metric computed on the co-ranking matrix between expression matrix and embedding."
+        description: "The local quality metric computed on the co-ranking matrix between expression matrix and embedding."
+        repository_url: https://github.com/gdkrmr/coRanking/
+        documentation_url: https://coranking.guido-kraemer.com/
+        reference: lueks2011evaluate
+        min: 0
+        max: 1
+        maximize: true
+        v1:
+          path: openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py 
+          commit: e3be930c6d4bbd656ab1e656badb52bb50e6cdd6
+          note: |
+            The original v1 implementations consisted of a lot of helper functions which were 
+            derived from the pyDRMetrics package. This version uses the coRanking package
+            to avoid reimplementing and potentially introducing a lot of bugs in how
+            the various metrics are computed.
+
+            In addition, the references for each of the metrics were looked up to
+            properly attribute the original authors of each of the metrics.
+      - name: qglobal
+        label: Global quality measure
+        summary: "The Global quality metric computed on the co-ranking matrix between expression matrix and embedding."
+        description: "The Global quality metric computed on the co-ranking matrix between expression matrix and embedding."
+        repository_url: https://github.com/gdkrmr/coRanking/
+        documentation_url: https://coranking.guido-kraemer.com/
+        reference: lueks2011evaluate
+        min: 0
+        max: 1
+        maximize: true
+        v1:
+          path: openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py 
+          commit: e3be930c6d4bbd656ab1e656badb52bb50e6cdd6
+          note: |
+            The original v1 implementations consisted of a lot of helper functions which were 
+            derived from the pyDRMetrics package. This version uses the coRanking package
+            to avoid reimplementing and potentially introducing a lot of bugs in how
+            the various metrics are computed.
+
+            In addition, the references for each of the metrics were looked up to
+            properly attribute the original authors of each of the metrics.
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [ coRanking ]
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, midcpu]
diff --git a/src/tasks/dimensionality_reduction/metrics/coranking/script.R b/src/tasks/dimensionality_reduction/metrics/coranking/script.R
new file mode 100644
index 0000000000..7fcce8c2f8
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/metrics/coranking/script.R
@@ -0,0 +1,101 @@
+library(anndata)
+library(coRanking)
+
+## VIASH START
+par <- list(
+  "input_embedding" = "resources_test/dimensionality_reduction/pancreas/reduced.h5ad",
+  "input_solution" = "resources_test/dimensionality_reduction/pancreas/test.h5ad",
+  "output" = "score.h5ad"
+)
+## VIASH END
+
+cat("Read anndata objects")
+input_solution <- anndata::read_h5ad(par[["input_solution"]])
+input_embedding <- anndata::read_h5ad(par[["input_embedding"]])
+
+# get datasets
+high_dim <- input_solution$layers[["normalized"]]
+X_emb <- input_embedding$obsm[["X_emb"]]
+
+if (any(is.na(X_emb))) {
+  continuity_at_k30 <-
+    trustworthiness_at_k30 <-
+    qnx_at_k30 <-
+    lcmc_at_k30 <-
+    qnx_auc <-
+    qlocal <-
+    qglobal <-
+    0
+} else {
+  cat("Compute pairwise distances\n")
+  # TODO: computing a square distance matrix is problematic for large datasets!
+  # TODO: should we use a different distance metric for the high_dim?
+  # TODO: or should we subset to the HVG?
+  dist_highdim <- coRanking:::euclidean(as.matrix(high_dim))
+  dist_emb <- coRanking:::euclidean(as.matrix(X_emb))
+
+  cat("Compute ranking matrices\n")
+  rmat_highdim <- rankmatrix(dist_highdim, input = "dist")
+  rmat_emb <- rankmatrix(dist_emb, input = "dist")
+
+  cat("Compute coranking matrix\n")
+  corank <- coranking(rmat_highdim, rmat_emb, "rank")
+
+  cat("Compute metrics\n")
+  # Compute QNX. This is a curve indicating the percentage of points
+  # that are mild in- and extrusions or keep their rank.
+  qnx <- Q_NX(corank)
+
+  # Calculate the local continuity meta-criterion from a co-ranking matrix.
+  lcmc <- LCMC(corank)
+
+  # the values of qnx are split into local and global values by kmax
+  kmax <- which.max(lcmc)
+
+  # check certain quality values at k=30
+  k30 <- 30
+  trustworthiness_at_k30 <- coRanking:::cm.M_T(corank, k30)
+  continuity_at_k30 <- coRanking:::cm.M_C(corank, k30)
+  qnx_at_k30 <- qnx[[k30]]
+  lcmc_at_k30 <- lcmc[[k30]]
+
+  # area under the QNX curve
+  qnx_auc <- mean(qnx)
+
+  # local quality measure
+  qlocal <- mean(qnx[seq_len(kmax)])
+
+  # global quality measure
+  qglobal <- mean(qnx[-seq_len(kmax)])
+}
+
+cat("construct output AnnData\n")
+output <- AnnData(
+  shape = c(0L, 0L),
+  uns = list(
+    dataset_id = input_solution$uns[["dataset_id"]],
+    normalization_id = input_solution$uns[["normalization_id"]],
+    method_id = input_embedding$uns[["method_id"]],
+    metric_ids = c(
+      "continuity_at_k30",
+      "trustworthiness_at_k30",
+      "qnx_at_k30",
+      "lcmc_at_k30",
+      "qnx_auc",
+      "qlocal",
+      "qglobal"
+    ),
+    metric_values = c(
+      continuity_at_k30,
+      trustworthiness_at_k30,
+      qnx_at_k30,
+      lcmc_at_k30,
+      qnx_auc,
+      qlocal,
+      qglobal
+    )
+  )
+)
+
+cat("Write to file\n")
+output$write_h5ad(par$output)
diff --git a/src/tasks/dimensionality_reduction/metrics/density_preservation/config.vsh.yaml b/src/tasks/dimensionality_reduction/metrics/density_preservation/config.vsh.yaml
new file mode 100644
index 0000000000..4b1e9f3a32
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/metrics/density_preservation/config.vsh.yaml
@@ -0,0 +1,43 @@
+__merge__: ../../api/comp_metric.yaml
+functionality:
+  name: "density_preservation"
+  info:
+    metrics:
+      - name: density_preservation
+        label: Density preservation
+        summary: "Similarity between local densities in the high-dimensional data and the reduced data."
+        description: |
+          "Similarity between local densities in the high-dimensional data and the reduced data.
+          This is computed as the pearson correlation of local radii with the local radii in the original data space."
+        reference: narayan2021assessing
+        min: -1
+        max: 1
+        maximize: true
+        v1:
+          path: openproblems/tasks/dimensionality_reduction/metrics/density.py 
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+  arguments:
+    - name: "--n_neighbors"
+      type: integer
+      default: 30
+      description: "Number of neighbors to use for density estimation."
+    - name: "--seed"
+      type: integer
+      default: 42
+      description: "Random seed."
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages:
+          - scipy
+          - numpy
+          - umap-learn
+          - pynndescent~=0.5.11
+  - type: nextflow
+    directives: 
+      label: [midtime, lowmem, midcpu]
diff --git a/src/tasks/dimensionality_reduction/metrics/density_preservation/script.py b/src/tasks/dimensionality_reduction/metrics/density_preservation/script.py
new file mode 100644
index 0000000000..9bf44397c2
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/metrics/density_preservation/script.py
@@ -0,0 +1,132 @@
+
+
+import anndata as ad
+import numpy as np
+from typing import Optional
+from umap import UMAP
+from scipy.stats import pearsonr
+
+## VIASH START
+par = {
+    "input_embedding": "resources_test/dimensionality_reduction/pancreas/reduced.h5ad",
+    "input_solution": "resources_test/dimensionality_reduction/pancreas/test.h5ad",
+    "output": "score.h5ad",
+    "n_neighbors": 30,
+    "seed": 42,
+}
+## VIASH END
+
+# Interpreted from:
+# https://github.com/lmcinnes/umap/blob/317ce81dc64aec9e279aa1374ac809d9ced236f6/umap/umap_.py#L1190-L1243
+#
+# Author: Leland McInnes <leland.mcinnes@gmail.com>
+#
+# License: BSD 3 clause
+def _calculate_radii(
+    X: np.ndarray,
+    n_neighbors: int = 30,
+    random_state: Optional[int] = None
+) -> np.ndarray:
+    from umap.umap_ import fuzzy_simplicial_set
+    from umap.umap_ import nearest_neighbors
+
+    (knn_indices, knn_dists, _) = nearest_neighbors(
+        X,
+        n_neighbors,
+        "euclidean",
+        {},
+        False,
+        random_state,
+        verbose=False,
+    )
+
+    emb_graph, _, _, emb_dists = fuzzy_simplicial_set(
+        X,
+        n_neighbors,
+        random_state,
+        "euclidean",
+        {},
+        knn_indices,
+        knn_dists,
+        verbose=False,
+        return_dists=True,
+    )
+
+    emb_graph = emb_graph.tocoo()
+    emb_graph.sum_duplicates()
+    emb_graph.eliminate_zeros()
+
+    n_vertices = emb_graph.shape[1]
+
+    mu_sum = np.zeros(n_vertices, dtype=np.float32)
+    re = np.zeros(n_vertices, dtype=np.float32)
+
+    head = emb_graph.row
+    tail = emb_graph.col
+    for i in range(len(head)):
+        j = head[i]
+        k = tail[i]
+        D = emb_dists[j, k]
+        mu = emb_graph.data[i]
+        re[j] += mu * D
+        re[k] += mu * D
+        mu_sum[j] += mu
+        mu_sum[k] += mu
+
+    epsilon = 1e-8
+    return np.log(epsilon + (re / mu_sum))
+
+def compute_density_preservation(
+    X_emb: np.ndarray,
+    high_dim: np.ndarray,
+    n_neighbors: int = 30,
+    random_state: Optional[int] = None
+) -> float:
+    if np.any(np.isnan(X_emb)):
+        return 0.0
+    
+    print("Compute local radii in original data", flush=True)
+    ro = _calculate_radii(
+        high_dim,
+        n_neighbors=n_neighbors,
+        random_state=random_state
+    )
+
+    print("Compute local radii of embedding", flush=True)
+    re = _calculate_radii(
+        X_emb,
+        n_neighbors=n_neighbors,
+        random_state=random_state
+    )
+    
+    print("Compute pearson correlation", flush=True)
+    return pearsonr(ro, re)[0]
+
+
+print("Load data", flush=True)
+input_solution = ad.read_h5ad(par["input_solution"])
+input_embedding = ad.read_h5ad(par["input_embedding"])
+
+high_dim = input_solution.layers["normalized"]
+X_emb = input_embedding.obsm["X_emb"]
+
+density_preservation = compute_density_preservation(
+    X_emb=X_emb,
+    high_dim=high_dim,
+    n_neighbors=par["n_neighbors"],
+    random_state=par["seed"]
+)
+
+print("Create output AnnData object", flush=True)
+output = ad.AnnData(
+    uns={
+        "dataset_id": input_solution.uns["dataset_id"],
+        "normalization_id": input_solution.uns["normalization_id"],
+        "method_id": input_embedding.uns["method_id"],
+        "metric_ids": [ "density_preservation" ],
+        "metric_values": [ density_preservation ]
+    }
+)
+
+print("Write data to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/metrics/distance_correlation/config.vsh.yaml b/src/tasks/dimensionality_reduction/metrics/distance_correlation/config.vsh.yaml
new file mode 100644
index 0000000000..b08c93db2c
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/metrics/distance_correlation/config.vsh.yaml
@@ -0,0 +1,50 @@
+__merge__: ../../api/comp_metric.yaml
+functionality:
+  name: distance_correlation
+  info:
+    metrics:
+      - name: distance_correlation
+        label: Distance Correlation
+        summary: "Calculates the distance correlation by computing Spearman correlations between distances."
+        description: "Calculates the distance correlation by computing Spearman correlations between distances on the full (or processed) data matrix and the dimensionally-reduced matrix."
+        reference: kruskal1964mds
+        min: 0
+        max: "+.inf"
+        maximize: true
+        v1:
+          path: openproblems/tasks/dimensionality_reduction/metrics/distance_correlation.py 
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+          note: This metric was ported but will probably be removed soon.
+      - name: distance_correlation_spectral
+        label: Distance Correlation Spectral
+        summary: "Spearman correlation between all pairwise diffusion distances in the original and dimension-reduced data."
+        description: "Spearman correlation between all pairwise diffusion distances in the original and dimension-reduced data."
+        reference: coifman2006diffusion
+        min: 0
+        max: "+.inf"
+        maximize: true
+        v1:
+          path: openproblems/tasks/dimensionality_reduction/metrics/root_mean_square_error.py 
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+          note: This metric was ported but will probably be removed soon.
+  arguments:
+    - name: "--spectral"
+      type: boolean_true
+      description: Calculate the spectral root mean squared error.
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages:
+          - umap-learn
+          - scikit-learn
+          - numpy
+          - pynndescent~=0.5.11
+          - scipy
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, midcpu]
diff --git a/src/tasks/dimensionality_reduction/metrics/distance_correlation/script.py b/src/tasks/dimensionality_reduction/metrics/distance_correlation/script.py
new file mode 100644
index 0000000000..5d8e325126
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/metrics/distance_correlation/script.py
@@ -0,0 +1,59 @@
+import anndata as ad
+import numpy as np
+import sklearn.decomposition
+import scipy.stats
+import scipy.spatial
+from sklearn.metrics import pairwise_distances
+import umap
+import umap.spectral
+
+## VIASH START
+par = {
+    "input_embedding": "resources_test/dimensionality_reduction/pancreas/embedding.h5ad",
+    "input_solution": "resources_test/dimensionality_reduction/pancreas/solution.h5ad",
+    "output": "score.h5ad",
+}
+## VIASH END
+
+def _distance_correlation(X, X_emb):
+    high_dimensional_distance_vector = scipy.spatial.distance.pdist(X)
+    low_dimensional_distance_vector = scipy.spatial.distance.pdist(X_emb)
+    corr = scipy.stats.spearmanr(
+        low_dimensional_distance_vector, high_dimensional_distance_vector
+    )
+    return corr
+
+print("Load data", flush=True)
+input_solution = ad.read_h5ad(par["input_solution"])
+input_embedding = ad.read_h5ad(par["input_embedding"])
+
+high_dim = input_solution.layers["normalized"]
+X_emb = input_embedding.obsm["X_emb"]
+
+print("Compute NNLS residual after SVD", flush=True)
+n_svd = 500
+svd_emb = sklearn.decomposition.TruncatedSVD(n_svd).fit_transform(high_dim)
+dist_corr = _distance_correlation(svd_emb, X_emb).correlation
+
+#! Explicitly not changing it to use diffusion map method as this will have a positive effect on the diffusion map method for this specific metric.
+print("Compute NLSS residual after spectral embedding", flush=True)
+n_comps = min(1000, min(input_solution.shape) - 2)
+umap_graph = umap.UMAP(transform_mode="graph").fit_transform(high_dim)
+spectral_emb = umap.spectral.spectral_layout(
+    high_dim, umap_graph, n_comps, random_state=np.random.default_rng()
+)
+dist_corr_spectral = _distance_correlation(spectral_emb, X_emb).correlation
+
+print("Create output AnnData object", flush=True)
+output = ad.AnnData(
+    uns={
+        "dataset_id": input_solution.uns["dataset_id"],
+        "normalization_id": input_solution.uns["normalization_id"],
+        "method_id": input_embedding.uns["method_id"],
+        "metric_ids": [ "distance_correlation", "distance_correlation_spectral" ],
+        "metric_values": [ dist_corr, dist_corr_spectral ]
+    }
+)
+
+print("Write data to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/metrics/trustworthiness/config.vsh.yaml b/src/tasks/dimensionality_reduction/metrics/trustworthiness/config.vsh.yaml
new file mode 100644
index 0000000000..5f75fa8e26
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/metrics/trustworthiness/config.vsh.yaml
@@ -0,0 +1,31 @@
+__merge__: ../../api/comp_metric.yaml
+functionality:
+  name: "trustworthiness"
+  info:
+    metrics:
+      - name: trustworthiness
+        label: Trustworthiness at k=15
+        summary: "A measurement of similarity between the rank of each point's nearest neighbors in the high-dimensional data and the reduced data."
+        description: "A measurement of similarity between the rank of each point's nearest neighbors in the high-dimensional data and the reduced data."
+        reference: venna2006local
+        min: 0
+        max: 1
+        maximize: true
+        v1:
+          path: openproblems/tasks/dimensionality_reduction/metrics/trustworthiness.py 
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+          note: This metric is already included in the 'coranking' component and can be removed.
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages:
+          - scikit-learn
+          - numpy
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, lowcpu]
diff --git a/src/tasks/dimensionality_reduction/metrics/trustworthiness/script.py b/src/tasks/dimensionality_reduction/metrics/trustworthiness/script.py
new file mode 100644
index 0000000000..410a0b3263
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/metrics/trustworthiness/script.py
@@ -0,0 +1,37 @@
+import anndata as ad
+import numpy as np
+from sklearn import manifold
+
+## VIASH START
+par = {
+    "input_embedding": "resources_test/dimensionality_reduction/pancreas/reduced.h5ad",
+    "input_solution": "resources_test/dimensionality_reduction/pancreas/test.h5ad",
+    "output": "score.h5ad",
+}
+## VIASH END
+
+print("Load data", flush=True)
+input_solution = ad.read_h5ad(par["input_solution"])
+input_embedding = ad.read_h5ad(par["input_embedding"])
+
+high_dim = input_solution.layers["normalized"]
+X_emb = input_embedding.obsm["X_emb"]
+
+print("Reduce dimensionality of raw data", flush=True)
+trustworthiness = manifold.trustworthiness(
+    high_dim, X_emb, n_neighbors=15, metric="euclidean"
+)
+
+print("Create output AnnData object", flush=True)
+output = ad.AnnData(
+    uns={
+        "dataset_id": input_solution.uns["dataset_id"],
+        "normalization_id": input_solution.uns["normalization_id"],
+        "method_id": input_embedding.uns["method_id"],
+        "metric_ids": [ "trustworthiness" ],
+        "metric_values": [ trustworthiness ]
+    }
+)
+
+print("Write data to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/process_dataset/config.vsh.yaml b/src/tasks/dimensionality_reduction/process_dataset/config.vsh.yaml
new file mode 100644
index 0000000000..d6f62e0c7e
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/process_dataset/config.vsh.yaml
@@ -0,0 +1,13 @@
+__merge__: ../api/comp_process_dataset.yaml
+functionality:
+  name: "process_dataset"
+  resources:
+    - type: python_script
+      path: script.py
+    - path: /src/common/helper_functions/subset_anndata.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/dimensionality_reduction/process_dataset/script.py b/src/tasks/dimensionality_reduction/process_dataset/script.py
new file mode 100644
index 0000000000..9563ed56f0
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/process_dataset/script.py
@@ -0,0 +1,34 @@
+import sys
+import anndata as ad
+
+## VIASH START
+par = {
+    "input": "resources_test/common/pancreas/dataset.h5ad",
+    "output_dataset": "train.h5ad",
+    "output_solution": "test.h5ad",
+}
+meta = {
+    "functionality_name": "split_data",
+    "config": "src/tasks/dimensionality_reduction/process_dataset/.config.vsh.yaml"
+}
+## VIASH END
+
+# import helper functions
+sys.path.append(meta['resources_dir'])
+from subset_anndata import read_config_slots_info, subset_anndata
+
+print(">> Load Data", flush=True)
+adata = ad.read_h5ad(par["input"])
+
+print(">> Figuring out which data needs to be copied to which output file", flush=True)
+slot_info = read_config_slots_info(meta["config"])
+
+print(">> Creating train data", flush=True)
+output_dataset = subset_anndata(adata, slot_info["output_dataset"])
+
+print(">> Creating test data", flush=True)
+output_solution = subset_anndata(adata, slot_info["output_solution"])
+
+print(">> Writing", flush=True)
+output_dataset.write_h5ad(par["output_dataset"])
+output_solution.write_h5ad(par["output_solution"])
diff --git a/src/tasks/dimensionality_reduction/resources_scripts/process_datasets.sh b/src/tasks/dimensionality_reduction/resources_scripts/process_datasets.sh
new file mode 100755
index 0000000000..11e911edac
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/resources_scripts/process_datasets.sh
@@ -0,0 +1,34 @@
+#!/bin/bash
+
+cat > /tmp/params.yaml << 'HERE'
+id: dimensionality_reduction_process_datasets
+input_states: s3://openproblems-data/resources/datasets/**/state.yaml
+rename_keys: 'input:output_dataset'
+settings: '{"output_dataset": "$id/dataset.h5ad", "output_solution": "$id/solution.h5ad"}'
+output_state: "$id/state.yaml"
+publish_dir: s3://openproblems-data/resources/dimensionality_reduction/datasets
+HERE
+
+cat > /tmp/nextflow.config << HERE
+process {
+  executor = 'awsbatch'
+  withName:'.*publishStatesProc' {
+      memory = '16GB'
+      disk = '100GB'
+   }
+  withLabel:highmem {
+      memory = '350GB'
+   }
+}
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/dimensionality_reduction/workflows/process_datasets/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --entry-name auto \
+  --config /tmp/nextflow.config \
+  --labels dimensionality_reduction,process_datasets
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/resources_scripts/run_benchmark.sh b/src/tasks/dimensionality_reduction/resources_scripts/run_benchmark.sh
new file mode 100755
index 0000000000..5cf975d3b5
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/resources_scripts/run_benchmark.sh
@@ -0,0 +1,22 @@
+#!/bin/bash
+
+RUN_ID="run_$(date +%Y-%m-%d_%H-%M-%S)"
+publish_dir="s3://openproblems-data/resources/dimensionality_reduction/results/${RUN_ID}"
+
+cat > /tmp/params.yaml << HERE
+input_states: s3://openproblems-data/resources/dimensionality_reduction/datasets/**/state.yaml
+rename_keys: 'input_dataset:output_dataset,input_solution:output_solution'
+output_state: "state.yaml"
+publish_dir: "$publish_dir"
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/dimensionality_reduction/workflows/run_benchmark/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --entry-name auto \
+  --config src/wf_utils/labels_tw.config \
+  --labels dimensionality_reduction,full
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/resources_scripts/run_benchmark_test.sh b/src/tasks/dimensionality_reduction/resources_scripts/run_benchmark_test.sh
new file mode 100755
index 0000000000..be6defda0f
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/resources_scripts/run_benchmark_test.sh
@@ -0,0 +1,25 @@
+#!/bin/bash
+
+cat > /tmp/params.yaml << 'HERE'
+input_states: s3://openproblems-data/resources_test/dimensionality_reduction/**/state.yaml
+rename_keys: 'input_dataset:output_dataset,input_solution:output_solution'
+output_state: "state.yaml"
+publish_dir: s3://openproblems-nextflow/temp/dimensionality-reduction/
+HERE
+
+cat > /tmp/nextflow.config << HERE
+process {
+  executor = 'awsbatch'
+}
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/dimensionality_reduction/workflows/run_benchmark/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --entry-name auto \
+  --config /tmp/nextflow.config \
+  --labels dimensionality_reduction,test
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/resources_test_scripts/pancreas.sh b/src/tasks/dimensionality_reduction/resources_test_scripts/pancreas.sh
new file mode 100755
index 0000000000..03ec1659b6
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/resources_test_scripts/pancreas.sh
@@ -0,0 +1,55 @@
+#!/bin/bash
+#make sure the following command has been executed
+#viash ns build -q 'dimensionality_reduction|common'
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+RAW_DATA=resources_test/common
+DATASET_DIR=resources_test/dimensionality_reduction
+
+mkdir -p $DATASET_DIR
+
+# process dataset
+echo Running process_dataset
+nextflow run . \
+  -main-script target/nextflow/dimensionality_reduction/workflows/process_datasets/main.nf \
+  -profile docker \
+  -entry auto \
+  --input_states "$RAW_DATA/**/state.yaml" \
+  --rename_keys 'input:output_dataset' \
+  --settings '{"output_dataset": "$id/dataset.h5ad", "output_solution": "$id/solution.h5ad"}' \
+  --publish_dir "$DATASET_DIR" \
+  --output_state '$id/state.yaml'
+# output_state should be moved to settings once workaround is solved
+
+
+# run one method
+viash run src/tasks/dimensionality_reduction/methods/densmap/config.vsh.yaml -- \
+    --input $DATASET_DIR/pancreas/dataset.h5ad \
+    --output $DATASET_DIR/pancreas/embedding.h5ad
+
+# run one metric
+viash run src/tasks/dimensionality_reduction/metrics/distance_correlation/config.vsh.yaml -- \
+    --input_embedding $DATASET_DIR/pancreas/embedding.h5ad \
+    --input_solution $DATASET_DIR/pancreas/solution.h5ad \
+    --output $DATASET_DIR/pancreas/score.h5ad
+
+# # run benchmark
+# export NXF_VER=22.04.5
+
+# # after having added a split dataset component
+# nextflow \
+#   run . \
+#   -main-script src/tasks/dimensionality_reduction/workflows/run/main.nf \
+#   -profile docker \
+#   --id pancreas \
+#   --input_dataset $DATASET_DIR/dataset.h5ad \
+#   --input_solution $DATASET_DIR/solution.h5ad \
+#   --output scores.tsv \
+#   --publish_dir $DATASET_DIR/
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/workflows/process_datasets/config.vsh.yaml b/src/tasks/dimensionality_reduction/workflows/process_datasets/config.vsh.yaml
new file mode 100644
index 0000000000..d6aa723b00
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/workflows/process_datasets/config.vsh.yaml
@@ -0,0 +1,30 @@
+functionality:
+  name: "process_datasets"
+  namespace: "dimensionality_reduction/workflows"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input"
+          __merge__: "/src/tasks/dimensionality_reduction/api/file_common_dataset.yaml"
+          required: true
+          direction: input
+    - name: Outputs
+      arguments:
+        - name: "--output_dataset"
+          __merge__: /src/tasks/dimensionality_reduction/api/file_dataset.yaml
+          required: true
+          direction: output
+        - name: "--output_solution"
+          __merge__: /src/tasks/dimensionality_reduction/api/file_solution.yaml
+          required: true
+          direction: output
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - path: /src/wf_utils/helper.nf
+  dependencies:
+    - name: common/check_dataset_schema
+    - name: dimensionality_reduction/process_dataset
+platforms:
+  - type: nextflow
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/workflows/process_datasets/main.nf b/src/tasks/dimensionality_reduction/workflows/process_datasets/main.nf
new file mode 100644
index 0000000000..8d34f77e82
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/workflows/process_datasets/main.nf
@@ -0,0 +1,54 @@
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    | check_dataset_schema.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset": checks["exit_code"] == 0 ? state.input : null,
+        ]
+      }
+    )
+
+    // remove datasets which didn't pass the schema check
+    | filter { id, state ->
+      state.dataset != null
+    }
+
+    | process_dataset.run(
+      fromState: [input: "dataset"],
+      toState: [
+        output_dataset: "output_dataset",
+        output_solution: "output_solution"
+      ]
+    )
+
+    // only output the files for which an output file was specified
+    | setState(["output_dataset", "output_solution"])
+
+  emit:
+  output_ch
+}
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/workflows/process_datasets/run_test.sh b/src/tasks/dimensionality_reduction/workflows/process_datasets/run_test.sh
new file mode 100644
index 0000000000..d16cd7736f
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/workflows/process_datasets/run_test.sh
@@ -0,0 +1,25 @@
+#!/bin/bash
+
+# Run this prior to executing this script:
+# bin/viash_build -q 'batch_integration'
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+export NXF_VER=22.04.5
+
+nextflow run . \
+  -main-script target/nextflow/dimensionality_reduction/workflows/process_datasets/main.nf \
+  -profile docker \
+  -entry auto \
+  -c src/wf_utils/labels_ci.config \
+  --id run_test \
+  --input_states "resources_test/common/**/state.yaml" \
+  --rename_keys 'input:output_dataset' \
+  --settings '{"output_dataset": "dataset.h5ad", "output_solution": "solution.h5ad"}' \
+  --publish_dir "resources_test/dimensionality_reduction"
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/workflows/run_benchmark/config.vsh.yaml b/src/tasks/dimensionality_reduction/workflows/run_benchmark/config.vsh.yaml
new file mode 100644
index 0000000000..aa751624d6
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/workflows/run_benchmark/config.vsh.yaml
@@ -0,0 +1,82 @@
+functionality:
+  name: "run_benchmark"
+  namespace: "dimensionality_reduction/workflows"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input_dataset"
+          __merge__: "/src/tasks/dimensionality_reduction/api/file_dataset.yaml"
+          required: true
+          direction: input
+        - name: "--input_solution"
+          __merge__: "/src/tasks/dimensionality_reduction/api/file_solution.yaml"
+          required: true
+          direction: input
+    - name: Outputs
+      arguments:
+        - name: "--output_scores"
+          type: file
+          required: true
+          direction: output
+          description: A yaml file containing the scores of each of the methods
+          default: score_uns.yaml
+        - name: "--output_method_configs"
+          type: file
+          required: true
+          direction: output
+          default: method_configs.yaml
+        - name: "--output_metric_configs"
+          type: file
+          required: true
+          direction: output
+          default: metric_configs.yaml
+        - name: "--output_dataset_info"
+          type: file
+          required: true
+          direction: output
+          default: dataset_uns.yaml
+        - name: "--output_task_info"
+          type: file
+          required: true
+          direction: output
+          default: task_info.yaml
+    - name: Methods
+      arguments:
+        - name: "--method_ids"
+          type: string
+          multiple: true
+          description: A list of method ids to run. If not specified, all methods will be run.
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - type: file
+      path: "../../api/task_info.yaml"
+  dependencies:
+    - name: common/check_dataset_schema
+    - name: common/extract_metadata
+    - name: dimensionality_reduction/control_methods/random_features
+    - name: dimensionality_reduction/control_methods/spectral_features
+    - name: dimensionality_reduction/control_methods/true_features
+    - name: dimensionality_reduction/methods/densmap
+    - name: dimensionality_reduction/methods/diffusion_map
+    - name: dimensionality_reduction/methods/ivis
+    - name: dimensionality_reduction/methods/lmds
+    - name: dimensionality_reduction/methods/neuralee
+    - name: dimensionality_reduction/methods/pca
+    - name: dimensionality_reduction/methods/phate
+    - name: dimensionality_reduction/methods/pymde
+    - name: dimensionality_reduction/methods/simlr
+    - name: dimensionality_reduction/methods/tsne
+    - name: dimensionality_reduction/methods/umap
+    - name: dimensionality_reduction/metrics/clustering_performance
+    - name: dimensionality_reduction/metrics/coranking
+    - name: dimensionality_reduction/metrics/density_preservation
+    - name: dimensionality_reduction/metrics/distance_correlation
+    - name: dimensionality_reduction/metrics/trustworthiness
+  # test_resources:
+  #   - type: nextflow_script
+  #     path: main.nf
+  #     entrypoint: test_wf
+platforms:
+  - type: nextflow
\ No newline at end of file
diff --git a/src/tasks/dimensionality_reduction/workflows/run_benchmark/main.nf b/src/tasks/dimensionality_reduction/workflows/run_benchmark/main.nf
new file mode 100644
index 0000000000..1ba9251f9f
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/workflows/run_benchmark/main.nf
@@ -0,0 +1,210 @@
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // construct list of methods
+  methods = [
+    // controls
+    random_features,
+    spectral_features,
+    true_features,
+    // methods
+    densmap,
+    diffusion_map,
+    ivis,
+    lmds,
+    neuralee,
+    pca,
+    phate,
+    pymde,
+    simlr,
+    tsne,
+    umap
+  ]
+
+  // construct list of metrics
+  metrics = [
+    clustering_performance,
+    coranking,
+    density_preservation,
+    distance_correlation,
+    trustworthiness
+  ]
+
+
+  /****************************
+   * EXTRACT DATASET METADATA *
+   ****************************/
+  dataset_ch = input_ch
+    // store join id
+    | map{ id, state -> 
+      [id, state + ["_meta": [join_id: id]]]
+    }
+
+    // extract the dataset metadata
+    | extract_metadata.run(
+      fromState: [input: "input_solution"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+  /***************************
+   * RUN METHODS AND METRICS *
+   ***************************/
+  score_ch = dataset_ch
+
+    // run all methods
+    | runEach(
+      components: methods,
+      
+      // use the 'filter' argument to only run a method on the normalisation the component is asking for
+      filter: { id, state, comp ->
+        def norm = state.dataset_uns.normalization_id
+        def pref = comp.config.functionality.info.preferred_normalization
+        // if the preferred normalisation is none at all,
+        // we can pass whichever dataset we want
+        def norm_check = (norm == "log_cp10k" && pref == "counts") || norm == pref
+        def method_check = !state.method_ids || state.method_ids.contains(comp.config.functionality.name)
+
+        method_check && norm_check
+      },
+
+      // define a new 'id' by appending the method name to the dataset id
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: { id, state, comp ->
+        def new_args = [
+          input: state.input_dataset
+        ]
+        if (comp.config.functionality.info.type == "control_method") {
+          new_args.input_solution = state.input_solution
+        }
+        new_args
+      },
+
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          method_id: comp.config.functionality.name,
+          method_output: output.output
+        ]
+      }
+    )
+
+    // run all metrics
+    | runEach(
+      components: metrics,
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: { id, state, comp ->
+        [
+          input_solution: state.input_solution,
+          input_embedding: state.method_output
+        ]
+      },
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          metric_id: comp.config.functionality.name,
+          metric_output: output.output
+        ]
+      }
+    )
+
+  /******************************
+   * GENERATE OUTPUT YAML FILES *
+   ******************************/
+  // TODO: can we store everything below in a separate helper function?
+
+  // extract the dataset metadata
+  dataset_meta_ch = dataset_ch
+    // only keep one of the normalization methods
+    | filter{ id, state ->
+      state.dataset_uns.normalization_id == "log_cp10k"
+    }
+    | joinStates { ids, states ->
+      // store the dataset metadata in a file
+      def dataset_uns = states.collect{state ->
+        def uns = state.dataset_uns.clone()
+        uns.remove("normalization_id")
+        uns
+      }
+      def dataset_uns_yaml_blob = toYamlBlob(dataset_uns)
+      def dataset_uns_file = tempFile("dataset_uns.yaml")
+      dataset_uns_file.write(dataset_uns_yaml_blob)
+
+      ["output", [output_dataset_info: dataset_uns_file]]
+    }
+
+  output_ch = score_ch
+
+    // extract the scores
+    | extract_metadata.run(
+      key: "extract_scores",
+      fromState: [input: "metric_output"],
+      toState: { id, output, state ->
+        state + [
+          score_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | joinStates { ids, states ->
+      // store the method configs in a file
+      def method_configs = methods.collect{it.config}
+      def method_configs_yaml_blob = toYamlBlob(method_configs)
+      def method_configs_file = tempFile("method_configs.yaml")
+      method_configs_file.write(method_configs_yaml_blob)
+
+      // store the metric configs in a file
+      def metric_configs = metrics.collect{it.config}
+      def metric_configs_yaml_blob = toYamlBlob(metric_configs)
+      def metric_configs_file = tempFile("metric_configs.yaml")
+      metric_configs_file.write(metric_configs_yaml_blob)
+
+      def task_info_file = meta.resources_dir.resolve("task_info.yaml")
+
+      // store the scores in a file
+      def score_uns = states.collect{it.score_uns}
+      def score_uns_yaml_blob = toYamlBlob(score_uns)
+      def score_uns_file = tempFile("score_uns.yaml")
+      score_uns_file.write(score_uns_yaml_blob)
+
+      def new_state = [
+        output_method_configs: method_configs_file,
+        output_metric_configs: metric_configs_file,
+        output_task_info: task_info_file,
+        output_scores: score_uns_file,
+        _meta: states[0]._meta
+      ]
+
+      ["output", new_state]
+    }
+
+    // merge all of the output data 
+    | mix(dataset_meta_ch)
+    | joinStates{ ids, states ->
+      def mergedStates = states.inject([:]) { acc, m -> acc + m }
+      [ids[0], mergedStates]
+    }
+
+  emit:
+  output_ch
+}
diff --git a/src/tasks/dimensionality_reduction/workflows/run_benchmark/run_test.sh b/src/tasks/dimensionality_reduction/workflows/run_benchmark/run_test.sh
new file mode 100755
index 0000000000..4bd2b01008
--- /dev/null
+++ b/src/tasks/dimensionality_reduction/workflows/run_benchmark/run_test.sh
@@ -0,0 +1,29 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+DATASETS_DIR="resources_test/dimensionality_reduction"
+OUTPUT_DIR="output/temp"
+
+if [ ! -d "$OUTPUT_DIR" ]; then
+  mkdir -p "$OUTPUT_DIR"
+fi
+
+export NXF_VER=22.04.5
+nextflow run . \
+  -main-script target/nextflow/dimensionality_reduction/workflows/run_benchmark/main.nf \
+  -profile docker \
+  -resume \
+  -entry auto \
+  -c src/wf_utils/labels_ci.config \
+  --input_states "$DATASETS_DIR/**/state.yaml" \
+  --rename_keys 'input_dataset:output_dataset,input_solution:output_solution' \
+  --settings '{"output_scores": "scores.yaml", "output_dataset_info": "dataset_info.yaml", "output_method_configs": "method_configs.yaml", "output_metric_configs": "metric_configs.yaml", "output_task_info": "task_info.yaml"}' \
+  --publish_dir "$OUTPUT_DIR" \
+  --output_state "state.yaml"
\ No newline at end of file
diff --git a/src/tasks/label_projection/README.md b/src/tasks/label_projection/README.md
new file mode 100644
index 0000000000..7694bc0aa6
--- /dev/null
+++ b/src/tasks/label_projection/README.md
@@ -0,0 +1,370 @@
+# Label projection
+
+
+Automated cell type annotation from rich, labeled reference data
+
+Path:
+[`src/tasks/label_projection`](https://github.com/openproblems-bio/openproblems/tree/main/src/tasks/label_projection)
+
+## Motivation
+
+A major challenge for integrating single cell datasets is creating
+matching cell type annotations for each cell. One of the most common
+strategies for annotating cell types is referred to as
+[“cluster-then-annotate”](https://www.nature.com/articles/s41576-018-0088-9)
+whereby cells are aggregated into clusters based on feature similarity
+and then manually characterized based on differential gene expression or
+previously identified marker genes. Recently, methods have emerged to
+build on this strategy and annotate cells using [known marker
+genes](https://www.nature.com/articles/s41592-019-0535-3). However,
+these strategies pose a difficulty for integrating atlas-scale datasets
+as the particular annotations may not match.
+
+## Description
+
+To ensure that the cell type labels in newly generated datasets match
+existing reference datasets, some methods align cells to a previously
+annotated [reference
+dataset](https://academic.oup.com/bioinformatics/article/35/22/4688/54802990)
+and then *project* labels from the reference to the new dataset.
+
+Here, we compare methods for annotation based on a reference dataset.
+The datasets consist of two or more samples of single cell profiles that
+have been manually annotated with matching labels. These datasets are
+then split into training and test batches, and the task of each method
+is to train a cell type classifer on the training set and project those
+labels onto the test set.
+
+## Authors & contributors
+
+| name              | roles              |
+|:------------------|:-------------------|
+| Nikolay Markov    | author, maintainer |
+| Scott Gigante     | author             |
+| Robrecht Cannoodt | author             |
+
+## API
+
+``` mermaid
+flowchart LR
+  file_common_dataset("Common Dataset")
+  comp_process_dataset[/"Data processor"/]
+  file_train("Training data")
+  file_test("Test data")
+  file_solution("Solution")
+  comp_control_method[/"Control method"/]
+  comp_method[/"Method"/]
+  comp_metric[/"Metric"/]
+  file_prediction("Prediction")
+  file_score("Score")
+  file_common_dataset---comp_process_dataset
+  comp_process_dataset-->file_train
+  comp_process_dataset-->file_test
+  comp_process_dataset-->file_solution
+  file_train---comp_control_method
+  file_train---comp_method
+  file_test---comp_control_method
+  file_test---comp_method
+  file_solution---comp_control_method
+  file_solution---comp_metric
+  comp_control_method-->file_prediction
+  comp_method-->file_prediction
+  comp_metric-->file_score
+  file_prediction---comp_metric
+```
+
+## File format: Common Dataset
+
+A subset of the common dataset.
+
+Example file: `resources_test/common/pancreas/dataset.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'cell_type', 'batch'
+     var: 'hvg', 'hvg_score'
+     obsm: 'X_pca'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                         | Type      | Description                                                                    |
+|:-----------------------------|:----------|:-------------------------------------------------------------------------------|
+| `obs["cell_type"]`           | `string`  | Cell type information.                                                         |
+| `obs["batch"]`               | `string`  | Batch information.                                                             |
+| `var["hvg"]`                 | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’.       |
+| `var["hvg_score"]`           | `double`  | A ranking of the features by hvg.                                              |
+| `obsm["X_pca"]`              | `double`  | The resulting PCA embedding.                                                   |
+| `layers["counts"]`           | `integer` | Raw counts.                                                                    |
+| `layers["normalized"]`       | `double`  | Normalized expression values.                                                  |
+| `uns["dataset_id"]`          | `string`  | A unique identifier for the dataset.                                           |
+| `uns["dataset_name"]`        | `string`  | Nicely formatted name.                                                         |
+| `uns["dataset_url"]`         | `string`  | (*Optional*) Link to the original source of the dataset.                       |
+| `uns["dataset_reference"]`   | `string`  | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]`     | `string`  | Short description of the dataset.                                              |
+| `uns["dataset_description"]` | `string`  | Long description of the dataset.                                               |
+| `uns["dataset_organism"]`    | `string`  | (*Optional*) The organism of the sample in the dataset.                        |
+| `uns["normalization_id"]`    | `string`  | Which normalization was used.                                                  |
+
+</div>
+
+## Component type: Data processor
+
+Path:
+[`src/label_projection`](https://github.com/openproblems-bio/openproblems/tree/main/src/label_projection)
+
+A label projection dataset processor.
+
+Arguments:
+
+<div class="small">
+
+| Name                | Type   | Description                                |
+|:--------------------|:-------|:-------------------------------------------|
+| `--input`           | `file` | A subset of the common dataset.            |
+| `--output_train`    | `file` | (*Output*) The training data.              |
+| `--output_test`     | `file` | (*Output*) The test data (without labels). |
+| `--output_solution` | `file` | (*Output*) The solution for the test data. |
+
+</div>
+
+## File format: Training data
+
+The training data
+
+Example file: `resources_test/label_projection/pancreas/train.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'label', 'batch'
+     var: 'hvg', 'hvg_score'
+     obsm: 'X_pca'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'normalization_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                      | Type      | Description                                                              |
+|:--------------------------|:----------|:-------------------------------------------------------------------------|
+| `obs["label"]`            | `string`  | Ground truth cell type labels.                                           |
+| `obs["batch"]`            | `string`  | Batch information.                                                       |
+| `var["hvg"]`              | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’. |
+| `var["hvg_score"]`        | `double`  | A ranking of the features by hvg.                                        |
+| `obsm["X_pca"]`           | `double`  | The resulting PCA embedding.                                             |
+| `layers["counts"]`        | `integer` | Raw counts.                                                              |
+| `layers["normalized"]`    | `double`  | Normalized counts.                                                       |
+| `uns["dataset_id"]`       | `string`  | A unique identifier for the dataset.                                     |
+| `uns["normalization_id"]` | `string`  | Which normalization was used.                                            |
+
+</div>
+
+## File format: Test data
+
+The test data (without labels)
+
+Example file: `resources_test/label_projection/pancreas/test.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'batch'
+     var: 'hvg', 'hvg_score'
+     obsm: 'X_pca'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'normalization_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                      | Type      | Description                                                              |
+|:--------------------------|:----------|:-------------------------------------------------------------------------|
+| `obs["batch"]`            | `string`  | Batch information.                                                       |
+| `var["hvg"]`              | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’. |
+| `var["hvg_score"]`        | `double`  | A ranking of the features by hvg.                                        |
+| `obsm["X_pca"]`           | `double`  | The resulting PCA embedding.                                             |
+| `layers["counts"]`        | `integer` | Raw counts.                                                              |
+| `layers["normalized"]`    | `double`  | Normalized counts.                                                       |
+| `uns["dataset_id"]`       | `string`  | A unique identifier for the dataset.                                     |
+| `uns["normalization_id"]` | `string`  | Which normalization was used.                                            |
+
+</div>
+
+## File format: Solution
+
+The solution for the test data
+
+Example file: `resources_test/label_projection/pancreas/solution.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'label', 'batch'
+     var: 'hvg', 'hvg_score'
+     obsm: 'X_pca'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                         | Type      | Description                                                                    |
+|:-----------------------------|:----------|:-------------------------------------------------------------------------------|
+| `obs["label"]`               | `string`  | Ground truth cell type labels.                                                 |
+| `obs["batch"]`               | `string`  | Batch information.                                                             |
+| `var["hvg"]`                 | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’.       |
+| `var["hvg_score"]`           | `double`  | A ranking of the features by hvg.                                              |
+| `obsm["X_pca"]`              | `double`  | The resulting PCA embedding.                                                   |
+| `layers["counts"]`           | `integer` | Raw counts.                                                                    |
+| `layers["normalized"]`       | `double`  | Normalized counts.                                                             |
+| `uns["dataset_id"]`          | `string`  | A unique identifier for the dataset.                                           |
+| `uns["dataset_name"]`        | `string`  | Nicely formatted name.                                                         |
+| `uns["dataset_url"]`         | `string`  | (*Optional*) Link to the original source of the dataset.                       |
+| `uns["dataset_reference"]`   | `string`  | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]`     | `string`  | Short description of the dataset.                                              |
+| `uns["dataset_description"]` | `string`  | Long description of the dataset.                                               |
+| `uns["dataset_organism"]`    | `string`  | (*Optional*) The organism of the sample in the dataset.                        |
+| `uns["normalization_id"]`    | `string`  | Which normalization was used.                                                  |
+
+</div>
+
+## Component type: Control method
+
+Path:
+[`src/label_projection/control_methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/label_projection/control_methods)
+
+Quality control methods for verifying the pipeline.
+
+Arguments:
+
+<div class="small">
+
+| Name               | Type   | Description                     |
+|:-------------------|:-------|:--------------------------------|
+| `--input_train`    | `file` | The training data.              |
+| `--input_test`     | `file` | The test data (without labels). |
+| `--input_solution` | `file` | The solution for the test data. |
+| `--output`         | `file` | (*Output*) The prediction file. |
+
+</div>
+
+## Component type: Method
+
+Path:
+[`src/label_projection/methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/label_projection/methods)
+
+A label projection method.
+
+Arguments:
+
+<div class="small">
+
+| Name            | Type   | Description                     |
+|:----------------|:-------|:--------------------------------|
+| `--input_train` | `file` | The training data.              |
+| `--input_test`  | `file` | The test data (without labels). |
+| `--output`      | `file` | (*Output*) The prediction file. |
+
+</div>
+
+## Component type: Metric
+
+Path:
+[`src/label_projection/metrics`](https://github.com/openproblems-bio/openproblems/tree/main/src/label_projection/metrics)
+
+A label projection metric.
+
+Arguments:
+
+<div class="small">
+
+| Name                 | Type   | Description                     |
+|:---------------------|:-------|:--------------------------------|
+| `--input_solution`   | `file` | The solution for the test data. |
+| `--input_prediction` | `file` | The prediction file.            |
+| `--output`           | `file` | (*Output*) Metric score file.   |
+
+</div>
+
+## File format: Prediction
+
+The prediction file
+
+Example file: `resources_test/label_projection/pancreas/prediction.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'label_pred'
+     uns: 'dataset_id', 'normalization_id', 'method_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                      | Type     | Description                          |
+|:--------------------------|:---------|:-------------------------------------|
+| `obs["label_pred"]`       | `string` | Predicted labels for the test cells. |
+| `uns["dataset_id"]`       | `string` | A unique identifier for the dataset. |
+| `uns["normalization_id"]` | `string` | Which normalization was used.        |
+| `uns["method_id"]`        | `string` | A unique identifier for the method.  |
+
+</div>
+
+## File format: Score
+
+Metric score file
+
+Example file: `resources_test/label_projection/pancreas/score.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     uns: 'dataset_id', 'normalization_id', 'method_id', 'metric_ids', 'metric_values'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                      | Type     | Description                                                                                  |
+|:--------------------------|:---------|:---------------------------------------------------------------------------------------------|
+| `uns["dataset_id"]`       | `string` | A unique identifier for the dataset.                                                         |
+| `uns["normalization_id"]` | `string` | Which normalization was used.                                                                |
+| `uns["method_id"]`        | `string` | A unique identifier for the method.                                                          |
+| `uns["metric_ids"]`       | `string` | One or more unique metric identifiers.                                                       |
+| `uns["metric_values"]`    | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |
+
+</div>
+
diff --git a/src/tasks/label_projection/api/comp_control_method.yaml b/src/tasks/label_projection/api/comp_control_method.yaml
new file mode 100644
index 0000000000..d32de4ab2c
--- /dev/null
+++ b/src/tasks/label_projection/api/comp_control_method.yaml
@@ -0,0 +1,38 @@
+functionality:
+  namespace: "label_projection/control_methods"
+  info:
+    type: control_method
+    type_info:
+      label: Control method
+      summary: Quality control methods for verifying the pipeline.
+      description: |
+        This folder contains control components for the task. 
+        These components have the same interface as the regular methods
+        but also receive the solution object as input. It serves as a
+        starting point to test the relative accuracy of new methods in
+        the task, and also as a quality control for the metrics defined
+        in the task. 
+  arguments:
+    - name: "--input_train"
+      __merge__: file_train.yaml
+      direction: input
+      required: true
+    - name: "--input_test"
+      __merge__: file_test.yaml
+      direction: input
+      required: true
+    - name: "--input_solution"
+      __merge__: file_solution.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_prediction.yaml
+      direction: output
+      required: true
+  test_resources:
+    - path: /resources_test/label_projection/pancreas
+      dest: resources_test/label_projection/pancreas
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
\ No newline at end of file
diff --git a/src/tasks/label_projection/api/comp_method.yaml b/src/tasks/label_projection/api/comp_method.yaml
new file mode 100644
index 0000000000..1b7cb0dabc
--- /dev/null
+++ b/src/tasks/label_projection/api/comp_method.yaml
@@ -0,0 +1,31 @@
+functionality:
+  namespace: "label_projection/methods"
+  info:
+    type: method
+    type_info:
+      label: Method
+      summary: A label projection method.
+      description: |
+        A label projection method to predict the labels of a new "test"
+        dataset based on an annotated "training" dataset.
+  arguments:
+    - name: "--input_train"
+      __merge__: file_train.yaml
+      direction: input
+      required: true
+    - name: "--input_test"
+      __merge__: file_test.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_prediction.yaml
+      direction: output
+      required: true
+  test_resources:
+    - path: /resources_test/label_projection/pancreas
+      dest: resources_test/label_projection/pancreas
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /src/common/library.bib
diff --git a/src/tasks/label_projection/api/comp_metric.yaml b/src/tasks/label_projection/api/comp_metric.yaml
new file mode 100644
index 0000000000..ce81b0f89f
--- /dev/null
+++ b/src/tasks/label_projection/api/comp_metric.yaml
@@ -0,0 +1,31 @@
+functionality:
+  namespace: "label_projection/metrics"
+  info:
+    type: metric
+    type_info:
+      label: Metric
+      summary: A label projection metric.
+      description: |
+        A metric for evaluating predicted labels.
+  arguments:
+    - name: "--input_solution"
+      __merge__: file_solution.yaml
+      direction: input
+      required: true
+    - name: "--input_prediction"
+      __merge__: file_prediction.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_score.yaml
+      required: true
+      direction: output
+  test_resources:
+    - path: /resources_test/label_projection/pancreas
+      dest: resources_test/label_projection/pancreas
+    - type: python_script
+      path: /src/common/comp_tests/check_metric_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /src/common/library.bib
+
diff --git a/src/tasks/label_projection/api/comp_process_dataset.yaml b/src/tasks/label_projection/api/comp_process_dataset.yaml
new file mode 100644
index 0000000000..03c2ea3726
--- /dev/null
+++ b/src/tasks/label_projection/api/comp_process_dataset.yaml
@@ -0,0 +1,32 @@
+functionality:
+  namespace: "label_projection"
+  info:
+    type: process_dataset
+    type_info:
+      label: Data processor
+      summary: A label projection dataset processor.
+      description: |
+        A component for processing a Common Dataset into a task-specific dataset.
+  arguments:
+    - name: "--input"
+      __merge__: file_common_dataset.yaml
+      direction: input
+      required: true
+    - name: "--output_train"
+      __merge__: file_train.yaml
+      direction: output
+      required: true
+    - name: "--output_test"
+      __merge__: file_test.yaml
+      direction: output
+      required: true
+    - name: "--output_solution"
+      __merge__: file_solution.yaml
+      direction: output
+      required: true
+  test_resources:
+    - path: /resources_test/common/pancreas
+      dest: resources_test/common/pancreas
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+        
diff --git a/src/tasks/label_projection/api/file_common_dataset.yaml b/src/tasks/label_projection/api/file_common_dataset.yaml
new file mode 100644
index 0000000000..eeb01ffd1e
--- /dev/null
+++ b/src/tasks/label_projection/api/file_common_dataset.yaml
@@ -0,0 +1,72 @@
+type: file
+example: "resources_test/common/pancreas/dataset.h5ad"
+info:
+  label: "Common Dataset"
+  summary: A subset of the common dataset.
+  slots:
+    layers:
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized expression values
+        required: true
+    obs:
+      - type: string
+        name: cell_type
+        description: Cell type information
+        required: true
+      - type: string
+        name: batch
+        description: Batch information
+        required: true
+    var:
+      - type: boolean
+        name: hvg
+        description: Whether or not the feature is considered to be a 'highly variable gene'
+        required: true
+      - type: double
+        name: hvg_score
+        description: A ranking of the features by hvg.
+        required: true
+    obsm:
+      - type: double
+        name: X_pca
+        description: The resulting PCA embedding.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
+
diff --git a/src/tasks/label_projection/api/file_prediction.yaml b/src/tasks/label_projection/api/file_prediction.yaml
new file mode 100644
index 0000000000..36efa87af0
--- /dev/null
+++ b/src/tasks/label_projection/api/file_prediction.yaml
@@ -0,0 +1,24 @@
+type: file
+example: "resources_test/label_projection/pancreas/prediction.h5ad"
+info:
+  label: "Prediction"
+  summary: "The prediction file"
+  slots:
+    obs:
+      - type: string
+        name: label_pred
+        description: Predicted labels for the test cells.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
+      - type: string
+        name: method_id
+        description: "A unique identifier for the method"
+        required: true
diff --git a/src/tasks/label_projection/api/file_score.yaml b/src/tasks/label_projection/api/file_score.yaml
new file mode 100644
index 0000000000..7ee5eaa8ee
--- /dev/null
+++ b/src/tasks/label_projection/api/file_score.yaml
@@ -0,0 +1,29 @@
+type: file
+example: "resources_test/label_projection/pancreas/score.h5ad"
+info:
+  label: "Score"
+  summary: "Metric score file"
+  slots:
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
+      - type: string
+        name: method_id
+        description: "A unique identifier for the method"
+        required: true
+      - type: string
+        name: metric_ids
+        description: "One or more unique metric identifiers"
+        multiple: true
+        required: true
+      - type: double
+        name: metric_values
+        description: "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'."
+        multiple: true
+        required: true
diff --git a/src/tasks/label_projection/api/file_solution.yaml b/src/tasks/label_projection/api/file_solution.yaml
new file mode 100644
index 0000000000..c7591678e0
--- /dev/null
+++ b/src/tasks/label_projection/api/file_solution.yaml
@@ -0,0 +1,71 @@
+type: file
+example: "resources_test/label_projection/pancreas/solution.h5ad"
+info:
+  label: "Solution"
+  summary: "The solution for the test data"
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized counts
+        required: true
+    obs:
+      - type: string
+        name: label
+        description: Ground truth cell type labels
+        required: true
+      - type: string
+        name: batch
+        description: Batch information
+        required: true
+    var:
+      - type: boolean
+        name: hvg
+        description: Whether or not the feature is considered to be a 'highly variable gene'
+        required: true
+      - type: double
+        name: hvg_score
+        description: A ranking of the features by hvg.
+        required: true
+    obsm:
+      - type: double
+        name: X_pca
+        description: The resulting PCA embedding.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
diff --git a/src/tasks/label_projection/api/file_test.yaml b/src/tasks/label_projection/api/file_test.yaml
new file mode 100644
index 0000000000..9cb2177da5
--- /dev/null
+++ b/src/tasks/label_projection/api/file_test.yaml
@@ -0,0 +1,43 @@
+type: file
+example: "resources_test/label_projection/pancreas/test.h5ad"
+info:
+  label: "Test data"
+  summary: "The test data (without labels)"
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized counts
+        required: true
+    obs:
+      - type: string
+        name: batch
+        description: Batch information
+        required: true
+    var:
+      - type: boolean
+        name: hvg
+        description: Whether or not the feature is considered to be a 'highly variable gene'
+        required: true
+      - type: double
+        name: hvg_score
+        description: A ranking of the features by hvg.
+        required: true
+    obsm:
+      - type: double
+        name: X_pca
+        description: The resulting PCA embedding.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
diff --git a/src/tasks/label_projection/api/file_train.yaml b/src/tasks/label_projection/api/file_train.yaml
new file mode 100644
index 0000000000..d615fc5693
--- /dev/null
+++ b/src/tasks/label_projection/api/file_train.yaml
@@ -0,0 +1,47 @@
+type: file
+example: "resources_test/label_projection/pancreas/train.h5ad"
+info:
+  label: "Training data"
+  summary: "The training data"
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized counts
+        required: true
+    obs:
+      - type: string
+        name: label
+        description: Ground truth cell type labels
+        required: true
+      - type: string
+        name: batch
+        description: Batch information
+        required: true
+    var:
+      - type: boolean
+        name: hvg
+        description: Whether or not the feature is considered to be a 'highly variable gene'
+        required: true
+      - type: double
+        name: hvg_score
+        description: A ranking of the features by hvg.
+        required: true
+    obsm:
+      - type: double
+        name: X_pca
+        description: The resulting PCA embedding.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
diff --git a/src/tasks/label_projection/api/task_info.yaml b/src/tasks/label_projection/api/task_info.yaml
new file mode 100644
index 0000000000..07b6b0120d
--- /dev/null
+++ b/src/tasks/label_projection/api/task_info.yaml
@@ -0,0 +1,46 @@
+name: label_projection
+label: Label projection
+v1:
+  path: openproblems/tasks/label_projection/README.md
+  commit: 817ea64a526c7251f74c9a7a6dba98e8602b94a8
+summary: Automated cell type annotation from rich, labeled reference data
+image: "thumbnail.svg"
+motivation: |
+  A major challenge for integrating single cell datasets is creating matching
+  cell type annotations for each cell. One of the most common strategies for
+  annotating cell types is referred to as
+  ["cluster-then-annotate"](https://www.nature.com/articles/s41576-018-0088-9)
+  whereby cells are aggregated into clusters based on feature similarity and
+  then manually characterized based on differential gene expression or previously
+  identified marker genes. Recently, methods have emerged to build on this
+  strategy and annotate cells using
+  [known marker genes](https://www.nature.com/articles/s41592-019-0535-3).
+  However, these strategies pose a difficulty for integrating atlas-scale
+  datasets as the particular annotations may not match.
+description: |
+  To ensure that the cell type labels in newly generated datasets match
+  existing reference datasets, some methods align cells to a previously
+  annotated [reference dataset](https://academic.oup.com/bioinformatics/article/35/22/4688/54802990)
+  and then _project_ labels from the reference to the new dataset.
+
+  Here, we compare methods for annotation based on a reference dataset.
+  The datasets consist of two or more samples of single cell profiles that
+  have been manually annotated with matching labels. These datasets are then
+  split into training and test batches, and the task of each method is to
+  train a cell type classifer on the training set and project those labels
+  onto the test set.
+authors:
+  - name: "Nikolay Markov"
+    roles: [ author, maintainer ]
+    info:
+      github: mxposed
+  - name: "Scott Gigante"
+    roles: [ author ]
+    info:
+      github: scottgigante
+      orcid: "0000-0002-4544-2764"
+  - name: Robrecht Cannoodt
+    roles: [ author ]
+    info:
+      github: rcannood
+      orcid: "0000-0003-3641-729X"
\ No newline at end of file
diff --git a/src/tasks/label_projection/api/thumbnail.svg b/src/tasks/label_projection/api/thumbnail.svg
new file mode 100644
index 0000000000..3a0c47b5c2
--- /dev/null
+++ b/src/tasks/label_projection/api/thumbnail.svg
@@ -0,0 +1 @@
+<?xml version="1.0" encoding="UTF-8"?><svg id="Layer_1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 600 200"><defs><style>.cls-1,.cls-2{fill:#fff;}.cls-2{stroke:#333;}.cls-2,.cls-3,.cls-4,.cls-5,.cls-6,.cls-7{stroke-miterlimit:10;}.cls-3{fill:rgba(251,162,154,.2);stroke:#fba29a;}.cls-4{fill:rgba(155,126,189,.2);stroke:#9b7ebd;}.cls-5{fill:rgba(0,107,49,.2);stroke:#006b31;}.cls-6{fill:rgba(105,174,233,.2);stroke:#69aee9;}.cls-7{fill:none;stroke:#4d4d4d;stroke-width:.25px;}</style></defs><line class="cls-7" x1="389.43" y1="48.71" x2="307.95" y2="86.5"/><line class="cls-7" x1="414.31" y1="92.68" x2="307.95" y2="86.5"/><line class="cls-7" x1="490.17" y1="86.5" x2="528.84" y2="55.11"/><line class="cls-7" x1="307.95" y1="86.5" x2="182.24" y2="112"/><line class="cls-7" x1="127.16" y1="112" x2="51.3" y2="105.82"/><line class="cls-7" x1="182.24" y1="112" x2="127.16" y2="112"/><line class="cls-7" x1="199.52" y1="61.12" x2="127.16" y2="112"/><line class="cls-7" x1="165.82" y1="75.81" x2="127.16" y2="112"/><line class="cls-7" x1="183.1" y1="24.74" x2="127.16" y2="112"/><line class="cls-7" x1="127.16" y1="112" x2="136.92" y2="167.45"/><line class="cls-7" x1="127.16" y1="112" x2="103.88" y2="167.45"/><line class="cls-7" x1="127.16" y1="112" x2="62.39" y2="132.17"/><line class="cls-7" x1="127.16" y1="112" x2="62.39" y2="59.78"/><line class="cls-7" x1="127.16" y1="112" x2="85.93" y2="86.12"/><line class="cls-7" x1="127.16" y1="112" x2="29.12" y2="59.78"/><line class="cls-7" x1="127.16" y1="112" x2="79" y2="36.69"/><line class="cls-7" x1="127.16" y1="112" x2="105.02" y2="63.84"/><line class="cls-7" x1="127.16" y1="112" x2="116.65" y2="24.74"/><ellipse class="cls-2" cx="126.86" cy="112" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="183.1" cy="112.19" rx="11.09" ry="11.12"/><ellipse class="cls-3" cx="183.1" cy="112.19" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="165.82" cy="75.81" rx="11.09" ry="11.12"/><ellipse class="cls-3" cx="165.82" cy="75.81" rx="11.14" ry="11.07" transform="translate(56.31 221.59) rotate(-77.66)"/><ellipse class="cls-1" cx="182.24" cy="24.74" rx="11.09" ry="11.12"/><ellipse class="cls-6" cx="182.24" cy="24.74" rx="11.14" ry="11.07" transform="translate(119.11 197.48) rotate(-77.66)"/><ellipse class="cls-1" cx="136.92" cy="167.45" rx="11.09" ry="11.12"/><ellipse class="cls-5" cx="136.92" cy="167.45" rx="11.06" ry="11.15" transform="translate(-26.62 25.88) rotate(-9.85)"/><ellipse class="cls-1" cx="103.88" cy="167.45" rx="11.09" ry="11.12"/><ellipse class="cls-5" cx="103.88" cy="167.45" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="61.66" cy="132.26" rx="11.09" ry="11.12"/><ellipse class="cls-3" cx="62.39" cy="132.17" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="50.46" cy="105.82" rx="11.09" ry="11.12"/><ellipse class="cls-3" cx="50.46" cy="105.82" rx="11.06" ry="11.15" transform="translate(-17.35 10.19) rotate(-9.85)"/><ellipse class="cls-1" cx="85.93" cy="86.12" rx="11.09" ry="11.12"/><ellipse class="cls-3" cx="85.93" cy="86.12" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="62.39" cy="59.78" rx="11.09" ry="11.12"/><ellipse class="cls-3" cx="62.39" cy="59.78" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="29.12" cy="59.78" rx="11.09" ry="11.12"/><ellipse class="cls-3" cx="29.12" cy="59.78" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="79" cy="36.69" rx="11.09" ry="11.12"/><ellipse class="cls-3" cx="79" cy="36.69" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="104.72" cy="63.84" rx="11.09" ry="11.12"/><ellipse class="cls-3" cx="104.72" cy="63.84" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="116.95" cy="25.57" rx="11.09" ry="11.12"/><ellipse class="cls-6" cx="116.95" cy="25.57" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="199.52" cy="61.12" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="199.52" cy="61.12" rx="11.09" ry="11.12"/><ellipse class="cls-3" cx="199.52" cy="61.12" rx="11.09" ry="11.12"/><line class="cls-7" x1="490.17" y1="86.5" x2="414.31" y2="92.68"/><line class="cls-7" x1="545.26" y1="86.5" x2="490.17" y2="86.5"/><line class="cls-7" x1="562.54" y1="137.38" x2="490.17" y2="86.5"/><line class="cls-7" x1="528.84" y1="122.68" x2="490.17" y2="86.5"/><line class="cls-7" x1="546.12" y1="173.75" x2="490.17" y2="86.5"/><line class="cls-7" x1="490.17" y1="86.5" x2="499.93" y2="31.05"/><line class="cls-7" x1="490.17" y1="86.5" x2="466.9" y2="31.05"/><line class="cls-7" x1="490.17" y1="86.5" x2="389.43" y2="48.71"/><line class="cls-7" x1="490.17" y1="86.5" x2="425.41" y2="138.71"/><line class="cls-7" x1="490.17" y1="86.5" x2="418.47" y2="21.28"/><line class="cls-7" x1="490.17" y1="86.5" x2="386.09" y2="112.63"/><line class="cls-7" x1="490.17" y1="86.5" x2="442.02" y2="161.8"/><line class="cls-7" x1="490.17" y1="86.5" x2="468.03" y2="134.65"/><line class="cls-7" x1="490.17" y1="86.5" x2="479.67" y2="173.75"/><ellipse class="cls-2" cx="489.88" cy="86.5" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="546.12" cy="86.31" rx="11.09" ry="11.12"/><ellipse class="cls-4" cx="546.12" cy="86.31" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="528.84" cy="122.68" rx="11.09" ry="11.12"/><ellipse class="cls-4" cx="528.84" cy="122.68" rx="11.07" ry="11.14" transform="translate(-14 115.9) rotate(-12.34)"/><ellipse class="cls-1" cx="545.26" cy="173.75" rx="11.09" ry="11.12"/><ellipse class="cls-4" cx="545.26" cy="173.75" rx="11.07" ry="11.14" transform="translate(-24.54 120.59) rotate(-12.34)"/><ellipse class="cls-1" cx="499.93" cy="31.05" rx="11.09" ry="11.12"/><ellipse class="cls-6" cx="499.93" cy="31.05" rx="11.15" ry="11.06" transform="translate(383.86 518.31) rotate(-80.15)"/><ellipse class="cls-1" cx="466.9" cy="31.05" rx="11.09" ry="11.12"/><ellipse class="cls-6" cx="466.9" cy="31.05" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="389.43" cy="48.61" rx="11.09" ry="11.12"/><ellipse class="cls-6" cx="390.17" cy="48.71" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="413.47" cy="92.68" rx="11.09" ry="11.12"/><ellipse class="cls-5" cx="413.47" cy="92.68" rx="11.15" ry="11.06" transform="translate(251.45 484.21) rotate(-80.15)"/><ellipse class="cls-1" cx="418.47" cy="21.28" rx="11.09" ry="11.12"/><ellipse class="cls-6" cx="418.47" cy="21.28" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="425.41" cy="138.71" rx="11.09" ry="11.12"/><ellipse class="cls-5" cx="425.41" cy="138.71" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="528.69" cy="55.11" rx="11.09" ry="11.12"/><ellipse class="cls-4" cx="528.69" cy="55.11" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="442.02" cy="161.8" rx="11.09" ry="11.12"/><ellipse class="cls-5" cx="442.02" cy="161.8" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="467.74" cy="134.65" rx="11.09" ry="11.12"/><ellipse class="cls-4" cx="467.74" cy="134.65" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="479.97" cy="172.93" rx="11.09" ry="11.12"/><ellipse class="cls-4" cx="479.97" cy="172.93" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="562.54" cy="137.38" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="562.54" cy="137.38" rx="11.09" ry="11.12"/><ellipse class="cls-4" cx="562.54" cy="137.38" rx="11.09" ry="11.12"/><line class="cls-7" x1="386.09" y1="112.63" x2="307.95" y2="86.5"/><line class="cls-7" x1="363.9" y1="173.75" x2="307.95" y2="86.5"/><line class="cls-7" x1="307.95" y1="86.5" x2="317.71" y2="31.05"/><line class="cls-7" x1="307.95" y1="86.5" x2="284.68" y2="31.05"/><line class="cls-7" x1="307.95" y1="86.5" x2="243.19" y2="66.33"/><line class="cls-7" x1="307.95" y1="86.5" x2="243.19" y2="138.71"/><line class="cls-7" x1="307.95" y1="86.5" x2="266.73" y2="112.37"/><line class="cls-7" x1="307.95" y1="86.5" x2="209.92" y2="138.71"/><line class="cls-7" x1="307.95" y1="86.5" x2="259.8" y2="161.8"/><line class="cls-7" x1="307.95" y1="86.5" x2="285.81" y2="134.65"/><line class="cls-7" x1="307.95" y1="86.5" x2="318.78" y2="143.29"/><ellipse class="cls-2" cx="307.66" cy="86.5" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="386.09" cy="112.63" rx="11.09" ry="11.12"/><ellipse class="cls-6" cx="386.09" cy="112.63" rx="11.07" ry="11.14" transform="translate(-15.15 85.15) rotate(-12.34)"/><ellipse class="cls-1" cx="363.04" cy="173.75" rx="11.09" ry="11.12"/><ellipse class="cls-4" cx="363.04" cy="173.75" rx="11.07" ry="11.14" transform="translate(-28.75 81.63) rotate(-12.34)"/><ellipse class="cls-1" cx="317.71" cy="31.05" rx="11.09" ry="11.12"/><ellipse class="cls-6" cx="317.71" cy="31.05" rx="11.15" ry="11.06" transform="translate(232.8 338.77) rotate(-80.15)"/><ellipse class="cls-1" cx="284.68" cy="31.05" rx="11.09" ry="11.12"/><ellipse class="cls-6" cx="284.68" cy="31.05" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="242.46" cy="66.23" rx="11.09" ry="11.12"/><ellipse class="cls-6" cx="243.19" cy="66.33" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="266.73" cy="112.37" rx="11.09" ry="11.12"/><ellipse class="cls-5" cx="266.73" cy="112.37" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="243.19" cy="138.71" rx="11.09" ry="11.12"/><ellipse class="cls-5" cx="243.19" cy="138.71" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="209.92" cy="138.71" rx="11.09" ry="11.12"/><ellipse class="cls-3" cx="209.92" cy="138.71" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="259.8" cy="161.8" rx="11.09" ry="11.12"/><ellipse class="cls-5" cx="259.8" cy="161.8" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="285.52" cy="134.65" rx="11.09" ry="11.12"/><ellipse class="cls-5" cx="285.52" cy="134.65" rx="11.09" ry="11.12"/><ellipse class="cls-1" cx="318.75" cy="143.29" rx="11.09" ry="11.12"/><ellipse class="cls-5" cx="318.75" cy="143.29" rx="11.09" ry="11.12"/></svg>
\ No newline at end of file
diff --git a/src/tasks/label_projection/control_methods/majority_vote/config.vsh.yaml b/src/tasks/label_projection/control_methods/majority_vote/config.vsh.yaml
new file mode 100644
index 0000000000..8f0915a1dd
--- /dev/null
+++ b/src/tasks/label_projection/control_methods/majority_vote/config.vsh.yaml
@@ -0,0 +1,22 @@
+__merge__: ../../api/comp_control_method.yaml
+functionality:
+  name: "majority_vote"
+  info:
+    label: Majority Vote
+    summary: "A control-type method that predicts all cells to belong to the most abundant cell type in the dataset"
+    description: "A control-type method that predicts all cells to belong to the most abundant cell type in the dataset"
+    v1:
+      path: openproblems/tasks/label_projection/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    variants:
+      majority_vote:
+    preferred_normalization: counts
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives: 
+      label: [midtime, lowmem, lowcpu]
diff --git a/src/tasks/label_projection/control_methods/majority_vote/script.py b/src/tasks/label_projection/control_methods/majority_vote/script.py
new file mode 100644
index 0000000000..0fc6446f0d
--- /dev/null
+++ b/src/tasks/label_projection/control_methods/majority_vote/script.py
@@ -0,0 +1,26 @@
+import anndata as ad
+
+## VIASH START
+par = {
+    'input_train': 'resources_test/label_projection/pancreas/train.h5ad',
+    'input_test': 'resources_test/label_projection/pancreas/test.h5ad',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'foo'
+}
+## VIASH END
+
+print("Load data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Compute majority vote", flush=True)
+majority = input_train.obs.label.value_counts().index[0]
+
+print("Create prediction object", flush=True)
+input_test.obs["label_pred"] = majority
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par["output"], compression="gzip")
diff --git a/src/tasks/label_projection/control_methods/random_labels/config.vsh.yaml b/src/tasks/label_projection/control_methods/random_labels/config.vsh.yaml
new file mode 100644
index 0000000000..728157a644
--- /dev/null
+++ b/src/tasks/label_projection/control_methods/random_labels/config.vsh.yaml
@@ -0,0 +1,25 @@
+__merge__: ../../api/comp_control_method.yaml
+functionality:
+  name: "random_labels"
+  info:
+    label: Random Labels
+    summary: "a negative control, where the labels are randomly predicted."
+    description: "A negative control, where the labels are randomly predicted without training the data."
+    v1:
+      path: openproblems/tasks/label_projection/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: counts
+    variants:
+      random_labels:
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: scanpy
+  - type: nextflow
+    directives: 
+      label: [midtime, lowmem, lowcpu]
diff --git a/src/tasks/label_projection/control_methods/random_labels/script.py b/src/tasks/label_projection/control_methods/random_labels/script.py
new file mode 100644
index 0000000000..a57a9d37f2
--- /dev/null
+++ b/src/tasks/label_projection/control_methods/random_labels/script.py
@@ -0,0 +1,33 @@
+import anndata as ad
+import numpy as np
+
+## VIASH START
+par = {
+    'input_train': 'resources_test/label_projection/pancreas/train.h5ad',
+    'input_test': 'resources_test/label_projection/pancreas/test.h5ad',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'foo'
+}
+## VIASH END
+
+print("Load data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Compute label distribution", flush=True)
+label_distribution = input_train.obs.label.value_counts()
+label_distribution = label_distribution / label_distribution.sum()
+
+print("Create prediction object", flush=True)
+input_test.obs["label_pred"] = np.random.choice(
+    label_distribution.index,
+    size=input_test.n_obs,
+    replace=True,
+    p=label_distribution
+)
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par["output"], compression="gzip")
diff --git a/src/tasks/label_projection/control_methods/true_labels/config.vsh.yaml b/src/tasks/label_projection/control_methods/true_labels/config.vsh.yaml
new file mode 100644
index 0000000000..ec536fcc7d
--- /dev/null
+++ b/src/tasks/label_projection/control_methods/true_labels/config.vsh.yaml
@@ -0,0 +1,22 @@
+__merge__: ../../api/comp_control_method.yaml
+functionality:
+  name: "true_labels"
+  info:
+    label: True labels
+    summary: "a positive control, solution labels are copied 1 to 1 to the predicted data."
+    description: "A positive control, where the solution labels are copied 1 to 1 to the predicted data."
+    v1:
+      path: openproblems/tasks/label_projection/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: counts
+    variants:
+      true_labels:
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives: 
+      label: [midtime, lowmem, lowcpu]
diff --git a/src/tasks/label_projection/control_methods/true_labels/script.py b/src/tasks/label_projection/control_methods/true_labels/script.py
new file mode 100644
index 0000000000..dc9354c290
--- /dev/null
+++ b/src/tasks/label_projection/control_methods/true_labels/script.py
@@ -0,0 +1,25 @@
+import anndata as ad
+
+## VIASH START
+par = {
+    'input_train': 'resources_test/label_projection/pancreas/train.h5ad',
+    'input_test': 'resources_test/label_projection/pancreas/test.h5ad',
+    'input_solution': 'resources_test/label_projection/pancreas/test.h5ad',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'foo'
+}
+## VIASH END
+
+print("Load data", flush=True)
+# input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print("Create prediction object", flush=True)
+input_test.obs["label_pred"] = input_solution.obs["label"]
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par["output"], compression="gzip")
diff --git a/src/tasks/label_projection/methods/knn/config.vsh.yaml b/src/tasks/label_projection/methods/knn/config.vsh.yaml
new file mode 100644
index 0000000000..499fa69e81
--- /dev/null
+++ b/src/tasks/label_projection/methods/knn/config.vsh.yaml
@@ -0,0 +1,37 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "knn"
+  info:
+    label: KNN
+    summary: "Assumes cells with similar gene expression belong to the same cell type, and assigns an unlabelled cell the most common cell type among its k nearest neighbors in PCA space."
+    description: |
+      Using the "k-nearest neighbours" approach, which is a
+      popular machine learning algorithm for classification and regression tasks.
+      The assumption underlying KNN in this context is that cells with similar gene
+      expression profiles tend to belong to the same cell type. For each unlabelled
+      cell, this method computes the $k$ labelled cells (in this case, 5) with the
+      smallest distance in PCA space, and assigns that cell the most common cell
+      type among its $k$ nearest neighbors.
+    reference : "cover1967nearest"
+    repository_url: https://github.com/scikit-learn/scikit-learn
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html"
+    v1:
+      path: openproblems/tasks/label_projection/methods/knn_classifier.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+    variants:
+      knn_classifier_log_cp10k:
+      knn_classifier_scran:
+        preferred_normalization: log_scran_pooling
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: [scikit-learn, jsonschema]
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
diff --git a/src/tasks/label_projection/methods/knn/script.py b/src/tasks/label_projection/methods/knn/script.py
new file mode 100644
index 0000000000..44b8b6f4de
--- /dev/null
+++ b/src/tasks/label_projection/methods/knn/script.py
@@ -0,0 +1,28 @@
+import anndata as ad
+import sklearn.neighbors
+
+## VIASH START
+par = {
+    'input_train': 'resources_test/label_projection/pancreas/train.h5ad',
+    'input_test': 'resources_test/label_projection/pancreas/test.h5ad',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'foo',
+}
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Fit to train data", flush=True)
+classifier = sklearn.neighbors.KNeighborsClassifier()
+classifier.fit(input_train.obsm["X_pca"], input_train.obs["label"].astype(str))
+
+print("Predict on test data", flush=True)
+input_test.obs["label_pred"] = classifier.predict(input_test.obsm["X_pca"])
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par['output'], compression="gzip")
diff --git a/src/tasks/label_projection/methods/logistic_regression/config.vsh.yaml b/src/tasks/label_projection/methods/logistic_regression/config.vsh.yaml
new file mode 100644
index 0000000000..88f4c2d5af
--- /dev/null
+++ b/src/tasks/label_projection/methods/logistic_regression/config.vsh.yaml
@@ -0,0 +1,34 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "logistic_regression"
+  info:
+    label: Logistic Regression
+    summary: "Logistic Regression with 100-dimensional PCA coordinates estimates parameters for multivariate classification by minimizing cross entropy loss over cell type classes."
+    description: |
+      Logistic Regression estimates parameters of a logistic function for
+      multivariate classification tasks. Here, we use 100-dimensional whitened PCA
+      coordinates as independent variables, and the model minimises the cross
+      entropy loss over all cell type classes.
+    reference: "hosmer2013applied"
+    repository_url: https://github.com/scikit-learn/scikit-learn
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html"
+    v1:
+      path: openproblems/tasks/label_projection/methods/logistic_regression.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+    variants:
+      logistic_regression_log_cp10k:
+      logistic_regression_scran:
+        preferred_normalization: log_scran_pooling
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: scikit-learn
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
diff --git a/src/tasks/label_projection/methods/logistic_regression/script.py b/src/tasks/label_projection/methods/logistic_regression/script.py
new file mode 100644
index 0000000000..e8796c1b75
--- /dev/null
+++ b/src/tasks/label_projection/methods/logistic_regression/script.py
@@ -0,0 +1,28 @@
+import anndata as ad
+import sklearn.linear_model
+
+## VIASH START
+par = {
+    'input_train': 'resources_test/label_projection/pancreas/train.h5ad',
+    'input_test': 'resources_test/label_projection/pancreas/test.h5ad',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'foo',
+}
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Fit to train data", flush=True)
+classifier = sklearn.linear_model.LogisticRegression()
+classifier.fit(input_train.obsm["X_pca"], input_train.obs["label"].astype(str))
+
+print("Predict on test data", flush=True)
+input_test.obs["label_pred"] = classifier.predict(input_test.obsm["X_pca"])
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par['output'], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/label_projection/methods/mlp/config.vsh.yaml b/src/tasks/label_projection/methods/mlp/config.vsh.yaml
new file mode 100644
index 0000000000..9c7e92fc68
--- /dev/null
+++ b/src/tasks/label_projection/methods/mlp/config.vsh.yaml
@@ -0,0 +1,47 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "mlp"
+  info:
+    label: Multilayer perceptron
+    summary: "A neural network with 100-dimensional PCA input, two hidden layers, and gradient descent weight updates to minimize cross entropy loss."
+    description: |
+      Multi-Layer Perceptron is a type of artificial neural network that
+      consists of multiple layers of interconnected neurons. Each neuron computes a
+      weighted sum of all neurons in the previous layer and transforms it with
+      nonlinear activation function. The output layer provides the final
+      prediction, and network weights are updated by gradient descent to minimize
+      the cross entropy loss. Here, the input data is 100-dimensional whitened PCA
+      coordinates for each cell, and we use two hidden layers of 100 neurons each.
+    reference: "hinton1989connectionist"
+    repository_url: https://github.com/scikit-learn/scikit-learn
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html"
+    v1:
+      path: openproblems/tasks/label_projection/methods/mlp.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+    variants:
+      mlp_log_cp10k:
+      mlp_scran:
+        preferred_normalization: log_scran_pooling
+  arguments:
+    - name: "--hidden_layer_sizes"
+      type: "integer"
+      multiple: true
+      description: "The ith element represents the number of neurons in the ith hidden layer."
+      default: [100, 100]
+    - name: "--max_iter"
+      type: "integer"
+      default: 1000
+      description: "Maximum number of iterations"
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: scikit-learn
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
diff --git a/src/tasks/label_projection/methods/mlp/script.py b/src/tasks/label_projection/methods/mlp/script.py
new file mode 100644
index 0000000000..c98fba3954
--- /dev/null
+++ b/src/tasks/label_projection/methods/mlp/script.py
@@ -0,0 +1,31 @@
+import anndata as ad
+from sklearn.neural_network import MLPClassifier
+
+## VIASH START
+par = {
+    'input_train': 'resources_test/label_projection/pancreas/train.h5ad',
+    'input_test': 'resources_test/label_projection/pancreas/test.h5ad',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'foo',
+}
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Fit to train data", flush=True)
+classifier = MLPClassifier(
+    max_iter=par["max_iter"], 
+    hidden_layer_sizes=tuple(par["hidden_layer_sizes"])
+)
+classifier.fit(input_train.obsm["X_pca"], input_train.obs["label"].astype(str))
+
+print("Predict on test data", flush=True)
+input_test.obs["label_pred"] = classifier.predict(input_test.obsm["X_pca"])
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par['output'], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/label_projection/methods/naive_bayes/config.vsh.yaml b/src/tasks/label_projection/methods/naive_bayes/config.vsh.yaml
new file mode 100644
index 0000000000..90f6e72a52
--- /dev/null
+++ b/src/tasks/label_projection/methods/naive_bayes/config.vsh.yaml
@@ -0,0 +1,33 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "naive_bayes"
+  info:
+    label: Naive Bayesian Classifier
+    summary: "Naive Bayes classification using feature probabilities to project cell type labels from a reference dataset."
+    description: |
+      Naive Bayes classification leverages probabilistic models based on Bayes' theorem
+      to classify cells into different types. In the context of single-cell datasets, this method
+      utilizes the probabilities of features to project cell type labels from a reference dataset
+      to new datasets. The algorithm assumes independence between features, making it computationally
+      efficient and well-suited for high-dimensional data. It is particularly useful for annotating
+      cells in atlas-scale datasets, ensuring consistency and alignment with existing reference annotations.
+    reference: "hosmer2013applied"
+    repository_url: https://github.com/scikit-learn/scikit-learn
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html"
+    preferred_normalization: log_cp10k
+    variants:
+      naive_bayes_log_cp10k:
+      naive_bayes_scran:
+        preferred_normalization: log_scran_pooling
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: scikit-learn
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, lowcpu]
diff --git a/src/tasks/label_projection/methods/naive_bayes/script.py b/src/tasks/label_projection/methods/naive_bayes/script.py
new file mode 100644
index 0000000000..542c088dca
--- /dev/null
+++ b/src/tasks/label_projection/methods/naive_bayes/script.py
@@ -0,0 +1,28 @@
+import anndata as ad
+import sklearn.naive_bayes
+
+## VIASH START
+par = {
+    'input_train': 'resources_test/label_projection/pancreas/train.h5ad',
+    'input_test': 'resources_test/label_projection/pancreas/test.h5ad',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'foo',
+}
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Fit to train data", flush=True)
+classifier = sklearn.naive_bayes.GaussianNB()
+classifier.fit(input_train.obsm["X_pca"], input_train.obs["label"].astype(str))
+
+print("Predict on test data", flush=True)
+input_test.obs["label_pred"] = classifier.predict(input_test.obsm["X_pca"])
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par['output'], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/label_projection/methods/scanvi/config.vsh.yaml b/src/tasks/label_projection/methods/scanvi/config.vsh.yaml
new file mode 100644
index 0000000000..6c36ead072
--- /dev/null
+++ b/src/tasks/label_projection/methods/scanvi/config.vsh.yaml
@@ -0,0 +1,46 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "scanvi"
+  info:
+    label: scANVI
+    summary: "scANVI predicts cell type labels for unlabelled test data by leveraging cell type labels, modelling uncertainty and using deep neural networks with stochastic optimization."
+    description: |
+      single-cell ANnotation using Variational Inference is a
+      semi-supervised variant of the scVI(Lopez et al. 2018) algorithm. Like scVI,
+      scANVI uses deep neural networks and stochastic optimization to model
+      uncertainty caused by technical noise and bias in single - cell
+      transcriptomics measurements. However, scANVI also leverages cell type labels
+      in the generative modelling. In this approach, scANVI is used to predict the
+      cell type labels of the unlabelled test data.
+    reference: "lotfollahi2020query"
+    repository_url: "https://github.com/scverse/scvi-tools"
+    documentation_url: https://scarches.readthedocs.io/en/latest/scanvi_surgery_pipeline.html
+    v1:
+      path: openproblems/tasks/label_projection/methods/scvi_tools.py
+      commit: e3be930c6d4bbd656ab1e656badb52bb50e6cdd6
+    preferred_normalization: counts
+    variants:
+      scanvi_all_genes:
+      scanvi_hvg:
+        num_hvg: 2000
+  arguments:
+    - name: "--num_hvg"
+      type: integer
+      description: "The number of HVG genes to subset to."
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_pytorch_nvidia:1.0.0
+    setup:
+      - type: python
+        packages: 
+          - scarches
+          - scvi-tools>=1.1.0
+      - type: docker
+        run: |
+          pip install -U "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, highcpu, gpu]
diff --git a/src/tasks/label_projection/methods/scanvi/script.py b/src/tasks/label_projection/methods/scanvi/script.py
new file mode 100644
index 0000000000..d34fccd932
--- /dev/null
+++ b/src/tasks/label_projection/methods/scanvi/script.py
@@ -0,0 +1,78 @@
+import anndata as ad
+import scarches as sca
+import pandas as pd
+
+# followed procedure from here:
+# https://scarches.readthedocs.io/en/latest/scanvi_surgery_pipeline.html
+
+## VIASH START
+par = {
+    'input_train': 'resources_test/label_projection/pancreas/train.h5ad',
+    'input_test': 'resources_test/label_projection/pancreas/test.h5ad',
+    'output': 'output.h5ad',
+    'num_hvg': 2000
+}
+meta = {
+    'functionality_name': 'scanvi'
+}
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+if par["num_hvg"]:
+    print("Subsetting to HVG", flush=True)
+    hvg_idx = input_train.var['hvg_score'].to_numpy().argsort()[:par["num_hvg"]]
+    input_train = input_train[:,hvg_idx]
+    input_test = input_test[:,hvg_idx]
+
+print("Concatenating train and test data", flush=True)
+input_train.obs['is_test'] = False
+input_test.obs['is_test'] = True
+input_test.obs['label'] = "Unknown"
+adata = ad.concat([input_train, input_test], merge = "same")
+del input_train
+
+print("Create SCANVI model and train it on fully labelled reference dataset", flush=True)
+sca.models.SCVI.setup_anndata(
+    adata, 
+    batch_key="batch", 
+    labels_key="label",
+    layer="counts"
+)
+
+vae = sca.models.SCVI(
+    adata,
+    n_layers=2,
+    encode_covariates=True,
+    deeply_inject_covariates=False,
+    use_layer_norm="both",
+    use_batch_norm="none",
+)
+
+print("Create the SCANVI model instance with ZINB loss", flush=True)
+scanvae = sca.models.SCANVI.from_scvi_model(vae, unlabeled_category = "Unknown")
+
+print("Train SCANVI model", flush=True)
+scanvae.train()
+
+print("Make predictions", flush=True)
+preds = scanvae.predict(adata)
+
+print("Store outputs", flush=True)
+output = ad.AnnData(
+    obs=pd.DataFrame(
+        {"label_pred": preds[adata.obs['is_test'].values]},
+        index=input_test.obs.index,
+    ),
+    var=input_test.var[[]],
+    uns={
+        "dataset_id": input_test.uns["dataset_id"],
+        "normalization_id": input_test.uns["normalization_id"],
+        "method_id": meta["functionality_name"],
+    },
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
diff --git a/src/tasks/label_projection/methods/scanvi_scarches/config.vsh.yaml b/src/tasks/label_projection/methods/scanvi_scarches/config.vsh.yaml
new file mode 100644
index 0000000000..ccf2f449b4
--- /dev/null
+++ b/src/tasks/label_projection/methods/scanvi_scarches/config.vsh.yaml
@@ -0,0 +1,53 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: scanvi_scarches
+  info:
+    label: scANVI+scArches
+    summary: 'Query to reference single-cell integration with transfer learning with scANVI and scArches'
+    description: 'scArches+scANVI or "Single-cell architecture surgery" is a deep learning method for mapping new datasets onto a pre-existing reference model, using transfer learning and parameter optimization. It first uses scANVI to build a reference model from the training data, and then apply scArches to map the test data onto the reference model and make predictions.'
+    reference: lotfollahi2020query
+    documentation_url: https://docs.scvi-tools.org
+    repository_url: https://github.com/scverse/scvi-tools
+    preferred_normalization: counts
+    v1:
+      path: openproblems/tasks/label_projection/methods/scvi_tools.py
+      commit: e3be930c6d4bbd656ab1e656badb52bb50e6cdd6
+    variants:
+      scanvi_scarches:
+  arguments:
+    - name: "--n_latent"
+      type: "integer"
+      default: 30
+      description: "Number of units in the latent layer"
+    - name: "--n_layers"
+      type: "integer"
+      default: 2
+      description: "Number of hidden layers"
+    - name: "--n_hidden"
+      type: "integer"
+      default: 128
+      description: "Number of units in the hidden layers"
+    - name: "--dropout_rate"
+      type: "double"
+      default: 0.2
+      description: "Rate of dropout applied in training"
+    - name: "--max_epochs"
+      type: "integer"
+      default: 2
+      description: "Maximum number of training epochs"
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_pytorch_nvidia:1.0.0
+    setup:
+      - type: python
+        pypi: scvi-tools>=1.1.0
+      - type: docker
+        run: |
+          pip install -U "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu, gpu]
diff --git a/src/tasks/label_projection/methods/scanvi_scarches/script.py b/src/tasks/label_projection/methods/scanvi_scarches/script.py
new file mode 100644
index 0000000000..73c9c0f1fa
--- /dev/null
+++ b/src/tasks/label_projection/methods/scanvi_scarches/script.py
@@ -0,0 +1,61 @@
+import anndata as ad
+import numpy as np
+import scvi
+
+## VIASH START
+par = {
+    "input_train": "resources_test/label_projection/pancreas/train.h5ad",
+    "input_test": "resources_test/label_projection/pancreas/test.h5ad",
+    "output": "output.h5ad",
+    "n_latent": 30,
+    "n_layers": 2,
+    "n_hidden": 128,
+    "dropout_rate": 0.2,
+    "max_epochs": 200,
+}
+meta = {"functionality_name": "scanvi_xgboost"}
+## VIASH END
+
+print("Reading input files", flush=True)
+input_train = ad.read_h5ad(par["input_train"])
+input_test = ad.read_h5ad(par["input_test"])
+input_train.X = input_train.layers["counts"]
+input_test.X = input_test.layers["counts"]
+
+print("Train model", flush=True)
+unlabeled_category = "Unknown"
+
+scvi.model.SCVI.setup_anndata(input_train, batch_key="batch", labels_key="label")
+
+# specific scArches parameters
+arches_params = dict(
+    use_layer_norm="both",
+    use_batch_norm="none",
+    encode_covariates=True,
+    dropout_rate=par["dropout_rate"],
+    n_hidden=par["n_hidden"],
+    n_layers=par["n_layers"],
+    n_latent=par["n_latent"],
+)
+scvi_model = scvi.model.SCVI(input_train, **arches_params)
+train_kwargs = dict(
+    train_size=0.9,
+    early_stopping=True,
+)
+scvi_model.train(**train_kwargs)
+model = scvi.model.SCANVI.from_scvi_model(
+    scvi_model, unlabeled_category=unlabeled_category
+)
+model.train(**train_kwargs)
+
+query_model = scvi.model.SCANVI.load_query_data(input_test, model)
+train_kwargs = dict(max_epochs=par["max_epochs"], early_stopping=True)
+query_model.train(plan_kwargs=dict(weight_decay=0.0), **train_kwargs)
+
+print("Generate predictions", flush=True)
+input_test.obs["label"] = "Unknown"
+input_test.obs["label_pred"] = query_model.predict(input_test)
+
+print("Write output AnnData to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par["output"], compression="gzip")
diff --git a/src/tasks/label_projection/methods/seurat_transferdata/config.vsh.yaml b/src/tasks/label_projection/methods/seurat_transferdata/config.vsh.yaml
new file mode 100644
index 0000000000..d51b532917
--- /dev/null
+++ b/src/tasks/label_projection/methods/seurat_transferdata/config.vsh.yaml
@@ -0,0 +1,36 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  status: disabled
+  name: "seurat_transferdata"
+  info:
+    label: Seurat TransferData
+    summary: "Seurat reference mapping predicts cell types for unlabelled cells using PCA distances, labelled anchors, and transfer anchors from Seurat, with SCTransform normalization."
+    description: |
+      Seurat reference mapping is a cell type label transfer method provided by the
+      Seurat package. Gene expression counts are first normalised by SCTransform
+      before computing PCA. Then it finds mutual nearest neighbours, known as
+      transfer anchors, between the labelled and unlabelled part of the data in PCA
+      space, and computes each cell's distance to each of the anchor pairs.
+      Finally, it uses the labelled anchors to predict cell types for unlabelled
+      cells based on these distances.
+    reference: "hao2021integrated"
+    repository_url: "https://github.com/satijalab/seurat"
+    documentation_url: "https://satijalab.org/seurat/articles/integration_mapping.html"
+    v1:
+      path: openproblems/tasks/label_projection/methods/seurat.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    preferred_normalization: log_cp10k
+    variants:
+      seurat:
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [ Matrix>=1.5.3, Seurat, rlang ]
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/label_projection/methods/seurat_transferdata/script.R b/src/tasks/label_projection/methods/seurat_transferdata/script.R
new file mode 100644
index 0000000000..999eb769ce
--- /dev/null
+++ b/src/tasks/label_projection/methods/seurat_transferdata/script.R
@@ -0,0 +1,81 @@
+cat(">> Loading dependencies\n")
+library(Matrix, warn.conflicts = FALSE)
+library(anndata, warn.conflicts = FALSE)
+requireNamespace("Seurat", quietly = TRUE)
+library(magrittr, warn.conflicts = FALSE)
+
+## VIASH START
+par <- list(
+  input_train = "resources_test/label_projection/pancreas/train.h5ad",
+  input_test = "resources_test/label_projection/pancreas/test.h5ad",
+  output = "output.h5ad"
+)
+## VIASH END
+
+packageVersion("Matrix")
+
+cat(">> Load input data\n")
+input_train <- read_h5ad(par$input_train)
+input_test <- read_h5ad(par$input_test)
+
+# sce_train <- zellkonverter::readH5AD(par$input_train)
+# obj_train <- Seurat::as.Seurat(sce_train, data = "normalized")
+# sce_test <- zellkonverter::readH5AD(par$input_test)
+# obj_test <- Seurat::as.Seurat(sce_test, data = "normalized")
+
+cat(">> Converting AnnData to Seurat\n")
+anndataToSeurat <- function(adata) {
+  # interpreted from https://github.com/satijalab/seurat/blob/v3.1.0/R/objects.R
+  obj <-
+    SeuratObject::CreateSeuratObject(
+      counts = as(Matrix::t(adata$layers[["counts"]]), "CsparseMatrix")
+    ) %>%
+    SeuratObject::SetAssayData(
+      slot = "data",
+      new.data = as(Matrix::t(adata$layers[["normalized"]]), "CsparseMatrix")
+    ) %>%
+    SeuratObject::AddMetaData(
+      adata$obs
+    )
+
+  # set hvg
+  SeuratObject::VariableFeatures(obj) <- adata$var_names[adata$var[["hvg"]]]
+
+  # set embedding
+  # could add loadings and stdev
+  embed <- SeuratObject::CreateDimReducObject(
+    embeddings = adata$obsm[["X_pca"]],
+    key = "PC_"
+  )
+  obj[["pca"]] <- embed
+
+  # return
+  obj
+}
+
+obj_train <- anndataToSeurat(input_train)
+obj_test <- anndataToSeurat(input_test)
+
+cat(">> Find transfer anchors\n")
+npcs <- ncol(obj_train[["pca"]])
+anchors <- Seurat::FindTransferAnchors(
+  reference = obj_train,
+  query = obj_test,
+  npcs = npcs,
+  dims = seq_len(npcs),
+  verbose = FALSE
+)
+
+cat(">> Predict on test data\n")
+query <- Seurat::TransferData(
+  anchorset = anchors,
+  reference = obj_train,
+  query = obj_test,
+  refdata = list(labels = "label"),
+  verbose = FALSE
+)
+input_test$obs[["label_pred"]] <- query$predicted.labels[input_test$obs_names]
+
+cat(">> Write output to file\n")
+input_test$uns[["method_id"]] <- meta[["functionality_name"]]
+input_test$write_h5ad(par$output, compression = "gzip")
diff --git a/src/tasks/label_projection/methods/xgboost/config.vsh.yaml b/src/tasks/label_projection/methods/xgboost/config.vsh.yaml
new file mode 100644
index 0000000000..516308fbdd
--- /dev/null
+++ b/src/tasks/label_projection/methods/xgboost/config.vsh.yaml
@@ -0,0 +1,34 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "xgboost"
+  info:
+    label: XGBoost
+    summary: "XGBoost is a decision tree model that averages multiple trees with gradient boosting."
+    description: |
+      XGBoost is a gradient boosting decision tree model that learns multiple tree
+      structures in the form of a series of input features and their values,
+      leading to a prediction decision, and averages predictions from all its
+      trees. Here, input features are normalised gene expression values.
+    reference: "chen2016xgboost"
+    repository_url: "https://github.com/dmlc/xgboost"
+    documentation_url: "https://xgboost.readthedocs.io/en/stable/index.html"
+    v1:
+      path: openproblems/tasks/label_projection/methods/xgboost.py
+      commit: e3be930c6d4bbd656ab1e656badb52bb50e6cdd6
+    preferred_normalization: log_cp10k
+    variants:
+      xgboost_log_cp10k:
+      xgboost_scran:
+        preferred_normalization: log_scran_pooling
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: xgboost
+  - type: nextflow
+    directives: 
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/label_projection/methods/xgboost/script.py b/src/tasks/label_projection/methods/xgboost/script.py
new file mode 100644
index 0000000000..c56eae59d5
--- /dev/null
+++ b/src/tasks/label_projection/methods/xgboost/script.py
@@ -0,0 +1,39 @@
+import anndata as ad
+import xgboost as xgb
+
+## VIASH START
+par = {
+    'input_train': 'resources_test/label_projection/pancreas/train.h5ad',
+    'input_test': 'resources_test/label_projection/pancreas/test.h5ad',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'foo',
+}
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+input_layer = "normalized"
+
+print("Transform into integers", flush=True)
+input_train.obs["label_int"] = input_train.obs["label"].cat.codes
+categories = input_train.obs["label"].cat.categories
+
+print("Convert AnnDatas into datasets", flush=True)
+xg_train = xgb.DMatrix(input_train.layers[input_layer], label=input_train.obs["label_int"])
+xg_test = xgb.DMatrix(input_test.layers[input_layer])
+
+print("Fit on train data", flush=True)
+param = {'objective': 'multi:softmax', 'num_class': len(categories)}
+watchlist = [(xg_train, "train")]
+xgb_op = xgb.train(param, xg_train, evals=watchlist)
+
+print("Predict on test data", flush=True)
+pred = xgb_op.predict(xg_test).astype(int)
+input_test.obs["label_pred"] = categories[pred]
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par['output'], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/label_projection/metrics/accuracy/config.vsh.yaml b/src/tasks/label_projection/metrics/accuracy/config.vsh.yaml
new file mode 100644
index 0000000000..8fc7021ffa
--- /dev/null
+++ b/src/tasks/label_projection/metrics/accuracy/config.vsh.yaml
@@ -0,0 +1,28 @@
+__merge__: ../../api/comp_metric.yaml
+functionality:
+  name: "accuracy"
+  info:
+    metrics:
+      - name: accuracy
+        label: Accuracy
+        summary: "The percentage of correctly predicted labels."
+        description: "The percentage of correctly predicted labels."
+        min: 0
+        max: 1
+        maximize: true
+        reference: grandini2020metrics
+        v1:
+          path: openproblems/tasks/label_projection/metrics/accuracy.py
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: scikit-learn
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/label_projection/metrics/accuracy/script.py b/src/tasks/label_projection/metrics/accuracy/script.py
new file mode 100644
index 0000000000..80795111d5
--- /dev/null
+++ b/src/tasks/label_projection/metrics/accuracy/script.py
@@ -0,0 +1,36 @@
+import numpy as np
+import sklearn.preprocessing
+import anndata as ad
+
+## VIASH START
+par = {
+    'input_prediction': 'resources_test/label_projection/pancreas/knn.h5ad',
+    'input_solution': 'resources_test/label_projection/pancreas/solution.h5ad',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'accuracy'
+}
+## VIASH END
+
+print("Load data", flush=True)
+input_prediction = ad.read_h5ad(par['input_prediction'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+assert (input_prediction.obs_names == input_solution.obs_names).all(), "obs_names not the same in prediction and solution inputs"
+
+print("Encode labels", flush=True)
+cats = list(input_solution.obs["label"].dtype.categories) + list(input_prediction.obs["label_pred"].dtype.categories)
+encoder = sklearn.preprocessing.LabelEncoder().fit(cats)
+input_solution.obs["label"] = encoder.transform(input_solution.obs["label"])
+input_prediction.obs["label_pred"] = encoder.transform(input_prediction.obs["label_pred"])
+
+print("Compute prediction accuracy", flush=True)
+accuracy = np.mean(input_solution.obs["label"] == input_prediction.obs["label_pred"])
+
+print("Store metric value", flush=True)
+input_prediction.uns["metric_ids"] = "accuracy"
+input_prediction.uns["metric_values"] = accuracy
+
+print("Writing adata to file", flush=True)
+input_prediction.write_h5ad(par['output'], compression="gzip")
diff --git a/src/tasks/label_projection/metrics/f1/config.vsh.yaml b/src/tasks/label_projection/metrics/f1/config.vsh.yaml
new file mode 100644
index 0000000000..f5abc0caa6
--- /dev/null
+++ b/src/tasks/label_projection/metrics/f1/config.vsh.yaml
@@ -0,0 +1,50 @@
+__merge__: ../../api/comp_metric.yaml
+functionality:
+  name: "f1"
+  info:
+    metrics:
+      - name: f1_weighted
+        label: F1 weighted
+        summary: "Average weigthed support between each labels F1 score"
+        description: "Calculates the F1 score for each label, and find their average weighted by support (the number of true instances for each label). This alters 'macro' to account for label imbalance; it can result in an F-score that is not between precision and recall."
+        reference: grandini2020metrics
+        min: 0
+        max: 1
+        maximize: true
+        v1:
+          path: openproblems/tasks/label_projection/metrics/f1.py
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+      - name: f1_macro
+        label: F1 macro
+        summary: "Unweighted mean of each label F1-score"
+        description: "Calculates the F1 score for each label, and find their unweighted mean. This does not take label imbalance into account."
+        reference: grandini2020metrics
+        min: 0
+        max: 1
+        maximize: true
+        v1:
+          path: openproblems/tasks/label_projection/metrics/f1.py
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+      - name: f1_micro
+        label: F1 micro
+        summary: "Calculation of TP, FN and FP."
+        description: "Calculates the F1 score globally by counting the total true positives, false negatives and false positives."
+        reference: grandini2020metrics
+        min: 0
+        max: 1
+        maximize: true
+        v1:
+          path: openproblems/tasks/label_projection/metrics/f1.py
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: scikit-learn
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/label_projection/metrics/f1/script.py b/src/tasks/label_projection/metrics/f1/script.py
new file mode 100644
index 0000000000..4d4b1a2395
--- /dev/null
+++ b/src/tasks/label_projection/metrics/f1/script.py
@@ -0,0 +1,43 @@
+from sklearn.metrics import f1_score
+import sklearn.preprocessing
+import anndata as ad
+
+## VIASH START
+par = {
+    'input_prediction': 'resources_test/label_projection/pancreas/knn.h5ad',
+    'input_solution': 'resources_test/label_projection/pancreas/solution.h5ad',
+    'average': 'weighted',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'f1'
+}
+## VIASH END
+
+print("Load data", flush=True)
+input_prediction = ad.read_h5ad(par['input_prediction'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+assert (input_prediction.obs_names == input_solution.obs_names).all(), "obs_names not the same in prediction and solution inputs"
+
+print("Encode labels", flush=True)
+cats = list(input_solution.obs["label"].dtype.categories) + list(input_prediction.obs["label_pred"].dtype.categories)
+encoder = sklearn.preprocessing.LabelEncoder().fit(cats)
+input_solution.obs["label"] = encoder.transform(input_solution.obs["label"])
+input_prediction.obs["label_pred"] = encoder.transform(input_prediction.obs["label_pred"])
+
+print("Compute F1 score", flush=True)
+metric_type = [ "macro", "micro", "weighted" ]
+metric_id = [ "f1_" + x for x in metric_type]
+metric_value = [ f1_score(
+        input_solution.obs["label"], 
+        input_prediction.obs["label_pred"], 
+        average=x
+    ) for x in metric_type ]
+
+print("Store metric value", flush=True)
+input_prediction.uns["metric_ids"] = metric_id
+input_prediction.uns["metric_values"] = metric_value
+
+print("Writing adata to file", flush=True)
+input_prediction.write_h5ad(par['output'], compression="gzip")
diff --git a/src/tasks/label_projection/process_dataset/config.vsh.yaml b/src/tasks/label_projection/process_dataset/config.vsh.yaml
new file mode 100644
index 0000000000..aa010876cb
--- /dev/null
+++ b/src/tasks/label_projection/process_dataset/config.vsh.yaml
@@ -0,0 +1,31 @@
+__merge__: ../api/comp_process_dataset.yaml
+functionality:
+  name: "process_dataset"
+  arguments:
+    - name: "--method"
+      type: "string"
+      description: "The process method to assign train/test."
+      choices: ["batch", "random"]
+      default: "batch"
+    - name: "--obs_label"
+      type: "string"
+      description: "Which .obs slot to use as label."
+      default: "cell_type"
+    - name: "--obs_batch"
+      type: "string"
+      description: "Which .obs slot to use as batch covariate."
+      default: "batch"
+    - name: "--seed"
+      type: "integer"
+      description: "A seed for the subsampling."
+      example: 123
+  resources:
+    - type: python_script
+      path: script.py
+    - path: /src/common/helper_functions/subset_anndata.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [highmem, midcpu , midtime]
diff --git a/src/tasks/label_projection/process_dataset/script.py b/src/tasks/label_projection/process_dataset/script.py
new file mode 100644
index 0000000000..0f2c5482b6
--- /dev/null
+++ b/src/tasks/label_projection/process_dataset/script.py
@@ -0,0 +1,78 @@
+import sys
+import random
+import numpy as np
+import anndata as ad
+
+## VIASH START
+par = {
+    'input': 'resources_test/common/pancreas/dataset.h5ad',
+    'method': 'batch',
+    'seed': None,
+    'obs_batch': 'batch',
+    'obs_label': 'cell_type',
+    'output_train': 'train.h5ad',
+    'output_test': 'test.h5ad',
+    'output_solution': 'solution.h5ad'
+}
+meta = {
+    'resources_dir': 'src/tasks/label_projection/process_dataset',
+    'config': 'src/tasks/label_projection/process_dataset/.config.vsh.yaml'
+}
+## VIASH END
+
+# import helper functions
+sys.path.append(meta['resources_dir'])
+from subset_anndata import read_config_slots_info, subset_anndata
+
+# set seed if need be
+if par["seed"]:
+    print(f">> Setting seed to {par['seed']}")
+    random.seed(par["seed"])
+
+print(">> Load data", flush=True)
+adata = ad.read_h5ad(par["input"])
+print("input:", adata)
+
+print(f">> Process data using {par['method']} method")
+if par["method"] == "batch":
+    batch_info = adata.obs[par["obs_batch"]]
+    batch_categories = batch_info.dtype.categories
+    test_batches = random.sample(list(batch_categories), 1)
+    is_test = [ x in test_batches for x in batch_info ]
+elif par["method"] == "random":
+    train_ix = np.random.choice(adata.n_obs, round(adata.n_obs * 0.8), replace=False)
+    is_test = [ not x in train_ix for x in range(0, adata.n_obs) ]
+
+# subset the different adatas
+print(">> Figuring which data needs to be copied to which output file", flush=True)
+# use par arguments to look for label and batch value in different slots
+slot_mapping = {
+    "obs": {
+        "label": par["obs_label"],
+        "batch": par["obs_batch"],
+    }
+}
+slot_info = read_config_slots_info(meta["config"], slot_mapping)
+
+print(">> Creating train data", flush=True)
+output_train = subset_anndata(
+    adata[[not x for x in is_test]], 
+    slot_info["output_train"]
+)
+
+print(">> Creating test data", flush=True)
+output_test = subset_anndata(
+    adata[is_test],
+    slot_info["output_test"]
+)
+
+print(">> Creating solution data", flush=True)
+output_solution = subset_anndata(
+    adata[is_test],
+    slot_info['output_solution']
+)
+
+print(">> Writing data", flush=True)
+output_train.write_h5ad(par["output_train"])
+output_test.write_h5ad(par["output_test"])
+output_solution.write_h5ad(par["output_solution"])
diff --git a/src/tasks/label_projection/resources_scripts/process_datasets.sh b/src/tasks/label_projection/resources_scripts/process_datasets.sh
new file mode 100755
index 0000000000..d5c6353ff5
--- /dev/null
+++ b/src/tasks/label_projection/resources_scripts/process_datasets.sh
@@ -0,0 +1,34 @@
+#!/bin/bash
+
+cat > /tmp/params.yaml << 'HERE'
+id: label_projection_process_datasets
+input_states: s3://openproblems-data/resources/datasets/**/state.yaml
+rename_keys: 'input:output_dataset'
+settings: '{"output_train": "$id/train.h5ad", "output_test": "$id/test.h5ad", "output_solution": "$id/solution.h5ad"}'
+output_state: "$id/state.yaml"
+publish_dir: s3://openproblems-data/resources/label_projection/datasets
+HERE
+
+cat > /tmp/nextflow.config << HERE
+process {
+  executor = 'awsbatch'
+  withName:'.*publishStatesProc' {
+      memory = '16GB'
+      disk = '100GB'
+   }
+  withLabel:highmem {
+      memory = '350GB'
+   }
+}
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/label_projection/workflows/process_datasets/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --entry-name auto \
+  --config /tmp/nextflow.config \
+  --labels label_projection,process_datasets
\ No newline at end of file
diff --git a/src/tasks/label_projection/resources_scripts/run_benchmark.sh b/src/tasks/label_projection/resources_scripts/run_benchmark.sh
new file mode 100755
index 0000000000..8733e22f52
--- /dev/null
+++ b/src/tasks/label_projection/resources_scripts/run_benchmark.sh
@@ -0,0 +1,23 @@
+#!/bin/bash
+
+RUN_ID="run_$(date +%Y-%m-%d_%H-%M-%S)"
+publish_dir="s3://openproblems-data/resources/label_projection/results/${RUN_ID}"
+
+cat > /tmp/params.yaml << HERE
+input_states: s3://openproblems-data/resources/label_projection/datasets/**/state.yaml
+rename_keys: 'input_train:output_train,input_test:output_test,input_solution:output_solution'
+output_state: "state.yaml"
+settings: '{"method_ids": "scanvi_scarches"}'
+publish_dir: "$publish_dir"
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/label_projection/workflows/run_benchmark/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --entry-name auto \
+  --config src/wf_utils/labels_tw.config \
+  --labels label_projection,full
\ No newline at end of file
diff --git a/src/tasks/label_projection/resources_scripts/run_benchmark_test.sh b/src/tasks/label_projection/resources_scripts/run_benchmark_test.sh
new file mode 100755
index 0000000000..caf699a384
--- /dev/null
+++ b/src/tasks/label_projection/resources_scripts/run_benchmark_test.sh
@@ -0,0 +1,25 @@
+#!/bin/bash
+
+cat > /tmp/params.yaml << 'HERE'
+input_states: s3://openproblems-data/resources_test/label_projection/**/state.yaml
+rename_keys: 'input_train:output_train,input_test:output_test,input_solution:output_solution'
+output_state: "state.yaml"
+publish_dir: s3://openproblems-nextflow/temp/label_projection/
+HERE
+
+cat > /tmp/nextflow.config << HERE
+process {
+  executor = 'awsbatch'
+}
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/label_projection/workflows/run_benchmark/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --entry-name auto \
+  --config /tmp/nextflow.config \
+  --labels label_projection,test
\ No newline at end of file
diff --git a/src/tasks/label_projection/resources_test_scripts/pancreas.sh b/src/tasks/label_projection/resources_test_scripts/pancreas.sh
new file mode 100755
index 0000000000..5a69340510
--- /dev/null
+++ b/src/tasks/label_projection/resources_test_scripts/pancreas.sh
@@ -0,0 +1,39 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+RAW_DATA=resources_test/common
+DATASET_DIR=resources_test/label_projection
+
+mkdir -p $DATASET_DIR
+
+# process dataset
+echo Running process_dataset
+nextflow run . \
+  -main-script target/nextflow/label_projection/workflows/process_datasets/main.nf \
+  -profile docker \
+  -entry auto \
+  --input_states "$RAW_DATA/**/state.yaml" \
+  --rename_keys 'input:output_dataset' \
+  --settings '{"output_train": "$id/train.h5ad", "output_test": "$id/test.h5ad", "output_solution": "$id/solution.h5ad"}' \
+  --publish_dir "$DATASET_DIR" \
+  --output_state '$id/state.yaml'
+# output_state should be moved to settings once workaround is solved
+
+# run one method
+viash run src/tasks/label_projection/methods/knn/config.vsh.yaml -- \
+    --input_train $DATASET_DIR/pancreas/train.h5ad \
+    --input_test $DATASET_DIR/pancreas/test.h5ad \
+    --output $DATASET_DIR/pancreas/prediction.h5ad
+
+# run one metric
+viash run src/tasks/label_projection/metrics/accuracy/config.vsh.yaml -- \
+    --input_prediction $DATASET_DIR/pancreas/prediction.h5ad \
+    --input_solution $DATASET_DIR/pancreas/solution.h5ad \
+    --output $DATASET_DIR/pancreas/score.h5ad
diff --git a/src/tasks/label_projection/workflows/process_datasets/config.vsh.yaml b/src/tasks/label_projection/workflows/process_datasets/config.vsh.yaml
new file mode 100644
index 0000000000..09b2e9a829
--- /dev/null
+++ b/src/tasks/label_projection/workflows/process_datasets/config.vsh.yaml
@@ -0,0 +1,34 @@
+functionality:
+  name: "process_datasets"
+  namespace: "label_projection/workflows"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input"
+          __merge__: "/src/tasks/label_projection/api/file_common_dataset.yaml"
+          required: true
+          direction: input
+    - name: Outputs
+      arguments:
+        - name: "--output_train"
+          __merge__: /src/tasks/label_projection/api/file_train.yaml
+          required: true
+          direction: output
+        - name: "--output_test"
+          __merge__: /src/tasks/label_projection/api/file_test.yaml
+          required: true
+          direction: output
+        - name: "--output_solution"
+          __merge__: /src/tasks/label_projection/api/file_solution.yaml
+          required: true
+          direction: output
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - path: /src/wf_utils/helper.nf
+  dependencies:
+    - name: common/check_dataset_schema
+    - name: label_projection/process_dataset
+platforms:
+  - type: nextflow
diff --git a/src/tasks/label_projection/workflows/process_datasets/main.nf b/src/tasks/label_projection/workflows/process_datasets/main.nf
new file mode 100644
index 0000000000..88cf24935c
--- /dev/null
+++ b/src/tasks/label_projection/workflows/process_datasets/main.nf
@@ -0,0 +1,55 @@
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    | check_dataset_schema.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset": checks["exit_code"] == 0 ? state.input : null,
+        ]
+      }
+    )
+
+    // remove datasets which didn't pass the schema check
+    | filter { id, state ->
+      state.dataset != null
+    }
+
+    | process_dataset.run(
+      fromState: [ input: "dataset" ],
+      toState: [
+        output_train: "output_train",
+        output_test: "output_test",
+        output_solution: "output_solution"
+      ]
+    )
+
+    // only output the files for which an output file was specified
+    | setState(["output_train", "output_test", "output_solution"])
+
+  emit:
+  output_ch
+}
diff --git a/src/tasks/label_projection/workflows/run_benchmark/config.vsh.yaml b/src/tasks/label_projection/workflows/run_benchmark/config.vsh.yaml
new file mode 100644
index 0000000000..083bb47a5a
--- /dev/null
+++ b/src/tasks/label_projection/workflows/run_benchmark/config.vsh.yaml
@@ -0,0 +1,77 @@
+functionality:
+  name: "run_benchmark"
+  namespace: "label_projection/workflows"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input_train"
+          __merge__: /src/tasks/label_projection/api/file_train.yaml
+          type: file
+          direction: input
+          required: true
+        - name: "--input_test"
+          __merge__: /src/tasks/label_projection/api/file_test.yaml
+          type: file
+          direction: input
+          required: true
+        - name: "--input_solution"
+          __merge__: /src/tasks/label_projection/api/file_solution.yaml
+          type: file
+          direction: input
+          required: true
+    - name: Outputs
+      arguments:
+        - name: "--output_scores"
+          type: file
+          required: true
+          direction: output
+          description: A yaml file containing the scores of each of the methods
+          default: score_uns.yaml
+        - name: "--output_method_configs"
+          type: file
+          required: true
+          direction: output
+          default: method_configs.yaml
+        - name: "--output_metric_configs"
+          type: file
+          required: true
+          direction: output
+          default: metric_configs.yaml
+        - name: "--output_dataset_info"
+          type: file
+          required: true
+          direction: output
+          default: dataset_uns.yaml
+        - name: "--output_task_info"
+          type: file
+          required: true
+          direction: output
+          default: task_info.yaml
+    - name: Methods
+      arguments:
+        - name: "--method_ids"
+          type: string
+          multiple: true
+          description: A list of method ids to run. If not specified, all methods will be run.
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - type: file
+      path: "../../api/task_info.yaml"
+  dependencies:
+    - name: common/check_dataset_schema
+    - name: common/extract_metadata
+    - name: label_projection/control_methods/true_labels
+    - name: label_projection/control_methods/majority_vote
+    - name: label_projection/control_methods/random_labels
+    - name: label_projection/methods/knn
+    - name: label_projection/methods/logistic_regression
+    - name: label_projection/methods/mlp
+    - name: label_projection/methods/scanvi
+    - name: label_projection/methods/scanvi_scarches
+    - name: label_projection/methods/xgboost
+    - name: label_projection/metrics/accuracy
+    - name: label_projection/metrics/f1
+platforms:
+  - type: nextflow
\ No newline at end of file
diff --git a/src/tasks/label_projection/workflows/run_benchmark/main.nf b/src/tasks/label_projection/workflows/run_benchmark/main.nf
new file mode 100644
index 0000000000..5dafc98d1e
--- /dev/null
+++ b/src/tasks/label_projection/workflows/run_benchmark/main.nf
@@ -0,0 +1,200 @@
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // construct list of methods
+  methods = [
+    true_labels,
+    majority_vote,
+    random_labels,
+    knn,
+    logistic_regression,
+    mlp,
+    scanvi,
+    scanvi_scarches,
+    // seurat_transferdata,
+    xgboost
+  ]
+
+  // construct list of metrics
+  metrics = [
+    accuracy,
+    f1
+  ]
+
+  /****************************
+   * EXTRACT DATASET METADATA *
+   ****************************/
+  dataset_ch = input_ch
+    // store join id
+    | map{ id, state -> 
+      [id, state + ["_meta": [join_id: id]]]
+    }
+
+    // extract the dataset metadata
+    | extract_metadata.run(
+      fromState: [input: "input_solution"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+  /***************************
+   * RUN METHODS AND METRICS *
+   ***************************/
+  score_ch = dataset_ch
+
+    // run all methods
+    | runEach(
+      components: methods,
+
+      // use the 'filter' argument to only run a method on the normalisation the component is asking for
+      filter: { id, state, comp ->
+        def norm = state.dataset_uns.normalization_id
+        def pref = comp.config.functionality.info.preferred_normalization
+        // if the preferred normalisation is none at all,
+        // we can pass whichever dataset we want
+        def norm_check = (norm == "log_cp10k" && pref == "counts") || norm == pref
+        def method_check = !state.method_ids || state.method_ids.contains(comp.config.functionality.name)
+
+        method_check && norm_check
+      },
+
+      // define a new 'id' by appending the method name to the dataset id
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: { id, state, comp ->
+        def new_args = [
+          input_train: state.input_train,
+          input_test: state.input_test
+        ]
+        if (comp.config.functionality.info.type == "control_method") {
+          new_args.input_solution = state.input_solution
+        }
+        new_args
+      },
+
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          method_id: comp.config.functionality.name,
+          method_output: output.output
+        ]
+      }
+    )
+
+    // run all metrics
+    | runEach(
+      components: metrics,
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: [
+        input_solution: "input_solution", 
+        input_prediction: "method_output"
+      ],
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          metric_id: comp.config.functionality.name,
+          metric_output: output.output
+        ]
+      }
+    )
+
+
+  /******************************
+   * GENERATE OUTPUT YAML FILES *
+   ******************************/
+  // TODO: can we store everything below in a separate helper function?
+
+  // extract the dataset metadata
+  dataset_meta_ch = dataset_ch
+    // only keep one of the normalization methods
+    | filter{ id, state ->
+      state.dataset_uns.normalization_id == "log_cp10k"
+    }
+    | joinStates { ids, states ->
+      // store the dataset metadata in a file
+      def dataset_uns = states.collect{state ->
+        def uns = state.dataset_uns.clone()
+        uns.remove("normalization_id")
+        uns
+      }
+      def dataset_uns_yaml_blob = toYamlBlob(dataset_uns)
+      def dataset_uns_file = tempFile("dataset_uns.yaml")
+      dataset_uns_file.write(dataset_uns_yaml_blob)
+
+      ["output", [output_dataset_info: dataset_uns_file]]
+    }
+
+  output_ch = score_ch
+
+    // extract the scores
+    | extract_metadata.run(
+      key: "extract_scores",
+      fromState: [input: "metric_output"],
+      toState: { id, output, state ->
+        state + [
+          score_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | joinStates { ids, states ->
+      // store the method configs in a file
+      def method_configs = methods.collect{it.config}
+      def method_configs_yaml_blob = toYamlBlob(method_configs)
+      def method_configs_file = tempFile("method_configs.yaml")
+      method_configs_file.write(method_configs_yaml_blob)
+
+      // store the metric configs in a file
+      def metric_configs = metrics.collect{it.config}
+      def metric_configs_yaml_blob = toYamlBlob(metric_configs)
+      def metric_configs_file = tempFile("metric_configs.yaml")
+      metric_configs_file.write(metric_configs_yaml_blob)
+
+      def task_info_file = meta.resources_dir.resolve("task_info.yaml")
+
+      // store the scores in a file
+      def score_uns = states.collect{it.score_uns}
+      def score_uns_yaml_blob = toYamlBlob(score_uns)
+      def score_uns_file = tempFile("score_uns.yaml")
+      score_uns_file.write(score_uns_yaml_blob)
+
+      def new_state = [
+        output_method_configs: method_configs_file,
+        output_metric_configs: metric_configs_file,
+        output_task_info: task_info_file,
+        output_scores: score_uns_file,
+        _meta: states[0]._meta
+      ]
+
+      ["output", new_state]
+    }
+
+    // merge all of the output data 
+    | mix(dataset_meta_ch)
+    | joinStates{ ids, states ->
+      def mergedStates = states.inject([:]) { acc, m -> acc + m }
+      [ids[0], mergedStates]
+    }
+
+  emit:
+  output_ch
+}
\ No newline at end of file
diff --git a/src/tasks/label_projection/workflows/run_benchmark/run_test.sh b/src/tasks/label_projection/workflows/run_benchmark/run_test.sh
new file mode 100755
index 0000000000..e9c712af48
--- /dev/null
+++ b/src/tasks/label_projection/workflows/run_benchmark/run_test.sh
@@ -0,0 +1,31 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+# export TOWER_WORKSPACE_ID=53907369739130
+
+DATASETS_DIR="resources_test/label_projection"
+OUTPUT_DIR="output/temp"
+
+if [ ! -d "$OUTPUT_DIR" ]; then
+  mkdir -p "$OUTPUT_DIR"
+fi
+
+export NXF_VER=22.04.5
+nextflow run . \
+  -main-script target/nextflow/label_projection/workflows/run_benchmark/main.nf \
+  -profile docker \
+  -resume \
+  -entry auto \
+  -c src/wf_utils/labels_ci.config \
+  --input_states "$DATASETS_DIR/**/state.yaml" \
+  --rename_keys 'input_train:output_train,input_test:output_test,input_solution:output_solution' \
+  --settings '{"output_scores": "scores.yaml", "output_dataset_info": "dataset_info.yaml", "output_method_configs": "method_configs.yaml", "output_metric_configs": "metric_configs.yaml", "output_task_info": "task_info.yaml"}' \
+  --publish_dir "$OUTPUT_DIR" \
+  --output_state "state.yaml"
diff --git a/src/tasks/match_modalities/README.md b/src/tasks/match_modalities/README.md
new file mode 100644
index 0000000000..777f367507
--- /dev/null
+++ b/src/tasks/match_modalities/README.md
@@ -0,0 +1,499 @@
+# Match Modalities
+
+
+Match cells across datasets of the same set of samples on different
+technologies / modalities.
+
+Path:
+[`src/tasks/match_modalities`](https://github.com/openproblems-bio/openproblems/tree/main/src/tasks/match_modalities)
+
+## Motivation
+
+Cellular function is regulated by the complex interplay of different
+types of biological molecules (DNA, RNA, proteins, etc.), which
+determine the state of a cell. Several recently described technologies
+allow for simultaneous measurement of different aspects of cellular
+state. For example, sci-CAR \[@cao2018joint\] jointly profiles RNA
+expression and chromatin accessibility on the same cell and CITE-seq
+\[@stoeckius2017simultaneous\] measures surface protein abundance and
+RNA expression from each cell. These technologies enable us to better
+understand cellular function, however datasets are still rare and there
+are tradeoffs that these measurements make for to profile multiple
+modalities.
+
+Joint methods can be more expensive or lower throughput or more noisy
+than measuring a single modality at a time. Therefore it is useful to
+develop methods that are capable of integrating measurements of the same
+biological system but obtained using different technologies on different
+cells.
+
+## Description
+
+In this task, the goal is to learn a latent space where cells profiled
+by different technologies in different modalities are matched if they
+have the same state. We use jointly profiled data as ground truth so
+that we can evaluate when the observations from the same cell acquired
+using different modalities are similar. A perfect result has each of the
+paired observations sharing the same coordinates in the latent space. A
+method that can achieve this would be able to match datasets across
+modalities to enable multimodal cellular analysis from separately
+measured profiles.
+
+## Authors & contributors
+
+| name              | roles              |
+|:------------------|:-------------------|
+| Scott Gigante     | author, maintainer |
+| Alex Tong         | author             |
+| Robrecht Cannoodt | author             |
+| Kai Waldrant      | contributor        |
+
+## API
+
+``` mermaid
+flowchart LR
+  file_common_dataset_mod1("Common dataset mod1")
+  comp_process_dataset[/"Data processor"/]
+  file_dataset_mod1("Modality 1")
+  file_dataset_mod2("Modality 2")
+  file_solution_mod1("Solution mod1")
+  file_solution_mod2("Solution mod1")
+  comp_control_method[/"Control method"/]
+  comp_method[/"Method"/]
+  comp_metric[/"Metric"/]
+  file_integrated_mod1("Integrated mod1")
+  file_integrated_mod2("Integrated mod2")
+  file_score("Score")
+  file_common_dataset_mod2("Common dataset mod2")
+  file_common_dataset_mod1---comp_process_dataset
+  comp_process_dataset-->file_dataset_mod1
+  comp_process_dataset-->file_dataset_mod2
+  comp_process_dataset-->file_solution_mod1
+  comp_process_dataset-->file_solution_mod2
+  file_dataset_mod1---comp_control_method
+  file_dataset_mod1---comp_method
+  file_dataset_mod2---comp_control_method
+  file_dataset_mod2---comp_method
+  file_solution_mod1---comp_control_method
+  file_solution_mod1---comp_metric
+  file_solution_mod2---comp_control_method
+  file_solution_mod2---comp_metric
+  comp_control_method-->file_integrated_mod1
+  comp_control_method-->file_integrated_mod2
+  comp_method-->file_integrated_mod1
+  comp_method-->file_integrated_mod2
+  comp_metric-->file_score
+  file_integrated_mod1---comp_metric
+  file_integrated_mod2---comp_metric
+  file_common_dataset_mod2---comp_process_dataset
+```
+
+## File format: Common dataset mod1
+
+The first modality (RNA) of a dataset processed by the common multimodal
+dataset processing pipeline.
+
+Example file:
+`resources_test/common/scicar_cell_lines/dataset_mod1.h5ad`
+
+Description:
+
+This dataset contains both raw counts and normalized data matrices, as
+well as a PCA embedding, HVG selection and a kNN graph.
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obsm: 'X_svd'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                         | Type      | Description                                                                    |
+|:-----------------------------|:----------|:-------------------------------------------------------------------------------|
+| `obsm["X_svd"]`              | `double`  | The resulting SVD PCA embedding.                                               |
+| `layers["counts"]`           | `integer` | Raw counts.                                                                    |
+| `layers["normalized"]`       | `double`  | Normalized counts.                                                             |
+| `uns["dataset_id"]`          | `string`  | A unique identifier for the dataset.                                           |
+| `uns["dataset_name"]`        | `string`  | Nicely formatted name.                                                         |
+| `uns["dataset_url"]`         | `string`  | (*Optional*) Link to the original source of the dataset.                       |
+| `uns["dataset_reference"]`   | `string`  | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]`     | `string`  | Short description of the dataset.                                              |
+| `uns["dataset_description"]` | `string`  | Long description of the dataset.                                               |
+| `uns["dataset_organism"]`    | `string`  | (*Optional*) The organism of the sample in the dataset.                        |
+| `uns["normalization_id"]`    | `string`  | Which normalization was used.                                                  |
+
+</div>
+
+## Component type: Data processor
+
+Path:
+[`src/match_modalities`](https://github.com/openproblems-bio/openproblems/tree/main/src/match_modalities)
+
+A match modalities dataset processor.
+
+Arguments:
+
+<div class="small">
+
+| Name                     | Type   | Description                                                                                                    |
+|:-------------------------|:-------|:---------------------------------------------------------------------------------------------------------------|
+| `--input_mod1`           | `file` | The first modality (RNA) of a dataset processed by the common multimodal dataset processing pipeline.          |
+| `--input_mod2`           | `file` | The second modality (ADT or ATAC) of a dataset processed by the common multimodal dataset processing pipeline. |
+| `--output_mod1`          | `file` | (*Output*) The first modality of a multimodal dataset. The cells of this dataset are randomly permuted.        |
+| `--output_mod2`          | `file` | (*Output*) The second modality of a multimodal dataset. The cells of this dataset are randomly permuted.       |
+| `--output_solution_mod1` | `file` | (*Output*) The ground truth information for the first modality.                                                |
+| `--output_solution_mod2` | `file` | (*Output*) The ground truth information for the second modality.                                               |
+
+</div>
+
+## File format: Modality 1
+
+The first modality of a multimodal dataset. The cells of this dataset
+are randomly permuted.
+
+Example file:
+`resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obsm: 'X_svd'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'normalization_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                      | Type      | Description                          |
+|:--------------------------|:----------|:-------------------------------------|
+| `obsm["X_svd"]`           | `double`  | The resulting SVD PCA embedding.     |
+| `layers["counts"]`        | `integer` | Raw counts.                          |
+| `layers["normalized"]`    | `double`  | Normalized counts.                   |
+| `uns["dataset_id"]`       | `string`  | A unique identifier for the dataset. |
+| `uns["normalization_id"]` | `string`  | Which normalization was used.        |
+
+</div>
+
+## File format: Modality 2
+
+The second modality of a multimodal dataset. The cells of this dataset
+are randomly permuted.
+
+Example file:
+`resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obsm: 'X_svd'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'normalization_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                      | Type      | Description                          |
+|:--------------------------|:----------|:-------------------------------------|
+| `obsm["X_svd"]`           | `double`  | The resulting SVD PCA embedding.     |
+| `layers["counts"]`        | `integer` | Raw counts.                          |
+| `layers["normalized"]`    | `double`  | Normalized counts.                   |
+| `uns["dataset_id"]`       | `string`  | A unique identifier for the dataset. |
+| `uns["normalization_id"]` | `string`  | Which normalization was used.        |
+
+</div>
+
+## File format: Solution mod1
+
+The ground truth information for the first modality
+
+Example file:
+`resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'permutation_indices'
+     obsm: 'X_svd'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                         | Type      | Description                                                                    |
+|:-----------------------------|:----------|:-------------------------------------------------------------------------------|
+| `obs["permutation_indices"]` | `integer` | Indices with which to revert the permutation of the cells.                     |
+| `obsm["X_svd"]`              | `double`  | The resulting SVD PCA embedding.                                               |
+| `layers["counts"]`           | `integer` | Raw counts.                                                                    |
+| `layers["normalized"]`       | `double`  | Normalized counts.                                                             |
+| `uns["dataset_id"]`          | `string`  | A unique identifier for the dataset.                                           |
+| `uns["dataset_name"]`        | `string`  | Nicely formatted name.                                                         |
+| `uns["dataset_url"]`         | `string`  | (*Optional*) Link to the original source of the dataset.                       |
+| `uns["dataset_reference"]`   | `string`  | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]`     | `string`  | Short description of the dataset.                                              |
+| `uns["dataset_description"]` | `string`  | Long description of the dataset.                                               |
+| `uns["dataset_organism"]`    | `string`  | (*Optional*) The organism of the sample in the dataset.                        |
+| `uns["normalization_id"]`    | `string`  | Which normalization was used.                                                  |
+
+</div>
+
+## File format: Solution mod1
+
+The ground truth information for the second modality
+
+Example file:
+`resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'permutation_indices'
+     obsm: 'X_svd'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                         | Type      | Description                                                                    |
+|:-----------------------------|:----------|:-------------------------------------------------------------------------------|
+| `obs["permutation_indices"]` | `integer` | Indices with which to revert the permutation of the cells.                     |
+| `obsm["X_svd"]`              | `double`  | The resulting SVD PCA embedding.                                               |
+| `layers["counts"]`           | `integer` | Raw counts.                                                                    |
+| `layers["normalized"]`       | `double`  | Normalized counts.                                                             |
+| `uns["dataset_id"]`          | `string`  | A unique identifier for the dataset.                                           |
+| `uns["dataset_name"]`        | `string`  | Nicely formatted name.                                                         |
+| `uns["dataset_url"]`         | `string`  | (*Optional*) Link to the original source of the dataset.                       |
+| `uns["dataset_reference"]`   | `string`  | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]`     | `string`  | Short description of the dataset.                                              |
+| `uns["dataset_description"]` | `string`  | Long description of the dataset.                                               |
+| `uns["dataset_organism"]`    | `string`  | (*Optional*) The organism of the sample in the dataset.                        |
+| `uns["normalization_id"]`    | `string`  | Which normalization was used.                                                  |
+
+</div>
+
+## Component type: Control method
+
+Path:
+[`src/match_modalities/control_methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/match_modalities/control_methods)
+
+A multimodal data integration control method.
+
+Arguments:
+
+<div class="small">
+
+| Name                    | Type   | Description                                                                                   |
+|:------------------------|:-------|:----------------------------------------------------------------------------------------------|
+| `--input_mod1`          | `file` | The first modality of a multimodal dataset. The cells of this dataset are randomly permuted.  |
+| `--input_mod2`          | `file` | The second modality of a multimodal dataset. The cells of this dataset are randomly permuted. |
+| `--input_solution_mod1` | `file` | The ground truth information for the first modality.                                          |
+| `--input_solution_mod2` | `file` | The ground truth information for the second modality.                                         |
+| `--output_mod1`         | `file` | (*Output*) The integrated embedding for the first modality.                                   |
+| `--output_mod2`         | `file` | (*Output*) The integrated embedding for the second modality.                                  |
+
+</div>
+
+## Component type: Method
+
+Path:
+[`src/match_modalities/methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/match_modalities/methods)
+
+A multimodal data integration method.
+
+Arguments:
+
+<div class="small">
+
+| Name            | Type   | Description                                                                                   |
+|:----------------|:-------|:----------------------------------------------------------------------------------------------|
+| `--input_mod1`  | `file` | The first modality of a multimodal dataset. The cells of this dataset are randomly permuted.  |
+| `--input_mod2`  | `file` | The second modality of a multimodal dataset. The cells of this dataset are randomly permuted. |
+| `--output_mod1` | `file` | (*Output*) The integrated embedding for the first modality.                                   |
+| `--output_mod2` | `file` | (*Output*) The integrated embedding for the second modality.                                  |
+
+</div>
+
+## Component type: Metric
+
+Path:
+[`src/match_modalities/metrics`](https://github.com/openproblems-bio/openproblems/tree/main/src/match_modalities/metrics)
+
+A multimodal data integration metric.
+
+Arguments:
+
+<div class="small">
+
+| Name                      | Type   | Description                                           |
+|:--------------------------|:-------|:------------------------------------------------------|
+| `--input_integrated_mod1` | `file` | The integrated embedding for the first modality.      |
+| `--input_integrated_mod2` | `file` | The integrated embedding for the second modality.     |
+| `--input_solution_mod1`   | `file` | The ground truth information for the first modality.  |
+| `--input_solution_mod2`   | `file` | The ground truth information for the second modality. |
+| `--output`                | `file` | (*Output*) Metric score file.                         |
+
+</div>
+
+## File format: Integrated mod1
+
+The integrated embedding for the first modality
+
+Example file:
+`resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obsm: 'integrated'
+     uns: 'dataset_id', 'normalization_id', 'method_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                      | Type     | Description                          |
+|:--------------------------|:---------|:-------------------------------------|
+| `obsm["integrated"]`      | `double` | An integrated embedding.             |
+| `uns["dataset_id"]`       | `string` | A unique identifier for the dataset. |
+| `uns["normalization_id"]` | `string` | Which normalization was used.        |
+| `uns["method_id"]`        | `string` | Which method was used.               |
+
+</div>
+
+## File format: Integrated mod2
+
+The integrated embedding for the second modality
+
+Example file:
+`resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obsm: 'integrated'
+     uns: 'dataset_id', 'normalization_id', 'method_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                      | Type     | Description                          |
+|:--------------------------|:---------|:-------------------------------------|
+| `obsm["integrated"]`      | `double` | An integrated embedding.             |
+| `uns["dataset_id"]`       | `string` | A unique identifier for the dataset. |
+| `uns["normalization_id"]` | `string` | Which normalization was used.        |
+| `uns["method_id"]`        | `string` | Which method was used.               |
+
+</div>
+
+## File format: Score
+
+Metric score file
+
+Example file:
+`resources_test/match_modalities/scicar_cell_lines/score.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     uns: 'dataset_id', 'normalization_id', 'method_id', 'metric_ids', 'metric_values'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                      | Type     | Description                                                                                  |
+|:--------------------------|:---------|:---------------------------------------------------------------------------------------------|
+| `uns["dataset_id"]`       | `string` | A unique identifier for the dataset.                                                         |
+| `uns["normalization_id"]` | `string` | Which normalization was used.                                                                |
+| `uns["method_id"]`        | `string` | A unique identifier for the method.                                                          |
+| `uns["metric_ids"]`       | `string` | One or more unique metric identifiers.                                                       |
+| `uns["metric_values"]`    | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |
+
+</div>
+
+## File format: Common dataset mod2
+
+The second modality (ADT or ATAC) of a dataset processed by the common
+multimodal dataset processing pipeline.
+
+Example file:
+`resources_test/common/scicar_cell_lines/dataset_mod2.h5ad`
+
+Description:
+
+This dataset contains both raw counts and normalized data matrices, as
+well as a PCA embedding, HVG selection and a kNN graph.
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obsm: 'X_svd'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                         | Type      | Description                                                                    |
+|:-----------------------------|:----------|:-------------------------------------------------------------------------------|
+| `obsm["X_svd"]`              | `double`  | The resulting SVD PCA embedding.                                               |
+| `layers["counts"]`           | `integer` | Raw counts.                                                                    |
+| `layers["normalized"]`       | `double`  | Normalized counts.                                                             |
+| `uns["dataset_id"]`          | `string`  | A unique identifier for the dataset.                                           |
+| `uns["dataset_name"]`        | `string`  | Nicely formatted name.                                                         |
+| `uns["dataset_url"]`         | `string`  | (*Optional*) Link to the original source of the dataset.                       |
+| `uns["dataset_reference"]`   | `string`  | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]`     | `string`  | Short description of the dataset.                                              |
+| `uns["dataset_description"]` | `string`  | Long description of the dataset.                                               |
+| `uns["dataset_organism"]`    | `string`  | (*Optional*) The organism of the sample in the dataset.                        |
+| `uns["normalization_id"]`    | `string`  | Which normalization was used.                                                  |
+
+</div>
+
diff --git a/src/tasks/match_modalities/api/comp_control_method.yaml b/src/tasks/match_modalities/api/comp_control_method.yaml
new file mode 100644
index 0000000000..446ee8a41a
--- /dev/null
+++ b/src/tasks/match_modalities/api/comp_control_method.yaml
@@ -0,0 +1,47 @@
+functionality:
+  namespace: "match_modalities/control_methods"
+  info:
+    type: control_method
+    type_info:
+      label: Control method
+      summary: A multimodal data integration control method.
+      description: |
+        This folder contains control components for the task. 
+        These components have the same interface as the regular methods
+        but also receive the solution object as input. It serves as a
+        starting point to test the relative accuracy of new methods in
+        the task, and also as a quality control for the metrics defined
+        in the task. 
+  arguments:
+    - name: "--input_mod1"
+      __merge__: file_dataset_mod1.yaml
+      direction: input
+      required: true
+    - name: "--input_mod2"
+      __merge__: file_dataset_mod2.yaml
+      direction: input
+      required: true
+    - name: "--input_solution_mod1"
+      __merge__: file_solution_mod1.yaml
+      direction: input
+      required: true
+    - name: "--input_solution_mod2"
+      __merge__: file_solution_mod2.yaml
+      direction: input
+      required: true
+    - name: "--output_mod1"
+      __merge__: file_integrated_mod1.yaml
+      direction: output
+      required: true
+    - name: "--output_mod2"
+      __merge__: file_integrated_mod2.yaml
+      direction: output
+      required: true
+  test_resources:
+    - path: /resources_test/match_modalities/scicar_cell_lines
+      dest: resources_test/match_modalities/scicar_cell_lines
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /src/common/library.bib
\ No newline at end of file
diff --git a/src/tasks/match_modalities/api/comp_method.yaml b/src/tasks/match_modalities/api/comp_method.yaml
new file mode 100644
index 0000000000..37a5e90b0e
--- /dev/null
+++ b/src/tasks/match_modalities/api/comp_method.yaml
@@ -0,0 +1,34 @@
+functionality:
+  namespace: "match_modalities/methods"
+  info:
+    type: method
+    type_info:
+      label: Method
+      summary: A multimodal data integration method.
+      description: |
+        A multimodal method to integrate data.
+  arguments:
+    - name: "--input_mod1"
+      __merge__: file_dataset_mod1.yaml
+      direction: input
+      required: true
+    - name: "--input_mod2"
+      __merge__: file_dataset_mod2.yaml
+      direction: input
+      required: true
+    - name: "--output_mod1"
+      __merge__: file_integrated_mod1.yaml
+      direction: output
+      required: true
+    - name: "--output_mod2"
+      __merge__: file_integrated_mod2.yaml
+      direction: output
+      required: true
+  test_resources:
+    - path: /resources_test/match_modalities/scicar_cell_lines
+      dest: resources_test/match_modalities/scicar_cell_lines
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /src/common/library.bib
diff --git a/src/tasks/match_modalities/api/comp_metric.yaml b/src/tasks/match_modalities/api/comp_metric.yaml
new file mode 100644
index 0000000000..220598bbbf
--- /dev/null
+++ b/src/tasks/match_modalities/api/comp_metric.yaml
@@ -0,0 +1,39 @@
+functionality:
+  namespace: "match_modalities/metrics"
+  info:
+    type: metric
+    type_info:
+      label: Metric
+      summary: A multimodal data integration metric.
+      description: |
+        A metric for evaluating integrated data.
+  arguments:
+    - name: "--input_integrated_mod1"
+      __merge__: file_integrated_mod1.yaml
+      direction: input
+      required: true
+    - name: "--input_integrated_mod2"
+      __merge__: file_integrated_mod2.yaml
+      direction: input
+      required: true
+    - name: "--input_solution_mod1"
+      __merge__: file_solution_mod1.yaml
+      direction: input
+      required: true
+    - name: "--input_solution_mod2"
+      __merge__: file_solution_mod2.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_score.yaml
+      required: true
+      direction: output
+  test_resources:
+    - path: /resources_test/match_modalities/scicar_cell_lines
+      dest: resources_test/match_modalities/scicar_cell_lines
+    - type: python_script
+      path: /src/common/comp_tests/check_metric_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /src/common/library.bib
+
diff --git a/src/tasks/match_modalities/api/comp_process_dataset.yaml b/src/tasks/match_modalities/api/comp_process_dataset.yaml
new file mode 100644
index 0000000000..a48a0957b1
--- /dev/null
+++ b/src/tasks/match_modalities/api/comp_process_dataset.yaml
@@ -0,0 +1,40 @@
+functionality:
+  namespace: "match_modalities"
+  info:
+    type: process_dataset
+    type_info:
+      label: Data processor
+      summary: A match modalities dataset processor.
+      description: |
+        A component for processing a Common Dataset into a task-specific dataset.
+  arguments:
+    - name: "--input_mod1"
+      __merge__: file_common_dataset_mod1.yaml
+      direction: input
+      required: true
+    - name: "--input_mod2"
+      __merge__: file_common_dataset_mod2.yaml
+      direction: input
+      required: true
+    - name: "--output_mod1"
+      __merge__: file_dataset_mod1.yaml
+      direction: output
+      required: true
+    - name: "--output_mod2"
+      __merge__: file_dataset_mod2.yaml
+      direction: output
+      required: true
+    - name: "--output_solution_mod1"
+      __merge__: file_solution_mod1.yaml
+      direction: output
+      required: true
+    - name: "--output_solution_mod2"
+      __merge__: file_solution_mod2.yaml
+      direction: output
+      required: true
+  test_resources:
+    - path: /resources_test/common/scicar_cell_lines
+      dest: resources_test/common/scicar_cell_lines
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+        
diff --git a/src/tasks/match_modalities/api/file_common_dataset_mod1.yaml b/src/tasks/match_modalities/api/file_common_dataset_mod1.yaml
new file mode 100644
index 0000000000..cfb98e04ea
--- /dev/null
+++ b/src/tasks/match_modalities/api/file_common_dataset_mod1.yaml
@@ -0,0 +1,56 @@
+type: file
+example: "resources_test/common/scicar_cell_lines/dataset_mod1.h5ad"
+info:
+  label: "Common dataset mod1"
+  summary: The first modality (RNA) of a dataset processed by the common multimodal dataset processing pipeline. 
+  description: |
+    This dataset contains both raw counts and normalized data matrices,
+    as well as a PCA embedding, HVG selection and a kNN graph.
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized counts
+        required: true
+    obsm:
+      - type: double
+        name: X_svd
+        description: The resulting SVD PCA embedding.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
diff --git a/src/tasks/match_modalities/api/file_common_dataset_mod2.yaml b/src/tasks/match_modalities/api/file_common_dataset_mod2.yaml
new file mode 100644
index 0000000000..c42fbf525c
--- /dev/null
+++ b/src/tasks/match_modalities/api/file_common_dataset_mod2.yaml
@@ -0,0 +1,56 @@
+type: file
+example: "resources_test/common/scicar_cell_lines/dataset_mod2.h5ad"
+info:
+  label: "Common dataset mod2"
+  summary: The second modality (ADT or ATAC) of a dataset processed by the common multimodal dataset processing pipeline. 
+  description: |
+    This dataset contains both raw counts and normalized data matrices,
+    as well as a PCA embedding, HVG selection and a kNN graph.
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized counts
+        required: true
+    obsm:
+      - type: double
+        name: X_svd
+        description: The resulting SVD PCA embedding.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
diff --git a/src/tasks/match_modalities/api/file_dataset_mod1.yaml b/src/tasks/match_modalities/api/file_dataset_mod1.yaml
new file mode 100644
index 0000000000..aece4dc975
--- /dev/null
+++ b/src/tasks/match_modalities/api/file_dataset_mod1.yaml
@@ -0,0 +1,29 @@
+type: file
+example: "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+info:
+  label: "Modality 1"
+  summary: "The first modality of a multimodal dataset. The cells of this dataset are randomly permuted."
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized counts
+        required: true
+    obsm:
+      - type: double
+        name: X_svd
+        description: The resulting SVD PCA embedding.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
diff --git a/src/tasks/match_modalities/api/file_dataset_mod2.yaml b/src/tasks/match_modalities/api/file_dataset_mod2.yaml
new file mode 100644
index 0000000000..9c140e3de8
--- /dev/null
+++ b/src/tasks/match_modalities/api/file_dataset_mod2.yaml
@@ -0,0 +1,29 @@
+type: file
+example: "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+info:
+  label: "Modality 2"
+  summary: "The second modality of a multimodal dataset. The cells of this dataset are randomly permuted."
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized counts
+        required: true
+    obsm:
+      - type: double
+        name: X_svd
+        description: The resulting SVD PCA embedding.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
diff --git a/src/tasks/match_modalities/api/file_integrated_mod1.yaml b/src/tasks/match_modalities/api/file_integrated_mod1.yaml
new file mode 100644
index 0000000000..72f363de1f
--- /dev/null
+++ b/src/tasks/match_modalities/api/file_integrated_mod1.yaml
@@ -0,0 +1,24 @@
+type: file
+example: "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+info:
+  label: "Integrated mod1"
+  summary: "The integrated embedding for the first modality"
+  slots:
+    obsm:
+      - type: double
+        name: integrated
+        description: An integrated embedding.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
+      - type: string
+        name: method_id
+        description: "Which method was used"
+        required: true
diff --git a/src/tasks/match_modalities/api/file_integrated_mod2.yaml b/src/tasks/match_modalities/api/file_integrated_mod2.yaml
new file mode 100644
index 0000000000..644bf052d4
--- /dev/null
+++ b/src/tasks/match_modalities/api/file_integrated_mod2.yaml
@@ -0,0 +1,24 @@
+type: file
+example: "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+info:
+  label: "Integrated mod2"
+  summary: "The integrated embedding for the second modality"
+  slots:
+    obsm:
+      - type: double
+        name: integrated
+        description: An integrated embedding.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
+      - type: string
+        name: method_id
+        description: "Which method was used"
+        required: true
diff --git a/src/tasks/match_modalities/api/file_score.yaml b/src/tasks/match_modalities/api/file_score.yaml
new file mode 100644
index 0000000000..7d66bde3c3
--- /dev/null
+++ b/src/tasks/match_modalities/api/file_score.yaml
@@ -0,0 +1,29 @@
+type: file
+example: "resources_test/match_modalities/scicar_cell_lines/score.h5ad"
+info:
+  label: "Score"
+  summary: "Metric score file"
+  slots:
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
+      - type: string
+        name: method_id
+        description: "A unique identifier for the method"
+        required: true
+      - type: string
+        name: metric_ids
+        description: "One or more unique metric identifiers"
+        multiple: true
+        required: true
+      - type: double
+        name: metric_values
+        description: "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'."
+        multiple: true
+        required: true
diff --git a/src/tasks/match_modalities/api/file_solution_mod1.yaml b/src/tasks/match_modalities/api/file_solution_mod1.yaml
new file mode 100644
index 0000000000..490e005e0a
--- /dev/null
+++ b/src/tasks/match_modalities/api/file_solution_mod1.yaml
@@ -0,0 +1,58 @@
+type: file
+example: "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+info:
+  label: "Solution mod1"
+  summary: "The ground truth information for the first modality"
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized counts
+        required: true
+    obs:
+      - type: integer
+        name: permutation_indices
+        description: "Indices with which to revert the permutation of the cells"
+        required: true
+    obsm:
+      - type: double
+        name: X_svd
+        description: The resulting SVD PCA embedding.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
diff --git a/src/tasks/match_modalities/api/file_solution_mod2.yaml b/src/tasks/match_modalities/api/file_solution_mod2.yaml
new file mode 100644
index 0000000000..7cb21fef8e
--- /dev/null
+++ b/src/tasks/match_modalities/api/file_solution_mod2.yaml
@@ -0,0 +1,58 @@
+type: file
+example: "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+info:
+  label: "Solution mod1"
+  summary: "The ground truth information for the second modality"
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized counts
+        required: true
+    obs:
+      - type: integer
+        name: permutation_indices
+        description: "Indices with which to revert the permutation of the cells"
+        required: true
+    obsm:
+      - type: double
+        name: X_svd
+        description: The resulting SVD PCA embedding.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
diff --git a/src/tasks/match_modalities/api/task_info.yaml b/src/tasks/match_modalities/api/task_info.yaml
new file mode 100644
index 0000000000..bc5550df16
--- /dev/null
+++ b/src/tasks/match_modalities/api/task_info.yaml
@@ -0,0 +1,47 @@
+name: match_modalities
+label: Match Modalities
+summary: |
+  Match cells across datasets of the same set of samples on different technologies / modalities.
+image: "thumbnail.svg"
+motivation: |
+    Cellular function is regulated by the complex interplay of different types of biological
+    molecules (DNA, RNA, proteins, etc.), which determine the state of a cell. Several
+    recently described technologies allow for simultaneous measurement of different aspects
+    of cellular state. For example, sci-CAR [@cao2018joint]
+    jointly profiles RNA expression and chromatin accessibility on the same cell and
+    CITE-seq [@stoeckius2017simultaneous] measures
+    surface protein abundance and RNA expression from each cell. These technologies enable
+    us to better understand cellular function, however datasets are still rare and there are
+    tradeoffs that these measurements make for to profile multiple modalities.
+
+    Joint methods can be more expensive or lower throughput or more noisy than measuring a
+    single modality at a time. Therefore it is useful to develop methods that are capable
+    of integrating measurements of the same biological system but obtained using different
+    technologies on different cells.
+description: |
+  In this task, the goal is to learn a latent space where cells profiled by different
+  technologies in different modalities are matched if they have the same state. We use
+  jointly profiled data as ground truth so that we can evaluate when the observations
+  from the same cell acquired using different modalities are similar. A perfect result
+  has each of the paired observations sharing the same coordinates in the latent space.
+  A method that can achieve this would be able to match datasets across modalities to
+  enable multimodal cellular analysis from separately measured profiles.
+authors:
+  - name: "Scott Gigante"
+    roles: [ author, maintainer ]
+    info:
+      github: scottgigante
+      orcid: "0000-0002-4544-2764"
+  - name: Alex Tong
+    roles: [ author ]
+    info:
+      github: atong01
+  - name: Robrecht Cannoodt
+    roles: [ author ]
+    info:
+      github: rcannood
+      orcid: "0000-0003-3641-729X"
+  - name: Kai Waldrant
+    roles: [ contributor ]
+    info:
+      github: KaiWaldrant
\ No newline at end of file
diff --git a/src/tasks/match_modalities/api/thumbnail.svg b/src/tasks/match_modalities/api/thumbnail.svg
new file mode 100644
index 0000000000..07e326bc4a
--- /dev/null
+++ b/src/tasks/match_modalities/api/thumbnail.svg
@@ -0,0 +1 @@
+<?xml version="1.0" encoding="UTF-8"?><svg id="Layer_1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 600 200"><defs><style>.cls-1{stroke:#211f1f;stroke-width:3px;}.cls-1,.cls-2{fill:none;stroke-miterlimit:10;}.cls-3{fill:#0056aa;}.cls-4{fill:#ed8620;}.cls-5{fill:#211f1f;}.cls-6{letter-spacing:-.07em;}.cls-7{font-family:ArialMT, Arial;font-size:16px;}.cls-8{letter-spacing:-.06em;}.cls-2{stroke:#231f20;stroke-width:2px;}</style></defs><g><g><text class="cls-7" transform="translate(545.37 159.87)"><tspan x="0" y="0">RN</tspan><tspan class="cls-8" x="23.11" y="0">A</tspan></text><text class="cls-7" transform="translate(545.4 176.09)"><tspan class="cls-6" x="0" y="0">AT</tspan><tspan x="18.07" y="0">AC</tspan></text><g><ellipse class="cls-4" cx="534.92" cy="170.57" rx="5.19" ry="5.13" transform="translate(317.76 688.33) rotate(-84.93)"/><ellipse class="cls-3" cx="534.92" cy="154.04" rx="5.21" ry="5.22" transform="translate(216.88 610.9) rotate(-71.22)"/><ellipse class="cls-4" cx="390.75" cy="80.91" rx="5.21" ry="5.23" transform="translate(2.76 174.16) rotate(-25.21)"/><ellipse class="cls-3" cx="488.8" cy="179.41" rx="5.23" ry="5.21" transform="translate(70.54 489.53) rotate(-56.71)"/><ellipse class="cls-3" cx="485.62" cy="158.04" rx="5.23" ry="5.21" transform="translate(86.96 477.23) rotate(-56.71)"/><ellipse class="cls-4" cx="478.74" cy="176.09" rx="5.23" ry="5.21" transform="translate(68.78 479.62) rotate(-56.71)"/><ellipse class="cls-4" cx="481.68" cy="117.48" rx="5.18" ry="5.13" transform="translate(294.45 576.57) rotate(-81.52)"/><ellipse class="cls-4" cx="491.19" cy="108.3" rx="5.23" ry="5.21" transform="translate(131.07 459.45) rotate(-56.71)"/><ellipse class="cls-3" cx="507.09" cy="81.61" rx="5.21" ry="5.23" transform="translate(-2.17 148.33) rotate(-16.61)"/><ellipse class="cls-3" cx="509.52" cy="103.46" rx="5.21" ry="5.23" transform="translate(4.47 226.9) rotate(-25.21)"/><ellipse class="cls-4" cx="485.47" cy="128.03" rx="5.21" ry="5.23" transform="translate(-8.29 219) rotate(-25.21)"/><ellipse class="cls-3" cx="480.26" cy="106.5" rx="5.21" ry="5.23" transform="translate(.39 214.73) rotate(-25.21)"/><ellipse class="cls-3" cx="507.09" cy="92.85" rx="5.21" ry="5.23" transform="translate(8.76 224.86) rotate(-25.21)"/><ellipse class="cls-3" cx="469.95" cy="182.07" rx="5.21" ry="5.23" transform="translate(-2.57 6.77) rotate(-.82)"/><ellipse class="cls-3" cx="496" cy="77.66" rx="5.21" ry="5.23" transform="translate(-1.06 7.13) rotate(-.82)"/><ellipse class="cls-3" cx="385.06" cy="71.75" rx="5.23" ry="5.21" transform="translate(201.13 417.56) rotate(-72.55)"/><ellipse class="cls-4" cx="395.99" cy="71.75" rx="5.23" ry="5.21" transform="translate(208.78 427.99) rotate(-72.55)"/><ellipse class="cls-3" cx="407.42" cy="71.04" rx="5.22" ry="5.21"/><ellipse class="cls-4" cx="495.37" cy="162.74" rx="5.22" ry="5.21"/><ellipse class="cls-4" cx="390.38" cy="61.86" rx="5.22" ry="5.21"/><ellipse class="cls-4" cx="476.32" cy="165.09" rx="5.22" ry="5.21"/><ellipse class="cls-3" cx="492.56" cy="120.03" rx="5.22" ry="5.21"/><ellipse class="cls-3" cx="395.99" cy="90.39" rx="5.22" ry="5.21"/><ellipse class="cls-3" cx="402.19" cy="80.91" rx="5.21" ry="5.22" transform="translate(196.11 435.64) rotate(-71.22)"/><ellipse class="cls-4" cx="487.34" cy="98.06" rx="5.22" ry="5.21"/><ellipse class="cls-4" cx="486.74" cy="168.97" rx="5.22" ry="5.21"/><ellipse class="cls-4" cx="401.22" cy="61.86" rx="5.21" ry="5.22" transform="translate(213.49 421.81) rotate(-71.22)"/><ellipse class="cls-3" cx="499.19" cy="99.95" rx="5.21" ry="5.22" transform="translate(243.86 540.39) rotate(-71.22)"/><ellipse class="cls-3" cx="467.69" cy="171.38" rx="5.21" ry="5.22" transform="translate(154.87 559) rotate(-71.22)"/><ellipse class="cls-4" cx="494.59" cy="90.02" rx="5.22" ry="5.21"/><ellipse class="cls-4" cx="501.87" cy="111.72" rx="5.21" ry="5.22" transform="translate(234.54 550.91) rotate(-71.22)"/><ellipse class="cls-3" cx="506.66" cy="59.17" rx="5.21" ry="5.23" transform="translate(-.8 7.28) rotate(-.82)"/><ellipse class="cls-4" cx="505.24" cy="70.39" rx="5.22" ry="5.21"/><ellipse class="cls-4" cx="407.59" cy="22.75" rx="5.23" ry="5.21" transform="translate(29.14 175.79) rotate(-25.21)"/><ellipse class="cls-3" cx="437.29" cy="170.82" rx="5.21" ry="5.23" transform="translate(-22 268.05) rotate(-33.29)"/><ellipse class="cls-3" cx="440.47" cy="149.46" rx="5.21" ry="5.23" transform="translate(-9.75 266.29) rotate(-33.29)"/><ellipse class="cls-4" cx="447.35" cy="167.5" rx="5.21" ry="5.23" transform="translate(-18.53 273.03) rotate(-33.29)"/><ellipse class="cls-4" cx="398.19" cy="39.15" rx="5.13" ry="5.18" transform="translate(300.75 427.21) rotate(-81.52)"/><ellipse class="cls-4" cx="438.06" cy="29.52" rx="5.21" ry="5.23" transform="translate(172.95 379.49) rotate(-56.71)"/><ellipse class="cls-3" cx="517.2" cy="64.78" rx="5.23" ry="5.21" transform="translate(307.31 541.89) rotate(-73.39)"/><ellipse class="cls-3" cx="442.89" cy="47.84" rx="5.23" ry="5.21" transform="translate(21.81 193.22) rotate(-25.21)"/><ellipse class="cls-4" cx="387.64" cy="42.94" rx="5.23" ry="5.21" transform="translate(18.64 169.22) rotate(-25.21)"/><ellipse class="cls-3" cx="439.86" cy="18.58" rx="5.23" ry="5.21" transform="translate(33.99 189.14) rotate(-25.21)"/><ellipse class="cls-3" cx="517.2" cy="76.01" rx="5.23" ry="5.21" transform="translate(228.11 511.55) rotate(-64.79)"/><ellipse class="cls-3" cx="456.14" cy="173.48" rx="5.23" ry="5.21" transform="translate(276.13 627.09) rotate(-89.18)"/><ellipse class="cls-3" cx="528.28" cy="60.82" rx="5.23" ry="5.21" transform="translate(459.88 588.17) rotate(-89.18)"/><ellipse class="cls-3" cx="416.75" cy="17.06" rx="5.21" ry="5.23" transform="translate(275.48 409.51) rotate(-72.55)"/><ellipse class="cls-4" cx="416.75" cy="27.99" rx="5.21" ry="5.23" transform="translate(265.06 417.16) rotate(-72.55)"/><ellipse class="cls-3" cx="417.46" cy="39.41" rx="5.21" ry="5.22"/><ellipse class="cls-4" cx="430.72" cy="154.15" rx="5.22" ry="5.21"/><ellipse class="cls-4" cx="426.64" cy="22.37" rx="5.21" ry="5.22"/><ellipse class="cls-4" cx="449.77" cy="156.5" rx="5.22" ry="5.21"/><ellipse class="cls-3" cx="395.64" cy="50.03" rx="5.21" ry="5.22"/><ellipse class="cls-3" cx="398.11" cy="27.99" rx="5.21" ry="5.22"/><ellipse class="cls-3" cx="407.59" cy="34.19" rx="5.22" ry="5.21" transform="translate(244.01 409.07) rotate(-71.22)"/><ellipse class="cls-4" cx="448.3" cy="25.66" rx="5.21" ry="5.22"/><ellipse class="cls-4" cx="439.35" cy="160.39" rx="5.22" ry="5.21"/><ellipse class="cls-4" cx="426.64" cy="33.21" rx="5.22" ry="5.21" transform="translate(257.85 426.44) rotate(-71.22)"/><ellipse class="cls-3" cx="446.4" cy="37.51" rx="5.22" ry="5.21" transform="translate(267.18 448.08) rotate(-71.22)"/><ellipse class="cls-3" cx="458.4" cy="162.79" rx="5.22" ry="5.21" transform="translate(-28 156.23) rotate(-18.78)"/><ellipse class="cls-4" cx="529.7" cy="73.18" rx="5.22" ry="5.21"/><ellipse class="cls-4" cx="434.64" cy="40.2" rx="5.22" ry="5.21" transform="translate(256.67 438.76) rotate(-71.22)"/><ellipse class="cls-3" cx="517.62" cy="42.33" rx="5.23" ry="5.21" transform="translate(467.86 559.29) rotate(-89.18)"/><ellipse class="cls-4" cx="519.04" cy="53.56" rx="5.22" ry="5.21"/></g></g><g><ellipse class="cls-3" cx="171.69" cy="49.33" rx="5.21" ry="5.23" transform="translate(-4.66 77.84) rotate(-25.21)"/><ellipse class="cls-4" cx="127.3" cy="143.48" rx="5.23" ry="5.21" transform="translate(-62.5 171.14) rotate(-56.71)"/><ellipse class="cls-4" cx="139.14" cy="78.05" rx="5.23" ry="5.21" transform="translate(-2.47 151.52) rotate(-56.71)"/><ellipse class="cls-4" cx="148.13" cy="140.16" rx="5.23" ry="5.21" transform="translate(-50.33 187.05) rotate(-56.71)"/><ellipse class="cls-4" cx="134.4" cy="125.02" rx="5.18" ry="5.13" transform="translate(-9.07 239.52) rotate(-81.52)"/><ellipse class="cls-4" cx="143.92" cy="115.83" rx="5.23" ry="5.21" transform="translate(-31.9 172.56) rotate(-56.71)"/><ellipse class="cls-4" cx="116.89" cy="134.14" rx="5.21" ry="5.23" transform="translate(-33.46 39) rotate(-16.61)"/><ellipse class="cls-4" cx="162.24" cy="111" rx="5.21" ry="5.23" transform="translate(-31.83 79.69) rotate(-25.21)"/><ellipse class="cls-4" cx="118.95" cy="89.53" rx="5.21" ry="5.23" transform="translate(-26.8 59.2) rotate(-25.21)"/><ellipse class="cls-4" cx="132.98" cy="114.04" rx="5.21" ry="5.23" transform="translate(-35.91 67.51) rotate(-25.21)"/><ellipse class="cls-4" cx="132.78" cy="168.21" rx="5.21" ry="5.23" transform="translate(-59 72.59) rotate(-25.21)"/><ellipse class="cls-4" cx="139.34" cy="146.14" rx="5.21" ry="5.23" transform="translate(-2.08 2.02) rotate(-.82)"/><ellipse class="cls-4" cx="85.08" cy="85.96" rx="5.21" ry="5.23" transform="translate(-1.23 1.23) rotate(-.82)"/><ellipse class="cls-3" cx="166" cy="40.17" rx="5.23" ry="5.21" transform="translate(77.9 186.48) rotate(-72.55)"/><ellipse class="cls-3" cx="176.93" cy="40.17" rx="5.23" ry="5.21" transform="translate(85.55 196.91) rotate(-72.55)"/><ellipse class="cls-3" cx="188.36" cy="39.46" rx="5.22" ry="5.21"/><ellipse class="cls-4" cx="125.65" cy="102.1" rx="5.22" ry="5.21"/><ellipse class="cls-3" cx="171.32" cy="30.28" rx="5.22" ry="5.21"/><ellipse class="cls-4" cx="145.71" cy="129.15" rx="5.22" ry="5.21"/><ellipse class="cls-4" cx="126.04" cy="81.52" rx="5.22" ry="5.21"/><ellipse class="cls-3" cx="42.22" cy="78.62" rx="5.22" ry="5.21"/><ellipse class="cls-3" cx="183.13" cy="49.33" rx="5.21" ry="5.22" transform="translate(77.47 206.83) rotate(-71.22)"/><ellipse class="cls-4" cx="140.06" cy="105.6" rx="5.22" ry="5.21"/><ellipse class="cls-4" cx="152.19" cy="73.88" rx="5.22" ry="5.21"/><ellipse class="cls-3" cx="182.16" cy="30.28" rx="5.21" ry="5.22" transform="translate(94.84 193) rotate(-71.22)"/><ellipse class="cls-4" cx="151.92" cy="107.49" rx="5.21" ry="5.22" transform="translate(1.24 216.71) rotate(-71.22)"/><ellipse class="cls-4" cx="137.07" cy="135.44" rx="5.21" ry="5.22" transform="translate(-35.29 221.62) rotate(-71.22)"/><ellipse class="cls-4" cx="130.87" cy="156.75" rx="5.22" ry="5.21"/><ellipse class="cls-4" cx="154.6" cy="119.25" rx="5.21" ry="5.22" transform="translate(-8.07 227.23) rotate(-71.22)"/><ellipse class="cls-4" cx="91.62" cy="97.32" rx="5.21" ry="5.23" transform="translate(-1.39 1.33) rotate(-.82)"/><ellipse class="cls-4" cx="109.79" cy="123.94" rx="5.22" ry="5.21"/><ellipse class="cls-3" cx="114.97" cy="29.76" rx="5.23" ry="5.21" transform="translate(-1.72 51.81) rotate(-25.21)"/><ellipse class="cls-3" cx="60.97" cy="64.75" rx="5.21" ry="5.23" transform="translate(-25.54 44.09) rotate(-33.29)"/><ellipse class="cls-3" cx="205.49" cy="53.83" rx="5.21" ry="5.23" transform="translate(4.17 121.62) rotate(-33.29)"/><ellipse class="cls-3" cx="71.04" cy="61.43" rx="5.21" ry="5.23" transform="translate(-22.06 49.07) rotate(-33.29)"/><ellipse class="cls-3" cx="105.57" cy="46.16" rx="5.13" ry="5.18" transform="translate(44.35 143.77) rotate(-81.52)"/><ellipse class="cls-3" cx="145.44" cy="36.53" rx="5.21" ry="5.23" transform="translate(35.08 138.05) rotate(-56.71)"/><ellipse class="cls-4" cx="102.15" cy="102.93" rx="5.23" ry="5.21" transform="translate(-25.68 171.41) rotate(-73.39)"/><ellipse class="cls-3" cx="33.76" cy="88.62" rx="5.23" ry="5.21" transform="translate(-34.53 22.83) rotate(-25.21)"/><ellipse class="cls-3" cx="95.02" cy="49.95" rx="5.23" ry="5.21" transform="translate(-12.23 45.24) rotate(-25.21)"/><ellipse class="cls-3" cx="147.24" cy="25.59" rx="5.23" ry="5.21" transform="translate(3.13 65.16) rotate(-25.21)"/><ellipse class="cls-4" cx="102.15" cy="114.17" rx="5.23" ry="5.21" transform="translate(-44.65 157.95) rotate(-64.79)"/><ellipse class="cls-3" cx="52" cy="72.6" rx="5.23" ry="5.21" transform="translate(-21.34 123.55) rotate(-89.18)"/><ellipse class="cls-4" cx="113.24" cy="98.98" rx="5.23" ry="5.21" transform="translate(12.64 210.78) rotate(-89.18)"/><ellipse class="cls-3" cx="124.13" cy="24.07" rx="5.21" ry="5.23" transform="translate(63.94 135.27) rotate(-72.55)"/><ellipse class="cls-3" cx="124.13" cy="35" rx="5.21" ry="5.23" transform="translate(53.52 142.92) rotate(-72.55)"/><ellipse class="cls-3" cx="124.84" cy="46.42" rx="5.21" ry="5.22"/><ellipse class="cls-3" cx="194.93" cy="48.27" rx="5.22" ry="5.21"/><ellipse class="cls-3" cx="134.02" cy="29.39" rx="5.21" ry="5.22"/><ellipse class="cls-3" cx="73.45" cy="50.43" rx="5.22" ry="5.21"/><ellipse class="cls-3" cx="86.19" cy="40.85" rx="5.21" ry="5.22"/><ellipse class="cls-3" cx="105.49" cy="35" rx="5.21" ry="5.22"/><ellipse class="cls-3" cx="114.97" cy="41.2" rx="5.22" ry="5.21" transform="translate(38.95 136.78) rotate(-71.22)"/><ellipse class="cls-3" cx="155.68" cy="32.67" rx="5.21" ry="5.22"/><ellipse class="cls-3" cx="63.04" cy="54.32" rx="5.22" ry="5.21"/><ellipse class="cls-3" cx="134.02" cy="40.22" rx="5.22" ry="5.21" transform="translate(52.79 154.15) rotate(-71.22)"/><ellipse class="cls-3" cx="153.78" cy="44.52" rx="5.22" ry="5.21" transform="translate(62.12 175.79) rotate(-71.22)"/><ellipse class="cls-3" cx="82.09" cy="56.72" rx="5.22" ry="5.21" transform="translate(-13.89 29.45) rotate(-18.78)"/><ellipse class="cls-4" cx="114.65" cy="111.33" rx="5.22" ry="5.21"/><ellipse class="cls-3" cx="142.02" cy="47.21" rx="5.22" ry="5.21" transform="translate(51.61 166.47) rotate(-71.22)"/><ellipse class="cls-4" cx="102.58" cy="80.49" rx="5.23" ry="5.21" transform="translate(20.63 181.9) rotate(-89.18)"/><ellipse class="cls-4" cx="104" cy="91.71" rx="5.22" ry="5.21"/><g><polyline class="cls-2" points="84.75 165.02 40.7 165.02 40.7 123.33"/><text class="cls-7" transform="translate(40.7 181.48)"><tspan x="0" y="0">dim-2</tspan></text><text class="cls-7" transform="translate(35.99 165.02) rotate(-90)"><tspan x="0" y="0">dim-1</tspan></text></g></g><g><line class="cls-1" x1="237.61" y1="98.98" x2="321.43" y2="98.98"/><polygon class="cls-5" points="319.24 106.46 332.19 98.98 319.24 91.5 319.24 106.46"/></g></g><g><polyline class="cls-2" points="416.63 165.02 372.58 165.02 372.58 123.33"/><text class="cls-7" transform="translate(372.58 181.48)"><tspan x="0" y="0">dim-2</tspan></text><text class="cls-7" transform="translate(367.86 165.02) rotate(-90)"><tspan x="0" y="0">dim-1</tspan></text></g></svg>
\ No newline at end of file
diff --git a/src/tasks/match_modalities/control_methods/random_features/config.vsh.yaml b/src/tasks/match_modalities/control_methods/random_features/config.vsh.yaml
new file mode 100644
index 0000000000..8c021c3bdf
--- /dev/null
+++ b/src/tasks/match_modalities/control_methods/random_features/config.vsh.yaml
@@ -0,0 +1,25 @@
+__merge__: ../../api/comp_control_method.yaml
+functionality:
+  name: "random_features"
+  info:
+    label: Random Features
+    summary: "Randomly permutated features"
+    description: |
+      "Randomly permuted twice, once for use as the output for each modality, producing random features with no correlation between modalities."
+    preferred_normalization: log_cp10k
+    v1:
+      path: openproblems/tasks/matching_modalities/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages:
+          - numpy
+  - type: nextflow
+    directives:
+      label: [midtime, lowmem, lowcpu]
\ No newline at end of file
diff --git a/src/tasks/match_modalities/control_methods/random_features/script.py b/src/tasks/match_modalities/control_methods/random_features/script.py
new file mode 100644
index 0000000000..d10bb72b27
--- /dev/null
+++ b/src/tasks/match_modalities/control_methods/random_features/script.py
@@ -0,0 +1,32 @@
+import anndata as ad
+import numpy as np
+
+## VIASH START
+
+par = {
+    "input_mod1": "resources_test/common/scicar_cell_lines/dataset_mod1.h5ad",
+    "input_mod2": "resources_test/common/scicar_cell_lines/dataset_mod2.h5ad",
+    "output_mod1": "output.mod1.h5ad",
+    "output_mod2": "output.mod2.h5ad",
+}
+
+meta = {
+    "functionality_name": "random_features"
+}
+
+## VIASH END
+
+print("Reading input h5ad file", flush=True)
+adata_mod1 = ad.read_h5ad(par["input_mod1"])
+adata_mod2 = ad.read_h5ad(par["input_mod2"])
+
+print("Generating random features", flush=True)
+# todo: do we actually need to permute this once more
+adata_mod1.obsm["integrated"] = adata_mod1.obsm["X_svd"][np.random.permutation(np.arange(adata_mod1.shape[0]))]
+adata_mod2.obsm["integrated"] = adata_mod1.obsm["X_svd"][np.random.permutation(np.arange(adata_mod1.shape[0]))]
+
+print("Write output to file", flush=True)
+adata_mod1.uns["method_id"] = meta["functionality_name"]
+adata_mod2.uns["method_id"] = meta["functionality_name"]
+adata_mod1.write_h5ad(par["output_mod1"], compression="gzip")
+adata_mod2.write_h5ad(par["output_mod2"], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/match_modalities/control_methods/true_features/config.vsh.yaml b/src/tasks/match_modalities/control_methods/true_features/config.vsh.yaml
new file mode 100644
index 0000000000..bc897dd821
--- /dev/null
+++ b/src/tasks/match_modalities/control_methods/true_features/config.vsh.yaml
@@ -0,0 +1,21 @@
+__merge__: ../../api/comp_control_method.yaml
+functionality:
+  name: "true_features"
+  info:
+    label: True Features
+    summary: "A 1 to 1 mapping of features between modalities"
+    description: |
+      "use the same features for both modalities"
+    preferred_normalization: log_cp10k
+    v1:
+      path: openproblems/tasks/matching_modalities/methods/baseline.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [midtime, lowmem, lowcpu]
\ No newline at end of file
diff --git a/src/tasks/match_modalities/control_methods/true_features/script.py b/src/tasks/match_modalities/control_methods/true_features/script.py
new file mode 100644
index 0000000000..cf7abac8e5
--- /dev/null
+++ b/src/tasks/match_modalities/control_methods/true_features/script.py
@@ -0,0 +1,59 @@
+import anndata as ad
+import numpy as np
+
+## VIASH START
+par = {
+  "input_mod1": "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad",
+  "input_mod2": "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad",
+  "input_solution_mod1": "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad",
+  "input_solution_mod2": "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad",
+  "output_mod1": "output.mod1.h5ad",
+  "output_mod2": "output.mod2.h5ad",
+}
+meta = {
+    "functionality_name": "true_features"
+}
+## VIASH END
+
+print("Reading input h5ad file", flush=True)
+adata_mod1 = ad.read_h5ad(par["input_mod1"])
+adata_mod2 = ad.read_h5ad(par["input_mod2"])
+
+solution_mod1 = ad.read_h5ad(par["input_solution_mod1"])
+solution_mod2 = ad.read_h5ad(par["input_solution_mod2"])
+
+print("Storing true features", flush=True)
+output_mod1 = ad.AnnData(
+  obs=adata_mod1.obs[[]],
+  var=adata_mod1.var[[]],
+  obsm={
+    "integrated": adata_mod1.obsm["X_svd"]
+  },
+  uns={
+    "dataset_id": adata_mod1.uns["dataset_id"],
+    "normalization_id": adata_mod1.uns["normalization_id"],
+    "method_id": meta["functionality_name"]
+  }
+)
+
+# Permutate mod1 according to mod2
+mod2_obsm = adata_mod1.obsm["X_svd"][solution_mod1.obs["permutation_indices"]]
+reverse_indices_mod2 = np.argsort(solution_mod2.obs["permutation_indices"])
+mod2_obsm = mod2_obsm[reverse_indices_mod2]
+
+output_mod2 = ad.AnnData(
+  obs=adata_mod2.obs[[]],
+  var=adata_mod2.var[[]],
+  obsm={
+    "integrated": mod2_obsm
+  },
+  uns={
+    "dataset_id": adata_mod2.uns["dataset_id"],
+    "normalization_id": adata_mod2.uns["normalization_id"],
+    "method_id": meta["functionality_name"]
+  }
+)
+
+print("Write output to file", flush=True)
+output_mod1.write_h5ad(par["output_mod1"], compression="gzip")
+output_mod2.write_h5ad(par["output_mod2"], compression="gzip")
diff --git a/src/tasks/match_modalities/methods/fastmnn/config.vsh.yaml b/src/tasks/match_modalities/methods/fastmnn/config.vsh.yaml
new file mode 100644
index 0000000000..4e143ec67b
--- /dev/null
+++ b/src/tasks/match_modalities/methods/fastmnn/config.vsh.yaml
@@ -0,0 +1,29 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "fastmnn"
+  info:
+    label: "fastMNN"
+    summary: "A simpler version of the original mnnCorrect algorithm."
+    description: |
+      FastMNN is a simplified version of the mnnCorrect algorithm. Both use Mutual Nearest Neighbors to integrate multimodal single-cell data.
+    preferred_normalization: "log_cp10k"
+    variants:
+      mnn_log_cp10k:
+      mnn_log_scran_pooling:
+      # "The normalization only changes for the first modality dataset, the second still uses log_cp10k"
+        preferred_normalization: "log_scran_pooling"
+    reference: "haghverdi2018batch"
+    repository_url: "https://github.com/LTLA/batchelor"
+    documentation_url: "https://github.com/LTLA/batchelor#readme"
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        bioc: batchelor
+  - type: nextflow
+    directives:
+      label: [midtime, lowmem, lowcpu]
diff --git a/src/tasks/match_modalities/methods/fastmnn/script.R b/src/tasks/match_modalities/methods/fastmnn/script.R
new file mode 100644
index 0000000000..129f134e16
--- /dev/null
+++ b/src/tasks/match_modalities/methods/fastmnn/script.R
@@ -0,0 +1,37 @@
+library(anndata, warn.conflicts = FALSE)
+library(Matrix, warn.conflicts = FALSE)
+requireNamespace("batchelor", quietly = TRUE)
+
+## VIASH START
+par <- list(
+  input_mod1 = "resources_test/common/scicar_cell_lines/dataset_mod1.h5ad",
+  input_mod2 = "resources_test/common/scicar_cell_lines/dataset_mod2.h5ad",
+  output_mod1 = "output_mod1.h5ad",
+  output_mod2 = "output_mod2.h5ad"
+)
+## VIASH END
+
+cat("Reading input h5ad file\n")
+adata_mod1 <- read_h5ad(par$input_mod1)
+adata_mod2 <- read_h5ad(par$input_mod2)
+
+cat("Running MNN\n")
+sce_mnn <- batchelor::fastMNN(
+  t(adata_mod1$obsm[["X_svd"]]),
+  t(adata_mod2$obsm[["X_svd"]])
+)
+
+cat("Storing output\n")
+combined_recons <- t(SummarizedExperiment::assay(sce_mnn, "reconstructed"))
+mode1_recons <- combined_recons[seq_len(nrow(adata_mod1$obsm[["X_svd"]])), , drop = FALSE]
+mode2_recons <- combined_recons[-seq_len(nrow(adata_mod1$obsm[["X_svd"]])), , drop = FALSE]
+
+adata_mod1$obsm[["integrated"]] <- as.matrix(mode1_recons)
+adata_mod2$obsm[["integrated"]] <- as.matrix(mode2_recons)
+
+cat("Writing to file\n")
+adata_mod1$uns["method_id"] <- meta$functionality_name
+adata_mod2$uns["method_id"] <- meta$functionality_name
+
+yyy <- adata_mod1$write_h5ad(par$output_mod1, compression = "gzip")
+zzz <- adata_mod2$write_h5ad(par$output_mod2, compression = "gzip")
diff --git a/src/tasks/match_modalities/methods/harmonic_alignment/config.vsh.yaml b/src/tasks/match_modalities/methods/harmonic_alignment/config.vsh.yaml
new file mode 100644
index 0000000000..3146db56e0
--- /dev/null
+++ b/src/tasks/match_modalities/methods/harmonic_alignment/config.vsh.yaml
@@ -0,0 +1,38 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "harmonic_alignment"
+  info:
+    label: "Harmonic Alignment"
+    summary: "Harmonic Alignment"
+    description: |
+      Harmonic Alignment is a method for integrating multimodal single-cell data. It is based on the idea of aligning the eigenvectors of the Laplacian matrices of the two modalities. The alignment is achieved by solving a generalized eigenvalue problem. The method is described in the following paper: https://doi.org/10.1137/1.9781611976236.36
+    preferred_normalization: "log_cp10k"
+    v1:
+      path: openproblems/tasks/matching_modalities/methods/harmonic_alignment.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    reference: "stanley2020harmonic"
+    documentation_url: "https://github.com/KrishnaswamyLab/harmonic-alignment#readme"
+    repository_url: "https://github.com/KrishnaswamyLab/harmonic-alignment"
+  arguments:
+    - name: "--n_pca_XY"
+      type: "integer"
+      default: 100
+      description: "Default number of principal components on which to build graph."
+    - name: "--n_eigenvectors"
+      type: "integer"
+      default: 100
+      description: "Number of eigenvectors of the normalized Laplacian on which to perform alignment."
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        github: 
+          - KrishnaswamyLab/harmonic-alignment#subdirectory=python
+  - type: nextflow
+    directives:
+      label: [midtime, lowmem, lowcpu]
+
diff --git a/src/tasks/match_modalities/methods/harmonic_alignment/script.py b/src/tasks/match_modalities/methods/harmonic_alignment/script.py
new file mode 100644
index 0000000000..abe2eece7c
--- /dev/null
+++ b/src/tasks/match_modalities/methods/harmonic_alignment/script.py
@@ -0,0 +1,48 @@
+import anndata as ad
+import harmonicalignment
+
+## VIASH START
+par = {
+  "mod1" : "resources_test/common/scicar_cell_lines/dataset_mod1.h5ad",
+  "mod2" : "resources_test/common/scicar_cell_lines/dataset_mod2.h5ad",
+  "output" : "output.scot.h5ad",
+  "n_pca_XY" : 100,
+  "eigenvectors" : 100
+}
+meta = {
+  "functionality_name" : "harmonic_alignment"
+}
+## VIASH END
+
+
+print("Reading input h5ad file", flush=True)
+adata_mod1 = ad.read_h5ad(par["input_mod1"])
+adata_mod2 = ad.read_h5ad(par["input_mod2"])
+
+print("Check parameters", flush=True)
+n_eigenvectors = par["n_eigenvectors"]
+n_pca_XY = par["n_pca_XY"]
+
+if adata_mod1.layers["normalized"].shape[0] <= n_eigenvectors:
+    n_eigenvectors = None
+if adata_mod1.layers["normalized"].shape[0] <= n_pca_XY:
+    n_pca_XY = None
+
+
+print("Running Harmonic Alignment", flush=True)
+ha_op = harmonicalignment.HarmonicAlignment(
+    n_filters=8, n_pca_XY=n_pca_XY, n_eigenvectors=n_eigenvectors
+)
+ha_op.align(adata_mod1.obsm["X_svd"], adata_mod2.obsm["X_svd"])
+XY_aligned = ha_op.diffusion_map(n_eigenvectors=n_eigenvectors)
+
+print("Storing output data structures", flush=True)
+
+adata_mod1.obsm["integrated"] = XY_aligned[: adata_mod1.obsm["X_svd"].shape[0]]
+adata_mod2.obsm["integrated"] = XY_aligned[-adata_mod2.obsm["X_svd"].shape[0] :]
+
+print("Write output to file", flush=True)
+adata_mod1.uns["method_id"] = meta["functionality_name"]
+adata_mod2.uns["method_id"] = meta["functionality_name"]
+adata_mod1.write_h5ad(par["output_mod1"], compression = "gzip")
+adata_mod2.write_h5ad(par["output_mod2"], compression = "gzip")
diff --git a/src/tasks/match_modalities/methods/procrustes/config.vsh.yaml b/src/tasks/match_modalities/methods/procrustes/config.vsh.yaml
new file mode 100644
index 0000000000..db7b49383b
--- /dev/null
+++ b/src/tasks/match_modalities/methods/procrustes/config.vsh.yaml
@@ -0,0 +1,29 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "procrustes"
+  info:
+    label: Procrustes
+    summary: |
+      "Procrustes superimposition embeds cellular data from each modality into a common space."
+    description: | 
+        "Procrustes superimposition embeds cellular data from each modality into a common space by aligning the 100-dimensional SVD embeddings to one another by using an isomorphic transformation that minimizes the root mean squared distance between points. The unmodified SVD embedding and the transformed second modality are used as output for the task."
+    v1:
+      path: openproblems/tasks/matching_modalities/methods/procrustes.py
+      commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+    reference: gower1975generalized
+    documentation_url: https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.procrustes.html
+    repository_url: https://github.com/scipy/scipy
+    preferred_normalization: "log_cp10k"
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        pypi: 
+          - scipy
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
\ No newline at end of file
diff --git a/src/tasks/match_modalities/methods/procrustes/script.py b/src/tasks/match_modalities/methods/procrustes/script.py
new file mode 100644
index 0000000000..fad63fa658
--- /dev/null
+++ b/src/tasks/match_modalities/methods/procrustes/script.py
@@ -0,0 +1,34 @@
+import anndata as ad
+import scipy.spatial
+
+## VIASH START
+
+par = {
+    "input_mod1" : "resources_test/common/scicar_cell_lines/dataset_mod1.h5ad",
+    "input_mod2" : "resources_test/common/scicar_cell_lines/dataset_mod2.h5ad",
+    "output_mod1" : "output.mod1.h5ad",
+    "output_mod2" : "output.mod2.h5ad",
+}
+
+meta = {
+    "functionality_name" : "procrustes"
+}
+
+## VIASH END
+
+print("Reading input h5ad file", flush=True)
+adata_mod1 = ad.read_h5ad(par["input_mod1"])
+adata_mod2 = ad.read_h5ad(par["input_mod2"])
+
+print("procrustes alignment", flush=True)
+X_proc, Y_proc, _ = scipy.spatial.procrustes(adata_mod1.obsm["X_svd"], adata_mod2.obsm["X_svd"])
+
+print("Storing output data", flush=True)
+adata_mod1.obsm["integrated"] = X_proc
+adata_mod2.obsm["integrated"] = Y_proc
+
+print("Write output to file", flush=True)
+adata_mod1.uns["method_id"] = meta["functionality_name"]
+adata_mod2.uns["method_id"] = meta["functionality_name"]
+adata_mod1.write_h5ad(par["output_mod1"], compression = "gzip")
+adata_mod2.write_h5ad(par["output_mod2"], compression = "gzip")
diff --git a/src/tasks/match_modalities/methods/scot/config.vsh.yaml b/src/tasks/match_modalities/methods/scot/config.vsh.yaml
new file mode 100644
index 0000000000..e86fe4438a
--- /dev/null
+++ b/src/tasks/match_modalities/methods/scot/config.vsh.yaml
@@ -0,0 +1,30 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: "scot"
+  info:
+    label: "Single Cell Optimal Transport"
+    description: |
+      Single Cell Optimal Transport (SCOT) is a method for integrating multimodal single-cell data. It is based on the idea of aligning the distributions of the two modalities using optimal transport.
+    summary: "Run Single Cell Optimal Transport"
+    preferred_normalization: "log_cp10k"
+    reference: Demetci2020scot
+    documentation_url: "https://github.com/rsinghlab/SCOT#readme"
+    repository_url: "https://github.com/rsinghlab/SCOT"
+  arguments:
+    - name: "--balanced"
+      type: "boolean_true"
+      description: "Determines whether balanced or unbalanced optimal transport. In the balanced case, the target and source distributions are assumed to have equal mass."
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: apt
+        packages: git
+      - type: docker
+        run: "cd /opt && git clone --depth 1 https://github.com/rsinghlab/SCOT.git && cd SCOT && pip install -r requirements.txt"
+  - type: nextflow
+    directives:
+      label: [midtime, lowmem, lowcpu]
diff --git a/src/tasks/match_modalities/methods/scot/script.py b/src/tasks/match_modalities/methods/scot/script.py
new file mode 100644
index 0000000000..d6e629c565
--- /dev/null
+++ b/src/tasks/match_modalities/methods/scot/script.py
@@ -0,0 +1,45 @@
+import anndata as ad
+import sys
+sys.path.append("/opt/SCOT/src/")
+import scotv1
+import pandas as pd
+
+# importing helper functions from common preprocessing.py file in resources dir
+import sys
+
+
+## VIASH START
+par = {
+  "input_mod1" : "resources_test/common/scicar_cell_lines/dataset_mod1.h5ad",
+  "input_mod2" : "resources_test/common/scicar_cell_lines/dataset_mod2.h5ad",
+  "output_mod1" : "integrated_mod1.h5ad",
+  "output_mod2" : "integrated_mod2.h5ad",
+  "balanced":False,
+}
+## VIASH END
+
+
+print("Reading input h5ad file", flush=True)
+adata_mod1 = ad.read_h5ad(par["input_mod1"])
+adata_mod2 = ad.read_h5ad(par["input_mod2"])
+
+
+print("Initialize SCOT", flush=True)
+scot = scotv1.SCOT(adata_mod1.obsm["X_svd"], adata_mod2.obsm["X_svd"])
+
+print("Call the unbalanced alignment", flush=True)
+# From https://github.com/rsinghlab/SCOT/blob/master/examples/unbalanced_GW_SNAREseq.ipynb # noqa: 501
+X_new_unbal, y_new_unbal = scot.align(
+    k=50, e=1e-3, normalize=True
+)
+
+
+print("store output", flush=True)
+adata_mod1.obsm["integrated"] = X_new_unbal
+adata_mod2.obsm["integrated"] = y_new_unbal
+
+print("Write output to file", flush=True)
+adata_mod1.uns["method_id"] = meta["functionality_name"]
+adata_mod2.uns["method_id"] = meta["functionality_name"]
+adata_mod1.write_h5ad(par["output_mod1"], compression = "gzip")
+adata_mod2.write_h5ad(par["output_mod2"], compression = "gzip")
diff --git a/src/tasks/match_modalities/metrics/knn_auc/config.vsh.yaml b/src/tasks/match_modalities/metrics/knn_auc/config.vsh.yaml
new file mode 100644
index 0000000000..e7067a20b5
--- /dev/null
+++ b/src/tasks/match_modalities/metrics/knn_auc/config.vsh.yaml
@@ -0,0 +1,36 @@
+__merge__: ../../api/comp_metric.yaml
+functionality:
+  name: "knn_auc"
+  info:
+    metrics:
+      - label: kNN Area Under the Curve
+        name: knn_auc
+        summary: "Compute the kNN Area Under the Curve"
+        description: |
+          Let $f(i) \in F$ be the scRNA-seq measurement of cell $i$, and $g(i) \in G$ be the scATAC- seq measurement of cell $i$. kNN-AUC calculates the average percentage overlap of neighborhoods of $f(i)$ in $F$ with neighborhoods of $g(i)$ in $G$. Higher is better.
+        reference: "lance2022multimodal"
+        min: 0
+        max: 1
+        maximize: true
+        v1:
+          path: openproblems/tasks/matching_modalities/metrics/knn_auc.py
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+  arguments:
+    - name: "--proportion_neighbors"
+      type: "double"
+      default: 0.1
+      description: The proportion of neighbours to use in computing the KNN.
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages:
+          - numpy
+          - scikit-learn
+  - type: nextflow
+    directives:
+      label: [midtime, lowmem, lowcpu]
diff --git a/src/tasks/match_modalities/metrics/knn_auc/script.py b/src/tasks/match_modalities/metrics/knn_auc/script.py
new file mode 100644
index 0000000000..cf5c14b473
--- /dev/null
+++ b/src/tasks/match_modalities/metrics/knn_auc/script.py
@@ -0,0 +1,75 @@
+import anndata as ad
+import numpy as np
+import sklearn.decomposition
+import sklearn.neighbors
+
+## VIASH START
+par = {
+  "input_integrated_mod1": "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad",
+  "input_integrated_mod2": "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad",
+  "input_solution_mod1": "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad",
+  "input_solution_mod2": "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad",
+  "output": "resources_test/multimodal/score.h5ad",
+  "proportion_neighbors": 0.1,
+}
+meta = {
+    "functionality_name": "knn_auc"
+}
+## VIASH END
+
+print("Reading adata file", flush=True)
+input_solution_mod1 = ad.read_h5ad(par["input_solution_mod1"])
+input_solution_mod2 = ad.read_h5ad(par["input_solution_mod2"])
+
+input_integrated_mod1 = ad.read_h5ad(par["input_integrated_mod1"])[input_solution_mod1.obs["permutation_indices"]]
+input_integrated_mod2 = ad.read_h5ad(par["input_integrated_mod2"])[input_solution_mod2.obs["permutation_indices"]]
+
+print("Checking parameters", flush=True)
+n_neighbors = int(np.ceil(par["proportion_neighbors"] * input_solution_mod1.n_obs))
+
+print("Compute KNN on PCA", flush=True)
+_, indices_true = (
+    sklearn.neighbors.NearestNeighbors(n_neighbors=n_neighbors)
+    .fit(input_solution_mod1.obsm["X_svd"])
+    .kneighbors(input_solution_mod1.obsm["X_svd"])
+)
+
+_, indices_pred = (
+    sklearn.neighbors.NearestNeighbors(n_neighbors=n_neighbors)
+    .fit(input_integrated_mod1.obsm["integrated"])
+    .kneighbors(input_integrated_mod2.obsm["integrated"])
+)
+
+print("Check which neighbours match", flush=True)
+neighbors_match = np.zeros(n_neighbors, dtype=int)
+for i in range(input_solution_mod1.n_obs):
+    _, pred_matches, true_matches = np.intersect1d(
+        indices_pred[i], indices_true[i], return_indices=True
+    )
+    neighbors_match_idx = np.maximum(pred_matches, true_matches)
+    neighbors_match += np.sum(
+        np.arange(n_neighbors) >= neighbors_match_idx[:, None],
+        axis=0,
+    )
+
+print("Compute area under neighbours match curve", flush=True)
+neighbors_match_curve = neighbors_match / (
+    np.arange(1, n_neighbors + 1) * input_solution_mod1.n_obs
+)
+area_under_curve = np.mean(neighbors_match_curve)
+
+print("Store metric value", flush=True)
+uns = {
+  "dataset_id": input_solution_mod1.uns["dataset_id"],
+  "normalization_id": input_solution_mod1.uns["normalization_id"],
+  "method_id": input_integrated_mod1.uns["method_id"],
+  "metric_ids": "knn_auc",
+  "metric_values": area_under_curve
+}
+output_metric = ad.AnnData(
+  shape=(0,0),
+  uns=uns
+)
+
+print("Writing adata to file", flush=True)
+output_metric.write_h5ad(par["output"], compression = "gzip")
diff --git a/src/tasks/match_modalities/metrics/mse/config.vsh.yaml b/src/tasks/match_modalities/metrics/mse/config.vsh.yaml
new file mode 100644
index 0000000000..b1dfc15746
--- /dev/null
+++ b/src/tasks/match_modalities/metrics/mse/config.vsh.yaml
@@ -0,0 +1,32 @@
+__merge__: ../../api/comp_metric.yaml
+functionality:
+  name: "mse"
+  info:
+    metrics:
+      - label: "Mean Squared Error"
+        name: "mse"
+        summary: Compute the mean squared error.
+        description: |
+          Mean squared error (MSE) is the average distance between each pair of matched observations of the same cell in the learned latent space. Lower is better.
+        reference: "lance2022multimodal"
+        maximize: false
+        min: 0
+        max: "+.inf"
+        v1:
+          path: openproblems/tasks/matching_modalities/metrics/mse.py
+          commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages:
+          - numpy<2
+          - scipy
+          - scprep
+  - type: nextflow
+    directives:
+      label: [midtime, lowmem, lowcpu]
diff --git a/src/tasks/match_modalities/metrics/mse/script.py b/src/tasks/match_modalities/metrics/mse/script.py
new file mode 100644
index 0000000000..b03487c6eb
--- /dev/null
+++ b/src/tasks/match_modalities/metrics/mse/script.py
@@ -0,0 +1,56 @@
+import anndata as ad
+import numpy as np
+from scipy import sparse
+
+## VIASH START
+par = {
+  "input_integrated_mod1": "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad",
+  "input_integrated_mod2": "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad",
+  "input_solution_mod1": "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad",
+  "input_solution_mod2": "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad",
+  "output": "resources_test/multimodal/score.h5ad",
+}
+meta = {
+    "functionality_name": "knn_auc"
+}
+## VIASH END
+
+print("Reading adata file", flush=True)
+input_solution_mod1 = ad.read_h5ad(par["input_solution_mod1"])
+input_solution_mod2 = ad.read_h5ad(par["input_solution_mod2"])
+
+input_integrated_mod1 = ad.read_h5ad(par["input_integrated_mod1"])[input_solution_mod1.obs["permutation_indices"]]
+input_integrated_mod2 = ad.read_h5ad(par["input_integrated_mod2"])[input_solution_mod2.obs["permutation_indices"]]
+
+print("Computing MSE", flush=True)
+def _square(X):
+	if sparse.issparse(X):
+		X.data = X.data ** 2
+		return X
+	else:
+		return X ** 2
+
+
+X = input_integrated_mod1.obsm["integrated"].toarray()
+Y = input_integrated_mod2.obsm["integrated"].toarray()
+
+X_shuffled = X[np.random.permutation(np.arange(X.shape[0])), :]
+error_random = np.mean(np.sum(_square(X_shuffled - Y)))
+error_abs = np.mean(np.sum(_square(X - Y)))
+metric_value = (error_abs / error_random).item()
+
+print("Store metric value", flush=True)
+uns = {
+  "dataset_id": input_solution_mod1.uns["dataset_id"],
+  "normalization_id": input_solution_mod1.uns["normalization_id"],
+  "method_id": input_integrated_mod1.uns["method_id"],
+  "metric_ids": "mse",
+  "metric_values": metric_value
+}
+output_metric = ad.AnnData(
+  shape=(0,0),
+  uns=uns
+)
+
+print("Writing adata to file", flush=True)
+output_metric.write_h5ad(par["output"], compression = "gzip")
diff --git a/src/tasks/match_modalities/process_dataset/config.vsh.yaml b/src/tasks/match_modalities/process_dataset/config.vsh.yaml
new file mode 100644
index 0000000000..35dc757809
--- /dev/null
+++ b/src/tasks/match_modalities/process_dataset/config.vsh.yaml
@@ -0,0 +1,18 @@
+__merge__: ../api/comp_process_dataset.yaml
+functionality:
+  name: "process_dataset"
+  arguments:
+    - name: "--seed"
+      type: "integer"
+      description: "A seed for the subsampling."
+      example: 123
+  resources:
+    - type: python_script
+      path: script.py
+    - path: /src/common/helper_functions/subset_anndata.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [highmem, midcpu , midtime]
diff --git a/src/tasks/match_modalities/process_dataset/script.py b/src/tasks/match_modalities/process_dataset/script.py
new file mode 100644
index 0000000000..d90d5e3965
--- /dev/null
+++ b/src/tasks/match_modalities/process_dataset/script.py
@@ -0,0 +1,64 @@
+import sys
+import random
+import numpy as np
+import anndata as ad
+
+## VIASH START
+par = {
+  "input_mod1": "resources_test/common/scicar_cell_lines/dataset_mod1.h5ad",
+  "input_mod2": "resources_test/common/scicar_cell_lines/dataset_mod2.h5ad",
+  "output_mod1": "output_mod1.h5ad",
+  "output_mod2": "output_mod2.h5ad",
+  "output_solution_mod1": "output_solution_mod1.h5ad",
+  "output_solution_mod2": "output_solution_mod2.h5ad",
+  "seed": 123
+}
+meta = {
+  "resources_dir": "src/common/helper_functions/",
+  "config": "src/tasks/match_modalities/process_dataset/.config.vsh.yaml"
+}
+## VIASH END
+
+# import helper functions
+sys.path.append(meta["resources_dir"])
+from subset_anndata import read_config_slots_info, subset_anndata
+
+# set seed if need be
+if par["seed"]:
+    print(f">> Setting seed to {par['seed']}")
+    random.seed(par["seed"])
+
+print(">> Load data", flush=True)
+input_mod1 = ad.read_h5ad(par["input_mod1"])
+input_mod2 = ad.read_h5ad(par["input_mod2"])
+
+print(f">> Permute input data")
+mod1_perm = np.random.permutation(np.arange(input_mod1.n_obs))
+mod2_perm = np.random.permutation(np.arange(input_mod2.n_obs))
+
+output_mod1 = input_mod1[mod1_perm]
+output_mod1.obs_names = [f"cell_mod1_{i}" for i in range(output_mod1.n_obs)]
+output_mod2 = input_mod2[mod2_perm]
+output_mod2.obs_names = [f"cell_mod2_{i}" for i in range(output_mod2.n_obs)]
+
+print(f">> Create solution objects")
+output_solution_mod1 = input_mod1.copy()
+output_solution_mod1.obs["permutation_indices"] = np.argsort(mod1_perm)
+output_solution_mod2 = input_mod2.copy()
+output_solution_mod2.obs["permutation_indices"] = np.argsort(mod2_perm)
+    
+# subset the different adatas
+print(">> Read slot info from config file", flush=True)
+slot_info = read_config_slots_info(meta["config"])
+
+print(">> Subset anndatas", flush=True)
+output_mod1 = subset_anndata(output_mod1, slot_info["output_mod1"])
+output_mod2 = subset_anndata(output_mod2, slot_info["output_mod2"])
+output_solution_mod1 = subset_anndata(output_solution_mod1, slot_info["output_solution_mod1"])
+output_solution_mod2 = subset_anndata(output_solution_mod2, slot_info["output_solution_mod2"])
+
+print(">> Writing data", flush=True)
+output_mod1.write_h5ad(par["output_mod1"])
+output_mod2.write_h5ad(par["output_mod2"])
+output_solution_mod1.write_h5ad(par["output_solution_mod1"])
+output_solution_mod2.write_h5ad(par["output_solution_mod2"])
diff --git a/src/tasks/match_modalities/resources_scripts/process_datasets.sh b/src/tasks/match_modalities/resources_scripts/process_datasets.sh
new file mode 100755
index 0000000000..e5796bd641
--- /dev/null
+++ b/src/tasks/match_modalities/resources_scripts/process_datasets.sh
@@ -0,0 +1,34 @@
+#!/bin/bash
+
+cat > /tmp/params.yaml << 'HERE'
+id: match_modalities_process_datasets
+input_states: s3://openproblems-data/resources/datasets/openproblems_v1_multimodal/**/state.yaml
+rename_keys: 'input_mod1:output_mod1,input_mod2:output_mod2'
+settings: '{"output_mod1": "$id/output_mod1.h5ad", "output_mod2": "$id/output_mod2.h5ad", "output_solution_mod1": "$id/output_solution_mod1.h5ad", "output_solution_mod2": "$id/output_solution_mod2.h5ad"}'
+output_state: "$id/state.yaml"
+publish_dir: s3://openproblems-data/resources/match_modalities/datasets/openproblems_v1_multimodal
+HERE
+
+cat > /tmp/nextflow.config << HERE
+process {
+  executor = 'awsbatch'
+  withName:'.*publishStatesProc' {
+      memory = '16GB'
+      disk = '100GB'
+   }
+  withLabel:highmem {
+      memory = '350GB'
+   }
+}
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/match_modalities/workflows/process_datasets/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --entry-name auto \
+  --config /tmp/nextflow.config \
+  --labels match_modalities,process_datasets
diff --git a/src/tasks/match_modalities/resources_scripts/run_benchmark.sh b/src/tasks/match_modalities/resources_scripts/run_benchmark.sh
new file mode 100755
index 0000000000..41789c6a0f
--- /dev/null
+++ b/src/tasks/match_modalities/resources_scripts/run_benchmark.sh
@@ -0,0 +1,23 @@
+#!/bin/bash
+
+RUN_ID="run_$(date +%Y-%m-%d_%H-%M-%S)"
+publish_dir="s3://openproblems-data/resources/match_modalities/results/${RUN_ID}"
+
+cat > /tmp/params.yaml << HERE
+id: match_modalities
+input_states: s3://openproblems-data/resources/match_modalities/datasets/**/state.yaml
+rename_keys: 'input_mod1:output_mod1,input_mod2:output_mod2,input_solution_mod1:output_solution_mod1,input_solution_mod2:output_solution_mod2'
+output_state: "state.yaml"
+publish_dir: "$publish_dir"
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/match_modalities/workflows/run_benchmark/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --entry-name auto \
+  --config src/wf_utils/labels_tw.config \
+  --labels match_modalities,full
\ No newline at end of file
diff --git a/src/tasks/match_modalities/resources_test_scripts/scicar_cell_lines.sh b/src/tasks/match_modalities/resources_test_scripts/scicar_cell_lines.sh
new file mode 100755
index 0000000000..6a35138815
--- /dev/null
+++ b/src/tasks/match_modalities/resources_test_scripts/scicar_cell_lines.sh
@@ -0,0 +1,34 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+RAW_DATA=resources_test/common
+DATASET_DIR=resources_test/match_modalities
+
+mkdir -p $DATASET_DIR
+
+# process dataset
+echo Running process_dataset
+nextflow run . \
+  -main-script target/nextflow/match_modalities/workflows/process_datasets/main.nf \
+  -profile docker \
+  -entry auto \
+  --input_states "$RAW_DATA/**/state.yaml" \
+  --rename_keys 'input_mod1:output_mod1,input_mod2:output_mod2' \
+  --settings '{"output_mod1": "$id/dataset_mod1.h5ad", "output_mod2": "$id/dataset_mod2.h5ad", "output_solution_mod1": "$id/solution_mod1.h5ad", "output_solution_mod2": "$id/solution_mod2.h5ad"}' \
+  --publish_dir "$DATASET_DIR" \
+  --output_state '$id/state.yaml'
+# output_state should be moved to settings once workaround is solved
+
+# run one method
+viash run src/tasks/match_modalities/methods/fastmnn/config.vsh.yaml -- \
+    --input_mod1 $DATASET_DIR/scicar_cell_lines/dataset_mod1.h5ad \
+    --input_mod2 $DATASET_DIR/scicar_cell_lines/dataset_mod2.h5ad \
+    --output_mod1 $DATASET_DIR/scicar_cell_lines/integrated_mod1.h5ad \
+    --output_mod2 $DATASET_DIR/scicar_cell_lines/integrated_mod2.h5ad
diff --git a/src/tasks/match_modalities/workflows/process_datasets/config.vsh.yaml b/src/tasks/match_modalities/workflows/process_datasets/config.vsh.yaml
new file mode 100644
index 0000000000..5427343f9f
--- /dev/null
+++ b/src/tasks/match_modalities/workflows/process_datasets/config.vsh.yaml
@@ -0,0 +1,42 @@
+functionality:
+  name: "process_datasets"
+  namespace: "match_modalities/workflows"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input_mod1"
+          __merge__: "/src/tasks/match_modalities/api/file_common_dataset_mod1.yaml"
+          required: true
+          direction: input
+        - name: "--input_mod2"
+          __merge__: "/src/tasks/match_modalities/api/file_common_dataset_mod2.yaml"
+          required: true
+          direction: input
+    - name: Outputs
+      arguments:
+        - name: "--output_mod1"
+          __merge__: /src/tasks/match_modalities/api/file_dataset_mod1.yaml
+          required: true
+          direction: output
+        - name: "--output_mod2"
+          __merge__: /src/tasks/match_modalities/api/file_dataset_mod2.yaml
+          required: true
+          direction: output
+        - name: "--output_solution_mod1"
+          __merge__: /src/tasks/match_modalities/api/file_solution_mod1.yaml
+          required: true
+          direction: output
+        - name: "--output_solution_mod2"
+          __merge__: /src/tasks/match_modalities/api/file_solution_mod2.yaml
+          required: true
+          direction: output
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - path: /src/wf_utils/helper.nf
+  dependencies:
+    - name: common/check_dataset_schema
+    - name: match_modalities/process_dataset
+platforms:
+  - type: nextflow
diff --git a/src/tasks/match_modalities/workflows/process_datasets/main.nf b/src/tasks/match_modalities/workflows/process_datasets/main.nf
new file mode 100644
index 0000000000..ab5e9a83b0
--- /dev/null
+++ b/src/tasks/match_modalities/workflows/process_datasets/main.nf
@@ -0,0 +1,82 @@
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    | check_dataset_schema.run(
+      key: "check_dataset_schema_mod1",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input_mod1")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input_mod1,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset_mod1": checks["exit_code"] == 0 ? state.input_mod1 : null,
+        ]
+      }
+    )
+
+    | check_dataset_schema.run(
+      key: "check_dataset_schema_mod2",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input_mod2")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input_mod2,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset_mod2": checks["exit_code"] == 0 ? state.input_mod2 : null,
+        ]
+      }
+    )
+    
+    // remove datasets which didn't pass the schema check
+    | filter { id, state ->
+      state.dataset_mod1 != null && state.dataset_mod2 != null
+    }
+
+    | process_dataset.run(
+      fromState: [ input_mod1: "dataset_mod1", input_mod2: "dataset_mod2" ],
+      toState: [
+        "output_mod1",
+        "output_mod2",
+        "output_solution_mod1",
+        "output_solution_mod2"
+      ]
+    )
+
+    // only output the files for which an output file was specified
+    | setState([
+      "output_mod1",
+      "output_mod2",
+      "output_solution_mod1",
+      "output_solution_mod2"
+    ])
+
+  emit:
+  output_ch
+}
diff --git a/src/tasks/match_modalities/workflows/run_benchmark/config.vsh.yaml b/src/tasks/match_modalities/workflows/run_benchmark/config.vsh.yaml
new file mode 100644
index 0000000000..89da796600
--- /dev/null
+++ b/src/tasks/match_modalities/workflows/run_benchmark/config.vsh.yaml
@@ -0,0 +1,75 @@
+functionality:
+  name: "run_benchmark"
+  namespace: "match_modalities/workflows"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input_mod1"
+          __merge__: /src/tasks/match_modalities/api/file_dataset_mod1.yaml
+          direction: input
+          required: true
+        - name: "--input_mod2"
+          __merge__: /src/tasks/match_modalities/api/file_dataset_mod2.yaml
+          direction: input
+          required: true
+        - name: "--input_solution_mod1"
+          __merge__: /src/tasks/match_modalities/api/file_solution_mod1.yaml
+          direction: input
+          required: true
+        - name: "--input_solution_mod2"
+          __merge__: /src/tasks/match_modalities/api/file_solution_mod2.yaml
+          direction: input
+          required: true
+    - name: Outputs
+      arguments:
+        - name: "--output_scores"
+          type: file
+          required: true
+          direction: output
+          description: A yaml file containing the scores of each of the methods
+          default: score_uns.yaml
+        - name: "--output_method_configs"
+          type: file
+          required: true
+          direction: output
+          default: method_configs.yaml
+        - name: "--output_metric_configs"
+          type: file
+          required: true
+          direction: output
+          default: metric_configs.yaml
+        - name: "--output_dataset_info"
+          type: file
+          required: true
+          direction: output
+          default: dataset_uns.yaml
+        - name: "--output_task_info"
+          type: file
+          required: true
+          direction: output
+          default: task_info.yaml
+    - name: Methods
+      arguments:
+        - name: "--method_ids"
+          type: string
+          multiple: true
+          description: A list of method ids to run. If not specified, all methods will be run.
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - type: file
+      path: "/src/tasks/match_modalities/api/task_info.yaml"
+  dependencies:
+    - name: common/check_dataset_schema
+    - name: common/extract_metadata
+    - name: match_modalities/control_methods/random_features
+    - name: match_modalities/control_methods/true_features
+    - name: match_modalities/methods/fastmnn
+    - name: match_modalities/methods/scot
+    - name: match_modalities/methods/harmonic_alignment
+    - name: match_modalities/methods/procrustes
+    - name: match_modalities/metrics/knn_auc
+    - name: match_modalities/metrics/mse
+platforms:
+  - type: nextflow
\ No newline at end of file
diff --git a/src/tasks/match_modalities/workflows/run_benchmark/main.nf b/src/tasks/match_modalities/workflows/run_benchmark/main.nf
new file mode 100644
index 0000000000..53753f3981
--- /dev/null
+++ b/src/tasks/match_modalities/workflows/run_benchmark/main.nf
@@ -0,0 +1,202 @@
+workflow auto {
+  findStates(params, meta.config)
+  | meta.workflow.run(
+    auto: [publish: "state"]
+  )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // construct list of methods
+  methods = [
+    random_features,
+    true_features,
+    scot,
+    harmonic_alignment,
+    fastmnn,
+    procrustes
+  ]
+
+  // construct list of metrics
+  metrics = [
+    knn_auc,
+    mse
+  ]
+
+  /****************************
+   * EXTRACT DATASET METADATA *
+   ****************************/
+  dataset_ch = input_ch
+    // store join id
+    | map{ id, state ->
+      [id, state + ["_meta": [join_id: id]]]
+    }
+
+    // extract the dataset metadata
+    | extract_metadata.run(
+      fromState: [input: "input_solution_mod1"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+  /***************************
+   * RUN METHODS AND METRICS *
+   ***************************/
+  score_ch = dataset_ch
+
+    // run all methods
+    | runEach(
+      components: methods,
+
+      // use the 'filter' argument to only run a method on the normalisation the component is asking for
+      filter: { id, state, comp ->
+        def norm = state.dataset_uns.normalization_id
+        def pref = comp.config.functionality.info.preferred_normalization
+        // if the preferred normalisation is none at all,
+        // we can pass whichever dataset we want
+        def norm_check = (norm == "log_cp10k" && pref == "counts") || norm == pref
+        def method_check = !state.method_ids || state.method_ids.contains(comp.config.functionality.name)
+
+        method_check && norm_check
+      },
+
+      // define a new 'id' by appending the method name to the dataset id
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: { id, state, comp ->
+        def new_args = [
+          input_mod1: state.input_mod1,
+          input_mod2: state.input_mod2
+        ]
+        if (comp.config.functionality.info.type == "control_method") {
+          new_args.input_solution_mod1 = state.input_solution_mod1
+          new_args.input_solution_mod2 = state.input_solution_mod2
+        }
+        new_args
+      },
+
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          method_id: comp.config.functionality.name,
+          method_output_mod1: output.output_mod1,
+          method_output_mod2: output.output_mod2
+        ]
+      }
+    )
+
+      // run all metrics
+    | runEach(
+      components: metrics,
+            id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: [
+        input_integrated_mod1: "method_output_mod1",
+        input_integrated_mod2: "method_output_mod2",
+        input_solution_mod1: "input_solution_mod1",
+        input_solution_mod2: "input_solution_mod2"
+      ],
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          metric_id: comp.config.functionality.name,
+          metric_output: output.output
+        ]
+      }
+    )
+
+  /******************************
+   * GENERATE OUTPUT YAML FILES *
+   ******************************/
+  // TODO: can we store everything below in a separate helper function?
+
+  // extract the dataset metadata
+  dataset_meta_ch = dataset_ch
+    // only keep one of the normalization methods
+    | filter{ id, state ->
+      state.dataset_uns.normalization_id == "log_cp10k"
+    }
+
+    | joinStates { ids, states ->
+      // store the dataset metadata in a file
+      def dataset_uns = states.collect{state ->
+        def uns = state.dataset_uns.clone()
+        uns.remove("normalization_id")
+        uns
+      }
+      def dataset_uns_yaml_blob = toYamlBlob(dataset_uns)
+      def dataset_uns_file = tempFile("dataset_uns.yaml")
+      dataset_uns_file.write(dataset_uns_yaml_blob)
+
+      ["output", [output_dataset_info: dataset_uns_file]]
+    }
+
+  output_ch = score_ch
+
+    // extract the scores
+    | extract_metadata.run(
+      key: "extract_scores",
+      fromState: [input: "metric_output"],
+      toState: { id, output, state ->
+        state + [
+          score_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | joinStates { ids, states ->
+
+      // store the method configs in a file
+      def method_configs = methods.collect{it.config}
+      def method_configs_yaml_blob = toYamlBlob(method_configs)
+      def method_configs_file = tempFile("method_configs.yaml")
+      method_configs_file.write(method_configs_yaml_blob)
+
+      // store the metric configs in a file
+      def metric_configs = metrics.collect{it.config}
+      def metric_configs_yaml_blob = toYamlBlob(metric_configs)
+      def metric_configs_file = tempFile("metric_configs.yaml")
+      metric_configs_file.write(metric_configs_yaml_blob)
+
+      def task_info_file = meta.resources_dir.resolve("task_info.yaml")
+
+      // store the scores in a file
+      def score_uns = states.collect{it.score_uns}
+      def score_uns_yaml_blob = toYamlBlob(score_uns)
+      def score_uns_file = tempFile("score_uns.yaml")
+      score_uns_file.write(score_uns_yaml_blob)
+
+      def new_state = [
+        output_method_configs: method_configs_file,
+        output_metric_configs: metric_configs_file,
+        output_task_info: task_info_file,
+        output_scores: score_uns_file,
+        _meta: states[0]._meta
+      ]
+      
+      ["output", new_state]
+    }
+
+    // merge all of the output data 
+    | mix(dataset_meta_ch)
+    | joinStates{ ids, states ->
+      def mergedStates = states.inject([:]) { acc, m -> acc + m }
+      [ids[0], mergedStates]
+    }
+
+  emit:
+  output_ch
+
+}
diff --git a/src/tasks/match_modalities/workflows/run_benchmark/run_test.sh b/src/tasks/match_modalities/workflows/run_benchmark/run_test.sh
new file mode 100644
index 0000000000..ee7c4c9909
--- /dev/null
+++ b/src/tasks/match_modalities/workflows/run_benchmark/run_test.sh
@@ -0,0 +1,31 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+# export TOWER_WORKSPACE_ID=53907369739130
+
+DATASETS_DIR="resources_test/match_modalities"
+OUTPUT_DIR="resources_test/match_modalities/benchmarks/openproblems_v1"
+
+if [ ! -d "$OUTPUT_DIR" ]; then
+  mkdir -p "$OUTPUT_DIR"
+fi
+
+export NXF_VER=22.04.5
+nextflow run . \
+  -main-script target/nextflow/match_modalities/workflows/run_benchmark/main.nf \
+  -profile docker \
+  -resume \
+  -entry auto \
+  -c src/wf_utils/labels_ci.config \
+  --id resources_test \
+  --input_states "$DATASETS_DIR/**/state.yaml" \
+  --rename_keys 'input_mod1:output_mod1,input_mod2:output_mod2,input_solution_mod1:output_solution_mod1,input_solution_mod2:output_solution_mod2' \
+  --settings '{"output_scores": "scores.yaml", "output_dataset_info": "dataset_info.yaml", "output_method_configs": "method_configs.yaml", "output_metric_configs": "metric_configs.yaml", "output_task_info": "task_info.yaml"}' \
+  --publish_dir "$OUTPUT_DIR"
\ No newline at end of file
diff --git a/src/tasks/predict_modality/README.md b/src/tasks/predict_modality/README.md
new file mode 100644
index 0000000000..4b361c52fb
--- /dev/null
+++ b/src/tasks/predict_modality/README.md
@@ -0,0 +1,486 @@
+# Predict Modality
+
+
+Predicting the profiles of one modality (e.g. protein abundance) from
+another (e.g. mRNA expression).
+
+Path:
+[`src/tasks/predict_modality`](https://github.com/openproblems-bio/openproblems/tree/main/src/tasks/predict_modality)
+
+## Motivation
+
+Experimental techniques to measure multiple modalities within the same
+single cell are increasingly becoming available. The demand for these
+measurements is driven by the promise to provide a deeper insight into
+the state of a cell. Yet, the modalities are also intrinsically linked.
+We know that DNA must be accessible (ATAC data) to produce mRNA
+(expression data), and mRNA in turn is used as a template to produce
+protein (protein abundance). These processes are regulated often by the
+same molecules that they produce: for example, a protein may bind DNA to
+prevent the production of more mRNA. Understanding these regulatory
+processes would be transformative for synthetic biology and drug target
+discovery. Any method that can predict a modality from another must have
+accounted for these regulatory processes, but the demand for multi-modal
+data shows that this is not trivial.
+
+## Description
+
+In this task, the goal is to take one modality and predict the other
+modality for all features in each cell. This task requires translating
+information between multiple layers of gene regulation. In some ways,
+this is similar to the task of machine translation. In machine
+translation, the same sentiment is expressed in multiple languages and
+the goal is to train a model to represent the same meaning in a
+different language. In this context, the same cellular state is measured
+in two different feature sets and the goal of this task is to translate
+the information about cellular state from one modality to the other.
+
+## Authors & contributors
+
+| name               | roles              |
+|:-------------------|:-------------------|
+| Robrecht Cannoodt  | author, maintainer |
+| Kai Waldrant       | contributor        |
+| Louise Deconinck   | author             |
+| Alex Tong          | author             |
+| Bastian Rieck      | author             |
+| Daniel Burkhardt   | author             |
+| Alejandro Granados | author             |
+
+## API
+
+``` mermaid
+flowchart LR
+  file_common_dataset_mod1("Raw dataset RNA")
+  comp_process_dataset[/"Data processor"/]
+  file_train_mod1("Train mod1")
+  file_train_mod2("Train mod2")
+  file_test_mod1("Test mod1")
+  file_test_mod2("Test mod2")
+  comp_control_method[/"Control method"/]
+  comp_method[/"Method"/]
+  comp_metric[/"Metric"/]
+  file_prediction("Prediction")
+  file_score("Score")
+  file_common_dataset_mod2("Raw dataset mod2")
+  file_common_dataset_mod1---comp_process_dataset
+  comp_process_dataset-->file_train_mod1
+  comp_process_dataset-->file_train_mod2
+  comp_process_dataset-->file_test_mod1
+  comp_process_dataset-->file_test_mod2
+  file_train_mod1---comp_control_method
+  file_train_mod1---comp_method
+  file_train_mod2---comp_control_method
+  file_train_mod2---comp_method
+  file_test_mod1---comp_control_method
+  file_test_mod1---comp_method
+  file_test_mod2---comp_control_method
+  file_test_mod2---comp_metric
+  comp_control_method-->file_prediction
+  comp_method-->file_prediction
+  comp_metric-->file_score
+  file_prediction---comp_metric
+  file_common_dataset_mod2---comp_process_dataset
+```
+
+## File format: Raw dataset RNA
+
+The RNA modality of the raw dataset.
+
+Example file:
+`resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod1.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'batch', 'size_factors'
+     var: 'feature_id', 'feature_name'
+     obsm: 'gene_activity'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id', 'gene_activity_var_names'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                             | Type      | Description                                                                    |
+|:---------------------------------|:----------|:-------------------------------------------------------------------------------|
+| `obs["batch"]`                   | `string`  | Batch information.                                                             |
+| `obs["size_factors"]`            | `double`  | (*Optional*) The size factors of the cells prior to normalization.             |
+| `var["feature_id"]`              | `string`  | Unique identifier for the feature, usually a ENSEMBL gene id.                  |
+| `var["feature_name"]`            | `string`  | A human-readable name for the feature, usually a gene symbol.                  |
+| `obsm["gene_activity"]`          | `double`  | (*Optional*) ATAC gene activity.                                               |
+| `layers["counts"]`               | `integer` | Raw counts.                                                                    |
+| `layers["normalized"]`           | `double`  | Normalized expression values.                                                  |
+| `uns["dataset_id"]`              | `string`  | A unique identifier for the dataset.                                           |
+| `uns["dataset_name"]`            | `string`  | Nicely formatted name.                                                         |
+| `uns["dataset_url"]`             | `string`  | (*Optional*) Link to the original source of the dataset.                       |
+| `uns["dataset_reference"]`       | `string`  | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]`         | `string`  | Short description of the dataset.                                              |
+| `uns["dataset_description"]`     | `string`  | Long description of the dataset.                                               |
+| `uns["dataset_organism"]`        | `string`  | (*Optional*) The organism of the sample in the dataset.                        |
+| `uns["normalization_id"]`        | `string`  | The unique identifier of the normalization method used.                        |
+| `uns["gene_activity_var_names"]` | `string`  | (*Optional*) Names of the gene activity matrix.                                |
+
+</div>
+
+## Component type: Data processor
+
+Path:
+[`src/predict_modality`](https://github.com/openproblems-bio/openproblems/tree/main/src/predict_modality)
+
+A predict modality dataset processor.
+
+Arguments:
+
+<div class="small">
+
+| Name                  | Type      | Description                                                                |
+|:----------------------|:----------|:---------------------------------------------------------------------------|
+| `--input_mod1`        | `file`    | The RNA modality of the raw dataset.                                       |
+| `--input_mod2`        | `file`    | The second modality of the raw dataset. Must be an ADT or an ATAC dataset. |
+| `--output_train_mod1` | `file`    | (*Output*) The mod1 expression values of the train cells.                  |
+| `--output_train_mod2` | `file`    | (*Output*) The mod2 expression values of the train cells.                  |
+| `--output_test_mod1`  | `file`    | (*Output*) The mod1 expression values of the test cells.                   |
+| `--output_test_mod2`  | `file`    | (*Output*) The mod2 expression values of the test cells.                   |
+| `--seed`              | `integer` | (*Optional*) The seed for determining the train/test split. Default: `1`.  |
+
+</div>
+
+## File format: Train mod1
+
+The mod1 expression values of the train cells.
+
+Example file:
+`resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'batch', 'size_factors'
+     var: 'gene_ids'
+     obsm: 'gene_activity'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'common_dataset_id', 'dataset_organism', 'normalization_id', 'gene_activity_var_names'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                             | Type      | Description                                                        |
+|:---------------------------------|:----------|:-------------------------------------------------------------------|
+| `obs["batch"]`                   | `string`  | Batch information.                                                 |
+| `obs["size_factors"]`            | `double`  | (*Optional*) The size factors of the cells prior to normalization. |
+| `var["gene_ids"]`                | `string`  | (*Optional*) The gene identifiers (if available).                  |
+| `obsm["gene_activity"]`          | `double`  | (*Optional*) ATAC gene activity.                                   |
+| `layers["counts"]`               | `integer` | Raw counts.                                                        |
+| `layers["normalized"]`           | `double`  | Normalized expression values.                                      |
+| `uns["dataset_id"]`              | `string`  | A unique identifier for the dataset.                               |
+| `uns["common_dataset_id"]`       | `string`  | (*Optional*) A common identifier for the dataset.                  |
+| `uns["dataset_organism"]`        | `string`  | (*Optional*) The organism of the sample in the dataset.            |
+| `uns["normalization_id"]`        | `string`  | The unique identifier of the normalization method used.            |
+| `uns["gene_activity_var_names"]` | `string`  | (*Optional*) Names of the gene activity matrix.                    |
+
+</div>
+
+## File format: Train mod2
+
+The mod2 expression values of the train cells.
+
+Example file:
+`resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'batch', 'size_factors'
+     var: 'gene_ids'
+     obsm: 'gene_activity'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'common_dataset_id', 'dataset_organism', 'normalization_id', 'gene_activity_var_names'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                             | Type      | Description                                                        |
+|:---------------------------------|:----------|:-------------------------------------------------------------------|
+| `obs["batch"]`                   | `string`  | Batch information.                                                 |
+| `obs["size_factors"]`            | `double`  | (*Optional*) The size factors of the cells prior to normalization. |
+| `var["gene_ids"]`                | `string`  | (*Optional*) The gene identifiers (if available).                  |
+| `obsm["gene_activity"]`          | `double`  | (*Optional*) ATAC gene activity.                                   |
+| `layers["counts"]`               | `integer` | Raw counts.                                                        |
+| `layers["normalized"]`           | `double`  | Normalized expression values.                                      |
+| `uns["dataset_id"]`              | `string`  | A unique identifier for the dataset.                               |
+| `uns["common_dataset_id"]`       | `string`  | (*Optional*) A common identifier for the dataset.                  |
+| `uns["dataset_organism"]`        | `string`  | (*Optional*) The organism of the sample in the dataset.            |
+| `uns["normalization_id"]`        | `string`  | The unique identifier of the normalization method used.            |
+| `uns["gene_activity_var_names"]` | `string`  | (*Optional*) Names of the gene activity matrix.                    |
+
+</div>
+
+## File format: Test mod1
+
+The mod1 expression values of the test cells.
+
+Example file:
+`resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'batch', 'size_factors'
+     var: 'gene_ids'
+     obsm: 'gene_activity'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'common_dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id', 'gene_activity_var_names'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                             | Type      | Description                                                                    |
+|:---------------------------------|:----------|:-------------------------------------------------------------------------------|
+| `obs["batch"]`                   | `string`  | Batch information.                                                             |
+| `obs["size_factors"]`            | `double`  | (*Optional*) The size factors of the cells prior to normalization.             |
+| `var["gene_ids"]`                | `string`  | (*Optional*) The gene identifiers (if available).                              |
+| `obsm["gene_activity"]`          | `double`  | (*Optional*) ATAC gene activity.                                               |
+| `layers["counts"]`               | `integer` | Raw counts.                                                                    |
+| `layers["normalized"]`           | `double`  | Normalized expression values.                                                  |
+| `uns["dataset_id"]`              | `string`  | A unique identifier for the dataset.                                           |
+| `uns["common_dataset_id"]`       | `string`  | (*Optional*) A common identifier for the dataset.                              |
+| `uns["dataset_name"]`            | `string`  | Nicely formatted name.                                                         |
+| `uns["dataset_url"]`             | `string`  | (*Optional*) Link to the original source of the dataset.                       |
+| `uns["dataset_reference"]`       | `string`  | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]`         | `string`  | Short description of the dataset.                                              |
+| `uns["dataset_description"]`     | `string`  | Long description of the dataset.                                               |
+| `uns["dataset_organism"]`        | `string`  | (*Optional*) The organism of the sample in the dataset.                        |
+| `uns["normalization_id"]`        | `string`  | The unique identifier of the normalization method used.                        |
+| `uns["gene_activity_var_names"]` | `string`  | (*Optional*) Names of the gene activity matrix.                                |
+
+</div>
+
+## File format: Test mod2
+
+The mod2 expression values of the test cells.
+
+Example file:
+`resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'batch', 'size_factors'
+     var: 'gene_ids'
+     obsm: 'gene_activity'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'common_dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'gene_activity_var_names'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                             | Type      | Description                                                                    |
+|:---------------------------------|:----------|:-------------------------------------------------------------------------------|
+| `obs["batch"]`                   | `string`  | Batch information.                                                             |
+| `obs["size_factors"]`            | `double`  | (*Optional*) The size factors of the cells prior to normalization.             |
+| `var["gene_ids"]`                | `string`  | (*Optional*) The gene identifiers (if available).                              |
+| `obsm["gene_activity"]`          | `double`  | (*Optional*) ATAC gene activity.                                               |
+| `layers["counts"]`               | `integer` | Raw counts.                                                                    |
+| `layers["normalized"]`           | `double`  | Normalized expression values.                                                  |
+| `uns["dataset_id"]`              | `string`  | A unique identifier for the dataset.                                           |
+| `uns["common_dataset_id"]`       | `string`  | (*Optional*) A common identifier for the dataset.                              |
+| `uns["dataset_name"]`            | `string`  | Nicely formatted name.                                                         |
+| `uns["dataset_url"]`             | `string`  | (*Optional*) Link to the original source of the dataset.                       |
+| `uns["dataset_reference"]`       | `string`  | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]`         | `string`  | Short description of the dataset.                                              |
+| `uns["dataset_description"]`     | `string`  | Long description of the dataset.                                               |
+| `uns["dataset_organism"]`        | `string`  | (*Optional*) The organism of the sample in the dataset.                        |
+| `uns["gene_activity_var_names"]` | `string`  | (*Optional*) Names of the gene activity matrix.                                |
+
+</div>
+
+## Component type: Control method
+
+Path:
+[`src/predict_modality/control_methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/predict_modality/control_methods)
+
+Quality control methods for verifying the pipeline.
+
+Arguments:
+
+<div class="small">
+
+| Name                 | Type   | Description                                                              |
+|:---------------------|:-------|:-------------------------------------------------------------------------|
+| `--input_train_mod1` | `file` | The mod1 expression values of the train cells.                           |
+| `--input_train_mod2` | `file` | The mod2 expression values of the train cells.                           |
+| `--input_test_mod1`  | `file` | The mod1 expression values of the test cells.                            |
+| `--input_test_mod2`  | `file` | The mod2 expression values of the test cells.                            |
+| `--output`           | `file` | (*Output*) A prediction of the mod2 expression values of the test cells. |
+
+</div>
+
+## Component type: Method
+
+Path:
+[`src/predict_modality/methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/predict_modality/methods)
+
+A regression method.
+
+Arguments:
+
+<div class="small">
+
+| Name                 | Type   | Description                                                              |
+|:---------------------|:-------|:-------------------------------------------------------------------------|
+| `--input_train_mod1` | `file` | The mod1 expression values of the train cells.                           |
+| `--input_train_mod2` | `file` | The mod2 expression values of the train cells.                           |
+| `--input_test_mod1`  | `file` | The mod1 expression values of the test cells.                            |
+| `--output`           | `file` | (*Output*) A prediction of the mod2 expression values of the test cells. |
+
+</div>
+
+## Component type: Metric
+
+Path:
+[`src/predict_modality/metrics`](https://github.com/openproblems-bio/openproblems/tree/main/src/predict_modality/metrics)
+
+A predict modality metric.
+
+Arguments:
+
+<div class="small">
+
+| Name                 | Type   | Description                                                   |
+|:---------------------|:-------|:--------------------------------------------------------------|
+| `--input_prediction` | `file` | A prediction of the mod2 expression values of the test cells. |
+| `--input_test_mod2`  | `file` | The mod2 expression values of the test cells.                 |
+| `--output`           | `file` | (*Output*) Metric score file.                                 |
+
+</div>
+
+## File format: Prediction
+
+A prediction of the mod2 expression values of the test cells
+
+Example file:
+`resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     layers: 'normalized'
+     uns: 'dataset_id', 'method_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                   | Type     | Description                             |
+|:-----------------------|:---------|:----------------------------------------|
+| `layers["normalized"]` | `double` | Predicted normalized expression values. |
+| `uns["dataset_id"]`    | `string` | A unique identifier for the dataset.    |
+| `uns["method_id"]`     | `string` | A unique identifier for the method.     |
+
+</div>
+
+## File format: Score
+
+Metric score file
+
+Example file:
+`resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/score.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                   | Type     | Description                                                                                  |
+|:-----------------------|:---------|:---------------------------------------------------------------------------------------------|
+| `uns["dataset_id"]`    | `string` | A unique identifier for the dataset.                                                         |
+| `uns["method_id"]`     | `string` | A unique identifier for the method.                                                          |
+| `uns["metric_ids"]`    | `string` | One or more unique metric identifiers.                                                       |
+| `uns["metric_values"]` | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |
+
+</div>
+
+## File format: Raw dataset mod2
+
+The second modality of the raw dataset. Must be an ADT or an ATAC
+dataset
+
+Example file:
+`resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod2.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'batch', 'size_factors'
+     var: 'feature_id', 'feature_name'
+     obsm: 'gene_activity'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id', 'gene_activity_var_names'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                             | Type      | Description                                                                    |
+|:---------------------------------|:----------|:-------------------------------------------------------------------------------|
+| `obs["batch"]`                   | `string`  | Batch information.                                                             |
+| `obs["size_factors"]`            | `double`  | (*Optional*) The size factors of the cells prior to normalization.             |
+| `var["feature_id"]`              | `string`  | Unique identifier for the feature, usually a ENSEMBL gene id.                  |
+| `var["feature_name"]`            | `string`  | A human-readable name for the feature, usually a gene symbol.                  |
+| `obsm["gene_activity"]`          | `double`  | (*Optional*) ATAC gene activity.                                               |
+| `layers["counts"]`               | `integer` | Raw counts.                                                                    |
+| `layers["normalized"]`           | `double`  | Normalized expression values.                                                  |
+| `uns["dataset_id"]`              | `string`  | A unique identifier for the dataset.                                           |
+| `uns["dataset_name"]`            | `string`  | Nicely formatted name.                                                         |
+| `uns["dataset_url"]`             | `string`  | (*Optional*) Link to the original source of the dataset.                       |
+| `uns["dataset_reference"]`       | `string`  | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]`         | `string`  | Short description of the dataset.                                              |
+| `uns["dataset_description"]`     | `string`  | Long description of the dataset.                                               |
+| `uns["dataset_organism"]`        | `string`  | (*Optional*) The organism of the sample in the dataset.                        |
+| `uns["normalization_id"]`        | `string`  | The unique identifier of the normalization method used.                        |
+| `uns["gene_activity_var_names"]` | `string`  | (*Optional*) Names of the gene activity matrix.                                |
+
+</div>
+
diff --git a/src/tasks/predict_modality/api/comp_control_method.yaml b/src/tasks/predict_modality/api/comp_control_method.yaml
new file mode 100644
index 0000000000..82ab6e441f
--- /dev/null
+++ b/src/tasks/predict_modality/api/comp_control_method.yaml
@@ -0,0 +1,42 @@
+functionality:
+  namespace: "predict_modality/control_methods"
+  info:
+    type: control_method
+    preferred_normalization: counts # there is currently only one type of normalization
+    type_info:
+      label: Control method
+      summary: Quality control methods for verifying the pipeline.
+      description: |
+        These components have the same interface as the regular methods
+        but also receive the solution object as input. It serves as a
+        starting point to test the relative accuracy of new methods in
+        the task, and also as a quality control for the metrics defined
+        in the task. 
+  arguments:
+    - name: "--input_train_mod1"
+      __merge__: file_train_mod1.yaml
+      direction: input
+      required: true
+    - name: "--input_train_mod2"
+      __merge__: file_train_mod2.yaml
+      direction: input
+      required: true
+    - name: "--input_test_mod1"
+      __merge__: file_test_mod1.yaml
+      direction: input
+      required: true
+    - name: "--input_test_mod2"
+      __merge__: file_test_mod2.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_prediction.yaml
+      direction: output
+      required: true
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap
+      dest: resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap
\ No newline at end of file
diff --git a/src/tasks/predict_modality/api/comp_method.yaml b/src/tasks/predict_modality/api/comp_method.yaml
new file mode 100644
index 0000000000..49ccc1e27b
--- /dev/null
+++ b/src/tasks/predict_modality/api/comp_method.yaml
@@ -0,0 +1,34 @@
+functionality:
+  namespace: "predict_modality/methods"
+  info:
+    type: method
+    type_info:
+      label: Method
+      summary: A regression method.
+      description: |
+        A regression method to predict the expression of one modality from another.
+  arguments:
+    - name: "--input_train_mod1"
+      __merge__: file_train_mod1.yaml
+      direction: input
+      required: true
+    - name: "--input_train_mod2"
+      __merge__: file_train_mod2.yaml
+      direction: input
+      required: true
+    - name: "--input_test_mod1"
+      __merge__: file_test_mod1.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_prediction.yaml
+      direction: output
+      required: true
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap
+      dest: resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap
+    - path: /src/common/library.bib
\ No newline at end of file
diff --git a/src/tasks/predict_modality/api/comp_method_predict.yaml b/src/tasks/predict_modality/api/comp_method_predict.yaml
new file mode 100644
index 0000000000..a43cd1e5c5
--- /dev/null
+++ b/src/tasks/predict_modality/api/comp_method_predict.yaml
@@ -0,0 +1,30 @@
+functionality:
+  namespace: "predict_modality/methods"
+  info:
+    type: method_predict
+    type_info:
+      label: Predict
+      summary: Make predictions using a trained model.
+      description: |
+        This method makes predictions using a trained model.
+  arguments:
+    - name: "--input_train_mod1"
+      __merge__: file_train_mod1.yaml
+      direction: input
+      required: false
+    - name: "--input_train_mod2"
+      __merge__: file_train_mod2.yaml
+      direction: input
+      required: false
+    - name: "--input_test_mod1"
+      __merge__: file_test_mod1.yaml
+      direction: input
+      required: true
+    - name: "--input_model"
+      __merge__: file_pretrained_model.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_prediction.yaml
+      direction: output
+      required: true
\ No newline at end of file
diff --git a/src/tasks/predict_modality/api/comp_method_train.yaml b/src/tasks/predict_modality/api/comp_method_train.yaml
new file mode 100644
index 0000000000..3f07c1efcf
--- /dev/null
+++ b/src/tasks/predict_modality/api/comp_method_train.yaml
@@ -0,0 +1,26 @@
+functionality:
+  namespace: "predict_modality/methods"
+  info:
+    type: method_train
+    type_info:
+      label: Train
+      summary: Train a model to predict the expression of one modality from another.
+      description: |
+        This method trains a model to predict the expression of one modality from another.
+  arguments:
+    - name: "--input_train_mod1"
+      __merge__: file_train_mod1.yaml
+      direction: input
+      required: true
+    - name: "--input_train_mod2"
+      __merge__: file_train_mod2.yaml
+      direction: input
+      required: true
+    - name: "--input_test_mod1"
+      __merge__: file_test_mod1.yaml
+      direction: input
+      required: false
+    - name: "--output"
+      __merge__: file_pretrained_model.yaml
+      direction: output
+      required: true
\ No newline at end of file
diff --git a/src/tasks/predict_modality/api/comp_metric.yaml b/src/tasks/predict_modality/api/comp_metric.yaml
new file mode 100644
index 0000000000..c85f900e46
--- /dev/null
+++ b/src/tasks/predict_modality/api/comp_metric.yaml
@@ -0,0 +1,30 @@
+functionality:
+  namespace: "predict_modality/metrics"
+  info:
+    type: metric
+    type_info:
+      label: Metric
+      summary: A predict modality metric.
+      description: |
+        A metric for evaluating predicted expression.
+  arguments:
+    - name: --input_prediction
+      __merge__: file_prediction.yaml
+      direction: input
+      required: true
+    - name: --input_test_mod2
+      __merge__: file_test_mod2.yaml
+      direction: input
+      required: true
+    - name: --output
+      __merge__: file_score.yaml
+      direction: output
+      required: true
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/check_metric_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap
+      dest: resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap
+    - path: /src/common/library.bib
\ No newline at end of file
diff --git a/src/tasks/predict_modality/api/comp_process_dataset.yaml b/src/tasks/predict_modality/api/comp_process_dataset.yaml
new file mode 100644
index 0000000000..c2c5feb2eb
--- /dev/null
+++ b/src/tasks/predict_modality/api/comp_process_dataset.yaml
@@ -0,0 +1,43 @@
+functionality:
+  namespace: "predict_modality"
+  info:
+    type: process_dataset
+    type_info:
+      label: Data processor
+      summary: A predict modality dataset processor.
+      description: |
+        A component for processing a Common Dataset into a task-specific dataset.
+  arguments:
+    - name: "--input_mod1"
+      __merge__: file_common_dataset_mod1.yaml
+      direction: input
+      required: true
+    - name: "--input_mod2"
+      __merge__: file_common_dataset_mod2.yaml
+      direction: input
+      required: true
+    - name: "--output_train_mod1"
+      __merge__: file_train_mod1.yaml
+      direction: output
+      required: true
+    - name: "--output_train_mod2"
+      __merge__: file_train_mod2.yaml
+      direction: output
+      required: true
+    - name: "--output_test_mod1"
+      __merge__: file_test_mod1.yaml
+      direction: "output"
+      required: true
+    - name: "--output_test_mod2"
+      __merge__: file_test_mod2.yaml
+      direction: output
+      required: true
+    - name: "--seed"
+      type: integer
+      default: 1
+      description: "The seed for determining the train/test split."
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/common/openproblems_neurips2021/bmmc_cite
+      dest: resources_test/common/openproblems_neurips2021/bmmc_cite
\ No newline at end of file
diff --git a/src/tasks/predict_modality/api/file_common_dataset_mod1.yaml b/src/tasks/predict_modality/api/file_common_dataset_mod1.yaml
new file mode 100644
index 0000000000..4824a05c46
--- /dev/null
+++ b/src/tasks/predict_modality/api/file_common_dataset_mod1.yaml
@@ -0,0 +1,98 @@
+type: file
+example: "resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod1.h5ad"
+info:
+  label: "Raw dataset RNA"
+  summary: "The RNA modality of the raw dataset."
+  slots:
+    layers:
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized expression values
+        required: true
+    obs:
+      - type: string
+        name: batch
+        description: Batch information
+        required: true
+      - type: double
+        name: size_factors
+        description: The size factors of the cells prior to normalization.
+        required: false
+    var:
+      - type: string
+        name: feature_id
+        description: Unique identifier for the feature, usually a ENSEMBL gene id.
+        # TODO: make this required once openproblems_v1 dataloader supports it
+        required: true
+      
+      - type: string
+        name: feature_name
+        description: A human-readable name for the feature, usually a gene symbol.
+        # TODO: make this required once the dataloader supports it
+        required: false
+
+      - type: boolean
+        name: hvg
+        description: Whether or not the feature is considered to be a 'highly variable gene'
+        required: true
+        
+      - type: double
+        name: hvg_score
+        description: A score for the feature indicating how highly variable it is.
+        required: true
+        
+      - type: boolean
+        name: hvg
+        description: Whether or not the feature is considered to be a 'highly variable gene'
+        required: true
+
+      - type: double
+        name: hvg_score
+        description: A ranking of the features by hvg.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - name: normalization_id
+        type: string
+        description: The unique identifier of the normalization method used.
+        required: true
+      - type: string
+        name: gene_activity_var_names
+        description: "Names of the gene activity matrix"
+        required: false
+    obsm:
+      - type: double
+        name: gene_activity
+        description: ATAC gene activity
+        required: false
\ No newline at end of file
diff --git a/src/tasks/predict_modality/api/file_common_dataset_mod2.yaml b/src/tasks/predict_modality/api/file_common_dataset_mod2.yaml
new file mode 100644
index 0000000000..e0b1b3bae9
--- /dev/null
+++ b/src/tasks/predict_modality/api/file_common_dataset_mod2.yaml
@@ -0,0 +1,98 @@
+type: file
+example: "resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod2.h5ad"
+info:
+  label: "Raw dataset mod2"
+  summary: "The second modality of the raw dataset. Must be an ADT or an ATAC dataset"
+  slots:
+    layers:
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized expression values
+        required: true
+    obs:
+      - type: string
+        name: batch
+        description: Batch information
+        required: true
+      - type: double
+        name: size_factors
+        description: The size factors of the cells prior to normalization.
+        required: false
+    var:
+      - type: string
+        name: feature_id
+        description: Unique identifier for the feature, usually a ENSEMBL gene id.
+        # TODO: make this required once openproblems_v1 dataloader supports it
+        required: true
+      
+      - type: string
+        name: feature_name
+        description: A human-readable name for the feature, usually a gene symbol.
+        # TODO: make this required once the dataloader supports it
+        required: false
+
+      - type: boolean
+        name: hvg
+        description: Whether or not the feature is considered to be a 'highly variable gene'
+        required: true
+        
+      - type: double
+        name: hvg_score
+        description: A score for the feature indicating how highly variable it is.
+        required: true
+      
+      - type: boolean
+        name: hvg
+        description: Whether or not the feature is considered to be a 'highly variable gene'
+        required: true
+
+      - type: double
+        name: hvg_score
+        description: A ranking of the features by hvg.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - name: normalization_id
+        type: string
+        description: The unique identifier of the normalization method used.
+        required: true
+      - type: string
+        name: gene_activity_var_names
+        description: "Names of the gene activity matrix"
+        required: false
+    obsm:
+      - type: double
+        name: gene_activity
+        description: ATAC gene activity
+        required: false
\ No newline at end of file
diff --git a/src/tasks/predict_modality/api/file_prediction.yaml b/src/tasks/predict_modality/api/file_prediction.yaml
new file mode 100644
index 0000000000..0464b323d1
--- /dev/null
+++ b/src/tasks/predict_modality/api/file_prediction.yaml
@@ -0,0 +1,20 @@
+type: file
+example: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+info:
+  label: "Prediction"
+  summary: "A prediction of the mod2 expression values of the test cells"
+  slots:
+    layers:
+      - type: double
+        name: normalized
+        description: Predicted normalized expression values
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: method_id
+        description: "A unique identifier for the method"
+        required: true
\ No newline at end of file
diff --git a/src/tasks/predict_modality/api/file_pretrained_model.yaml b/src/tasks/predict_modality/api/file_pretrained_model.yaml
new file mode 100644
index 0000000000..f8c4a717ac
--- /dev/null
+++ b/src/tasks/predict_modality/api/file_pretrained_model.yaml
@@ -0,0 +1,4 @@
+type: file
+info:
+  label: "Pretrained model"
+  summary: "A pretrained model for predicting the expression of one modality from another."
diff --git a/src/tasks/predict_modality/api/file_score.yaml b/src/tasks/predict_modality/api/file_score.yaml
new file mode 100644
index 0000000000..928e18eebf
--- /dev/null
+++ b/src/tasks/predict_modality/api/file_score.yaml
@@ -0,0 +1,25 @@
+type: file
+example: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/score.h5ad"
+info:
+  label: "Score"
+  summary: "Metric score file"
+  slots:
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: method_id
+        description: "A unique identifier for the method"
+        required: true
+      - type: string
+        name: metric_ids
+        description: "One or more unique metric identifiers"
+        multiple: true
+        required: true
+      - type: double
+        name: metric_values
+        description: "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'."
+        multiple: true
+        required: true
diff --git a/src/tasks/predict_modality/api/file_test_mod1.yaml b/src/tasks/predict_modality/api/file_test_mod1.yaml
new file mode 100644
index 0000000000..fa67672104
--- /dev/null
+++ b/src/tasks/predict_modality/api/file_test_mod1.yaml
@@ -0,0 +1,85 @@
+type: file
+example: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+info:
+  label: "Test mod1"
+  summary: "The mod1 expression values of the test cells."
+  slots:
+    layers:
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized expression values
+        required: true
+    obs:
+      - type: string
+        name: batch
+        description: Batch information
+        required: true
+      - type: double
+        name: size_factors
+        description: The size factors of the cells prior to normalization.
+        required: false
+    var:
+      - type: string
+        name: gene_ids
+        description: The gene identifiers (if available)
+        required: false
+
+      - type: boolean
+        name: hvg
+        description: Whether or not the feature is considered to be a 'highly variable gene'
+        required: true
+        
+      - type: double
+        name: hvg_score
+        description: A score for the feature indicating how highly variable it is.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: common_dataset_id
+        description: "A common identifier for the dataset"
+        required: false
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - name: normalization_id
+        type: string
+        description: The unique identifier of the normalization method used.
+        required: true
+      - type: string
+        name: gene_activity_var_names
+        description: "Names of the gene activity matrix"
+        required: false
+    obsm:
+      - type: double
+        name: gene_activity
+        description: ATAC gene activity
+        required: false
diff --git a/src/tasks/predict_modality/api/file_test_mod2.yaml b/src/tasks/predict_modality/api/file_test_mod2.yaml
new file mode 100644
index 0000000000..417edf6162
--- /dev/null
+++ b/src/tasks/predict_modality/api/file_test_mod2.yaml
@@ -0,0 +1,81 @@
+type: file
+example: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+info:
+  label: "Test mod2"
+  summary: "The mod2 expression values of the test cells."
+  slots:
+    layers:
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized expression values
+        required: true
+    obs:
+      - type: string
+        name: batch
+        description: Batch information
+        required: true
+      - type: double
+        name: size_factors
+        description: The size factors of the cells prior to normalization.
+        required: false
+    var:
+      - type: string
+        name: gene_ids
+        description: The gene identifiers (if available)
+        required: false
+
+      - type: boolean
+        name: hvg
+        description: Whether or not the feature is considered to be a 'highly variable gene'
+        required: true
+        
+      - type: double
+        name: hvg_score
+        description: A score for the feature indicating how highly variable it is.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: common_dataset_id
+        description: "A common identifier for the dataset"
+        required: false
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - type: string
+        name: gene_activity_var_names
+        description: "Names of the gene activity matrix"
+        required: false
+    obsm:
+      - type: double
+        name: gene_activity
+        description: ATAC gene activity
+        required: false
\ No newline at end of file
diff --git a/src/tasks/predict_modality/api/file_train_mod1.yaml b/src/tasks/predict_modality/api/file_train_mod1.yaml
new file mode 100644
index 0000000000..a4919ee7bd
--- /dev/null
+++ b/src/tasks/predict_modality/api/file_train_mod1.yaml
@@ -0,0 +1,65 @@
+type: file
+example: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+info:
+  label: "Train mod1"
+  summary: "The mod1 expression values of the train cells."
+  slots:
+    layers:
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized expression values
+        required: true
+    obs:
+      - type: string
+        name: batch
+        description: Batch information
+        required: true
+      - type: double
+        name: size_factors
+        description: The size factors of the cells prior to normalization.
+        required: false
+    var:
+      - type: string
+        name: gene_ids
+        description: The gene identifiers (if available)
+        required: false
+
+      - type: boolean
+        name: hvg
+        description: Whether or not the feature is considered to be a 'highly variable gene'
+        required: true
+        
+      - type: double
+        name: hvg_score
+        description: A score for the feature indicating how highly variable it is.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: common_dataset_id
+        description: "A common identifier for the dataset"
+        required: false
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - name: normalization_id
+        type: string
+        description: The unique identifier of the normalization method used.
+        required: true
+      - type: string
+        name: gene_activity_var_names
+        description: "Names of the gene activity matrix"
+        required: false
+    obsm:
+      - type: double
+        name: gene_activity
+        description: ATAC gene activity
+        required: false
\ No newline at end of file
diff --git a/src/tasks/predict_modality/api/file_train_mod2.yaml b/src/tasks/predict_modality/api/file_train_mod2.yaml
new file mode 100644
index 0000000000..dcbfae45de
--- /dev/null
+++ b/src/tasks/predict_modality/api/file_train_mod2.yaml
@@ -0,0 +1,65 @@
+type: file
+example: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+info:
+  label: "Train mod2"
+  summary: "The mod2 expression values of the train cells."
+  slots:
+    layers:
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+      - type: double
+        name: normalized
+        description: Normalized expression values
+        required: true
+    obs:
+      - type: string
+        name: batch
+        description: Batch information
+        required: true
+      - type: double
+        name: size_factors
+        description: The size factors of the cells prior to normalization.
+        required: false
+    var:
+      - type: string
+        name: gene_ids
+        description: The gene identifiers (if available)
+        required: false
+
+      - type: boolean
+        name: hvg
+        description: Whether or not the feature is considered to be a 'highly variable gene'
+        required: true
+        
+      - type: double
+        name: hvg_score
+        description: A score for the feature indicating how highly variable it is.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: common_dataset_id
+        description: "A common identifier for the dataset"
+        required: false
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - name: normalization_id
+        type: string
+        description: The unique identifier of the normalization method used.
+        required: true
+      - type: string
+        name: gene_activity_var_names
+        description: "Names of the gene activity matrix"
+        required: false
+    obsm:
+      - type: double
+        name: gene_activity
+        description: ATAC gene activity
+        required: false
\ No newline at end of file
diff --git a/src/tasks/predict_modality/api/task_info.yaml b/src/tasks/predict_modality/api/task_info.yaml
new file mode 100644
index 0000000000..e0d1ed9da7
--- /dev/null
+++ b/src/tasks/predict_modality/api/task_info.yaml
@@ -0,0 +1,67 @@
+name: predict_modality
+label: Predict Modality
+summary: "Predicting the profiles of one modality (e.g. protein abundance) from another (e.g. mRNA expression)."
+image: "thumbnail.svg"
+motivation: |
+  Experimental techniques to measure multiple modalities within the same single cell are increasingly becoming available. 
+  The demand for these measurements is driven by the promise to provide a deeper insight into the state of a cell. 
+  Yet, the modalities are also intrinsically linked. We know that DNA must be accessible (ATAC data) to produce mRNA 
+  (expression data), and mRNA in turn is used as a template to produce protein (protein abundance). These processes 
+  are regulated often by the same molecules that they produce: for example, a protein may bind DNA to prevent the production 
+  of more mRNA. Understanding these regulatory processes would be transformative for synthetic biology and drug target discovery. 
+  Any method that can predict a modality from another must have accounted for these regulatory processes, but the demand for 
+  multi-modal data shows that this is not trivial.
+description: |
+  In this task, the goal is to take one modality and predict the other modality for all
+  features in each cell. This task requires translating information between multiple layers of
+  gene regulation. In some ways, this is similar to the task of machine translation. In machine translation, the same
+  sentiment is expressed in multiple languages and the goal is to train a model to represent the same meaning in a different
+  language. In this context, the same cellular state is measured in two different feature sets and the goal of this task
+  is to translate the information about cellular state from one modality to the other.
+authors:
+  - name: Robrecht Cannoodt
+    roles: [ author, maintainer ]
+    info:
+      github: rcannood
+      orcid: "0000-0003-3641-729X"
+  - name: Kai Waldrant
+    roles: [ contributor ]
+    info: 
+      github: KaiWaldrant
+      orcid: "0009-0003-8555-1361"
+  - name: Louise Deconinck
+    roles: [ author ]
+    info:
+      github: LouiseDck
+  - name: Alex Tong
+    roles: [ author ]
+    info:
+      github: atong01
+  - name: Bastian Rieck
+    roles: [ author ]
+    info:
+      github: Pseudomanifold
+  - name: Daniel Burkhardt
+    roles: [ author ]
+    info:
+      github: dburkhardt
+  - name: Alejandro Granados
+    roles: [ author ]
+    info:
+      github: agranado
+  - name: Kaiwen Deng
+    roles: [ contributor ]
+    info:
+      email: dengkw@umich.edu
+      github: nonztalk
+  - name: Xueer Chen
+    roles: [ contributor ]
+    info:
+      github: xuerchen
+      email: xc2579@columbia.edu
+  - name: Jiwei Liu
+    roles: [ contributor ]
+    info:
+      github: daxiongshu
+      email: jiweil@nvidia.com
+      orcid: "0000-0002-8799-9763"
diff --git a/src/tasks/predict_modality/api/thumbnail.svg b/src/tasks/predict_modality/api/thumbnail.svg
new file mode 100644
index 0000000000..59436e6187
--- /dev/null
+++ b/src/tasks/predict_modality/api/thumbnail.svg
@@ -0,0 +1,666 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<svg
+   id="Layer_1"
+   viewBox="0 0 600 200"
+   version="1.1"
+   sodipodi:docname="thumbnail.svg"
+   inkscape:version="1.3.2 (091e20ef0f, 2023-11-25)"
+   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
+   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
+   xmlns="http://www.w3.org/2000/svg"
+   xmlns:svg="http://www.w3.org/2000/svg">
+  <sodipodi:namedview
+     id="namedview107"
+     pagecolor="#ffffff"
+     bordercolor="#000000"
+     borderopacity="0.25"
+     inkscape:showpageshadow="2"
+     inkscape:pageopacity="0.0"
+     inkscape:pagecheckerboard="0"
+     inkscape:deskcolor="#d1d1d1"
+     showguides="false"
+     inkscape:zoom="2.856329"
+     inkscape:cx="173.47441"
+     inkscape:cy="29.233327"
+     inkscape:window-width="3840"
+     inkscape:window-height="2091"
+     inkscape:window-x="0"
+     inkscape:window-y="32"
+     inkscape:window-maximized="1"
+     inkscape:current-layer="Layer_1" />
+  <defs
+     id="defs1">
+    <style
+       id="style1">.cls-1{stroke:#211f1f;stroke-width:3px;}.cls-1,.cls-2,.cls-3,.cls-4,.cls-5,.cls-6,.cls-7,.cls-8,.cls-9{stroke-miterlimit:10;}.cls-1,.cls-3{fill:none;}.cls-10{fill:#211f1f;}.cls-2{fill:#9b7ebd;stroke:#1b1464;}.cls-2,.cls-4,.cls-5,.cls-6,.cls-7,.cls-8,.cls-9{opacity:.9;}.cls-2,.cls-4,.cls-5,.cls-6,.cls-8,.cls-9{stroke-width:.5px;}.cls-11{font-family:ArialMT, Arial;font-size:16px;}.cls-3{stroke:#231f20;stroke-width:2px;}.cls-4{fill:#b7d59b;}.cls-4,.cls-9{stroke:#006837;}.cls-5{fill:#fba29a;stroke:#ff1d25;}.cls-6{fill:#fcd375;stroke:#f7931e;}.cls-7{fill:#ccc;stroke:gray;stroke-width:.75px;}.cls-8{fill:#69aee9;stroke:#0071bc;}.cls-9{fill:#00a99d;}</style>
+  </defs>
+  <rect
+     class="cls-25"
+     x="977.30408"
+     y="57.044724"
+     width="8.8868885"
+     height="53.964958"
+     id="rect32"
+     style="fill:#f67088;stroke-width:0.2693" />
+  <text
+     class="cls-26"
+     id="text32"
+     style="font-size:16px;font-family:'Open Sans';stroke-width:0.2693;fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1;-inkscape-font-specification:'Open Sans';font-weight:normal;font-style:normal;font-stretch:normal;font-variant:normal"
+     x="1004.7377"
+     y="139.2592">Gene</text>
+  <text
+     class="cls-26"
+     transform="rotate(-90)"
+     id="text33"
+     style="font-size:16px;font-family:'Open Sans';stroke-width:0.2693;fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1;-inkscape-font-specification:'Open Sans';font-weight:normal;font-style:normal;font-stretch:normal;font-variant:normal"
+     x="-96.661392"
+     y="957.60754">Expression</text>
+  <text
+     class="cls-27"
+     id="text34"
+     style="font-size:16px;font-family:'Open Sans';letter-spacing:-0.06em;stroke-width:0.2693;fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1;-inkscape-font-specification:'Open Sans';font-weight:normal;font-style:normal;font-stretch:normal;font-variant:normal"
+     x="979.78705"
+     y="124.80861">A</text>
+  <text
+     class="cls-28"
+     id="text35"
+     style="font-size:16px;font-family:'Open Sans';stroke-width:0.2693;fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1;-inkscape-font-specification:'Open Sans';font-weight:normal;font-style:normal;font-stretch:normal;font-variant:normal"
+     x="1018.2915"
+     y="124.80861">B</text>
+  <text
+     class="cls-28"
+     id="text36"
+     style="font-size:16px;font-family:'Open Sans';stroke-width:0.2693;fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1;-inkscape-font-specification:'Open Sans';font-weight:normal;font-style:normal;font-stretch:normal;font-variant:normal"
+     x="1056.4271"
+     y="124.80861">C</text>
+  <rect
+     class="cls-29"
+     x="989.80768"
+     y="72.483658"
+     width="8.8868885"
+     height="38.523319"
+     id="rect36"
+     style="fill:#f6dde2;stroke:#f67088;stroke-width:0.355476px;stroke-miterlimit:10" />
+  <polyline
+     class="cls-30"
+     points="942.24 125.3 942.24 432.8 1344.53 432.8"
+     id="polyline36"
+     style="fill:none;stroke:#000000;stroke-width:4.43px;stroke-miterlimit:10"
+     transform="matrix(0.26929966,0,0,0.26929966,712.77374,-5.5459008)" />
+  <rect
+     class="cls-31"
+     x="983.1991"
+     y="152.5276"
+     width="8.8034058"
+     height="8.8034058"
+     id="rect37"
+     style="fill:#f67088;stroke:#f67088;stroke-width:0.37702px;stroke-miterlimit:10" />
+  <rect
+     class="cls-32"
+     x="996.19281"
+     y="152.5276"
+     width="8.8034058"
+     height="8.8034058"
+     id="rect38-3"
+     style="fill:#98c456;stroke-width:0.2693" />
+  <rect
+     class="cls-33"
+     x="1009.1865"
+     y="152.38759"
+     width="8.8034058"
+     height="8.8034058"
+     id="rect39-5"
+     style="fill:#7bcbe0;stroke-width:0.2693" />
+  <text
+     class="cls-34"
+     id="text39"
+     style="font-size:16px;font-family:'Open Sans';letter-spacing:-0.04em;stroke-width:0.2693;fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1;-inkscape-font-specification:'Open Sans';font-weight:normal;font-style:normal;font-stretch:normal;font-variant:normal"
+     x="1023.8579"
+     y="160.93517">T<tspan
+   class="cls-35"
+   x="1029.9764"
+   y="160.93517"
+   id="tspan39"
+   style="stroke-width:0.2693;fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1;-inkscape-font-specification:'Open Sans';font-family:'Open Sans';font-weight:normal;font-style:normal;font-stretch:normal;font-variant:normal;font-size:16px">rue</tspan></text>
+  <text
+     class="cls-36"
+     id="text40"
+     style="font-size:16px;font-family:'Open Sans';stroke-width:0.2693;fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1;-inkscape-font-specification:'Open Sans';font-weight:normal;font-style:normal;font-stretch:normal;font-variant:normal"
+     x="1023.8579"
+     y="177.91988">Predicted</text>
+  <rect
+     class="cls-29"
+     x="983.1991"
+     y="169.23228"
+     width="8.8034058"
+     height="8.8034058"
+     id="rect40-6"
+     style="fill:#f6dde2;stroke:#f67088;stroke-width:0.355476px;stroke-miterlimit:10" />
+  <rect
+     class="cls-37"
+     x="996.19281"
+     y="169.23228"
+     width="8.8034058"
+     height="8.8034058"
+     id="rect41-2"
+     style="fill:#e3f5c4;stroke:#98c456;stroke-width:0.355476px;stroke-miterlimit:10" />
+  <rect
+     class="cls-38"
+     x="1009.1865"
+     y="169.09492"
+     width="8.8034058"
+     height="8.8034058"
+     id="rect42-9"
+     style="fill:#e6f0f2;stroke:#7bcbe0;stroke-width:0.355476px;stroke-miterlimit:10" />
+  <rect
+     class="cls-32"
+     x="1011.3543"
+     y="94.17308"
+     width="8.8868885"
+     height="16.090654"
+     id="rect43-1"
+     style="fill:#98c456;stroke-width:0.2693" />
+  <rect
+     class="cls-37"
+     x="1023.8579"
+     y="91.746689"
+     width="8.8868885"
+     height="18.517046"
+     id="rect44-2"
+     style="fill:#e3f5c4;stroke:#98c456;stroke-width:0.355476px;stroke-miterlimit:10" />
+  <rect
+     class="cls-33"
+     x="1050.1686"
+     y="101.37413"
+     width="8.8868885"
+     height="8.8868885"
+     id="rect45-7"
+     style="fill:#7bcbe0;stroke-width:0.2693" />
+  <rect
+     class="cls-38"
+     x="1062.6748"
+     y="40.507034"
+     width="8.8868885"
+     height="69.753998"
+     id="rect46-0"
+     style="fill:#e6f0f2;stroke:#7bcbe0;stroke-width:0.355476px;stroke-miterlimit:10" />
+  <polyline
+     class="cls-39"
+     points="1171.67 341.81 1171.67 313.93 1125.23 313.93 1125.23 341.81"
+     id="polyline46"
+     style="fill:none;stroke:#000000;stroke-width:1.32px;stroke-miterlimit:10"
+     transform="matrix(0.26929966,0,0,0.26929966,712.77374,-5.5459008)" />
+  <polygon
+     points="1165.08,339.89 1178.25,339.89 1171.67,351.29 "
+     id="polygon46"
+     transform="matrix(0.26929966,0,0,0.26929966,712.77374,-5.5459008)" />
+  <polygon
+     points="1131.81,339.89 1118.65,339.89 1125.23,351.29 "
+     id="polygon47"
+     transform="matrix(0.26929966,0,0,0.26929966,712.77374,-5.5459008)" />
+  <text
+     class="cls-40"
+     id="text47"
+     style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:16px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.269301;stroke-opacity:1"
+     x="24.371859"
+     y="19.430696"><tspan
+       sodipodi:role="line"
+       id="tspan108"
+       x="24.371859"
+       y="19.430696"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:16px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';stroke-width:0.269301">Chromatin Accessibility</tspan></text>
+  <text
+     class="cls-40"
+     id="text58"
+     style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:16px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.269301;stroke-opacity:1"
+     x="307.91034"
+     y="19.430696">Gene Expression</text>
+  <g
+     id="g127">
+    <text
+       class="cls-41"
+       id="text48"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:13.3333px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.269301;stroke-opacity:1"
+       x="41.895554"
+       y="60.602859">Cell 1</text>
+    <text
+       class="cls-41"
+       id="text49"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:13.3333px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.269301;stroke-opacity:1"
+       x="39.642956"
+       y="104.93105">Cell 2</text>
+    <g
+       id="g123"
+       transform="translate(0,9.2212599)">
+      <path
+         class="cls-42"
+         d="m 176.65826,56.5886 c -1.1209,-1.966504 -1.07764,-5.164025 -2.10808,-8.16491 -1.25463,-3.649825 -3.20934,-4.554416 -4.58982,-3.146406 -2.10415,2.159225 -2.39126,9.140305 -4.15324,9.631923 -2.00978,0.56635 -2.69017,-8.259299 -4.77074,-7.748008 -1.91144,0.471961 -2.48565,8.17671 -4.15324,7.539558 -0.83773,-0.322498 -1.15237,-2.446322 -2.29295,-6.489447 -0.49556,-1.761981 -1.5732,-9.789243 -3.59476,-9.215021 -2.3598,0 -2.98908,14.972925 -4.03525,17.379931 a 2.5918482,2.5918482 0 0 1 -0.96359,1.286092 c -0.98325,0.680412 -1.86424,-0.55455 -2.92615,-2.092365 -0.73548,-1.069769 -0.58996,-1.16023 -0.91245,-1.447341 -0.7866,-0.71187 -1.43949,0.07478 -2.87897,1.769853 -1.34901,1.573203 -2.61544,1.887843 -3.98019,-2.092351 -1.50635,-4.397095 -1.8839,-8.042991 -3.20147,-10.780367 -0.47589,-0.991109 -3.8858,-3.327312 -6.35573,0.483762 -1.74625,2.698045 -2.3598,15.413428 -4.89265,14.965068 -1.96651,-0.353971 -4.11786,-0.07872 -4.66061,-1.451269 -0.84953,-2.135624 -1.1799,-5.32136 -1.41981,-7.720479 -0.36184,-3.559378 -0.21238,-4.888729 -0.54669,-12.554149 -0.18878,-4.275176 -0.3225,-7.232789 -0.50342,-7.240661 -0.18092,-0.0073 -0.29498,3.641968 -0.73154,10.941616 -0.51915,8.6526 -0.87312,11.555153 -1.23496,13.840226 -0.75515,4.790396 -1.01866,5.403948 -1.40016,5.86018 -0.57028,0.676483 -0.90065,0.896719 -1.75411,0.896719 -0.85346,0 -2.09629,-5.242684 -2.80816,-5.742175 -0.71188,-0.49949 -1.46701,-0.495561 -2.25361,0.542749 -0.7866,1.038311 -0.84954,2.517125 -1.76986,2.926154 -0.70007,0.306769 -2.076618,1.030454 -2.831761,0.692213 -0.755128,-0.338242 -2.241799,-0.900663 -2.75309,-2.257543 -0.511291,-1.356895 -1.640062,-8.145253 -2.819964,-8.145253 -0.180921,0 -1.234961,1.608604 -1.502415,1.608604 -0.849517,0 -2.068749,-9.014442 -3.752085,-9.014442 -1.683321,0 -3.311583,8.334031 -3.661625,9.010499 -0.656811,1.266434 -0.932121,-1.431611 -2.143481,-1.305749 -0.283168,0.02746 -0.570279,0.491618 -0.884919,1.219232 l -1.950775,4.872985 v 7.543501 h 29.131745 43.69957 26.96467 V 58.905145 A 5.9348996,5.9348996 0 0 1 176.65826,56.5886 Z"
+         id="path49"
+         style="fill:#e2e2e2;stroke-width:0.269301" />
+      <path
+         class="cls-25"
+         d="m 108.60554,63.038731 v 0 m 7.60643,-0.09435 v -5.923096 c -0.84953,-2.131681 -1.1799,-5.313488 -1.42768,-7.712622 -0.36577,-3.539691 -0.21631,-4.884785 -0.56242,-12.546276 -0.19271,-4.275163 -0.33037,-7.228846 -0.51129,-7.236718 -0.18092,-0.0073 -0.29104,3.641967 -0.71974,10.949473 -0.50735,8.652615 -0.86132,11.563025 -1.21922,13.852041 -0.74727,4.794324 -1.01078,5.411805 -1.3923,5.871965 -0.57028,0.684341 -0.89672,0.904592 -1.75411,0.912464 -0.8574,0.0073 -2.10022,-5.223027 -2.81209,-5.71466 -0.71188,-0.491619 -1.45915,-0.48769 -2.24181,0.546692 -0.78267,1.034382 -0.84561,2.524981 -1.76593,2.941883 v 4.145386"
+         id="path50"
+         style="fill:#f67088;stroke-width:0.269301" />
+      <path
+         class="cls-33"
+         d="m 179.26192,62.991529 v -4.106056 a 5.9348996,5.9348996 0 0 1 -2.60366,-2.316531 c -1.1209,-1.966503 -1.07764,-5.164039 -2.10808,-8.164925 -1.25463,-3.649824 -3.20934,-4.554416 -4.58982,-3.146391 -2.10415,2.159211 -2.39126,9.14029 -4.15324,9.631924 -2.00978,0.56635 -2.69017,-8.259314 -4.77074,-7.74408 -1.91144,0.47196 -2.48565,8.17671 -4.15324,7.535629 V 62.9876 Z"
+         id="path51"
+         style="fill:#7bcbe0;stroke-width:0.269301" />
+      <path
+         class="cls-32"
+         d="m 146.99951,56.313289 a 2.6390441,2.6390441 0 0 1 -0.95966,1.286092 c -0.98718,0.676483 -1.86817,-0.55455 -2.92614,-2.092351 -0.73942,-1.069783 -0.5939,-1.179902 -0.91639,-1.447355 -0.7866,-0.715798 -1.43949,0.07478 -2.87897,1.769853 -1.34901,1.573203 -2.61151,1.887843 -3.98019,-2.092351 -1.50635,-4.397095 -1.8839,-8.042991 -3.19754,-10.780367 -0.47982,-0.991109 -3.88973,-3.327312 -6.35573,0.483762 v 19.570614 h 21.21462 z"
+         id="path52"
+         style="fill:#98c456;stroke-width:0.269301" />
+      <polyline
+         class="cls-15"
+         points="40.46 171.74 40.46 243.23 300.98 243.23"
+         id="polyline52"
+         style="fill:none;stroke:#000000;stroke-width:1.27129;stroke-miterlimit:10;stroke-dasharray:none"
+         transform="matrix(0.39330018,0,0,0.39330018,63.419284,-32.651205)" />
+    </g>
+    <g
+       id="g122"
+       transform="translate(0,8.8190254)">
+      <path
+         class="cls-42"
+         d="m 79.697987,99.532568 c 0.216308,-1.41587 0.530949,-3.63016 0.786601,-6.44619 0.44836,-4.60555 0.424759,-7.0794 0.900649,-7.56316 0.570293,-0.56242 1.376552,2.26542 1.608604,3.08347 1.09337,3.82288 0.810202,7.00861 1.8957,9.10491 a 2.3598011,2.3598011 0 0 0 0.995052,1.11697 c 1.018654,0.59388 1.931102,-0.47984 3.028416,-1.82099 0.763,-0.92819 0.61748,-1.00685 0.94785,-1.25856 0.825931,-0.62141 1.490614,0.0669 2.985156,1.54175 1.392282,1.37655 2.70196,1.64399 4.117842,-1.82099 1.573203,-3.82681 1.966503,-7.00073 3.315523,-9.38414 0.49162,-0.86526 1.87998,5.15617 4.43643,8.47168 1.80918,2.3598 4.57801,5.36462 7.21313,4.97525 2.05302,-0.30677 3.27618,0.0393 3.8386,-1.15631 0.87706,-1.85637 2.75311,-14.24532 3.97627,-14.24532 1.22316,0 2.66657,17.202952 3.933,17.202952 1.26642,0 2.98514,-12.373222 4.53082,-12.373222 1.91537,0 1.89964,3.1464 2.67444,5.23875 1.30182,3.48465 3.8032,1.52207 5.1365,-3.93301 1.77771,-7.27604 2.84749,-15.01225 4.17291,-15.01225 2.11988,0 3.20932,8.76272 3.72456,10.82755 1.1799,4.75107 1.50633,7.25245 2.35979,7.63002 1.72659,0.7512 2.30081,4.87299 4.27517,4.32631 2.15922,-0.6057 2.88682,-3.40992 4.96739,-4.0746 1.82491,-0.57815 3.42564,0.49556 5.60845,-2.03729 1.42769,-1.66366 2.14743,-9.87578 3.4453,-5.5888 1.06586,3.53971 1.02259,7.28392 2.18283,9.59652 a 11.366375,11.366375 0 0 0 2.6941,3.29586 c 0.97933,-0.17305 1.81311,-0.30284 2.4896,-0.39331 1.03044,-0.14546 4.71959,-0.55454 6.92995,-3.53969 V 106.97775 H 79.697987 Z"
+         id="path53"
+         style="fill:#e2e2e2;stroke-width:0.269301" />
+      <path
+         class="cls-33"
+         d="M 178.93155,106.96595 V 95.284928 c -2.20249,2.97334 -5.89951,3.38239 -6.92995,3.53971 -0.67648,0.0983 -1.51029,0.22811 -2.4896,0.3933 a 11.366375,11.366375 0 0 1 -2.6941,-3.29587 c -1.16024,-2.3126 -1.11698,-6.07254 -2.18282,-9.59652 -1.2979,-4.28696 -2.01762,3.93301 -3.44531,5.5888 -1.86424,2.16315 -3.30372,1.69512 -4.82187,1.87604 v 13.175562 z"
+         id="path54"
+         style="fill:#7bcbe0;stroke-width:0.269301" />
+      <path
+         class="cls-32"
+         d="m 146.32303,93.703858 c -1.09731,-2.29687 -1.83671,-5.5298 -2.3598,-7.63002 -0.5231,-2.10022 -1.60466,-10.82755 -3.72456,-10.82755 -1.32542,0 -2.39519,7.73622 -4.1729,15.01227 -1.32937,5.459 -3.83469,7.42156 -5.13258,3.933 -0.78659,-2.08056 -0.763,-5.23877 -2.67837,-5.23877 -0.9046,0 -1.86818,4.25158 -2.75309,7.77163 v 10.225802 h 20.8449 z"
+         id="path55"
+         style="fill:#98c456;stroke-width:0.269301" />
+      <path
+         class="cls-25"
+         d="m 115.799,98.372338 c -0.56241,1.1799 -1.78558,0.84953 -3.8386,1.1563 -2.63512,0.3933 -5.40395,-2.62725 -7.21313,-4.97525 -1.32542,-1.71872 -2.33621,-4.16504 -3.07168,-6.00175 V 106.96595 H 115.799 Z"
+         id="path56"
+         style="fill:#f67088;stroke-width:0.269301" />
+      <polyline
+         class="cls-15"
+         points="41.38 368 41.38 439.69 300.06 439.69"
+         id="polyline56"
+         style="fill:none;stroke:#000000;stroke-width:1.27129;stroke-miterlimit:10;stroke-dasharray:none"
+         transform="matrix(0.39330018,0,0,0.39330018,63.419284,-66.440099)" />
+    </g>
+    <text
+       class="cls-41"
+       id="text79"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:13.3333px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.269301;stroke-opacity:1"
+       x="39.708061"
+       y="148.89543">Cell 3</text>
+    <g
+       id="g121"
+       transform="translate(0,7.6629934)">
+      <path
+         class="cls-42"
+         d="m 178.79389,144.00502 c -0.22026,-1.2153 -0.5349,-3.11888 -0.80626,-5.5416 -0.44838,-3.96446 -0.42478,-6.10009 -0.90067,-6.50519 -0.57422,-0.48376 -1.38048,1.96651 -1.61253,2.65085 -1.09337,3.28799 -0.81413,6.02928 -1.89964,7.83061 a 2.218213,2.218213 0 0 1 -0.99505,0.96358 c -1.02257,0.50736 -1.93504,-0.4169 -3.03628,-1.5732 -0.763,-0.78661 -0.61355,-0.86919 -0.94785,-1.08552 -0.82593,-0.53487 -1.49061,0.0551 -2.98515,1.32542 -1.40014,1.17991 -2.70984,1.41196 -4.12965,-1.57318 -1.5732,-3.29193 -1.9665,-6.02143 -3.31945,-8.07052 -0.49163,-0.74335 -1.87997,4.43248 -4.44036,7.28391 -1.81312,2.02157 -4.58588,4.61734 -7.22492,4.27911 -2.05696,-0.26351 -3.28012,0.0353 -3.84647,-0.99112 -0.87707,-1.60074 -2.75311,-12.25524 -3.98413,-12.25524 -1.23103,0 -2.67051,14.79989 -3.95268,14.79989 -1.28214,0 -2.98907,-10.64663 -4.53474,-10.64663 -1.92324,0 -1.90752,2.7177 -2.68232,4.50721 -1.30182,2.99696 -3.80714,1.30969 -5.14435,-3.38631 -1.77772,-6.2574 -2.85144,-12.90811 -4.17686,-12.90811 -2.12382,0 -3.21719,7.5317 -3.72848,9.30941 -1.1799,4.08639 -1.51027,6.23774 -2.35981,6.56025 -1.7305,0.64501 -2.30473,4.19258 -4.28696,3.71669 -2.15922,-0.52309 -2.88682,-2.93401 -4.97131,-3.50431 -1.82492,-0.49556 -3.429576,0.4287 -5.61633,-1.75412 -1.451284,-1.40408 -2.171011,-8.48348 -3.468903,-4.7904 -1.069784,3.03236 -1.026511,6.26921 -2.186755,8.25932 a 10.355594,10.355594 0 0 1 -2.701974,2.83569 c -0.979309,-0.14946 -1.813112,-0.25958 -2.493524,-0.34611 -1.030438,-0.1259 -4.719593,-0.47589 -6.937806,-3.03235 v 10.0449 h 99.371212 z"
+         id="path61"
+         style="fill:#e2e2e2;stroke-width:0.269301" />
+      <path
+         class="cls-25"
+         d="m 101.47894,139.13203 a 5.9388326,5.9388326 0 0 1 1.34115,0.18091 c 2.0845,0.56637 2.81211,2.98123 4.97132,3.50038 1.9665,0.47589 2.55645,-3.06775 4.28698,-3.71276 0.86919,-0.32644 1.1799,-2.47385 2.3598,-6.56025 0.23598,-0.82593 0.60174,-2.89468 1.13663,-4.87298 v 22.5243 h -14.15881 z"
+         id="path62"
+         style="fill:#f67088;stroke-width:0.269301" />
+      <path
+         class="cls-32"
+         d="m 146.14998,143.96962 c -1.43555,-0.059 -3.17786,-0.25564 -3.50037,-0.84166 -0.87707,-1.59681 -2.75311,-12.2513 -3.98413,-12.2513 -1.23103,0 -2.67051,14.79595 -3.95268,14.79595 -1.28214,0 -2.98907,-10.6427 -4.53474,-10.6427 -1.92324,0 -1.90752,2.7177 -2.68232,4.50328 -0.57027,1.31757 -1.3726,1.73053 -2.21034,1.38443 v 9.08523 h 20.86458 z"
+         id="path63"
+         style="fill:#98c456;stroke-width:0.269301" />
+      <path
+         class="cls-33"
+         d="m 156.22239,135.78505 c 1.01865,-2.06484 1.64793,-3.79536 1.94684,-3.34306 1.36868,2.0491 1.75804,4.7786 3.31944,8.07053 1.41983,2.97728 2.7295,2.7531 4.12966,1.5732 1.49454,-1.27036 2.15922,-1.86032 2.98515,-1.32542 0.3343,0.2163 0.18485,0.28316 0.94785,1.08549 1.10125,1.14845 2.0137,2.0727 3.03628,1.57321 a 2.218213,2.218213 0 0 0 0.99505,-0.96358 c 1.08551,-1.80131 0.80626,-4.54262 1.89964,-7.83061 0.23204,-0.704 1.03831,-3.14641 1.61254,-2.65085 0.47589,0.3933 0.45228,2.54073 0.90064,6.50519 0.27139,2.42272 0.58603,4.32631 0.80628,5.5416 v 6.399 h -22.57937 z"
+         id="path64"
+         style="fill:#7bcbe0;stroke-width:0.269301" />
+      <polyline
+         class="cls-15"
+         points="41.38 564.43 41.45 635.95 300.52 635.95"
+         id="polyline64"
+         style="fill:none;stroke:#000000;stroke-width:1.27129;stroke-miterlimit:10;stroke-dasharray:none"
+         transform="matrix(0.39330018,0,0,0.39330018,63.328825,-99.947281)" />
+    </g>
+    <text
+       class="cls-1"
+       id="text80"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:13.3333px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';letter-spacing:-0.06em;fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1"
+       x="104.33425"
+       y="173.57886">A</text>
+    <text
+       class="cls-2"
+       id="text81"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:13.3333px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1"
+       x="133.52109"
+       y="173.57886">B</text>
+    <text
+       class="cls-2"
+       id="text82"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:13.3333px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1"
+       x="162.42471"
+       y="173.57886">C</text>
+    <text
+       class="cls-2"
+       id="text83"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:13.3333px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1"
+       x="113.95412"
+       y="191.8134">Gene</text>
+  </g>
+  <polyline
+     class="cls-39"
+     points="1044.74 269.58 1044.74 199.35 998.3 199.35 998.3 213.24"
+     id="polyline84"
+     style="fill:none;stroke:#000000;stroke-width:1.32px;stroke-miterlimit:10"
+     transform="matrix(0.26929966,0,0,0.26929966,712.77374,-5.5459008)" />
+  <polygon
+     points="1038.15,267.65 1051.32,267.65 1044.74,279.05 "
+     id="polygon85"
+     transform="matrix(0.26929966,0,0,0.26929966,712.77374,-5.5459008)" />
+  <polygon
+     points="1004.88,211.31 991.72,211.31 998.3,222.71 "
+     id="polygon86"
+     transform="matrix(0.26929966,0,0,0.26929966,712.77374,-5.5459008)" />
+  <polyline
+     class="cls-39"
+     points="1315.8 153.18 1315.8 138.3 1269.37 138.3 1269.37 381.16"
+     id="polyline86"
+     style="fill:none;stroke:#000000;stroke-width:1.32px;stroke-miterlimit:10"
+     transform="matrix(0.26929966,0,0,0.26929966,712.77374,-5.5459008)" />
+  <polygon
+     points="1309.22,151.25 1322.38,151.25 1315.8,162.66 "
+     id="polygon87"
+     transform="matrix(0.26929966,0,0,0.26929966,712.77374,-5.5459008)" />
+  <polygon
+     points="1275.95,379.23 1262.78,379.23 1269.37,390.63 "
+     id="polygon88"
+     transform="matrix(0.26929966,0,0,0.26929966,712.77374,-5.5459008)" />
+  <text
+     class="cls-46"
+     id="text89"
+     style="font-weight:normal;font-size:16px;font-family:'Open Sans';stroke-width:0.2693;fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1;-inkscape-font-specification:'Open Sans';font-style:normal;font-stretch:normal;font-variant:normal"
+     x="801.19019"
+     y="9.8041849"><tspan
+       class="cls-47"
+       id="tspan88"
+       style="stroke-width:0.2693;fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1;-inkscape-font-specification:'Open Sans';font-family:'Open Sans';font-weight:normal;font-style:normal;font-stretch:normal;font-variant:normal;font-size:16px">T</tspan><tspan
+       x="807.54834"
+       y="9.8041849"
+       id="tspan89"
+       style="stroke-width:0.2693;fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1;-inkscape-font-specification:'Open Sans';font-family:'Open Sans';font-weight:normal;font-style:normal;font-stretch:normal;font-variant:normal;font-size:16px">ask</tspan></text>
+  <text
+     class="cls-46"
+     id="text90"
+     style="font-weight:normal;font-size:16px;font-family:'Open Sans';stroke-width:0.2693;fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1;-inkscape-font-specification:'Open Sans';font-style:normal;font-stretch:normal;font-variant:normal"
+     x="1008.1039"
+     y="9.9576883">Metric</text>
+  <text
+     class="cls-48"
+     id="text91"
+     style="font-style:normal;font-size:16px;font-family:'Open Sans';stroke-width:0.2693;fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1;-inkscape-font-specification:'Open Sans';font-weight:normal;font-stretch:normal;font-variant:normal"
+     x="987.86871"
+     y="42.47023">Root mean square error</text>
+  <g
+     id="g128">
+    <g
+       id="g124"
+       transform="translate(0,7.0498142)">
+      <rect
+         class="cls-3"
+         x="339.91028"
+         y="55.487766"
+         width="11.555159"
+         height="7.0400724"
+         id="rect3"
+         style="fill:#f67088;fill-opacity:1;stroke:#f67088;stroke-width:0.5;stroke-dasharray:none;stroke-opacity:1" />
+      <rect
+         class="cls-7"
+         x="365.11563"
+         y="37.574928"
+         width="11.555145"
+         height="24.956863"
+         id="rect6"
+         style="fill:#98c456;fill-opacity:1;stroke:#98c456;stroke-width:0.5;stroke-dasharray:none;stroke-opacity:1" />
+      <rect
+         class="cls-11"
+         x="390.32098"
+         y="42.921822"
+         width="11.555159"
+         height="19.606014"
+         id="rect10"
+         style="fill:#7bcbe0;fill-opacity:1;stroke:#7bcbe0;stroke-width:0.5;stroke-dasharray:none;stroke-opacity:1" />
+      <polyline
+         class="cls-15"
+         points="475.15 171.74 475.15 243.45 683.93 243.45"
+         id="polyline15"
+         style="fill:none;stroke:#000000;stroke-width:1.2713;stroke-miterlimit:10;stroke-dasharray:none"
+         transform="matrix(0.39330018,0,0,0.39330018,143.14259,-33.221074)" />
+    </g>
+    <g
+       id="g125"
+       transform="translate(0,10.395037)">
+      <rect
+         class="cls-3"
+         x="339.91028"
+         y="77.377754"
+         width="11.555145"
+         height="25.771217"
+         id="rect20-6"
+         style="fill:#f67088;fill-opacity:1;stroke:#f67088;stroke-width:0.5;stroke-dasharray:none;stroke-opacity:1" />
+      <rect
+         class="cls-7"
+         x="365.11563"
+         y="87.062996"
+         width="11.555159"
+         height="16.089911"
+         id="rect23-7"
+         style="fill:#98c456;fill-opacity:1;stroke:#98c456;stroke-width:0.5;stroke-dasharray:none;stroke-opacity:1" />
+      <rect
+         class="cls-11"
+         x="390.32098"
+         y="77.627754"
+         width="11.555159"
+         height="25.521248"
+         id="rect27-5"
+         style="fill:#7bcbe0;fill-opacity:1;stroke:#7bcbe0;stroke-width:0.5;stroke-dasharray:none;stroke-opacity:1" />
+      <polyline
+         class="cls-15"
+         points="479.63 369.12 479.63 438.99 688.42 438.99"
+         id="polyline32"
+         style="fill:none;stroke:#000000;stroke-width:1.2713;stroke-miterlimit:10;stroke-dasharray:none"
+         transform="matrix(0.39330018,0,0,0.39330018,141.37864,-69.505854)" />
+    </g>
+    <text
+       class="cls-1"
+       id="text64"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:13.3333px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';letter-spacing:-0.06em;fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1"
+       x="341.46912"
+       y="173.57886">A</text>
+    <text
+       class="cls-2"
+       id="text65"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:13.3333px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1"
+       x="364.54218"
+       y="173.57886">B</text>
+    <text
+       class="cls-2"
+       id="text66"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:13.3333px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1"
+       x="391.68451"
+       y="173.57886">C</text>
+    <text
+       class="cls-2"
+       id="text84"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:13.3333px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';fill:#000000;fill-opacity:1;stroke:none;stroke-opacity:1"
+       x="354.4664"
+       y="191.8134">Gene</text>
+    <g
+       id="g126"
+       transform="translate(0,7.9875844)">
+      <rect
+         class="cls-3"
+         x="339.91028"
+         y="127.79685"
+         width="11.555159"
+         height="21.725903"
+         id="rect66-9"
+         style="fill:#f6dde2;fill-opacity:1;stroke:#f67088;stroke-width:0.5;stroke-dasharray:none;stroke-opacity:1" />
+      <rect
+         class="cls-7"
+         x="363.3707"
+         y="133.43286"
+         width="11.555159"
+         height="16.089911"
+         id="rect69-3"
+         style="fill:#e3f5c4;fill-opacity:1;stroke:#98c456;stroke-width:0.5;stroke-dasharray:none;stroke-opacity:1" />
+      <rect
+         class="cls-11"
+         x="390.32098"
+         y="140.12682"
+         width="11.555159"
+         height="9.3959408"
+         id="rect73-6"
+         style="fill:#e6f0f2;fill-opacity:1;stroke:#7bcbe0;stroke-width:0.5;stroke-dasharray:none;stroke-opacity:1" />
+      <polyline
+         class="cls-15"
+         points="484.11 566.12 484.11 635.99 692.9 635.99"
+         id="polyline78"
+         style="fill:none;stroke:#000000;stroke-width:1.2713;stroke-miterlimit:10;stroke-dasharray:none"
+         transform="matrix(0.39330018,0,0,0.39330018,137.87174,-100.6122)" />
+    </g>
+  </g>
+  <g
+     id="g120"
+     transform="translate(0,44.635368)">
+    <line
+       class="cls-1"
+       x1="211.0421"
+       y1="99.260056"
+       x2="294.86209"
+       y2="99.260056"
+       id="line106-7"
+       style="fill:none;stroke:#211f1f;stroke-width:3px;stroke-miterlimit:10" />
+    <polygon
+       class="cls-10"
+       points="327.96,114.36 340.92,106.88 327.96,99.4 "
+       id="polygon106-5"
+       style="fill:#211f1f"
+       transform="translate(-35.287906,-7.6199436)" />
+  </g>
+  <g
+     id="g130"
+     transform="translate(7.5878747,1.3412035)">
+    <rect
+       class="cls-33"
+       x="479.35452"
+       y="143.25"
+       width="11.445098"
+       height="11.444129"
+       id="rect39-5-6"
+       style="fill:#9b9b9b;fill-opacity:1;stroke:#9b9b9b;stroke-width:0.5;stroke-dasharray:none;stroke-opacity:1" />
+    <text
+       class="cls-34"
+       id="text39-1"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:13.3333px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';letter-spacing:-0.04em;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.2693;stroke-opacity:1"
+       x="499.7879"
+       y="153.97206"><tspan
+         sodipodi:role="line"
+         id="tspan119"
+         x="499.7879"
+         y="153.97206">Ground-truth</tspan></text>
+    <text
+       class="cls-36"
+       id="text40-7"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:13.3333px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.2693;stroke-opacity:1"
+       x="499.29312"
+       y="175.97205">Predicted</text>
+    <rect
+       class="cls-38"
+       x="479.35452"
+       y="165.25"
+       width="11.445098"
+       height="11.444129"
+       id="rect42-9-0"
+       style="fill:#e2e2e2;fill-opacity:1;stroke:#9b9b9b;stroke-width:0.5;stroke-miterlimit:10;stroke-dasharray:none;stroke-opacity:1" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:16px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.5;stroke-dasharray:none;stroke-opacity:1"
+       x="469.02863"
+       y="128.25851"
+       id="text120-5-8"><tspan
+         sodipodi:role="line"
+         id="tspan120-3-3"
+         x="469.02863"
+         y="128.25851">Value Type</tspan></text>
+  </g>
+  <g
+     id="g129"
+     transform="translate(7.5878747,14.401431)">
+    <rect
+       class="cls-32"
+       x="479.35452"
+       y="53.852859"
+       width="11.445098"
+       height="11.444129"
+       id="rect38-3-2-5"
+       style="fill:#98c456;stroke-width:0.5;stroke-dasharray:none" />
+    <rect
+       class="cls-33"
+       x="479.35452"
+       y="75.602859"
+       width="11.445098"
+       height="11.444129"
+       id="rect39-5-6-6"
+       style="fill:#7bcbe0;stroke-width:0.5;stroke-dasharray:none" />
+    <rect
+       class="cls-31"
+       x="479.35452"
+       y="31.852863"
+       width="11.445098"
+       height="11.444129"
+       id="rect37-6-1"
+       style="fill:#f67088;stroke:#f67088;stroke-width:0.5;stroke-miterlimit:10;stroke-dasharray:none" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:13.3333px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.5;stroke-dasharray:none;stroke-opacity:1"
+       x="499.7879"
+       y="42.337284"
+       id="text120"><tspan
+         sodipodi:role="line"
+         id="tspan120"
+         x="499.7879"
+         y="42.337284">Gene A</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:16px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.5;stroke-dasharray:none;stroke-opacity:1"
+       x="468.05206"
+       y="20.311249"
+       id="text120-5"><tspan
+         sodipodi:role="line"
+         id="tspan120-3"
+         x="468.05206"
+         y="20.311249">Genes</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:13.3333px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.5;stroke-dasharray:none;stroke-opacity:1"
+       x="499.7879"
+       y="64.33728"
+       id="text120-4"><tspan
+         sodipodi:role="line"
+         id="tspan120-1"
+         x="499.7879"
+         y="64.33728">Gene B</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:13.3333px;font-family:'Open Sans';-inkscape-font-specification:'Open Sans';fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.5;stroke-dasharray:none;stroke-opacity:1"
+       x="499.7879"
+       y="86.08728"
+       id="text120-4-5"><tspan
+         sodipodi:role="line"
+         id="tspan120-1-9"
+         x="499.7879"
+         y="86.08728">Gene C</tspan></text>
+  </g>
+</svg>
diff --git a/src/tasks/predict_modality/control_methods/meanpergene/config.vsh.yaml b/src/tasks/predict_modality/control_methods/meanpergene/config.vsh.yaml
new file mode 100644
index 0000000000..9521b90508
--- /dev/null
+++ b/src/tasks/predict_modality/control_methods/meanpergene/config.vsh.yaml
@@ -0,0 +1,17 @@
+__merge__: ../../api/comp_control_method.yaml
+functionality:
+  name: mean_per_gene
+  info:
+    label: Mean per gene
+    summary: Returns the mean expression value per gene.
+    description: Returns the mean expression value per gene.
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [midtime, lowmem, lowcpu]
+  
\ No newline at end of file
diff --git a/src/tasks/predict_modality/control_methods/meanpergene/script.py b/src/tasks/predict_modality/control_methods/meanpergene/script.py
new file mode 100644
index 0000000000..043f19d42a
--- /dev/null
+++ b/src/tasks/predict_modality/control_methods/meanpergene/script.py
@@ -0,0 +1,37 @@
+import anndata as ad
+from scipy.sparse import csc_matrix
+import numpy as np
+
+# VIASH START
+par = {
+    "input_train_mod1": "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/train_mod1.h5ad",
+    "input_test_mod1": "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/test_mod1.h5ad",
+    "input_train_mod2": "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/train_mod2.h5ad",
+    "output": "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/prediction.h5ad",
+}
+
+meta = {
+    "functionality_name": "foo"
+}
+# VIASH END
+
+input_test_mod1 = ad.read_h5ad(par["input_test_mod1"])
+input_train_mod2 = ad.read_h5ad(par["input_train_mod2"])
+
+
+# Find the correct shape
+mean = np.array(input_train_mod2.layers["normalized"].mean(axis=0)).flatten()
+prediction = csc_matrix(np.tile(mean, (input_test_mod1.shape[0], 1)))
+
+# Write out prediction
+out = ad.AnnData(
+    layers={"normalized": prediction},
+    shape=prediction.shape,
+    obs=input_test_mod1.obs,
+    var=input_train_mod2.var,
+    uns={
+        "dataset_id": input_test_mod1.uns["dataset_id"],
+        "method_id": meta["functionality_name"],
+    }
+)
+out.write_h5ad(par["output"], compression="gzip")
diff --git a/src/tasks/predict_modality/control_methods/random_predict/config.vsh.yaml b/src/tasks/predict_modality/control_methods/random_predict/config.vsh.yaml
new file mode 100644
index 0000000000..3324c53a91
--- /dev/null
+++ b/src/tasks/predict_modality/control_methods/random_predict/config.vsh.yaml
@@ -0,0 +1,16 @@
+__merge__: ../../api/comp_control_method.yaml
+functionality:
+  name: random_predict
+  info:
+    label: Random predictions
+    summary: Returns random training profiles.
+    description: Returns random training profiles.
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+  - type: nextflow
+    directives:
+      label: [midtime, lowmem, lowcpu]
diff --git a/src/tasks/predict_modality/control_methods/random_predict/script.R b/src/tasks/predict_modality/control_methods/random_predict/script.R
new file mode 100644
index 0000000000..ab96dcc26a
--- /dev/null
+++ b/src/tasks/predict_modality/control_methods/random_predict/script.R
@@ -0,0 +1,34 @@
+cat("Loading dependencies\n")
+requireNamespace("anndata", quietly = TRUE)
+library(Matrix, warn.conflicts = FALSE, quietly = TRUE)
+
+## VIASH START
+par <- list(
+  input_train_mod1 = "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/train_mod1.h5ad",
+  input_test_mod1 = "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/test_mod1.h5ad",
+  input_train_mod2 = "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/train_mod2.h5ad",
+  output = "output.h5ad"
+)
+meta <- list(functionality_name = "foo")
+## VIASH END
+
+cat("Reading h5ad files\n")
+input_train_mod2 <- anndata::read_h5ad(par$input_train_mod2)
+input_test_mod1 <- anndata::read_h5ad(par$input_test_mod1)
+
+cat("Creating outputs object\n")
+sample_ix <- sample.int(nrow(input_train_mod2), nrow(input_test_mod1), replace = TRUE)
+prediction <- input_train_mod2$layers[["normalized"]][sample_ix, , drop = FALSE]
+rownames(prediction) <- rownames(input_test_mod1)
+
+out <- anndata::AnnData(
+  layers = list(normalized = prediction),
+  shape = dim(prediction),
+  uns = list(
+    dataset_id = input_train_mod2$uns[["dataset_id"]],
+    method_id = meta[["functionality_name"]]
+  )
+)
+
+cat("Writing predictions to file\n")
+zzz <- out$write_h5ad(par$output, compression = "gzip")
diff --git a/src/tasks/predict_modality/control_methods/solution/config.vsh.yaml b/src/tasks/predict_modality/control_methods/solution/config.vsh.yaml
new file mode 100644
index 0000000000..350b0e79ea
--- /dev/null
+++ b/src/tasks/predict_modality/control_methods/solution/config.vsh.yaml
@@ -0,0 +1,16 @@
+__merge__: ../../api/comp_control_method.yaml
+functionality:
+  name: solution
+  info:
+    label: Solution
+    summary: Returns the ground-truth solution.
+    description: Returns the ground-truth solution.
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+  - type: nextflow
+    directives:
+      label: [midtime, lowmem, lowcpu]
diff --git a/src/tasks/predict_modality/control_methods/solution/script.R b/src/tasks/predict_modality/control_methods/solution/script.R
new file mode 100644
index 0000000000..ae7c288e29
--- /dev/null
+++ b/src/tasks/predict_modality/control_methods/solution/script.R
@@ -0,0 +1,20 @@
+cat("Loading dependencies\n")
+requireNamespace("anndata", quietly = TRUE)
+
+## VIASH START
+par <- list(
+  input_test_mod2 = "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/test_mod2.h5ad",
+  output = "output.h5ad"
+)
+
+meta <- list(
+  functionality_name = "foo"
+)
+## VIASH END
+
+cat("Reading h5ad files\n")
+ad2_test <- anndata::read_h5ad(par$input_test_mod2)
+ad2_test$uns[["method_id"]] <- meta$functionality_name
+
+cat("Writing predictions to file\n")
+zzz <- ad2_test$write_h5ad(par$output, compression = "gzip")
diff --git a/src/tasks/predict_modality/control_methods/zeros/config.vsh.yaml b/src/tasks/predict_modality/control_methods/zeros/config.vsh.yaml
new file mode 100644
index 0000000000..344df9c338
--- /dev/null
+++ b/src/tasks/predict_modality/control_methods/zeros/config.vsh.yaml
@@ -0,0 +1,16 @@
+__merge__: ../../api/comp_control_method.yaml
+functionality:
+  name: zeros
+  info:
+    label: Zeros
+    summary: Returns a prediction consisting of all zeros.
+    description: Returns a prediction consisting of all zeros.
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [midtime, lowmem, lowcpu]
diff --git a/src/tasks/predict_modality/control_methods/zeros/script.py b/src/tasks/predict_modality/control_methods/zeros/script.py
new file mode 100644
index 0000000000..600b5c696c
--- /dev/null
+++ b/src/tasks/predict_modality/control_methods/zeros/script.py
@@ -0,0 +1,37 @@
+import anndata
+from scipy.sparse import csc_matrix
+import numpy as np
+
+# VIASH START
+par = {
+    "input_train_mod1": "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/train_mod1.h5ad",
+    "input_test_mod1": "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/test_mod1.h5ad",
+    "input_train_mod2": "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/train_mod2.h5ad",
+    "output": "output.h5ad",
+}
+
+meta = {
+    "functionality_name": "foo"
+}
+# VIASH END
+
+print("Reading h5ad files", flush=True)
+ad_mod1_test = anndata.read_h5ad(par["input_test_mod1"])
+ad_mod2 = anndata.read_h5ad(par["input_train_mod2"])
+
+print("create output objects", flush=True)
+prediction = csc_matrix((ad_mod1_test.n_obs, ad_mod2.n_vars), dtype = np.float32)
+
+out = anndata.AnnData(
+    layers={"normalized": prediction},
+    shape=prediction.shape,
+    obs=ad_mod1_test.obs,
+    var=ad_mod2.var,
+    uns={
+        "dataset_id": ad_mod2.uns["dataset_id"],
+        "method_id": meta["functionality_name"],
+    }
+)
+
+print("write predictions to file", flush=True)
+out.write_h5ad(par["output"], compression="gzip")
diff --git a/src/tasks/predict_modality/methods/guanlab_dengkw_pm/config.vsh.yaml b/src/tasks/predict_modality/methods/guanlab_dengkw_pm/config.vsh.yaml
new file mode 100644
index 0000000000..8663123ad9
--- /dev/null
+++ b/src/tasks/predict_modality/methods/guanlab_dengkw_pm/config.vsh.yaml
@@ -0,0 +1,43 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: guanlab_dengkw_pm
+  info:
+    label: Guanlab-dengkw
+    summary: A kernel ridge regression method with RBF kernel. 
+    description: |
+      This is a solution developed by Team Guanlab - dengkw in the Neurips 2021 competition to predict one modality
+      from another using kernel ridge regression (KRR) with RBF kernel. Truncated SVD is applied on the combined
+      training and test data from modality 1 followed by row-wise z-score normalization on the reduced matrix. The
+      truncated SVD of modality 2 is predicted by training a KRR model on the normalized training matrix of modality 1.
+      Predictions on the normalized test matrix are then re-mapped to the modality 2 feature space via the right
+      singular vectors. 
+    preferred_normalization: log_cp10k
+    reference: lance2022multimodal
+    documentation_url: https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/Guanlab-dengkw
+    repository_url: https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/Guanlab-dengkw
+    competition_submission_id: 170636
+  arguments:
+    - name: "--distance_method"
+      type: "string"
+      default: "minkowski"
+      description: The distance metric to use. Possible values include `euclidean` and `minkowski`.
+      choices: [euclidean, minkowski]
+    - name: "--n_pcs"
+      type: "integer"
+      default: 50
+      description: Number of components to use for dimensionality reduction.
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: 
+          - scikit-learn
+          - pandas
+          - numpy
+  - type: nextflow
+    directives:
+      label: [hightime, highmem, highcpu]
diff --git a/src/tasks/predict_modality/methods/guanlab_dengkw_pm/script.py b/src/tasks/predict_modality/methods/guanlab_dengkw_pm/script.py
new file mode 100644
index 0000000000..aafd2948c8
--- /dev/null
+++ b/src/tasks/predict_modality/methods/guanlab_dengkw_pm/script.py
@@ -0,0 +1,136 @@
+import anndata as ad
+import numpy as np
+from scipy.sparse import csc_matrix
+from sklearn.decomposition import TruncatedSVD
+from sklearn.gaussian_process.kernels import RBF
+from sklearn.kernel_ridge import KernelRidge
+
+## VIASH START
+par = {
+    'input_train_mod1': 'resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/normal/train_mod1.h5ad',
+    'input_train_mod2': 'resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/normal/train_mod2.h5ad',
+    'input_test_mod1': 'resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/normal/test_mod1.h5ad',
+    'output': 'output.h5ad', 
+    'distance_method': 'minkowski', 
+    'n_pcs': 50
+}
+meta = {
+    'functionality_name': 'guanlab_dengkw_pm'
+}
+## VIASH END
+
+
+## Removed PCA and normalization steps, as they arr already performed with the input data
+print('Reading input files', flush=True)
+input_train_mod1 = ad.read_h5ad(par['input_train_mod1'])
+input_train_mod2 = ad.read_h5ad(par['input_train_mod2'])
+input_test_mod1 = ad.read_h5ad(par['input_test_mod1'])
+
+batches = input_train_mod1.obs.batch.unique().tolist()
+batch_len = len(batches)
+
+# combine the train and test data
+input_train = ad.concat(
+    {"train": input_train_mod1, "test": input_test_mod1},
+    axis=0,
+    join="outer",
+    label="group",
+    fill_value=0,
+    index_unique="-"
+)
+
+print('Determine parameters by the modalities', flush=True)
+mod1_type = input_train_mod1.uns["modality"].upper()
+mod2_type = input_train_mod2.uns["modality"].upper()
+n_comp_dict = {
+    ("GEX", "ADT"): (300, 70, 10, 0.2),
+    ("ADT", "GEX"): (None, 50, 10, 0.2),
+    ("GEX", "ATAC"): (1000, 50, 10, 0.1),
+    ("ATAC", "GEX"): (100, 70, 10, 0.1)
+}
+print(f"{mod1_type}, {mod2_type}", flush=True)
+n_mod1, n_mod2, scale, alpha = n_comp_dict[(mod1_type, mod2_type)]
+print(f"{n_mod1}, {n_mod2}, {scale}, {alpha}", flush=True)
+
+# Perform PCA on the input data
+print('Models using the Truncated SVD to reduce the dimension', flush=True)
+
+if n_mod1 is not None and n_mod1 < input_train.n_vars:
+    embedder_mod1 = TruncatedSVD(n_components=n_mod1)
+    mod1_pca = embedder_mod1.fit_transform(input_train.layers["normalized"]).astype(np.float32)
+    train_matrix = mod1_pca[input_train.obs['group'] == 'train']
+    test_matrix = mod1_pca[input_train.obs['group'] == 'test']
+else:
+    train_matrix = input_train_mod1.to_df(layer="normalized").values.astype(np.float32)
+    test_matrix = input_test_mod1.to_df(layer="normalized").values.astype(np.float32)
+  
+if n_mod2 is not None and n_mod2 < input_train_mod2.n_vars:
+    embedder_mod2 = TruncatedSVD(n_components=n_mod2)
+    train_gs = embedder_mod2.fit_transform(input_train_mod2.layers["normalized"]).astype(np.float32)
+else:
+    train_gs = input_train_mod2.to_df(layer="normalized").values.astype(np.float32)
+
+del input_train
+
+print('Running normalization ...', flush=True)
+train_sd = np.std(train_matrix, axis=1).reshape(-1, 1)
+train_sd[train_sd == 0] = 1
+train_norm = (train_matrix - np.mean(train_matrix, axis=1).reshape(-1, 1)) / train_sd
+train_norm = train_norm.astype(np.float32)
+del train_matrix
+
+test_sd = np.std(test_matrix, axis=1).reshape(-1, 1)
+test_sd[test_sd == 0] = 1
+test_norm = (test_matrix - np.mean(test_matrix, axis=1).reshape(-1, 1)) / test_sd
+test_norm = test_norm.astype(np.float32)
+del test_matrix
+
+print('Running KRR model ...', flush=True)
+if batch_len == 1:
+    # just in case there is only one batch
+    batch_subsets = [batches]
+elif mod1_type == "ADT" or mod2_type == "ADT":
+    # two fold consensus predictions
+    batch_subsets = [
+        batches[:batch_len//2],
+        batches[batch_len//2:]
+    ]
+else:
+    # leave-one-batch-out consensus predictions
+    batch_subsets = [
+        batches[:i] + batches[i+1:]
+        for i in range(batch_len)
+    ]
+
+y_pred = np.zeros((input_test_mod1.n_obs, input_train_mod2.n_vars), dtype=np.float32)
+for batch in batch_subsets:
+    print(batch, flush=True)
+    kernel = RBF(length_scale = scale)
+    krr = KernelRidge(alpha=alpha, kernel=kernel)
+    print('Fitting KRR ... ', flush=True)
+    krr.fit(
+        train_norm[input_train_mod1.obs.batch.isin(batch)], 
+        train_gs[input_train_mod2.obs.batch.isin(batch)]
+    )
+    y_pred += (krr.predict(test_norm) @ embedder_mod2.components_)
+
+np.clip(y_pred, a_min=0, a_max=None, out=y_pred)
+y_pred /= len(batch_subsets)
+
+# Store as sparse matrix to be efficient. 
+# Note that this might require different classifiers/embedders before-hand. 
+# Not every class is able to support such data structures.
+## Changed from csr to csc matrix as this is more supported.
+y_pred = csc_matrix(y_pred)
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  layers = { 'normalized': y_pred },
+  obs = input_test_mod1.obs[[]],
+  var = input_train_mod2.var[[]],
+  uns = {
+    'dataset_id': input_train_mod1.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/predict_modality/methods/knnr_py/config.vsh.yaml b/src/tasks/predict_modality/methods/knnr_py/config.vsh.yaml
new file mode 100644
index 0000000000..543ee71fa1
--- /dev/null
+++ b/src/tasks/predict_modality/methods/knnr_py/config.vsh.yaml
@@ -0,0 +1,33 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: knnr_py
+  info:
+    label: KNNR (Py)
+    summary: K-nearest neighbor regression in Python.
+    description: K-nearest neighbor regression in Python.
+    reference: fix1989discriminatory
+    documentation_url: https://scikit-learn.org/stable/modules/neighbors.html
+    repository_url: https://github.com/scikit-learn/scikit-learn
+    preferred_normalization: log_cp10k
+  arguments:
+    - name: "--distance_method"
+      type: "string"
+      default: "minkowski"
+      description: The distance metric to use. Possible values include `euclidean` and `minkowski`.
+    - name: "--n_pcs"
+      type: "integer"
+      default: 50
+      description: Number of components to use for dimensionality reduction.
+    - name: "--n_neighbors"
+      type: "integer"
+      default: 100
+      description: Number of neighbors to use.
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [hightime, lowmem, lowcpu]
diff --git a/src/tasks/predict_modality/methods/knnr_py/script.py b/src/tasks/predict_modality/methods/knnr_py/script.py
new file mode 100644
index 0000000000..f08c335ffe
--- /dev/null
+++ b/src/tasks/predict_modality/methods/knnr_py/script.py
@@ -0,0 +1,67 @@
+import anndata as ad
+from scipy.sparse import csc_matrix
+from sklearn.decomposition import TruncatedSVD
+from sklearn.neighbors import KNeighborsRegressor
+
+## VIASH START
+par = {
+    'input_train_mod1': 'resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/train_mod1.h5ad',
+    'input_train_mod2': 'resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/train_mod2.h5ad',
+    'input_test_mod1': 'resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/test_mod1.h5ad',
+    'distance_method': 'minkowski',
+    'output': 'output.h5ad',
+    'n_pcs': 4,
+    'n_neighbors': 5,
+}
+meta = { 'functionality_name': 'foo' }
+## VIASH END
+
+print('Reading `h5ad` files...', flush=True)
+input_train_mod1 = ad.read_h5ad(par['input_train_mod1'])
+input_train_mod2 = ad.read_h5ad(par['input_train_mod2'])
+input_test_mod1 = ad.read_h5ad(par['input_test_mod1'])
+
+input_train = ad.concat(
+    {"train": input_train_mod1, "test": input_test_mod1},
+    axis=0,
+    join="outer",
+    label="group",
+    fill_value=0,
+    index_unique="-"
+)
+
+print('Performing dimensionality reduction on modality 1 values...', flush=True)
+embedder = TruncatedSVD(n_components=par['n_pcs'])
+X = embedder.fit_transform(input_train.layers["normalized"])
+
+# split dimred back up
+X_train = X[input_train.obs['group'] == 'train']
+X_test = X[input_train.obs['group'] == 'test']
+y_train = input_train_mod2.layers["normalized"].toarray()
+
+assert len(X_train) + len(X_test) == len(X)
+
+print('Running KNN regression...', flush=True)
+
+reg = KNeighborsRegressor(
+    n_neighbors=par['n_neighbors'],
+    metric=par['distance_method']
+)
+
+reg.fit(X_train, y_train)
+y_pred = reg.predict(X_test)
+
+y_pred = csc_matrix(y_pred)
+
+adata = ad.AnnData(
+    layers={"normalized": y_pred},
+    obs=input_test_mod1.obs,
+    var=input_train_mod2.var,
+    uns={
+        'dataset_id': input_train_mod1.uns['dataset_id'],
+        'method_id': meta["functionality_name"],
+    },
+)
+
+print('Storing annotated data...', flush=True)
+adata.write_h5ad(par['output'], compression = "gzip")
diff --git a/src/tasks/predict_modality/methods/knnr_r/config.vsh.yaml b/src/tasks/predict_modality/methods/knnr_r/config.vsh.yaml
new file mode 100644
index 0000000000..448b3ca0b8
--- /dev/null
+++ b/src/tasks/predict_modality/methods/knnr_r/config.vsh.yaml
@@ -0,0 +1,36 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: knnr_r
+  info:
+    label: KNNR (R)
+    summary: K-nearest neighbor regression in R.
+    description: K-nearest neighbor regression in R.
+    reference: fix1989discriminatory
+    documentation_url: https://cran.r-project.org/package=FNN
+    repository_url: https://github.com/cran/FNN
+    preferred_normalization: log_cp10k
+  arguments:
+    - name: "--distance_method"
+      type: "string"
+      default: "spearman"
+      description: The distance method to use. Possible values are euclidean, pearson, spearman and others.
+    - name: "--n_pcs"
+      type: "integer"
+      default: 50
+      description: Number of principal components to use.
+    - name: "--n_neighbors"
+      type: "integer"
+      default: 20
+      description: Number of neighbors to use in the knn regression.
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [ lmds, FNN, proxyC]
+  - type: nextflow
+    directives:
+      label: [hightime, lowmem, lowcpu]
diff --git a/src/tasks/predict_modality/methods/knnr_r/script.R b/src/tasks/predict_modality/methods/knnr_r/script.R
new file mode 100644
index 0000000000..5679f8dd2d
--- /dev/null
+++ b/src/tasks/predict_modality/methods/knnr_r/script.R
@@ -0,0 +1,81 @@
+cat("Loading dependencies\n")
+requireNamespace("anndata", quietly = TRUE)
+library(Matrix, warn.conflicts = FALSE, quietly = TRUE)
+
+## VIASH START
+path <- "output/datasets/predict_modality/openproblems_bmmc_multiome_phase1_mod1/openproblems_bmmc_multiome_phase1_mod1.censor_dataset.output_"
+par <- list(
+  input_train_mod1 = paste0(path, "train_mod1.h5ad"),
+  input_test_mod1 = paste0(path, "test_mod1.h5ad"),
+  input_train_mod2 = paste0(path, "train_mod2.h5ad"),
+  output = "output.h5ad",
+  n_pcs = 4L,
+  n_neighbors = 3,
+  distance_method = "pearson"
+)
+## VIASH END
+
+cat("Reading mod1 h5ad files\n")
+input_train_mod1 <- anndata::read_h5ad(par$input_train_mod1)
+dataset_id <- input_train_mod1$uns[["dataset_id"]]
+
+# subset to HVG to reduce memory consumption
+train_mod1_sd <- proxyC::colSds(input_train_mod1$layers[["normalized"]])
+ix <- order(train_mod1_sd, decreasing = TRUE)[seq_len(min(1000, length(train_mod1_sd)))]
+input_train_mod1 <- input_train_mod1[,ix]$copy()
+gc()
+
+# subset to HVG to reduce memory consumption
+input_test_mod1 <- anndata::read_h5ad(par$input_test_mod1)
+input_test_mod1 <- input_test_mod1[,ix]$copy()
+gc()
+
+cat("Performing DR on the mod1 values\n")
+# LMDS is more efficient than regular MDS because
+# it does not compure a square distance matrix.
+dr_mod1 <- lmds::lmds(
+  rbind(input_train_mod1$layers[["normalized"]], input_test_mod1$layers[["normalized"]]),
+  ndim = par$n_pcs,
+  distance_method = par$distance_method
+)
+
+ix <- seq_len(nrow(input_train_mod1))
+dr_mod1_train <- dr_mod1[ix, , drop = FALSE]
+dr_mod1_test <- dr_mod1[-ix, , drop = FALSE]
+
+# remove previous objects to save memory
+rm(input_train_mod1, input_test_mod1)
+gc()
+
+cat("Reading mod2 h5ad files\n")
+input_train_mod2 <- anndata::read_h5ad(par$input_train_mod2)
+
+cat("Predicting for each column in modality 2\n")
+# precompute knn indices
+knn_ix <- FNN::get.knnx(
+  dr_mod1_train,
+  dr_mod1_test,
+  k = par$n_neighbors
+)$nn.index
+
+# perform knn regression.
+pred <- input_train_mod2$layers[["normalized"]][knn_ix[, 1], , drop = FALSE]
+if (par$n_neighbors > 1) {
+  for (k in seq(2, par$n_neighbors)) {
+    pred <- pred + input_train_mod2$layers[["normalized"]][knn_ix[, k], , drop = FALSE]
+  }
+}
+pred <- pred / par$n_neighbors
+rownames(pred) <- rownames(dr_mod1_test)
+
+out <- anndata::AnnData(
+  layers = list(normalized = pred),
+  shape = dim(pred),
+  uns = list(
+    dataset_id = dataset_id,
+    method_id = meta$functionality_name
+  )
+)
+
+cat("Writing predictions to file\n")
+zzz <- out$write_h5ad(par$output, compression = "gzip")
diff --git a/src/tasks/predict_modality/methods/lm/config.vsh.yaml b/src/tasks/predict_modality/methods/lm/config.vsh.yaml
new file mode 100644
index 0000000000..3fdbc0f243
--- /dev/null
+++ b/src/tasks/predict_modality/methods/lm/config.vsh.yaml
@@ -0,0 +1,32 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: lm
+  info:
+    label: Linear Model
+    summary: Linear model regression.
+    description: A linear model regression method.
+    reference: wilkinson1973symbolic
+    repository_url: https://github.com/RcppCore/RcppArmadillo
+    documentation_url: https://cran.r-project.org/package=RcppArmadillo
+    preferred_normalization: log_cp10k
+  arguments:
+    - name: "--distance_method"
+      type: "string"
+      default: "spearman"
+      description: The distance method to use. Possible values are euclidean, pearson, spearman and others.
+    - name: "--n_pcs"
+      type: "integer"
+      default: 50
+      description: Number of principal components to use.
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [ lmds, RcppArmadillo, pbapply]
+  - type: nextflow
+    directives:
+      label: [hightime, highmem, highcpu]
diff --git a/src/tasks/predict_modality/methods/lm/script.R b/src/tasks/predict_modality/methods/lm/script.R
new file mode 100644
index 0000000000..58d3febfb5
--- /dev/null
+++ b/src/tasks/predict_modality/methods/lm/script.R
@@ -0,0 +1,74 @@
+cat("Loading dependencies\n")
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("pbapply", quietly = TRUE)
+library(Matrix, warn.conflicts = FALSE, quietly = TRUE)
+
+## VIASH START
+path <- "output/datasets/predict_modality/openproblems_bmmc_multiome_phase1_mod1/openproblems_bmmc_multiome_phase1_mod1.censor_dataset.output_"
+par <- list(
+  input_train_mod1 = paste0(path, "train_mod1.h5ad"),
+  input_test_mod1 = paste0(path, "test_mod1.h5ad"),
+  input_train_mod2 = paste0(path, "train_mod2.h5ad"),
+  output = "output.h5ad",
+  n_pcs = 4L
+)
+meta <- list(functionality_name = "foo")
+## VIASH END
+
+n_cores <- parallel::detectCores(all.tests = FALSE, logical = TRUE)
+
+cat("Reading mod1 files\n")
+input_train_mod1 <- anndata::read_h5ad(par$input_train_mod1)
+input_test_mod1 <- anndata::read_h5ad(par$input_test_mod1)
+
+
+cat("Performing DR on the mod1 values\n")
+dr <- lmds::lmds(
+  rbind(input_train_mod1$layers[["normalized"]], input_test_mod1$layers[["normalized"]]), 
+  ndim = par$n_pcs,
+  distance_method = par$distance_method
+)
+
+ix <- seq_len(nrow(input_train_mod1))
+dr_train <- dr[ix, , drop = FALSE]
+dr_test <- dr[-ix, , drop = FALSE]
+
+rm(input_test_mod1)
+gc()
+
+
+cat("Reading mod2 files\n")
+X_mod2 <- anndata::read_h5ad(par$input_train_mod2)$layers[["normalized"]]
+
+cat("Predicting for each column in modality 2\n")
+preds <- pbapply::pblapply(
+  seq_len(ncol(X_mod2)),
+  function(i) {
+    y <- X_mod2[, i]
+    uy <- unique(y)
+    if (length(uy) > 1) {
+      fit <- RcppArmadillo::fastLm(dr_train, y)
+      # fit <- lm(y ~ ., dr_train)
+      stats::predict(fit, dr_test)
+    } else {
+      rep(uy, nrow(dr_test))
+    }
+  }
+)
+
+cat("Creating outputs object\n")
+prediction <- Matrix::Matrix(do.call(cbind, preds), sparse = TRUE)
+rownames(prediction) <- rownames(dr_test)
+colnames(prediction) <- colnames(X_mod2)
+
+out <- anndata::AnnData(
+  layers = list(normalized = prediction),
+  shape = dim(prediction),
+  uns = list(
+    dataset_id = input_train_mod1$uns[["dataset_id"]],
+    method_id = meta$functionality_name
+  )
+)
+
+cat("Writing predictions to file\n")
+zzz <- out$write_h5ad(par$output, compression = "gzip")
diff --git a/src/tasks/predict_modality/methods/lmds_irlba_rf/config.vsh.yaml b/src/tasks/predict_modality/methods/lmds_irlba_rf/config.vsh.yaml
new file mode 100644
index 0000000000..0ed08b89aa
--- /dev/null
+++ b/src/tasks/predict_modality/methods/lmds_irlba_rf/config.vsh.yaml
@@ -0,0 +1,37 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: lmds_irlba_rf
+  info:
+    label: LMDS + IRLBA + RF
+    summary: A random forest regression using LMDS of modality 1 to predict a PCA embedding of modality 2, which is then reversed to predict the original modality 2.
+    description: |
+      A random forest regression using LMDS of modality 1 to predict a PCA embedding of modality 2, which is then reversed to predict the original modality 2.
+    reference: lance2022multimodal
+    documentation_url: https://github.com/openproblems-bio/openproblems/tree/main/src/tasks/predict_modality/methods #/lmds_irlba_rf
+    repository_url: https://github.com/openproblems-bio/openproblems
+    preferred_normalization: log_cp10k
+  arguments:
+    - name: "--distance_method"
+      type: "string"
+      default: "pearson"
+      description: The distance method to use. Possible values are euclidean, pearson, spearman and others.
+    - name: "--n_pcs"
+      type: "integer"
+      default: 20
+      description: Number of principal components to use.
+    - name: "--n_trees"
+      type: "integer"
+      default: 500
+      description: Number of trees to use.
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [lmds, ranger, pbapply, irlba]
+  - type: nextflow
+    directives:
+      label: [hightime, highmem, highcpu]
\ No newline at end of file
diff --git a/src/tasks/predict_modality/methods/lmds_irlba_rf/script.R b/src/tasks/predict_modality/methods/lmds_irlba_rf/script.R
new file mode 100644
index 0000000000..6a5b7ed595
--- /dev/null
+++ b/src/tasks/predict_modality/methods/lmds_irlba_rf/script.R
@@ -0,0 +1,93 @@
+cat("Loading dependencies\n")
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("pbapply", quietly = TRUE)
+library(Matrix, warn.conflicts = FALSE, quietly = TRUE)
+
+## VIASH START
+path <- "resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/normal/"
+par <- list(
+  input_train_mod1 = paste0(path, "train_mod1.h5ad"),
+  input_test_mod1 = paste0(path, "test_mod1.h5ad"),
+  input_train_mod2 = paste0(path, "train_mod2.h5ad"),
+  output = "output.h5ad",
+  n_pcs = 20L,
+  n_trees = 50L
+)
+meta <- list(functionality_name = "foo")
+## VIASH END
+
+n_cores <- parallel::detectCores(all.tests = FALSE, logical = TRUE)
+
+cat("Reading mod1 files\n")
+input_train_mod1 <- anndata::read_h5ad(par$input_train_mod1)
+input_test_mod1 <- anndata::read_h5ad(par$input_test_mod1)
+
+dataset_id <- input_train_mod1$uns[["dataset_id"]]
+
+cat("Performing DR on the mod1 values\n")
+dr <- lmds::lmds(
+  rbind(input_train_mod1$layers[["normalized"]], input_test_mod1$layers[["normalized"]]), 
+  ndim = par$n_pcs,
+  distance_method = par$distance_method
+)
+# alternative:
+# pr_out <- irlba::prcomp_irlba(
+#   rbind(input_train_mod1$layers[["normalized"]], input_test_mod1$layers[["normalized"]]),
+#   n = par$n_pcs
+# )
+# dr <- pr_out$x
+
+# split up dr data
+ix <- seq_len(nrow(input_train_mod1))
+dr_train <- as.data.frame(dr[ix, , drop = FALSE])
+dr_test <- as.data.frame(dr[-ix, , drop = FALSE])
+dr_train <- dr[ix, , drop = FALSE]
+dr_test <- dr[-ix, , drop = FALSE]
+
+rm(input_train_mod1, input_test_mod1)
+gc()
+
+
+cat("Reading mod2 files\n")
+X_mod2 <- anndata::read_h5ad(par$input_train_mod2)$layers[["normalized"]]
+prcomp_mod2 <- irlba::prcomp_irlba(X_mod2, n = par$n_pcs)
+dr_mod2 <- prcomp_mod2$x
+
+cat("Predicting for each column in modality 2\n")
+pred_drs <- pbapply::pblapply(
+  seq_len(ncol(dr_mod2)),
+  function(i) {
+    y <- dr_mod2[, i]
+    uy <- unique(y)
+    if (length(uy) > 1) {
+      rf <- ranger::ranger(
+        x = dr_train,
+        y = y,
+        num.trees = par$n_trees,
+        num.threads = n_cores
+      )
+      stats::predict(rf, dr_test)$prediction
+    } else {
+      rep(uy, nrow(dr_test))
+    }
+  }
+)
+
+cat("Creating outputs object\n")
+pred_dr <- Matrix::Matrix(do.call(cbind, pred_drs), sparse = TRUE)
+prediction <- pred_dr %*% t(prcomp_mod2$rotation)
+rownames(prediction) <- rownames(dr_test)
+colnames(prediction) <- colnames(X_mod2)
+
+out <- anndata::AnnData(
+  layers = list(normalized = as(prediction, "CsparseMatrix")),
+  shape = dim(prediction),
+  uns = list(
+    dataset_id = dataset_id,
+    method_id = meta$functionality_name
+  )
+)
+
+
+cat("Writing predictions to file\n")
+zzz <- out$write_h5ad(par$output, compression = "gzip")
diff --git a/src/tasks/predict_modality/methods/newwave_knnr/config.vsh.yaml b/src/tasks/predict_modality/methods/newwave_knnr/config.vsh.yaml
new file mode 100644
index 0000000000..385f1234bb
--- /dev/null
+++ b/src/tasks/predict_modality/methods/newwave_knnr/config.vsh.yaml
@@ -0,0 +1,42 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: newwave_knnr
+  status: disabled # disabled due to poor performance and long execution times
+  info:
+    label: NewWave+KNNR
+    summary: Perform DR with NewWave, predict modality with KNN regression.
+    description: Perform DR with NewWave, predict modality with KNN regression.
+    reference: agostinis2022newwave
+    repository_url: https://github.com/fedeago/NewWave
+    documentation_url: https://bioconductor.org/packages/release/bioc/html/NewWave.html
+    preferred_normalization: log_cp10k
+  arguments:
+    - name: "--newwave_maxiter"
+      type: "integer"
+      default: 40
+      description: Maximum number of NewWave iterations.
+    - name: "--newwave_ngene"
+      type: "integer"
+      default: 200
+      description: Setting of the n_gene_par NewWave parameter.
+    - name: "--newwave_ncell"
+      type: "integer"
+      default: 200
+      description: Setting of the n_cell_par NewWave parameter.
+    - name: "--n_neighbors"
+      type: "integer"
+      default: 20
+      description: Number of neighbors to use in the knn regression.
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [ lmds, FNN, proxy, proxyC ]
+        bioc: [ SingleCellExperiment, NewWave ]
+  - type: nextflow
+    directives:
+      label: [hightime, highmem, highcpu, highsharedmem]
diff --git a/src/tasks/predict_modality/methods/newwave_knnr/script.R b/src/tasks/predict_modality/methods/newwave_knnr/script.R
new file mode 100644
index 0000000000..84f8a0b469
--- /dev/null
+++ b/src/tasks/predict_modality/methods/newwave_knnr/script.R
@@ -0,0 +1,107 @@
+cat("Loading dependencies\n")
+requireNamespace("anndata", quietly = TRUE)
+library(Matrix, warn.conflicts = FALSE, quietly = TRUE)
+requireNamespace("NewWave", quietly = TRUE)
+requireNamespace("FNN", quietly = TRUE)
+requireNamespace("SingleCellExperiment", quietly = TRUE)
+
+## VIASH START
+path <- "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/"
+par <- list(
+  input_train_mod1 = paste0(path, "train_mod1.h5ad"),
+  input_test_mod1 = paste0(path, "test_mod1.h5ad"),
+  input_train_mod2 = paste0(path, "train_mod2.h5ad"),
+  output = "output.h5ad",
+  newwave_maxiter = 40L,
+  newwave_ngene = 200L,
+  newwave_ncell = 200L,
+  n_neighbors = 20L
+)
+meta <- list(functionality_name = "foo")
+## VIASH END
+
+print(par)
+
+n_cores <- parallel::detectCores(all.tests = FALSE, logical = TRUE)
+
+method_id <- meta$functionality_name
+
+cat("Reading h5ad files\n")
+input_train_mod1 <- anndata::read_h5ad(par$input_train_mod1)
+input_test_mod1 <- anndata::read_h5ad(par$input_test_mod1)
+
+# fetch batch labels
+batch1 <- c(as.character(input_train_mod1$obs$batch), as.character(input_test_mod1$obs$batch))
+batch2 <- as.character(input_train_mod1$obs$batch)
+
+# create SummarizedExperiment object
+data1 <- SummarizedExperiment::SummarizedExperiment(
+  assays = list(
+    counts = as(
+      cbind(
+        t(input_train_mod1$layers[["counts"]]),
+        t(input_test_mod1$layers[["counts"]])
+      ),
+      "CsparseMatrix"
+    )
+  ),
+  colData = data.frame(batch = factor(batch1))
+)
+data1 <- data1[Matrix::rowSums(SummarizedExperiment::assay(data1)) > 0, ]
+rm(input_train_mod1, input_test_mod1)
+gc()
+
+cat("Running NewWave on mod1\n")
+res1 <- NewWave::newWave(
+  data1,
+  X = "~batch",
+  verbose = TRUE,
+  K = 10,
+  maxiter_optimize = par$newwave_maxiter,
+  n_gene_par = min(par$newwave_ngene, nrow(data1)),
+  n_cell_par = min(par$newwave_ncell, ncol(data1)),
+  commondispersion = FALSE
+)
+dr_mod1 <- SingleCellExperiment::reducedDim(res1)
+colnames(dr_mod1) <- paste0("comp_", seq_len(ncol(dr_mod1)))
+rm(data1)
+gc()
+
+# split DR matrices
+train_ix <- seq_along(batch2)
+dr_mod1_train <- dr_mod1[train_ix, , drop = FALSE]
+dr_mod1_test <- dr_mod1[-train_ix, , drop = FALSE]
+
+
+cat("Predicting for each column in modality 2\n")
+input_train_mod2 <- anndata::read_h5ad(par$input_train_mod2)
+
+# precompute knn indices
+knn_ix <- FNN::get.knnx(
+  dr_mod1_train,
+  dr_mod1_test,
+  k = min(nrow(dr_mod1_train), par$n_neighbors)
+)$nn.index
+
+# perform knn regression.
+pred <- input_train_mod2$layers[["normalized"]][knn_ix[, 1], , drop = FALSE]
+if (par$n_neighbors > 1) {
+  for (k in seq(2, par$n_neighbors)) {
+    pred <- pred + input_train_mod2$layers[["normalized"]][knn_ix[, k], , drop = FALSE]
+  }
+}
+pred <- pred / par$n_neighbors
+rownames(pred) <- rownames(dr_mod1_test)
+
+cat("Creating outputs object\n")
+out <- anndata::AnnData(
+  layers = list(normalized = pred),
+  shape = dim(pred),
+  uns = list(
+    dataset_id = input_train_mod2$uns[["dataset_id"]],
+    method_id = meta$functionality_name
+  )
+)
+
+cat("Writing predictions to file\n")
+zzz <- out$write_h5ad(par$output, compression = "gzip")
diff --git a/src/tasks/predict_modality/methods/novel/helper_functions.py b/src/tasks/predict_modality/methods/novel/helper_functions.py
new file mode 100644
index 0000000000..17c57c9b3b
--- /dev/null
+++ b/src/tasks/predict_modality/methods/novel/helper_functions.py
@@ -0,0 +1,247 @@
+import torch
+
+from torch import nn
+import torch.nn.functional as F
+
+from torch.utils.data import Dataset
+
+from typing import Optional
+
+import anndata
+import numpy as np
+import pandas as pd
+import scipy.sparse
+import sklearn.decomposition
+import sklearn.feature_extraction.text
+import sklearn.preprocessing
+import sklearn.neighbors
+import sklearn.utils.extmath
+
+class tfidfTransformer():
+    def __init__(self):
+        self.idf = None
+        self.fitted = False
+
+    def fit(self, X):
+        self.idf = X.shape[0] / X.sum(axis=0)
+        self.fitted = True
+
+    def transform(self, X):
+        if not self.fitted:
+            raise RuntimeError('Transformer was not fitted on any data')
+        if scipy.sparse.issparse(X):
+            tf = X.multiply(1 / X.sum(axis=1))
+            return tf.multiply(self.idf)
+        else:
+            tf = X / X.sum(axis=1, keepdims=True)
+            return tf * self.idf
+
+    def fit_transform(self, X):
+        self.fit(X)
+        return self.transform(X)
+
+class lsiTransformer():
+    def __init__(self,
+                 n_components: int = 20,
+                 use_highly_variable = None
+                ):
+        self.n_components = n_components
+        self.use_highly_variable = use_highly_variable
+        self.tfidfTransformer = tfidfTransformer()
+        self.normalizer =  sklearn.preprocessing.Normalizer(norm="l1")
+        self.pcaTransformer = sklearn.decomposition.TruncatedSVD(n_components = self.n_components, random_state=777)
+        # self.lsi_mean = None
+        # self.lsi_std = None
+        self.fitted = None
+        
+    def fit(self, adata: anndata.AnnData):
+        if self.use_highly_variable is None:
+            self.use_highly_variable = "hvg" in adata.var
+        adata_use = adata[:, adata.var["hvg"]] if self.use_highly_variable else adata
+        X = self.tfidfTransformer.fit_transform(adata_use.X)
+        X_norm = self.normalizer.fit_transform(X)
+        X_norm = np.log1p(X_norm * 1e4)
+        X_lsi = self.pcaTransformer.fit_transform(X_norm)
+        # self.lsi_mean = X_lsi.mean(axis=1, keepdims=True)
+        # self.lsi_std = X_lsi.std(axis=1, ddof=1, keepdims=True)
+        self.fitted = True
+    
+    def transform(self, adata):
+        if not self.fitted:
+            raise RuntimeError('Transformer was not fitted on any data')
+        adata_use = adata[:, adata.var["hvg"]] if self.use_highly_variable else adata
+        X = self.tfidfTransformer.transform(adata_use.X)
+        X_norm = self.normalizer.transform(X)
+        X_norm = np.log1p(X_norm * 1e4)
+        X_lsi = self.pcaTransformer.transform(X_norm)
+        X_lsi -= X_lsi.mean(axis=1, keepdims=True)
+        X_lsi /= X_lsi.std(axis=1, ddof=1, keepdims=True)
+        lsi_df = pd.DataFrame(X_lsi, index = adata_use.obs_names)
+        return lsi_df
+    
+    def fit_transform(self, adata):
+        self.fit(adata)
+        return self.transform(adata)
+
+class ModalityMatchingDataset(Dataset):
+    def __init__(
+        self, df_modality1, df_modality2, is_train=True
+    ):
+        super().__init__()
+        self.df_modality1 = df_modality1
+        self.df_modality2 = df_modality2
+        self.is_train = is_train
+    def __len__(self):
+        return self.df_modality1.shape[0]
+    
+    def __getitem__(self, index: int):
+        if self.is_train == True:
+            x = self.df_modality1.iloc[index].values
+            y = self.df_modality2.iloc[index].values
+            return x, y
+        else:
+            x = self.df_modality1.iloc[index].values
+            return x
+
+class Swish(torch.autograd.Function):
+    @staticmethod
+    def forward(ctx, i):
+        result = i * sigmoid(i)
+        ctx.save_for_backward(i)
+        return result
+    @staticmethod
+    def backward(ctx, grad_output):
+        i = ctx.saved_variables[0]
+        sigmoid_i = sigmoid(i)
+        return grad_output * (sigmoid_i * (1 + i * (1 - sigmoid_i)))
+
+class Swish_module(nn.Module):
+    def forward(self, x):
+        return Swish.apply(x)
+    
+sigmoid = torch.nn.Sigmoid()
+
+class ModelRegressionGex2Atac(nn.Module):
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionGex2Atac, self).__init__()
+        #self.bn = torch.nn.BatchNorm1d(1024)
+        self.input_ = nn.Linear(dim_mod1, 1024)
+        self.fc = nn.Linear(1024, 256)
+        self.fc1 = nn.Linear(256, 2048)
+        self.dropout1 = nn.Dropout(p=0.298885630228993)
+        self.dropout2 = nn.Dropout(p=0.11289717442776658)
+        self.dropout3 = nn.Dropout(p=0.13523634924414762)
+        self.output = nn.Linear(2048, dim_mod2)
+    def forward(self, x):
+        x = F.gelu(self.input_(x))
+        x = self.dropout1(x)
+        x = F.gelu(self.fc(x))
+        x = self.dropout2(x)
+        x = F.gelu(self.fc1(x))
+        x = self.dropout3(x)
+        x = F.gelu(self.output(x))
+        return x
+
+class ModelRegressionAtac2Gex(nn.Module): #
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionAtac2Gex, self).__init__()
+        self.input_ = nn.Linear(dim_mod1, 2048)
+        self.fc = nn.Linear(2048, 2048)
+        self.fc1 = nn.Linear(2048, 512)
+        self.dropout1 = nn.Dropout(p=0.2649138776004753)
+        self.dropout2 = nn.Dropout(p=0.1769628308148758)
+        self.dropout3 = nn.Dropout(p=0.2516791883012817)
+        self.output = nn.Linear(512, dim_mod2)
+    def forward(self, x):
+        x = F.gelu(self.input_(x))
+        x = self.dropout1(x)
+        x = F.gelu(self.fc(x))
+        x = self.dropout2(x)
+        x = F.gelu(self.fc1(x))
+        x = self.dropout3(x)
+        x = F.gelu(self.output(x))
+        return x
+
+class ModelRegressionAdt2Gex(nn.Module):
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionAdt2Gex, self).__init__()
+        self.input_ = nn.Linear(dim_mod1, 512)
+        self.dropout1 = nn.Dropout(p=0.0)
+        self.swish = Swish_module()
+        self.fc = nn.Linear(512, 512)
+        self.fc1 = nn.Linear(512, 512)
+        self.fc2 = nn.Linear(512, 512)
+        self.output = nn.Linear(512, dim_mod2)
+    def forward(self, x):
+        x = F.gelu(self.input_(x))
+        x = F.gelu(self.fc(x))
+        x = F.gelu(self.fc1(x))
+        x = F.gelu(self.fc2(x))
+        x = F.gelu(self.output(x))
+        return x
+    
+class ModelRegressionGex2Adt(nn.Module):
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionGex2Adt, self).__init__()
+        self.input_ = nn.Linear(dim_mod1, 512)
+        self.dropout1 = nn.Dropout(p=0.20335661386636347)
+        self.dropout2 = nn.Dropout(p=0.15395289261127876)
+        self.dropout3 = nn.Dropout(p=0.16902655078832815)
+        self.fc = nn.Linear(512, 512)
+        self.fc1 = nn.Linear(512, 2048)
+        self.output = nn.Linear(2048, dim_mod2)
+    def forward(self, x):
+       # x = self.batchswap_noise(x)
+        x = F.gelu(self.input_(x))
+        x = self.dropout1(x)
+        x = F.gelu(self.fc(x))
+        x = self.dropout2(x)
+        x = F.gelu(self.fc1(x))
+        x = self.dropout3(x)
+        x = F.gelu(self.output(x))
+        return x
+
+def rmse(y, y_pred):
+    return np.sqrt(np.mean(np.square(y - y_pred)))
+
+def train_and_valid(model, optimizer, loss_fn, dataloader_train, dataloader_test, name_model, device):
+    best_score = 100000
+    for i in range(100):
+        train_losses = []
+        test_losses = []
+        model.train()
+
+        for x, y in dataloader_train:
+            optimizer.zero_grad()
+            output = model(x.float().to(device))
+            loss = torch.sqrt(loss_fn(output, y.float().to(device)))
+            loss.backward()
+            train_losses.append(loss.item())
+            optimizer.step()
+           
+        model.eval()
+        with torch.no_grad():
+            for x, y in dataloader_test:
+                output = model(x.float().to(device))
+                output[output<0] = 0.0
+                loss = torch.sqrt(loss_fn(output, y.float().to(device)))
+                test_losses.append(loss.item())
+        
+        outputs = []
+        targets = []
+        model.eval()
+        with torch.no_grad():
+            for x, y in dataloader_test:
+                output = model(x.float().to(device))
+                
+                outputs.append(output.detach().cpu().numpy())
+                targets.append(y.float().detach().cpu().numpy())
+        cat_outputs = np.concatenate(outputs)
+        cat_targets = np.concatenate(targets)
+        cat_outputs[cat_outputs<0.0] = 0
+        
+        if best_score > rmse(cat_targets,cat_outputs):
+            torch.save(model.state_dict(), name_model)
+            best_score = rmse(cat_targets,cat_outputs)
+    print("best rmse: ", best_score)
+    
diff --git a/src/tasks/predict_modality/methods/novel/predict/config.vsh.yaml b/src/tasks/predict_modality/methods/novel/predict/config.vsh.yaml
new file mode 100644
index 0000000000..72e3292407
--- /dev/null
+++ b/src/tasks/predict_modality/methods/novel/predict/config.vsh.yaml
@@ -0,0 +1,25 @@
+__merge__: ../../../api/comp_method_predict.yaml
+functionality:
+  name: novel_predict
+  arguments:
+    - name: "--input_transform"
+      type: file
+      direction: input
+      required: false
+      example: "lsi_transformer.pickle"
+  resources:
+    - type: python_script
+      path: script.py
+    - path: ../helper_functions.py
+platforms:
+  - type: docker
+    image: openproblems/base_pytorch_nvidia:1.0.0
+    setup:
+      - type: python
+        packages:
+          - scikit-learn
+          - networkx
+  - type: nextflow
+    directives:
+      label: [highmem, hightime, midcpu, highsharedmem, gpu]
+  
diff --git a/src/tasks/predict_modality/methods/novel/predict/run_test.sh b/src/tasks/predict_modality/methods/novel/predict/run_test.sh
new file mode 100644
index 0000000000..af5550e5d7
--- /dev/null
+++ b/src/tasks/predict_modality/methods/novel/predict/run_test.sh
@@ -0,0 +1,8 @@
+#!/bin/bash
+
+viash run src/tasks/predict_modality/methods/novel/predict/config.vsh.yaml -- \
+    --input_train_mod2 'resources/predict_modality/datasets/openproblems_neurips2021/bmmc_cite/normal/log_cp10k/train_mod2.h5ad' \
+    --input_test_mod1 'resources/predict_modality/datasets/openproblems_neurips2021/bmmc_cite/normal/log_cp10k/test_mod1.h5ad' \
+    --input_model output/novel/model.pt \
+    --input_transform output/novel/lsi_transform.pickle \
+    --output 'output/novel/novel_test.h5ad'
\ No newline at end of file
diff --git a/src/tasks/predict_modality/methods/novel/predict/script.py b/src/tasks/predict_modality/methods/novel/predict/script.py
new file mode 100644
index 0000000000..5f336ce7b0
--- /dev/null
+++ b/src/tasks/predict_modality/methods/novel/predict/script.py
@@ -0,0 +1,119 @@
+import sys
+import torch
+from torch.utils.data import DataLoader
+
+import anndata as ad
+import pickle
+import numpy as np
+from scipy.sparse import csc_matrix
+
+#check gpu available
+if (torch.cuda.is_available()):
+    device = 'cuda:0' #switch to current device
+    print('current device: gpu', flush=True)
+else:
+    device = 'cpu'
+    print('current device: cpu', flush=True)
+
+
+## VIASH START
+
+par = {
+    'input_train_mod2': 'resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/normal/train_mod2.h5ad',
+    'input_test_mod1': 'resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/normal/test_mod1.h5ad',
+    'input_model': 'resources_test/predict_modality/neurips2021_bmmc_cite/model.pt',
+    'input_transform': 'transformer.pickle'
+}
+meta = {
+    'resources_dir': 'src/tasks/predict_modality/methods/novel',
+    'functionality_name': '171129'
+}
+## VIASH END
+
+sys.path.append(meta['resources_dir'])
+from helper_functions import ModelRegressionAtac2Gex, ModelRegressionAdt2Gex, ModelRegressionGex2Adt, ModelRegressionGex2Atac, ModalityMatchingDataset
+
+print("Load data", flush=True)
+
+input_test_mod1 = ad.read_h5ad(par['input_test_mod1'])
+input_train_mod2 = ad.read_h5ad(par['input_train_mod2'])
+
+mod1 = input_test_mod1.uns['modality']
+mod2 = input_train_mod2.uns['modality']
+
+n_vars_mod1 = input_train_mod2.uns["model_dim"]["mod1"]
+n_vars_mod2 = input_train_mod2.uns["model_dim"]["mod2"]
+
+input_test_mod1.X = input_test_mod1.layers['normalized'].tocsr()
+
+# Remove vars that were removed from training set. Mostlyy only applicable for testing.
+if input_train_mod2.uns.get("removed_vars"):
+  rem_var = input_train_mod2.uns["removed_vars"]
+  input_test_mod1 = input_test_mod1[:, ~input_test_mod1.var_names.isin(rem_var)]
+
+del input_train_mod2
+
+
+model_fp = par['input_model']
+
+print("Start predict", flush=True)
+
+if mod1 == 'GEX' and mod2 == 'ADT':
+  model = ModelRegressionGex2Adt(n_vars_mod1,n_vars_mod2)   
+  weight = torch.load(model_fp, map_location='cpu')    
+  with open(par['input_transform'], 'rb') as f:
+    lsi_transformer_gex = pickle.load(f)
+  
+  model.load_state_dict(weight)    
+  input_test_mod1_ = lsi_transformer_gex.transform(input_test_mod1)
+
+elif mod1 == 'GEX' and mod2 == 'ATAC':
+  model = ModelRegressionGex2Atac(n_vars_mod1,n_vars_mod2)   
+  weight = torch.load(model_fp, map_location='cpu')
+  with open(par['input_transform'], 'rb') as f:
+    lsi_transformer_gex = pickle.load(f)
+  
+  model.load_state_dict(weight)    
+  input_test_mod1_ = lsi_transformer_gex.transform(input_test_mod1)
+    
+elif mod1 == 'ATAC' and mod2 == 'GEX':
+  model = ModelRegressionAtac2Gex(n_vars_mod1,n_vars_mod2)   
+  weight = torch.load(model_fp, map_location='cpu')
+  with open(par['input_transform'], 'rb') as f:
+    lsi_transformer_gex = pickle.load(f)
+      
+  model.load_state_dict(weight)    
+  input_test_mod1_ = lsi_transformer_gex.transform(input_test_mod1)
+
+elif mod1 == 'ADT' and mod2 == 'GEX':
+    model = ModelRegressionAdt2Gex(n_vars_mod1,n_vars_mod2)   
+    weight = torch.load(model_fp, map_location='cpu')
+
+    model.load_state_dict(weight)    
+    input_test_mod1_ = input_test_mod1.to_df()
+    
+dataset_test = ModalityMatchingDataset(input_test_mod1_, None, is_train=False)
+dataloader_test = DataLoader(dataset_test, 32, shuffle = False, num_workers = 4)
+
+outputs = []
+model.eval()
+with torch.no_grad():
+    for x in dataloader_test:
+        output = model(x.float())
+        outputs.append(output.detach().cpu().numpy())
+
+outputs = np.concatenate(outputs)
+outputs[outputs<0] = 0
+outputs = csc_matrix(outputs)
+
+adata = ad.AnnData(
+    layers={"normalized": outputs},
+    shape=outputs.shape,
+    uns={
+        'dataset_id': input_test_mod1.uns['dataset_id'],
+        'method_id': meta['functionality_name'],
+    },
+)
+adata.write_h5ad(par['output'], compression = "gzip")
+
+
diff --git a/src/tasks/predict_modality/methods/novel/run/config.vsh.yaml b/src/tasks/predict_modality/methods/novel/run/config.vsh.yaml
new file mode 100644
index 0000000000..682782e059
--- /dev/null
+++ b/src/tasks/predict_modality/methods/novel/run/config.vsh.yaml
@@ -0,0 +1,21 @@
+__merge__: ../../../api/comp_method.yaml
+functionality:
+  name: novel
+  info:
+    label: Novel
+    summary: A method using encoder-decoder MLP model
+    description: This method trains an encoder-decoder MLP model with one output neuron per component in the target. As an input, the encoders use representations obtained from ATAC and GEX data via LSI transform and raw ADT data. The hyperparameters of the models were found via broad hyperparameter search using the Optuna framework.
+    documentation_url: https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/novel#readme
+    repository_url: https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/novel
+    reference: pmlr-v176-lance2022multimodal
+    submission_id: "169769"
+    preferred_normalization: log_cp10k
+  resources:
+    - path: main.nf
+      type: nextflow_script
+      entrypoint: run_wf
+  dependencies:
+    - name: predict_modality/methods/novel_train
+    - name: predict_modality/methods/novel_predict
+platforms:
+  - type: nextflow
\ No newline at end of file
diff --git a/src/tasks/predict_modality/methods/novel/run/main.nf b/src/tasks/predict_modality/methods/novel/run/main.nf
new file mode 100644
index 0000000000..59111194cb
--- /dev/null
+++ b/src/tasks/predict_modality/methods/novel/run/main.nf
@@ -0,0 +1,25 @@
+workflow run_wf {
+  take: input_ch
+  main:
+  output_ch = input_ch
+    | novel_train.run(
+      fromState: ["input_train_mod1", "input_train_mod2"],
+      toState: ["input_model": "output", "input_transform": "output_transform", "output_train_mod2": "output_train_mod2"]
+    )
+    | novel_predict.run(
+      fromState: { id, state ->
+        [
+          "input_train_mod2": state.output_train_mod2,
+          "input_test_mod1": state.input_test_mod1,
+          "input_model": state.input_model, 
+          "input_transform": state.input_transform,
+          "output": state.output]},
+      toState: ["output": "output"]
+    )
+
+    | map { tup ->
+      [tup[0], [output: tup[1].output]]
+    }
+
+  emit: output_ch
+}
\ No newline at end of file
diff --git a/src/tasks/predict_modality/methods/novel/run/run_test.sh b/src/tasks/predict_modality/methods/novel/run/run_test.sh
new file mode 100644
index 0000000000..f6da6b0863
--- /dev/null
+++ b/src/tasks/predict_modality/methods/novel/run/run_test.sh
@@ -0,0 +1,15 @@
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+nextflow run . \
+  -main-script target/nextflow/predict_modality/methods/novel/main.nf \
+  -profile docker \
+  -c src/wf_utils/labels_ci.config \
+  --input_train_mod1 resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/normal/train_mod1.h5ad \
+  --input_train_mod2 resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/normal/train_mod2.h5ad \
+  --input_test_mod1 resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/normal/test_mod1.h5ad \
+  --publish_dir output/novel/nextflow
diff --git a/src/tasks/predict_modality/methods/novel/train/config.vsh.yaml b/src/tasks/predict_modality/methods/novel/train/config.vsh.yaml
new file mode 100644
index 0000000000..87ea471301
--- /dev/null
+++ b/src/tasks/predict_modality/methods/novel/train/config.vsh.yaml
@@ -0,0 +1,31 @@
+__merge__: ../../../api/comp_method_train.yaml
+functionality:
+  name: novel_train
+  arguments:
+    - name: --output_transform
+      type: file
+      description: "The output transform file"
+      required: false
+      default: "lsi_transformer.pickle"
+      direction: output
+    - name: --output_train_mod2
+      type: file
+      description: copy of the input with model dim in `.uns`
+      direction: output
+      default: "train_mod2.h5ad"
+      required: false
+  resources:
+    - path: script.py
+      type: python_script
+    - path: ../helper_functions.py
+platforms:
+  - type: docker
+    image: openproblems/base_pytorch_nvidia:1.0.0
+    setup:
+      - type: python
+        packages:
+          - scikit-learn
+          - networkx
+  - type: nextflow
+    directives:
+      label: [highmem, hightime, midcpu, highsharedmem, gpu]
\ No newline at end of file
diff --git a/src/tasks/predict_modality/methods/novel/train/run_test.sh b/src/tasks/predict_modality/methods/novel/train/run_test.sh
new file mode 100644
index 0000000000..08630b1ac0
--- /dev/null
+++ b/src/tasks/predict_modality/methods/novel/train/run_test.sh
@@ -0,0 +1,29 @@
+#!/bin/bash
+
+# Run script for all test resources
+
+echo "GEX2ADT"
+viash run src/tasks/predict_modality/methods/novel/train/config.vsh.yaml -- \
+  --input_train_mod1 resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/normal/train_mod1.h5ad \
+  --input_train_mod2 resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/normal/train_mod2.h5ad \
+  --output output/model.pt
+
+# echo "ADT2GEX"
+# viash run src/tasks/predict_modality/methods/novel/train/config.vsh.yaml -- \
+#   --input_train_mod1 resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad \
+#   --input_train_mod2 resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad \
+#   --output output/model.pt
+
+# echo "GEX2ATAC"
+# viash run src/tasks/predict_modality/methods/novel/train/config.vsh.yaml -- \
+#   --input_train_mod1 resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/normal/train_mod1.h5ad \
+#   --input_train_mod2 resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/normal/train_mod2.h5ad \
+#   --output output/model.pt
+
+# echo "ATAC2GEX"
+# viash run src/tasks/predict_modality/methods/novel/train/config.vsh.yaml -- \
+#   --input_train_mod1 resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/swap/train_mod1.h5ad \
+#   --input_train_mod2 resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/swap/train_mod2.h5ad \
+#   --output output/model.pt
+
+
diff --git a/src/tasks/predict_modality/methods/novel/train/script.py b/src/tasks/predict_modality/methods/novel/train/script.py
new file mode 100644
index 0000000000..39ea8b4778
--- /dev/null
+++ b/src/tasks/predict_modality/methods/novel/train/script.py
@@ -0,0 +1,148 @@
+import sys
+
+import torch
+from torch.utils.data import DataLoader
+# from sklearn.model_selection import train_test_split
+
+import anndata as ad
+import pickle
+
+#check gpu available
+if (torch.cuda.is_available()):
+    device = 'cuda:0' #switch to current device
+    print('current device: gpu', flush=True)
+else:
+    device = 'cpu'
+    print('current device: cpu', flush=True)
+
+
+## VIASH START
+
+par = {
+  'input_train_mod1': 'resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/normal/train_mod1.h5ad',
+  'input_train_mod2': 'resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/normal/train_mod2.h5ad',
+  'output_train_mod2': 'train_mod2.h5ad',
+  'output': 'model.pt'
+}
+
+meta = {
+   'resources_dir': 'src/tasks/predict_modality/methods/novel',
+}
+## VIASH END
+
+
+sys.path.append(meta['resources_dir'])
+from helper_functions import train_and_valid, lsiTransformer, ModalityMatchingDataset
+from helper_functions import ModelRegressionAtac2Gex, ModelRegressionAdt2Gex, ModelRegressionGex2Adt, ModelRegressionGex2Atac
+
+print('Load data', flush=True)
+
+input_train_mod1 = ad.read_h5ad(par['input_train_mod1'])
+input_train_mod2 = ad.read_h5ad(par['input_train_mod2'])
+
+adata = input_train_mod2.copy()
+
+mod1 = input_train_mod1.uns['modality']
+mod2 = input_train_mod2.uns['modality']
+
+input_train_mod1.X = input_train_mod1.layers['normalized']
+input_train_mod2.X = input_train_mod2.layers['normalized']
+
+input_train_mod2_df = input_train_mod2.to_df()
+
+del input_train_mod2
+
+print('Start train', flush=True)
+
+
+# Check for zero divide
+zero_row = input_train_mod1.X.sum(axis=0) == 0
+
+rem_var = None
+if True in zero_row:
+  rem_var = input_train_mod1[:, zero_row].var_names
+  input_train_mod1 = input_train_mod1[:, ~zero_row]
+  
+
+# select number of variables for LSI
+n_comp = input_train_mod1.n_vars -1 if input_train_mod1.n_vars < 256 else 256
+
+if mod1 != 'ADT':  
+  lsi_transformer_gex = lsiTransformer(n_components=n_comp)
+  input_train_mod1_df = lsi_transformer_gex.fit_transform(input_train_mod1)
+else:
+  input_train_mod1_df = input_train_mod1.to_df()
+
+# reproduce train/test split from phase 1
+batch = input_train_mod1.obs["batch"]
+train_ix = [ k for k,v in enumerate(batch) if v not in {'s1d2', 's3d7'} ]
+test_ix = [ k for k,v in enumerate(batch) if v in {'s1d2', 's3d7'} ]
+
+train_mod1 = input_train_mod1_df.iloc[train_ix, :]
+train_mod2 = input_train_mod2_df.iloc[train_ix, :]
+test_mod1 = input_train_mod1_df.iloc[test_ix, :]
+test_mod2 = input_train_mod2_df.iloc[test_ix, :]
+
+n_vars_train_mod1 = train_mod1.shape[1]
+n_vars_train_mod2 = train_mod2.shape[1]
+n_vars_test_mod1 = test_mod1.shape[1]
+n_vars_test_mod2 = test_mod2.shape[1]
+
+n_vars_mod1 = input_train_mod1_df.shape[1]
+n_vars_mod2 = input_train_mod2_df.shape[1]
+  
+if mod1 == 'ATAC' and mod2 == 'GEX':
+  dataset_train = ModalityMatchingDataset(train_mod1, train_mod2)
+  dataloader_train = DataLoader(dataset_train, 256, shuffle = True, num_workers = 8)
+
+  dataset_test = ModalityMatchingDataset(test_mod1, test_mod2)
+  dataloader_test = DataLoader(dataset_test, 64, shuffle = False, num_workers = 8)
+
+  model = ModelRegressionAtac2Gex(n_vars_mod1,n_vars_mod2).to(device)
+  optimizer = torch.optim.AdamW(model.parameters(), lr=0.00008386597445284492,weight_decay=0.000684887347727808)
+        
+elif mod1 == 'ADT' and mod2 == 'GEX':
+  dataset_train = ModalityMatchingDataset(train_mod1, train_mod2)
+  dataloader_train = DataLoader(dataset_train, 64, shuffle = True, num_workers = 4)
+
+  dataset_test = ModalityMatchingDataset(test_mod1, test_mod2)
+  dataloader_test = DataLoader(dataset_test, 32, shuffle = False, num_workers = 4)
+
+  model = ModelRegressionAdt2Gex(n_vars_mod1,n_vars_mod2).to(device)
+  optimizer = torch.optim.Adam(model.parameters(), lr=0.00041, weight_decay=0.0000139)
+
+
+elif mod1 == 'GEX' and mod2 == 'ADT':
+  dataset_train = ModalityMatchingDataset(train_mod1, train_mod2)
+  dataloader_train = DataLoader(dataset_train, 32, shuffle = True, num_workers = 8)
+
+  dataset_test = ModalityMatchingDataset(test_mod1, test_mod2)
+  dataloader_test = DataLoader(dataset_test, 64, shuffle = False, num_workers = 8)
+
+  model = ModelRegressionGex2Adt(n_vars_mod1,n_vars_mod2).to(device)
+  optimizer = torch.optim.AdamW(model.parameters(), lr=0.000034609210829678734, weight_decay=0.0009965881574697426)
+
+
+elif mod1 == 'GEX' and mod2 == 'ATAC':
+  dataset_train = ModalityMatchingDataset(train_mod1, train_mod2)
+  dataloader_train = DataLoader(dataset_train, 64, shuffle = True, num_workers = 8)
+
+  dataset_test = ModalityMatchingDataset(test_mod1, test_mod2)
+  dataloader_test = DataLoader(dataset_test, 64, shuffle = False, num_workers = 8)
+
+  model = ModelRegressionGex2Atac(n_vars_mod1,n_vars_mod2).to(device)
+  optimizer = torch.optim.AdamW(model.parameters(), lr=0.00001806762345275399, weight_decay=0.0004084171379280058)
+
+loss_fn = torch.nn.MSELoss()
+train_and_valid(model, optimizer, loss_fn, dataloader_train, dataloader_test, par['output'], device)
+
+# Add model dim for use in predict part
+adata.uns["model_dim"] = {"mod1": n_vars_mod1, "mod2": n_vars_mod2}
+if rem_var:
+  adata.uns["removed_vars"] = [rem_var[0]]
+adata.write_h5ad(par['output_train_mod2'], compression="gzip")
+
+if mod1 != 'ADT':
+    with open(par['output_transform'], 'wb') as f:
+        pickle.dump(lsi_transformer_gex, f)
+
diff --git a/src/tasks/predict_modality/methods/random_forest/config.vsh.yaml b/src/tasks/predict_modality/methods/random_forest/config.vsh.yaml
new file mode 100644
index 0000000000..a1ee69041d
--- /dev/null
+++ b/src/tasks/predict_modality/methods/random_forest/config.vsh.yaml
@@ -0,0 +1,37 @@
+__merge__: ../../api/comp_method.yaml
+functionality:
+  name: random_forest
+  status: disabled # disabled due to long execution times
+  info:
+    label: Random Forests
+    summary: Random forest regression.
+    description: A random forest regression method.
+    reference: breiman2001random
+    documentation_url: https://www.stat.berkeley.edu/~breiman/RandomForests/reg_home.htm
+    repository_url: https://github.com/cran/randomForest
+    preferred_normalization: log_cp10k
+  arguments:
+    - name: "--distance_method"
+      type: "string"
+      default: "pearson"
+      description: The distance method to use. Possible values are euclidean, pearson, spearman and others.
+    - name: "--n_pcs"
+      type: "integer"
+      default: 20
+      description: Number of principal components to use.
+    - name: "--n_trees"
+      type: "integer"
+      default: 50
+      description: Number of trees to use.
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [ lmds, ranger, pbapply]
+  - type: nextflow
+    directives:
+      label: [hightime, highmem, highcpu]
\ No newline at end of file
diff --git a/src/tasks/predict_modality/methods/random_forest/script.R b/src/tasks/predict_modality/methods/random_forest/script.R
new file mode 100644
index 0000000000..e148eefbf7
--- /dev/null
+++ b/src/tasks/predict_modality/methods/random_forest/script.R
@@ -0,0 +1,83 @@
+cat("Loading dependencies\n")
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("pbapply", quietly = TRUE)
+library(Matrix, warn.conflicts = FALSE, quietly = TRUE)
+
+## VIASH START
+path <- "output/datasets/predict_modality/openproblems_bmmc_multiome_phase1_mod1/openproblems_bmmc_multiome_phase1_mod1.censor_dataset.output_"
+par <- list(
+  input_train_mod1 = paste0(path, "train_mod1.h5ad"),
+  input_test_mod1 = paste0(path, "test_mod1.h5ad"),
+  input_train_mod2 = paste0(path, "train_mod2.h5ad"),
+  output = "output.h5ad",
+  n_pcs = 20L,
+  n_trees = 50L
+)
+meta <- list(functionality_name = "foo")
+## VIASH END
+
+n_cores <- parallel::detectCores(all.tests = FALSE, logical = TRUE)
+
+cat("Reading mod1 files\n")
+input_train_mod1 <- anndata::read_h5ad(par$input_train_mod1)
+input_test_mod1 <- anndata::read_h5ad(par$input_test_mod1)
+
+dataset_id <- input_train_mod1$uns[["dataset_id"]]
+
+cat("Performing DR on the mod1 values\n")
+dr <- lmds::lmds(
+  rbind(input_train_mod1$layers[["normalized"]], input_test_mod1$layers[["normalized"]]), 
+  ndim = par$n_pcs,
+  distance_method = par$distance_method
+)
+
+ix <- seq_len(nrow(input_train_mod1))
+dr_train <- as.data.frame(dr[ix, , drop = FALSE])
+dr_test <- as.data.frame(dr[-ix, , drop = FALSE])
+dr_train <- dr[ix, , drop = FALSE]
+dr_test <- dr[-ix, , drop = FALSE]
+
+rm(input_train_mod1, input_test_mod1)
+gc()
+
+
+cat("Reading mod2 files\n")
+X_mod2 <- anndata::read_h5ad(par$input_train_mod2)$layers[["normalized"]]
+
+cat("Predicting for each column in modality 2\n")
+preds <- pbapply::pblapply(
+  seq_len(ncol(X_mod2)),
+  cl = n_cores,
+  function(i) {
+    y <- X_mod2[, i]
+    uy <- unique(y)
+    if (length(uy) > 1) {
+      rf <- ranger::ranger(
+        x = dr_train,
+        y = y,
+        num.trees = par$n_trees
+      )
+      stats::predict(rf, dr_test)$prediction
+    } else {
+      rep(uy, nrow(dr_test))
+    }
+  }
+)
+
+cat("Creating outputs object\n")
+prediction <- Matrix::Matrix(do.call(cbind, preds), sparse = TRUE)
+rownames(prediction) <- rownames(dr_test)
+colnames(prediction) <- colnames(X_mod2)
+
+out <- anndata::AnnData(
+  layers = list(normalized = prediction),
+  shape = dim(prediction),
+  uns = list(
+    dataset_id = dataset_id,
+    method_id = meta$functionality_name
+  )
+)
+
+
+cat("Writing predictions to file\n")
+zzz <- out$write_h5ad(par$output, compression = "gzip")
diff --git a/src/tasks/predict_modality/methods/simple_mlp/predict/config.vsh.yaml b/src/tasks/predict_modality/methods/simple_mlp/predict/config.vsh.yaml
new file mode 100644
index 0000000000..ef972e416f
--- /dev/null
+++ b/src/tasks/predict_modality/methods/simple_mlp/predict/config.vsh.yaml
@@ -0,0 +1,21 @@
+__merge__: ../../../api/comp_method_predict.yaml
+functionality:
+  name: simplemlp_predict
+  resources:
+    - type: python_script
+      path: script.py
+    - path: ../resources/
+platforms:
+  - type: docker
+    # image: pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime
+    image: openproblems/base_pytorch_nvidia:1.0.0
+    # run_args: ["--gpus all --ipc=host"]
+    setup:
+      - type: python
+        pypi:
+          - scikit-learn
+          - scanpy
+          - pytorch-lightning
+  - type: nextflow
+    directives:
+      label: [highmem, hightime, midcpu, gpu, highsharedmem]
\ No newline at end of file
diff --git a/src/tasks/predict_modality/methods/simple_mlp/predict/script.py b/src/tasks/predict_modality/methods/simple_mlp/predict/script.py
new file mode 100644
index 0000000000..b67284e348
--- /dev/null
+++ b/src/tasks/predict_modality/methods/simple_mlp/predict/script.py
@@ -0,0 +1,104 @@
+from glob import glob
+import sys
+import numpy as np
+from scipy.sparse import csc_matrix
+import anndata as ad
+import torch
+from torch.utils.data import TensorDataset,DataLoader
+
+## VIASH START
+par = {
+    'input_train_mod1': 'resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/swap/train_mod1.h5ad',
+    'input_train_mod2': 'resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/swap/train_mod2.h5ad',
+    'input_test_mod1': 'resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/swap/test_mod1.h5ad',
+    'input_model': 'output/model',
+    'output': 'output/prediction'
+}
+meta = {
+    'resources_dir': 'src/tasks/predict_modality/methods/simple_mlp',
+    'cpus': 10
+}
+## VIASH END
+
+resources_dir = f"{meta['resources_dir']}/resources"
+sys.path.append(resources_dir)
+from models import MLP
+import utils
+
+def _predict(model,dl):
+  model = model.cuda()
+  model.eval()
+  yps = []
+  for x in dl:
+    with torch.no_grad():
+      yp = model(x[0].cuda())
+      yps.append(yp.detach().cpu().numpy())
+  yp = np.vstack(yps)
+  return yp
+
+
+print('Load data', flush=True)
+input_train_mod2 = ad.read_h5ad(par['input_train_mod2'])
+input_test_mod1 = ad.read_h5ad(par['input_test_mod1'])
+
+# determine variables
+mod_1 = input_test_mod1.uns['modality']
+mod_2 = input_train_mod2.uns['modality']
+
+task = f'{mod_1}2{mod_2}'
+
+print('Load ymean', flush=True)
+ymean_path = f"{par['input_model']}/{task}_ymean.npy"
+ymean = np.load(ymean_path)
+
+print('Start predict', flush=True)
+if task == 'GEX2ATAC':
+    y_pred = ymean*np.ones([input_test_mod1.n_obs, input_test_mod1.n_vars])
+else:
+    folds = [0, 1, 2]
+
+    ymean = torch.from_numpy(ymean).float()
+    yaml_path=f"{resources_dir}/yaml/mlp_{task}.yaml"
+    config = utils.load_yaml(yaml_path)
+    X = input_test_mod1.layers["normalized"].toarray()
+    X = torch.from_numpy(X).float()
+    
+    te_ds = TensorDataset(X)
+    
+    yp = 0
+    for fold in folds:
+        # load_path = f"{par['input_model']}/{task}_fold_{fold}/version_0/checkpoints/*"
+        load_path = f"{par['input_model']}/{task}_fold_{fold}/**.ckpt"
+        print(load_path)
+        ckpt = glob(load_path)[0]
+        model_inf = MLP.load_from_checkpoint(
+            ckpt,
+            in_dim=X.shape[1],
+            out_dim=input_test_mod1.n_vars,
+            ymean=ymean,
+            config=config
+        )
+        te_loader = DataLoader(
+            te_ds,
+            batch_size=config.batch_size,
+            num_workers=0,
+            shuffle=False,
+            drop_last=False
+        )
+        yp = yp + _predict(model_inf, te_loader)
+
+    y_pred = yp/len(folds)
+
+y_pred = csc_matrix(y_pred)
+
+adata = ad.AnnData(
+    layers={"normalized": y_pred},
+    shape=y_pred.shape,
+    uns={
+        'dataset_id': input_test_mod1.uns['dataset_id'],
+        'method_id': meta['functionality_name'],
+    },
+)
+
+print('Write data', flush=True)
+adata.write_h5ad(par['output'], compression = "gzip")
\ No newline at end of file
diff --git a/src/tasks/predict_modality/methods/simple_mlp/resources/models.py b/src/tasks/predict_modality/methods/simple_mlp/resources/models.py
new file mode 100644
index 0000000000..25ce9b2995
--- /dev/null
+++ b/src/tasks/predict_modality/methods/simple_mlp/resources/models.py
@@ -0,0 +1,68 @@
+import torch
+import pytorch_lightning as pl
+import torch.nn as nn
+import torch.nn.functional as F
+
+class MLP(pl.LightningModule):
+    def __init__(self,in_dim,out_dim,ymean,config):
+        super(MLP, self).__init__()
+        self.ymean = ymean.cuda()
+        H1 = config.H1
+        H2 = config.H2
+        p = config.dropout
+        self.config = config
+        self.fc1 = nn.Linear(in_dim, H1)
+        self.fc2 = nn.Linear(H1,H2)
+        self.fc3 = nn.Linear(H1+H2, out_dim)
+        self.dp2 = nn.Dropout(p=p)
+
+    def forward(self, x):
+        x0 = x
+        x1 = F.relu(self.fc1(x))
+        x1 = self.dp2(x1)
+        x = F.relu(self.fc2(x1))
+        x = torch.cat([x,x1],dim=1)
+        x = self.fc3(x)
+        x = self.apply_mask(x)
+        return x
+    
+    def apply_mask(self,yp):
+        tmp = torch.ones_like(yp).float()*self.ymean
+        mask = tmp<self.config.threshold
+        mask = mask.float()
+        return yp*(1-mask) + tmp*mask
+    
+    def training_step(self, batch, batch_nb):
+        x,y = batch
+        yp = self(x)
+        criterion = nn.MSELoss()
+        loss = criterion(yp, y)
+        self.log('train_loss', loss, prog_bar=True)
+        return loss
+    
+    def validation_step(self, batch, batch_idx):
+        x,y = batch
+        yp = self(x)
+        criterion = nn.MSELoss()
+        loss = criterion(yp, y)
+        self.log('valid_RMSE', loss**0.5, prog_bar=True)
+        return loss
+    
+    def predict_step(self, batch, batch_idx):
+        if len(batch) == 2:
+            x,_ = batch
+        else:
+            x = batch
+        return self(x)
+    
+    def configure_optimizers(self):
+        lr = self.config.lr
+        wd = float(self.config.wd)
+        adam = torch.optim.Adam(self.parameters(), lr=lr, weight_decay=wd)
+        if self.config.lr_schedule == 'adam':
+            return adam
+        elif self.config.lr_schedule == 'adam_cosin':
+            slr = torch.optim.lr_scheduler.CosineAnnealingLR(adam, self.config.epochs)
+            return [adam], [slr]
+        else:
+            assert 0
\ No newline at end of file
diff --git a/src/tasks/predict_modality/methods/simple_mlp/resources/utils.py b/src/tasks/predict_modality/methods/simple_mlp/resources/utils.py
new file mode 100644
index 0000000000..d001b8e0f7
--- /dev/null
+++ b/src/tasks/predict_modality/methods/simple_mlp/resources/utils.py
@@ -0,0 +1,37 @@
+import yaml
+from collections import namedtuple
+
+
+def to_site_donor(data):
+    df = data.obs['batch'].copy().to_frame().reset_index()
+    df.columns = ['index','batch']
+    df['site'] = df['batch'].apply(lambda x: x[:2])
+    df['donor'] = df['batch'].apply(lambda x: x[2:]) 
+    return df
+
+
+def split(tr1, tr2, fold):
+    df = to_site_donor(tr1) 
+    mask = df['site'] == f's{fold+1}'
+    maskr = ~mask
+
+    Xt = tr1[mask].layers["normalized"].toarray()
+    X = tr1[maskr].layers["normalized"].toarray()
+
+    yt = tr2[mask].layers["normalized"].toarray()
+    y = tr2[maskr].layers["normalized"].toarray()
+
+    print(f"{X.shape}, {y.shape}, {Xt.shape}, {yt.shape}")
+
+    return X,y,Xt,yt
+
+
+def load_yaml(path):
+    with open(path) as f:
+        x = yaml.safe_load(f)
+    res = {}
+    for i in x:
+        res[i] = x[i]['value']
+    config = namedtuple('Config', res.keys())(**res)
+    print(config)
+    return config
diff --git a/src/tasks/predict_modality/methods/simple_mlp/resources/yaml/mlp_ADT2GEX.yaml b/src/tasks/predict_modality/methods/simple_mlp/resources/yaml/mlp_ADT2GEX.yaml
new file mode 100644
index 0000000000..13db5b490e
--- /dev/null
+++ b/src/tasks/predict_modality/methods/simple_mlp/resources/yaml/mlp_ADT2GEX.yaml
@@ -0,0 +1,28 @@
+# sample config defaults file
+epochs:
+  desc: Number of epochs to train over
+  value: 10
+batch_size:
+  desc: Size of each mini-batch
+  value: 512
+H1:
+  desc: Number of hidden neurons in 1st layer of MLP
+  value: 256
+H2:
+  desc: Number of hidden neurons in 2nd layer of MLP
+  value: 128
+dropout:
+  desc: probs of zeroing values
+  value: 0
+lr:
+  desc: learning rate
+  value: 0.001
+wd:
+  desc: weight decay
+  value: 1e-5
+threshold:
+  desc: threshold to set values to zero
+  value: 0
+lr_schedule:
+  desc: learning rate scheduler
+  value: adam
\ No newline at end of file
diff --git a/src/tasks/predict_modality/methods/simple_mlp/resources/yaml/mlp_ATAC2GEX.yaml b/src/tasks/predict_modality/methods/simple_mlp/resources/yaml/mlp_ATAC2GEX.yaml
new file mode 100644
index 0000000000..ee714a47ea
--- /dev/null
+++ b/src/tasks/predict_modality/methods/simple_mlp/resources/yaml/mlp_ATAC2GEX.yaml
@@ -0,0 +1,28 @@
+# sample config defaults file
+epochs:
+  desc: Number of epochs to train over
+  value: 10
+batch_size:
+  desc: Size of each mini-batch
+  value: 512
+H1:
+  desc: Number of hidden neurons in 1st layer of MLP
+  value: 256
+H2:
+  desc: Number of hidden neurons in 2nd layer of MLP
+  value: 128
+dropout:
+  desc: probs of zeroing values
+  value: 0.5
+lr:
+  desc: learning rate
+  value: 0.001
+wd:
+  desc: weight decay
+  value: 1e-5
+threshold:
+  desc: threshold to set values to zero
+  value: 0
+lr_schedule:
+  desc: learning rate scheduler
+  value: adam
\ No newline at end of file
diff --git a/src/tasks/predict_modality/methods/simple_mlp/resources/yaml/mlp_GEX2ADT.yaml b/src/tasks/predict_modality/methods/simple_mlp/resources/yaml/mlp_GEX2ADT.yaml
new file mode 100644
index 0000000000..80dfededd9
--- /dev/null
+++ b/src/tasks/predict_modality/methods/simple_mlp/resources/yaml/mlp_GEX2ADT.yaml
@@ -0,0 +1,28 @@
+# sample config defaults file
+epochs:
+  desc: Number of epochs to train over
+  value: 10 
+batch_size:
+  desc: Size of each mini-batch
+  value: 512
+H1:
+  desc: Number of hidden neurons in 1st layer of MLP
+  value: 1024 
+H2:
+  desc: Number of hidden neurons in 2nd layer of MLP
+  value: 512
+dropout:
+  desc: probs of zeroing values
+  value: 0
+lr:
+  desc: learning rate
+  value: 0.001
+wd:
+  desc: weight decay
+  value: 1e-5
+threshold:
+  desc: threshold to set values to zero
+  value: 0.05
+lr_schedule:
+  desc: learning rate scheduler
+  value: adam_cosin
\ No newline at end of file
diff --git a/src/tasks/predict_modality/methods/simple_mlp/run/config.vsh.yaml b/src/tasks/predict_modality/methods/simple_mlp/run/config.vsh.yaml
new file mode 100644
index 0000000000..2633d5d085
--- /dev/null
+++ b/src/tasks/predict_modality/methods/simple_mlp/run/config.vsh.yaml
@@ -0,0 +1,26 @@
+__merge__: ../../../api/comp_method_train.yaml
+functionality:
+  name: simplemlp
+  info:
+    label: Simple MLP
+    summary: Ensemble of MLPs trained on different sites (team AXX)
+    description: |
+      This folder contains the AXX solution to the OpenProblems-NeurIPS2021 Single-Cell Multimodal Data Integration.
+      Team took the 4th place of the modality prediction task in terms of overall ranking of 4 subtasks: namely GEX
+      to ADT, ADT to GEX, GEX to ATAC and ATAC to GEX. Specifically, our methods ranked 3rd in GEX to ATAC and 4th
+      in GEX to ADT. More details about the task can be found in the
+      [competition webpage](https://openproblems.bio/events/2021-09_neurips/documentation/about_tasks/task1_modality_prediction).
+    documentation_url: https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/AXX
+    repository_url: https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/AXX
+    reference: lance2022multimodal
+    preferred_normalization: log_cp10k
+    competition_submission_id: 170812
+  resources:
+    - path: main.nf
+      type: nextflow_script
+      entrypoint: run_wf
+  dependencies:
+    - name: predict_modality/methods/simplemlp_train
+    - name: predict_modality/methods/simplemlp_predict
+platforms:
+  - type: nextflow
\ No newline at end of file
diff --git a/src/tasks/predict_modality/methods/simple_mlp/run/main.nf b/src/tasks/predict_modality/methods/simple_mlp/run/main.nf
new file mode 100644
index 0000000000..274aa2e526
--- /dev/null
+++ b/src/tasks/predict_modality/methods/simple_mlp/run/main.nf
@@ -0,0 +1,21 @@
+workflow run_wf {
+  take: input_ch
+  main:
+  output_ch = input_ch
+
+    | simplemlp_train.run(
+      fromState: ["input_train_mod1", "input_train_mod2"],
+      toState: ["input_model": "output"]
+    )
+
+    | simplemlp_predict.run(
+      fromState: ["input_train_mod2", "input_test_mod1", "input_model", "input_transform"],
+      toState: ["output": "output"]
+    )
+
+    | map { tup ->
+      [tup[0], [output: tup[1].output]]
+    }
+
+  emit: output_ch
+}
\ No newline at end of file
diff --git a/src/tasks/predict_modality/methods/simple_mlp/run/run_test.sh b/src/tasks/predict_modality/methods/simple_mlp/run/run_test.sh
new file mode 100644
index 0000000000..afd9b9545f
--- /dev/null
+++ b/src/tasks/predict_modality/methods/simple_mlp/run/run_test.sh
@@ -0,0 +1,15 @@
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+nextflow run . \
+  -main-script target/nextflow/predict_modality/methods/novel/main.nf \
+  -profile docker \
+  -c src/wf_utils/labels_ci.config \
+  --input_train_mod1 resources/predict_modality/datasets/openproblems_neurips2021/bmmc_cite/normal/log_cp10k/train_mod1.h5ad \
+  --input_train_mod2 resources/predict_modality/datasets/openproblems_neurips2021/bmmc_cite/normal/log_cp10k/train_mod2.h5ad \
+  --input_test_mod1 resources/predict_modality/datasets/openproblems_neurips2021/bmmc_cite/normal/log_cp10k/test_mod1.h5ad \
+  --publish_dir output/novel/nextflow
diff --git a/src/tasks/predict_modality/methods/simple_mlp/test.sh b/src/tasks/predict_modality/methods/simple_mlp/test.sh
new file mode 100755
index 0000000000..85afd66389
--- /dev/null
+++ b/src/tasks/predict_modality/methods/simple_mlp/test.sh
@@ -0,0 +1,14 @@
+#!/bin/bash
+
+DATASET_DIR=resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/swap
+
+viash run src/tasks/predict_modality/methods/simple_mlp/train/config.vsh.yaml -- \
+    --input_train_mod1 $DATASET_DIR/train_mod1.h5ad \
+    --input_train_mod2 $DATASET_DIR/train_mod2.h5ad \
+    --output 'output/model'
+
+viash run src/tasks/predict_modality/methods/simple_mlp/predict/config.vsh.yaml -- \
+    --input_train_mod2 $DATASET_DIR/train_mod2.h5ad \
+    --input_test_mod1 $DATASET_DIR/test_mod1.h5ad \
+    --input_model 'output/model' \
+    --output 'output/predictions'
diff --git a/src/tasks/predict_modality/methods/simple_mlp/train/config.vsh.yaml b/src/tasks/predict_modality/methods/simple_mlp/train/config.vsh.yaml
new file mode 100644
index 0000000000..8bcf75b97d
--- /dev/null
+++ b/src/tasks/predict_modality/methods/simple_mlp/train/config.vsh.yaml
@@ -0,0 +1,21 @@
+__merge__: ../../../api/comp_method_train.yaml
+functionality:
+  name: simplemlp_train
+  resources:
+    - type: python_script
+      path: script.py
+    - path: ../resources/
+platforms:
+  - type: docker
+    # image: pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime
+    image: openproblems/base_pytorch_nvidia:1.0.0
+    # run_args: ["--gpus all --ipc=host"]
+    setup:
+      - type: python
+        pypi:
+          - scikit-learn
+          - scanpy
+          - pytorch-lightning
+  - type: nextflow
+    directives:
+      label: [highmem, hightime, midcpu, gpu, highsharedmem]
\ No newline at end of file
diff --git a/src/tasks/predict_modality/methods/simple_mlp/train/script.py b/src/tasks/predict_modality/methods/simple_mlp/train/script.py
new file mode 100644
index 0000000000..ba512b53f6
--- /dev/null
+++ b/src/tasks/predict_modality/methods/simple_mlp/train/script.py
@@ -0,0 +1,155 @@
+import os
+import math
+import logging
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+
+import torch
+import pytorch_lightning as pl
+from torch.utils.data import TensorDataset, DataLoader
+from pytorch_lightning.callbacks import ModelCheckpoint
+from pytorch_lightning.loggers import TensorBoardLogger,WandbLogger
+
+logging.basicConfig(level=logging.INFO)
+
+## VIASH START
+par = {
+    'input_train_mod1': 'resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/swap/train_mod1.h5ad',
+    'input_train_mod2': 'resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/swap/train_mod2.h5ad',
+    'input_test_mod1': 'resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/swap/test_mod1.h5ad',
+    'output': 'output/model'
+}
+meta = {
+    'resources_dir': 'src/tasks/predict_modality/methods/simple_mlp',
+    'cpus': 10
+}
+## VIASH END
+
+resources_dir = f"{meta['resources_dir']}/resources"
+
+import sys
+sys.path.append(resources_dir)
+from models import MLP
+import utils
+
+def _train(X, y, Xt, yt, logger, config, num_workers):
+    
+    X = torch.from_numpy(X).float()
+    y = torch.from_numpy(y).float()
+    ymean = torch.mean(y, dim=0, keepdim=True)
+    
+    tr_ds = TensorDataset(X,y)
+    tr_loader = DataLoader(
+        tr_ds,
+        batch_size=config.batch_size,
+        num_workers=num_workers,
+        shuffle=True,
+        drop_last=True
+    )
+    
+    Xt = torch.from_numpy(Xt).float()
+    yt = torch.from_numpy(yt).float()
+    te_ds = TensorDataset(Xt,yt)
+    te_loader = DataLoader(
+        te_ds,
+        batch_size=config.batch_size,
+        num_workers=num_workers,
+        shuffle=False,
+        drop_last=False
+    )
+    
+    checkpoint_callback = ModelCheckpoint(
+        monitor='valid_RMSE',
+        dirpath=logger.save_dir,
+        save_top_k=1,
+    )
+    
+    trainer = pl.Trainer(
+        devices="auto",
+        enable_checkpointing=True,
+        logger=logger, 
+        max_epochs=config.epochs, 
+        callbacks=[checkpoint_callback],
+        default_root_dir=logger.save_dir,
+        # progress_bar_refresh_rate=5
+    )
+    
+    net = MLP(X.shape[1], y.shape[1], ymean, config)
+    trainer.fit(net, tr_loader, te_loader)
+    
+    yp = trainer.predict(net, te_loader, ckpt_path='best')
+    yp = torch.cat(yp, dim=0)
+    
+    score = ((yp-yt)**2).mean()**0.5
+    print(f"VALID RMSE {score:.3f}")
+    del trainer
+    return score,yp.detach().numpy()
+
+
+
+input_train_mod1 = ad.read_h5ad(par['input_train_mod1'])
+input_train_mod2 = ad.read_h5ad(par['input_train_mod2'])
+
+mod_1 = input_train_mod1.uns["modality"]
+mod_2 = input_train_mod2.uns["modality"]
+
+task = f'{mod_1}2{mod_2}'
+yaml_path = f'{resources_dir}/yaml/mlp_{task}.yaml'
+
+obs_info = utils.to_site_donor(input_train_mod1)
+# TODO: if we want this method to work for other datasets, resolve dependence on site notation
+sites = obs_info.site.unique()
+
+os.makedirs(par['output'], exist_ok=True)
+
+print('Compute ymean', flush=True)
+ymean = np.asarray(input_train_mod2.layers["normalized"].mean(axis=0))
+path = f"{par['output']}/{task}_ymean.npy"
+np.save(path, ymean)
+
+
+if task == "GEX2ATAC":
+    logging.info(f"No training required for this task ({task}).")
+    sys.exit(0)
+
+if not os.path.exists(yaml_path):
+    logging.error(f"No configuration file found for task '{task}'")
+    sys.exit(1)
+
+yaml_path = f'{resources_dir}/yaml/mlp_{task}.yaml'
+yps = []
+scores = []
+
+msgs = {}
+# TODO: if we want this method to work for other datasets, dont use hardcoded range
+for fold in range(3):
+
+    run_name = f"{task}_fold_{fold}"
+    save_path = f"{par['output']}/{run_name}"
+    num_workers = meta["cpus"] or 0
+
+    Path(save_path).mkdir(parents=True, exist_ok=True)   
+
+    X,y,Xt,yt = utils.split(input_train_mod1, input_train_mod2, fold)
+    
+    logger = TensorBoardLogger(save_path, name='') 
+    
+    config = utils.load_yaml(yaml_path)
+
+    if config.batch_size > X.shape[0]:
+        config = config._replace(batch_size=math.ceil(X.shape[0] / 2))
+
+    score, yp = _train(X, y, Xt, yt, logger, config, num_workers)
+    yps.append(yp)
+    scores.append(score)
+    msg = f"{task} Fold {fold} RMSE {score:.3f}"
+    msgs[f'Fold {fold}'] = f'{score:.3f}'
+    print(msg)
+
+yp = np.concatenate(yps)
+score = np.mean(scores)
+msgs['Overall'] = f'{score:.3f}'
+print('Overall', f'{score:.3f}')
+
diff --git a/src/tasks/predict_modality/metrics/correlation/config.vsh.yaml b/src/tasks/predict_modality/metrics/correlation/config.vsh.yaml
new file mode 100644
index 0000000000..9d297a147d
--- /dev/null
+++ b/src/tasks/predict_modality/metrics/correlation/config.vsh.yaml
@@ -0,0 +1,65 @@
+__merge__: ../../api/comp_metric.yaml
+functionality:
+  name: correlation
+  info:
+    metrics:
+      - name: mean_pearson_per_cell
+        label: Mean pearson per cell
+        summary: The mean of the pearson values of per-cell expression value vectors.
+        description: The mean of the pearson values of per-cell expression value vectors.
+        min: -1
+        max: 1
+        maximize: true
+        reference: pearson1895regression
+      - name: mean_spearman_per_cell
+        label: Mean spearman per cell
+        summary: The mean of the spearman values of per-cell expression value vectors.
+        description: The mean of the spearman values of per-cell expression value vectors.
+        min: -1
+        max: 1
+        maximize: true
+        reference: kendall1938new
+      - name: mean_pearson_per_gene
+        label: Mean pearson per gene
+        summary: The mean of the pearson values of per-gene expression value vectors.
+        description: The mean of the pearson values of per-gene expression value vectors.
+        min: -1
+        max: 1
+        maximize: true
+        reference: pearson1895regression
+      - name: mean_spearman_per_gene
+        label: Mean spearman per gene
+        summary: The mean of the spearman values of per-gene expression value vectors.
+        description: The mean of the spearman values of per-gene expression value vectors.
+        min: -1
+        max: 1
+        maximize: true
+        reference: kendall1938new
+      - name: overall_pearson
+        label: Overall pearson
+        summary: The mean of the pearson values of vectorized expression matrices.
+        description: The mean of the pearson values of vectorized expression matrices.
+        min: -1
+        max: 1
+        maximize: true
+        reference: pearson1895regression
+      - name: overall_spearman
+        label: Overall spearman
+        summary: The mean of the spearman values of vectorized expression matrices.
+        description: The mean of the spearman values of vectorized expression matrices.
+        min: -1
+        max: 1
+        maximize: true
+        reference: kendall1938new
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [ proxyC, testthat, dynutils ]
+  - type: nextflow
+    directives:
+      label: [midtime, lowmem, lowcpu]
diff --git a/src/tasks/predict_modality/metrics/correlation/script.R b/src/tasks/predict_modality/metrics/correlation/script.R
new file mode 100644
index 0000000000..585ec7c2b1
--- /dev/null
+++ b/src/tasks/predict_modality/metrics/correlation/script.R
@@ -0,0 +1,84 @@
+cat("Load dependencies\n")
+library(testthat, quietly = TRUE, warn.conflicts = FALSE)
+library(Matrix, quietly = TRUE, warn.conflicts = FALSE)
+requireNamespace("anndata", quietly = TRUE)
+
+## VIASH START
+par <- list(
+  input_test_mod2 = "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/test_mod2.h5ad",
+  input_prediction = "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/prediction.h5ad",
+  output = "output/scores.h5ad"
+)
+## VIASH END
+
+cat("Reading solution file\n")
+ad_sol <- anndata::read_h5ad(par$input_test_mod2)
+
+cat("Reading prediction file\n")
+ad_pred <- anndata::read_h5ad(par$input_prediction)
+
+cat("Check prediction format\n")
+expect_equal(
+  ad_sol$uns$dataset_id, ad_pred$uns$dataset_id,
+  info = "Prediction and solution have differing dataset_ids"
+)
+
+expect_true(
+  isTRUE(all.equal(dim(ad_sol), dim(ad_pred))),
+  info = "Dataset and prediction anndata objects should have the same shape / dimensions."
+)
+
+cat("Computing correlation metrics\n")
+# Wrangle data
+tv <- ad_sol$layers[["normalized"]]
+pv <- ad_pred$layers[["normalized"]]
+
+# precompute sds
+tv_sd2 <- proxyC::colSds(tv)
+pv_sd2 <- proxyC::colSds(pv)
+tv_sd1 <- proxyC::rowSds(tv)
+pv_sd1 <- proxyC::rowSds(pv)
+
+# Compute metrics
+pearson_vec_1 <- diag(dynutils::calculate_similarity(tv, pv, method = "pearson", margin = 1, diag = TRUE, drop0 = TRUE))
+spearman_vec_1 <- diag(dynutils::calculate_similarity(tv, pv, method = "spearman", margin = 1, diag = TRUE, drop0 = TRUE))
+
+pearson_vec_1[tv_sd1 == 0 | pv_sd1 == 0] <- 0
+spearman_vec_1[tv_sd1 == 0 | pv_sd1 == 0] <- 0
+# pearson_vec_1[!is.finite(pearson_vec_1) | pearson_vec_1 > 10] <- 0
+# spearman_vec_1[!is.finite(spearman_vec_1) | spearman_vec_1 > 10] <- 0
+
+mean_pearson_per_cell <- mean(pearson_vec_1)
+mean_spearman_per_cell <- mean(spearman_vec_1)
+
+pearson_vec_2 <- diag(dynutils::calculate_similarity(tv, pv, method = "pearson", margin = 2, diag = TRUE, drop0 = TRUE))
+spearman_vec_2 <- diag(dynutils::calculate_similarity(tv, pv, method = "spearman", margin = 2, diag = TRUE, drop0 = TRUE))
+
+pearson_vec_2[tv_sd2 == 0 | pv_sd2 == 0] <- 0
+spearman_vec_2[tv_sd2 == 0 | pv_sd2 == 0] <- 0
+# pearson_vec_2[!is.finite(pearson_vec_2) | pearson_vec_2 > 10] <- 0
+# spearman_vec_2[!is.finite(spearman_vec_2) | spearman_vec_2 > 10] <- 0
+
+mean_pearson_per_gene <- mean(pearson_vec_2)
+mean_spearman_per_gene <- mean(spearman_vec_2)
+
+overall_pearson <- cor(as.vector(tv), as.vector(pv), method = "pearson")
+overall_spearman <- cor(as.vector(tv), as.vector(pv), method = "spearman")
+
+metric_ids <- c("mean_pearson_per_cell", "mean_spearman_per_cell", "mean_pearson_per_gene", "mean_spearman_per_gene", "overall_pearson", "overall_spearman")
+metric_values <- c(mean_pearson_per_cell, mean_spearman_per_cell, mean_pearson_per_gene, mean_spearman_per_gene, overall_pearson, overall_spearman)
+
+cat("Create output object\n")
+out <- anndata::AnnData(
+  obs = data.frame(row.names = rownames(ad_sol), pearson = pearson_vec_1, spearman = spearman_vec_1),
+  var = data.frame(row.names = colnames(ad_sol), pearson = pearson_vec_2, spearman = spearman_vec_2),
+  uns = list(
+    dataset_id = ad_pred$uns$dataset_id,
+    method_id = ad_pred$uns$method_id,
+    metric_ids = metric_ids,
+    metric_values = metric_values
+  )
+)
+
+cat("Write output to h5ad file\n")
+zzz <- out$write_h5ad(par$output, compression = "gzip")
diff --git a/src/tasks/predict_modality/metrics/mse/config.vsh.yaml b/src/tasks/predict_modality/metrics/mse/config.vsh.yaml
new file mode 100644
index 0000000000..78fbcc8efc
--- /dev/null
+++ b/src/tasks/predict_modality/metrics/mse/config.vsh.yaml
@@ -0,0 +1,30 @@
+__merge__: ../../api/comp_metric.yaml
+functionality:
+  name: mse
+  info:
+    metrics:
+      - name: rmse
+        label: RMSE
+        summary: The root mean squared error.
+        description: The square root of the mean of the square of all of the error.
+        min: 0
+        max: "+inf"
+        maximize: false
+        reference: chai2014root
+      - name: mae
+        label: MAE
+        summary: The mean absolute error.
+        description: The average difference between the expression values and the predicted expression values.
+        min: 0
+        max: "+inf"
+        maximize: false
+        reference: chai2014root
+  resources:
+    - type: python_script
+      path: script.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives:
+      label: [midtime, lowmem, lowcpu]
diff --git a/src/tasks/predict_modality/metrics/mse/script.py b/src/tasks/predict_modality/metrics/mse/script.py
new file mode 100644
index 0000000000..a5ae6bcc53
--- /dev/null
+++ b/src/tasks/predict_modality/metrics/mse/script.py
@@ -0,0 +1,43 @@
+import anndata as ad
+import logging
+import numpy as np
+
+## VIASH START
+par = {
+  "input_test_mod2" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/test_mod2.h5ad",
+  "input_prediction" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/prediction.h5ad",
+  "output" : "output/scores.h5ad"
+}
+## VIASH END
+
+logging.info("Reading solution file")
+ad_sol = ad.read_h5ad(par["input_test_mod2"])
+
+logging.info("Reading prediction file")
+ad_pred = ad.read_h5ad(par["input_prediction"])
+
+logging.info("Check prediction format")
+if ad_sol.uns["dataset_id"] != ad_pred.uns["dataset_id"]:
+  raise ValueError("Prediction and solution have differing dataset_ids")
+
+if ad_sol.shape != ad_pred.shape:
+  raise ValueError("Dataset and prediction anndata objects should have the same shape / dimensions.")
+
+logging.info("Computing MSE metrics")
+
+tmp = ad_sol.layers["normalized"] - ad_pred.layers["normalized"]
+rmse = np.sqrt(tmp.power(2).mean())
+mae = np.abs(tmp).mean()
+
+logging.info("Create output object")
+out = ad.AnnData(
+  uns = {
+    "dataset_id" : ad_pred.uns["dataset_id"],
+    "method_id" : ad_pred.uns["method_id"],
+    "metric_ids" : ["rmse", "mae"],
+    "metric_values" : [rmse, mae],
+  }
+)
+
+logging.info("Write output to h5ad file")
+out.write_h5ad(par["output"], compression=9)
diff --git a/src/tasks/predict_modality/process_dataset/config.vsh.yaml b/src/tasks/predict_modality/process_dataset/config.vsh.yaml
new file mode 100644
index 0000000000..ec52472896
--- /dev/null
+++ b/src/tasks/predict_modality/process_dataset/config.vsh.yaml
@@ -0,0 +1,21 @@
+__merge__: ../api/comp_process_dataset.yaml
+functionality:
+  name: "process_dataset"
+  arguments:
+    - name: "--dataset_id"
+      type: "string"
+      description: "New dataset ID"
+      required: false
+    - name: "--swap"
+      type: "boolean"
+      description: "Swap mod1 and mod2"
+      default: false
+  resources:
+    - type: r_script
+      path: script.R
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+  - type: nextflow
+    directives:
+      label: [hightime, highmem, highcpu]
diff --git a/src/tasks/predict_modality/process_dataset/script.R b/src/tasks/predict_modality/process_dataset/script.R
new file mode 100644
index 0000000000..f45559ebed
--- /dev/null
+++ b/src/tasks/predict_modality/process_dataset/script.R
@@ -0,0 +1,158 @@
+cat("Loading dependencies\n")
+library(anndata, warn.conflicts = FALSE)
+library(Matrix, warn.conflicts = FALSE)
+
+## VIASH START
+par <- list(
+  input_mod1 = "resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod1.h5ad",
+  input_mod2 = "resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod2.h5ad",
+  output_train_mod1 = "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/train_mod1.h5ad",
+  output_train_mod2 = "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/train_mod2.h5ad",
+  output_test_mod1 = "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/test_mod1.h5ad",
+  output_test_mod2 = "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/test_mod2.h5ad",
+  swap = TRUE,
+  seed = 1L
+)
+# par <- list(
+#   input_mod1 = "resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/output_mod1.h5ad",
+#   input_mod2 = "resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/output_atac.h5ad",
+#   output_train_mod1 = "resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/train_mod1.h5ad",
+#   output_train_mod2 = "resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/train_mod2.h5ad",
+#   output_test_mod1 = "resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/test_mod1.h5ad",
+#   output_test_mod2 = "resources_test/predict_modality/openproblems_neurips2021/bmmc_multiome/test_mod2.h5ad",
+#   swap = TRUE,
+#   seed = 1L
+# )
+## VIASH END
+
+cat("Using seed ", par$seed, "\n", sep = "")
+set.seed(par$seed)
+
+cat("Reading input data\n")
+ad1 <- anndata::read_h5ad(if (!par$swap) par$input_mod1 else par$input_mod2)
+ad2 <- anndata::read_h5ad(if (!par$swap) par$input_mod2 else par$input_mod1)
+
+# use heuristic to determine modality
+# TODO: should be removed once modality is stored in the uns
+determine_modality <- function(ad, mod1 = TRUE) {
+  if ("modality" %in% names(ad$uns)) {
+    ad$uns[["modality"]]
+  } else if ("feature_types" %in% colnames(ad$var)) {
+    unique(ad$var[["feature_types"]])
+  } else if (mod1) {
+    "GEX"
+  } else if (grepl("cite", ad$uns[["dataset_id"]])) {
+    "ADT"
+  } else if (grepl("multiome", ad$uns[["dataset_id"]])) {
+    "ATAC"
+  } else {
+    stop("Could not determine modality")
+  }
+}
+ad1_mod <- determine_modality(ad1, !par$swap)
+ad2_mod <- determine_modality(ad2, par$swap)
+
+# determine new uns
+uns_vars <- c("dataset_id", "dataset_name", "dataset_url", "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism", "normalization_id")
+ad1_uns <- ad1$uns[uns_vars]
+ad2_uns <- ad2$uns[uns_vars]
+ad1_uns$modality <- ad1_mod
+ad2_uns$modality <- ad2_mod
+
+# Create new dataset id and name depending on the modality
+if (!is.null(par$dataset_id)) {
+  ad1_uns[["common_dataset_id"]] <- ad2_uns[["common_dataset_id"]] <- ad1_uns$dataset_id
+  ad1_uns$dataset_id <- ad2_uns$dataset_id <- par$dataset_id
+}
+
+new_dataset_name <- paste0(ad1_uns$dataset_name, " (", ad1_mod, "2", ad2_mod, ")")
+ad1_uns$dataset_name <- ad2_uns$dataset_name <- new_dataset_name
+
+# determine new obsm
+ad1_obsm <- ad2_obsm <- list()
+
+# determine new varm
+ad1_var <- ad1$var[, intersect(colnames(ad1$var), c("gene_ids", "hvg", "hvg_score")), drop = FALSE]
+ad2_var <- ad2$var[, intersect(colnames(ad2$var), c("gene_ids", "hvg", "hvg_score")), drop = FALSE]
+
+if (ad1_mod == "ATAC" && "gene_activity" %in% names(ad1$obsm)) {
+  # copy gene activity in new object
+  ad1_uns$gene_activity_var_names <- ad1$uns$gene_activity_var_names
+  ad1_obsm$gene_activity <- as(ad1$obsm$gene_activity, "CsparseMatrix")
+}
+
+if (ad2_mod == "ATAC") {
+  # subset to make the task computationally feasible
+  if (ncol(ad2) > 10000) {
+    poss_ix <- which(Matrix::colSums(ad2$layers[["normalized"]]) > 0)
+    sel_ix <- sort(sample(poss_ix, 10000))
+    ad2 <- ad2[, sel_ix]$copy()
+    ad2_var <- ad2_var[sel_ix, , drop = FALSE]
+  }
+
+  if ("gene_activity" %in% names(ad2$obsm)) {
+    # copy gene activity in new object
+    ad2_uns$gene_activity_var_names <- ad2$uns$gene_activity_var_names
+    ad2_obsm$gene_activity <- as(ad2$obsm$gene_activity, "CsparseMatrix")
+  }
+}
+
+cat("Creating train/test split\n")
+is_train <- which(ad1$obs[["is_train"]] == "train")
+is_test <- which(!ad1$obs[["is_train"]] == "train")
+
+# sample cells
+if (length(is_test) > 1000) {
+  ct <- as.character(ad1$obs[["cell_type"]][is_test])
+  ct_tab <- table(ct)
+  ct_freq <- setNames(as.vector(ct_tab) / sum(ct_tab), names(ct_tab))
+  is_test <- sample(is_test, 1000, prob = sqrt(1 / ct_freq[ct]))
+}
+
+train_obs <- ad1$obs[is_train, intersect(colnames(ad1$obs), c("batch", "size_factors")), drop = FALSE]
+test_obs <- ad1$obs[is_test, intersect(colnames(ad1$obs), c("batch", "size_factors")), drop = FALSE]
+subset_mats <- function(li, obs_filt) {
+  out <- list()
+  for (n in names(li)) {
+    out[[n]] <- li[[n]][obs_filt, , drop = FALSE]
+  }
+  out
+}
+
+cat("Create train objects\n")
+output_train_mod1 <- anndata::AnnData(
+  layers = subset_mats(list(counts = ad1$layers[["counts"]], normalized = ad1$layers[["normalized"]]), is_train),
+  obsm = subset_mats(ad1_obsm, is_train),
+  obs = train_obs,
+  var = ad1_var,
+  uns = ad1_uns
+)
+output_train_mod2 <- anndata::AnnData(
+  layers = subset_mats(list(counts = ad2$layers[["counts"]], normalized = ad2$layers[["normalized"]]), is_train),
+  obsm = subset_mats(ad2_obsm, is_train),
+  obs = train_obs,
+  var = ad2_var,
+  uns = ad2_uns
+)
+
+cat("Create test objects\n")
+output_test_mod1 <- anndata::AnnData(
+  layers = subset_mats(list(counts = ad1$layers[["counts"]], normalized = ad1$layers[["normalized"]]), is_test),
+  obsm = subset_mats(ad1_obsm, is_test),
+  obs = test_obs,
+  var = ad1_var,
+  uns = ad1_uns
+)
+output_test_mod2 <- anndata::AnnData(
+  layers = subset_mats(list(counts = ad2$layers[["counts"]], normalized = ad2$layers[["normalized"]]), is_test),
+  obsm = subset_mats(ad2_obsm, is_test),
+  obs = test_obs,
+  var = ad2_var,
+  uns = ad2_uns
+)
+
+cat("Saving output files as h5ad\n")
+zzz <- output_train_mod1$write_h5ad(par$output_train_mod1, compression = "gzip")
+zzz <- output_train_mod2$write_h5ad(par$output_train_mod2, compression = "gzip")
+zzz <- output_test_mod1$write_h5ad(par$output_test_mod1, compression = "gzip")
+zzz <- output_test_mod2$write_h5ad(par$output_test_mod2, compression = "gzip")
diff --git a/src/tasks/predict_modality/resources_scripts/process_datasets.sh b/src/tasks/predict_modality/resources_scripts/process_datasets.sh
new file mode 100755
index 0000000000..7be4d548c1
--- /dev/null
+++ b/src/tasks/predict_modality/resources_scripts/process_datasets.sh
@@ -0,0 +1,22 @@
+#!/bin/bash
+
+# only process the 'log_cp10k' datasets
+cat > /tmp/params.yaml << 'HERE'
+id: predict_modality_process_datasets
+input_states: s3://openproblems-data/resources/datasets/**/log_cp10k/state.yaml
+settings: '{"output_train_mod1": "$id/train_mod1.h5ad", "output_train_mod2": "$id/train_mod2.h5ad", "output_test_mod1": "$id/test_mod1.h5ad", "output_test_mod2": "$id/test_mod2.h5ad"}'
+rename_keys: 'input_mod1:output_mod1,input_mod2:output_mod2'
+output_state: "$id/state.yaml"
+publish_dir: s3://openproblems-data/resources/predict_modality/datasets
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/predict_modality/workflows/process_datasets/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --entry-name auto \
+  --config src/wf_utils/labels_tw.config \
+  --labels predict_modality,process_datasets
\ No newline at end of file
diff --git a/src/tasks/predict_modality/resources_scripts/run_benchmark.sh b/src/tasks/predict_modality/resources_scripts/run_benchmark.sh
new file mode 100755
index 0000000000..941776be43
--- /dev/null
+++ b/src/tasks/predict_modality/resources_scripts/run_benchmark.sh
@@ -0,0 +1,24 @@
+#!/bin/bash
+
+RUN_ID="run_$(date +%Y-%m-%d_%H-%M-%S)"
+publish_dir="s3://openproblems-data/resources/predict_modality/results/${RUN_ID}"
+
+# only process the 'log_cp10k' datasets
+cat > /tmp/params.yaml << HERE
+id: predict_modality
+input_states: s3://openproblems-data/resources/predict_modality/datasets/**/log_cp10k/state.yaml
+rename_keys: 'input_train_mod1:output_train_mod1,input_train_mod2:output_train_mod2,input_test_mod1:output_test_mod1,input_test_mod2:output_test_mod2'
+output_state: "state.yaml"
+publish_dir: "$publish_dir"
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/predict_modality/workflows/run_benchmark/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --entry-name auto \
+  --config src/wf_utils/labels_tw.config \
+  --labels predict_modality,full
\ No newline at end of file
diff --git a/src/tasks/predict_modality/resources_test_scripts/neurips2021_bmmc.sh b/src/tasks/predict_modality/resources_test_scripts/neurips2021_bmmc.sh
new file mode 100755
index 0000000000..534a3b5626
--- /dev/null
+++ b/src/tasks/predict_modality/resources_test_scripts/neurips2021_bmmc.sh
@@ -0,0 +1,52 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+DATASETS_DIR="resources_test/common"
+OUTPUT_DIR="resources_test/predict_modality"
+
+export NXF_VER=22.04.5
+
+echo "Preprocess datasets"
+nextflow run . \
+  -main-script target/nextflow/predict_modality/workflows/process_datasets/main.nf \
+  -profile docker \
+  -entry auto \
+  -c src/wf_utils/labels_ci.config \
+  --input_states "resources_test/common/openproblems_neurips2021/**/state.yaml" \
+  --rename_keys 'input_mod1:output_mod1,input_mod2:output_mod2' \
+  --settings '{"output_train_mod1": "$id/train_mod1.h5ad", "output_train_mod2": "$id/train_mod2.h5ad", "output_test_mod1": "$id/test_mod1.h5ad", "output_test_mod2": "$id/test_mod2.h5ad"}' \
+  --publish_dir "$OUTPUT_DIR" \
+  --output_state '$id/state.yaml'
+
+echo "Run one method"
+
+viash run src/tasks/predict_modality/methods/knnr_py/config.vsh.yaml -- \
+  --input_train_mod1 $OUTPUT_DIR/openproblems_neurips2021/bmmc_cite/normal/train_mod1.h5ad \
+  --input_train_mod2 $OUTPUT_DIR/openproblems_neurips2021/bmmc_cite/normal/train_mod2.h5ad \
+  --input_test_mod1 $OUTPUT_DIR/openproblems_neurips2021/bmmc_cite/normal/test_mod1.h5ad \
+  --output $OUTPUT_DIR/openproblems_neurips2021/bmmc_cite/normal/prediction.h5ad
+
+viash run src/tasks/predict_modality/methods/knnr_py/config.vsh.yaml -- \
+  --input_train_mod1 $OUTPUT_DIR//openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad \
+  --input_train_mod2 $OUTPUT_DIR//openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad \
+  --input_test_mod1 $OUTPUT_DIR//openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad \
+  --output $OUTPUT_DIR//openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad
+
+viash run src/tasks/predict_modality/methods/knnr_py/config.vsh.yaml -- \
+  --input_train_mod1 $OUTPUT_DIR/openproblems_neurips2021/bmmc_multiome/normal/train_mod1.h5ad \
+  --input_train_mod2 $OUTPUT_DIR/openproblems_neurips2021/bmmc_multiome/normal/train_mod2.h5ad \
+  --input_test_mod1 $OUTPUT_DIR/openproblems_neurips2021/bmmc_multiome/normal/test_mod1.h5ad \
+  --output $OUTPUT_DIR/openproblems_neurips2021/bmmc_multiome/normal/prediction.h5ad
+
+viash run src/tasks/predict_modality/methods/knnr_py/config.vsh.yaml -- \
+  --input_train_mod1 $OUTPUT_DIR/openproblems_neurips2021/bmmc_multiome/swap/train_mod1.h5ad \
+  --input_train_mod2 $OUTPUT_DIR/openproblems_neurips2021/bmmc_multiome/swap/train_mod2.h5ad \
+  --input_test_mod1 $OUTPUT_DIR/openproblems_neurips2021/bmmc_multiome/swap/test_mod1.h5ad \
+  --output $OUTPUT_DIR/openproblems_neurips2021/bmmc_multiome/swap/prediction.h5ad
\ No newline at end of file
diff --git a/src/tasks/predict_modality/workflows/process_datasets/config.vsh.yaml b/src/tasks/predict_modality/workflows/process_datasets/config.vsh.yaml
new file mode 100644
index 0000000000..66e5c5141b
--- /dev/null
+++ b/src/tasks/predict_modality/workflows/process_datasets/config.vsh.yaml
@@ -0,0 +1,43 @@
+functionality:
+  name: "process_datasets"
+  namespace: "predict_modality/workflows"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input_mod1"
+          __merge__: "/src/tasks/predict_modality/api/file_common_dataset_mod1.yaml"
+          required: true
+          direction: input
+        - name: "--input_mod2"
+          __merge__: "/src/tasks/predict_modality/api/file_common_dataset_mod2.yaml"
+          direction: input
+          required: true
+    - name: Outputs
+      arguments:
+        - name: "--output_train_mod1"
+          __merge__: /src/tasks/predict_modality/api/file_train_mod1.yaml
+          direction: output
+          required: true
+        - name: "--output_train_mod2"
+          __merge__: /src/tasks/predict_modality/api/file_train_mod2.yaml
+          direction: output
+          required: true
+        - name: "--output_test_mod1"
+          __merge__: /src/tasks/predict_modality/api/file_test_mod1.yaml
+          direction: "output"
+          required: true
+        - name: "--output_test_mod2"
+          __merge__: /src/tasks/predict_modality/api/file_test_mod2.yaml
+          direction: output
+          required: true
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - path: /src/wf_utils/helper.nf
+  dependencies:
+    - name: common/check_dataset_schema
+    - name: common/extract_metadata
+    - name: predict_modality/process_dataset
+platforms:
+  - type: nextflow
diff --git a/src/tasks/predict_modality/workflows/process_datasets/main.nf b/src/tasks/predict_modality/workflows/process_datasets/main.nf
new file mode 100644
index 0000000000..69d9949e41
--- /dev/null
+++ b/src/tasks/predict_modality/workflows/process_datasets/main.nf
@@ -0,0 +1,128 @@
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  output_ch = input_ch
+  
+    // Check if the input datasets match the desired format --------------------------------
+    | check_dataset_schema.run(
+      key: "check_dataset_schema_mod1",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input_mod1")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input_mod1,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset_mod1": checks["exit_code"] == 0 ? state.input_mod1 : null,
+        ]
+      }
+    )
+
+    | check_dataset_schema.run(
+      key: "check_dataset_schema_mod2",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input_mod2")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input_mod2,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset_mod2": checks["exit_code"] == 0 ? state.input_mod2 : null,
+        ]
+      }
+    )
+
+    // remove datasets which didn't pass the schema check
+    | filter { id, state ->
+      state.dataset_mod1 != null &&
+      state.dataset_mod2 != null
+    }
+
+    // Use datasets in both directions (mod1 -> mod2 and mod2 -> mod1) ---------------------
+    // extract the dataset metadata
+    | extract_metadata.run(
+      key: "extract_metadata",
+      fromState: [input: "dataset_mod1"],
+      toState: { id, output, state ->
+        def uns = readYaml(output.output).uns
+        state + [
+          "dataset_id": uns.dataset_id,
+          "normalization_id": uns.normalization_id
+        ]
+      }
+    )
+
+    // Add swap direction to the state and set new id
+    | flatMap{id, state -> 
+      ["normal", "swap"].collect { dir ->
+        // Add direction (normal / swap) to id  
+        // Note: this id is added before the normalisation id  
+        // Example old id: dataset_loader/dataset_id/normalization_id  
+        // Example new id: dataset_loader/dataset_id/direction/normalization_id
+        def orig_dataset_id = id.replaceAll("/${state.normalization_id}", "")
+        def normalization_id = id.replaceAll("^${orig_dataset_id}", "")
+        def new_dataset_id = orig_dataset_id + "/" + dir
+        def new_id = new_dataset_id + normalization_id
+
+        [new_id, state + [dataset_id: new_dataset_id, direction: dir, "_meta": [join_id: id]]]
+      }
+    }
+
+    | process_dataset.run(
+      fromState: { id, state ->
+        def swap_state = state.direction == "swap" ? true : false
+        [
+          dataset_id: state.dataset_id,
+          input_mod1: state.dataset_mod1,
+          input_mod2: state.dataset_mod2,
+          output_train_mod1: state.output_train_mod1,
+          output_train_mod2: state.output_train_mod2,
+          output_test_mod1: state.output_test_mod1,
+          output_test_mod2: state.output_test_mod2,
+          swap: swap_state
+        ]
+      },
+      toState: [
+        "output_train_mod1",
+        "output_train_mod2",
+        "output_test_mod1",
+        "output_test_mod2"
+      ]
+    )
+
+    // only output the files for which an output file was specified
+    | setState ([
+      "output_train_mod1",
+      "output_train_mod2",
+      "output_test_mod1",
+      "output_test_mod2",
+      "_meta"
+    ])
+
+  emit:
+  output_ch
+}
diff --git a/src/tasks/predict_modality/workflows/process_datasets/run_test.sh b/src/tasks/predict_modality/workflows/process_datasets/run_test.sh
new file mode 100755
index 0000000000..4f921155e2
--- /dev/null
+++ b/src/tasks/predict_modality/workflows/process_datasets/run_test.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+
+# Run this prior to executing this script:
+# bin/viash_build -q 'batch_integration'
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+DATASETS_DIR="resources_test/common"
+OUTPUT_DIR="output/test"
+
+export NXF_VER=22.04.5
+
+nextflow run . \
+  -main-script target/nextflow/predict_modality/workflows/process_datasets/main.nf \
+  -profile docker \
+  -entry auto \
+  -c src/wf_utils/labels_ci.config \
+  --input_states "$DATASETS_DIR/**/state.yaml" \
+  --rename_keys 'input_mod1:output_mod1,input_mod2:output_mod2' \
+  --settings '{"output_train_mod1": "$id/train_mod1.h5ad", "output_train_mod2": "$id/train_mod2.h5ad", "output_test_mod1": "$id/test_mod1.h5ad", "output_test_mod2": "$id/test_mod2.h5ad"}' \
+  --publish_dir "$OUTPUT_DIR" \
+  --output_state '$id/state.yaml'
\ No newline at end of file
diff --git a/src/tasks/predict_modality/workflows/run_benchmark/config.vsh.yaml b/src/tasks/predict_modality/workflows/run_benchmark/config.vsh.yaml
new file mode 100644
index 0000000000..4be9f19389
--- /dev/null
+++ b/src/tasks/predict_modality/workflows/run_benchmark/config.vsh.yaml
@@ -0,0 +1,82 @@
+functionality:
+  name: "run_benchmark"
+  namespace: "predict_modality/workflows"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input_train_mod1"
+          __merge__: /src/tasks/predict_modality/api/file_train_mod1.yaml
+          required: true
+          direction: input
+        - name: "--input_train_mod2"
+          __merge__: /src/tasks/predict_modality/api/file_train_mod2.yaml
+          required: true
+          direction: input
+        - name: "--input_test_mod1"
+          __merge__: /src/tasks/predict_modality/api/file_test_mod1.yaml
+          required: true
+          direction: input
+        - name: "--input_test_mod2"
+          __merge__: /src/tasks/predict_modality/api/file_test_mod2.yaml
+          required: true
+          direction: input
+    - name: Outputs
+      arguments:
+        - name: "--output_scores"
+          type: file
+          required: true
+          direction: output
+          description: A yaml file containing the scores of each of the methods
+          default: score_uns.yaml
+        - name: "--output_method_configs"
+          type: file
+          required: true
+          direction: output
+          default: method_configs.yaml
+        - name: "--output_metric_configs"
+          type: file
+          required: true
+          direction: output
+          default: metric_configs.yaml
+        - name: "--output_dataset_info"
+          type: file
+          required: true
+          direction: output
+          default: dataset_uns.yaml
+        - name: "--output_task_info"
+          type: file
+          required: true
+          direction: output
+          default: task_info.yaml
+    - name: Methods
+      arguments:
+        - name: "--method_ids"
+          type: string
+          multiple: true
+          description: A list of method ids to run. If not specified, all methods will be run.
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - type: file
+      path: "/src/tasks/predict_modality/api/task_info.yaml"
+  dependencies:
+    - name: common/check_dataset_schema
+    - name: common/extract_metadata
+    - name: predict_modality/control_methods/mean_per_gene
+    - name: predict_modality/control_methods/random_predict
+    - name: predict_modality/control_methods/zeros
+    - name: predict_modality/control_methods/solution
+    - name: predict_modality/methods/knnr_py
+    - name: predict_modality/methods/knnr_r
+    - name: predict_modality/methods/lm
+    - name: predict_modality/methods/lmds_irlba_rf
+    # - name: predict_modality/methods/newwave_knnr
+    # - name: predict_modality/methods/random_forest
+    - name: predict_modality/methods/guanlab_dengkw_pm
+    - name: predict_modality/methods/simplemlp
+    - name: predict_modality/methods/novel
+    - name: predict_modality/metrics/correlation
+    - name: predict_modality/metrics/mse
+platforms:
+  - type: nextflow
\ No newline at end of file
diff --git a/src/tasks/predict_modality/workflows/run_benchmark/main.nf b/src/tasks/predict_modality/workflows/run_benchmark/main.nf
new file mode 100644
index 0000000000..2db5acc756
--- /dev/null
+++ b/src/tasks/predict_modality/workflows/run_benchmark/main.nf
@@ -0,0 +1,220 @@
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // construct list of methods
+  methods = [
+    mean_per_gene,
+    random_predict,
+    zeros,
+    solution,
+    knnr_py,
+    knnr_r,
+    lm,
+    lmds_irlba_rf,
+    // newwave_knnr,
+    // random_forest,
+    guanlab_dengkw_pm,
+    simplemlp,
+    novel
+  ]
+
+  // construct list of metrics
+  metrics = [
+    correlation,
+    mse
+  ]
+
+  /****************************
+   * EXTRACT DATASET METADATA *
+   ****************************/
+  dataset_ch = input_ch
+
+    // store original id for later use
+    | map{ id, state ->
+      [id, state + ["_meta": [join_id: id]]]
+    }
+
+    // extract the dataset metadata
+    | extract_metadata.run(
+      key: "metadata_mod1",
+      fromState: [input: "input_train_mod1"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns_mod1: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | extract_metadata.run(
+      key: "metadata_mod2",
+      fromState: [input: "input_test_mod2"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns_mod2: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | map{ id, state ->
+      def rna_norm = state.dataset_uns_mod1.modality == "GEX" ? state.dataset_uns_mod1.normalization_id : state.dataset_uns_mod2.normalization_id
+      [id, state + [rna_norm: rna_norm]]
+    }
+
+  /***************************
+   * RUN METHODS AND METRICS *
+   ***************************/
+  score_ch = dataset_ch
+
+    // run all methods
+    | runEach(
+      components: methods,
+
+      // use the 'filter' argument to only run a method on the normalisation the component is asking for
+      filter: { id, state, comp ->
+        def norm = state.rna_norm
+        def pref = comp.config.functionality.info.preferred_normalization
+        // if the preferred normalisation is none at all,
+        // we can pass whichever dataset we want
+        def norm_check = (norm == "log_cp10k" && pref == "counts") || norm == pref
+        def method_check = !state.method_ids || state.method_ids.contains(comp.config.functionality.name)
+
+        method_check && norm_check
+      },
+
+      // define a new 'id' by appending the method name to the dataset id
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: { id, state, comp ->
+        def new_args = [
+          input_train_mod1: state.input_train_mod1,
+          input_train_mod2: state.input_train_mod2,
+          input_test_mod1: state.input_test_mod1
+        ]
+        if (comp.config.functionality.info.type == "control_method") {
+          new_args.input_test_mod2 = state.input_test_mod2
+        }
+        new_args
+      },
+
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          method_id: comp.config.functionality.name,
+          method_output: output.output
+        ]
+      }
+    )
+
+    // run all metrics
+    | runEach(
+      components: metrics,
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: [
+        input_test_mod2: "input_test_mod2", 
+        input_prediction: "method_output"
+      ],
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          metric_id: comp.config.functionality.name,
+          metric_output: output.output
+        ]
+      }
+    )
+
+  /******************************
+   * GENERATE OUTPUT YAML FILES *
+   ******************************/
+  // TODO: can we store everything below in a separate helper function?
+
+  // extract the dataset metadata
+  dataset_meta_ch = dataset_ch
+    // only keep one of the normalization methods
+    | filter{ id, state ->
+      state.rna_norm == "log_cp10k"
+    }
+    | joinStates { ids, states ->
+      // store the dataset metadata in a file
+      def dataset_uns = states.collect{state ->
+        def uns = state.dataset_uns_mod2.clone()
+        uns.remove("normalization_id")
+        uns
+      }
+      def dataset_uns_yaml_blob = toYamlBlob(dataset_uns)
+      def dataset_uns_file = tempFile("dataset_uns.yaml")
+      dataset_uns_file.write(dataset_uns_yaml_blob)
+
+      ["output", [output_dataset_info: dataset_uns_file]]
+    }
+
+  output_ch = score_ch
+
+    // extract the scores
+    | extract_metadata.run(
+      key: "extract_scores",
+      fromState: [input: "metric_output"],
+      toState: { id, output, state ->
+        state + [
+          score_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | joinStates { ids, states ->
+      // store the method configs in a file
+      def method_configs = methods.collect{it.config}
+      def method_configs_yaml_blob = toYamlBlob(method_configs)
+      def method_configs_file = tempFile("method_configs.yaml")
+      method_configs_file.write(method_configs_yaml_blob)
+
+      // store the metric configs in a file
+      def metric_configs = metrics.collect{it.config}
+      def metric_configs_yaml_blob = toYamlBlob(metric_configs)
+      def metric_configs_file = tempFile("metric_configs.yaml")
+      metric_configs_file.write(metric_configs_yaml_blob)
+
+      def task_info_file = meta.resources_dir.resolve("task_info.yaml")
+
+      // store the scores in a file
+      def score_uns = states.collect{it.score_uns}
+      def score_uns_yaml_blob = toYamlBlob(score_uns)
+      def score_uns_file = tempFile("score_uns.yaml")
+      score_uns_file.write(score_uns_yaml_blob)
+
+      def new_state = [
+        output_method_configs: method_configs_file,
+        output_metric_configs: metric_configs_file,
+        output_task_info: task_info_file,
+        output_scores: score_uns_file,
+        _meta: states[0]._meta
+      ]
+      
+      ["output", new_state]
+    }
+
+    // merge all of the output data 
+    | mix(dataset_meta_ch)
+    | joinStates{ ids, states ->
+      def mergedStates = states.inject([:]) { acc, m -> acc + m }
+      [ids[0], mergedStates]
+    }
+
+  emit:
+  output_ch
+}
\ No newline at end of file
diff --git a/src/tasks/predict_modality/workflows/run_benchmark/run_test.sh b/src/tasks/predict_modality/workflows/run_benchmark/run_test.sh
new file mode 100755
index 0000000000..93212af821
--- /dev/null
+++ b/src/tasks/predict_modality/workflows/run_benchmark/run_test.sh
@@ -0,0 +1,31 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+DATASETS_DIR="resources_test/predict_modality/openproblems_neurips2021"
+OUTPUT_DIR="output/predict_modality"
+
+if [ ! -d "$OUTPUT_DIR" ]; then
+  mkdir -p "$OUTPUT_DIR"
+fi
+# run benchmark
+export NXF_VER=23.04.2
+
+nextflow run . \
+  -main-script target/nextflow/predict_modality/workflows/run_benchmark/main.nf \
+  -profile docker \
+  -resume \
+  -entry auto \
+  -with-trace \
+  -c src/wf_utils/labels_ci.config \
+  --input_states "$DATASETS_DIR/**/state.yaml" \
+  --rename_keys 'input_train_mod1:output_train_mod1,input_train_mod2:output_train_mod2,input_test_mod1:output_test_mod1,input_test_mod2:output_test_mod2' \
+  --settings '{"output_scores": "scores.yaml", "output_dataset_info": "dataset_info.yaml", "output_method_configs": "method_configs.yaml", "output_metric_configs": "metric_configs.yaml", "output_task_info": "task_info.yaml"}' \
+  --publish_dir "$OUTPUT_DIR" \
+  --output_state 'state.yaml'
\ No newline at end of file
diff --git a/src/tasks/spatial_decomposition/README.md b/src/tasks/spatial_decomposition/README.md
new file mode 100644
index 0000000000..d5a8b58751
--- /dev/null
+++ b/src/tasks/spatial_decomposition/README.md
@@ -0,0 +1,362 @@
+# Spatial decomposition
+
+
+Estimation of cell type proportions per spot in 2D space from spatial
+transcriptomic data coupled with corresponding single-cell data
+
+Path:
+[`src/tasks/spatial_decomposition`](https://github.com/openproblems-bio/openproblems/tree/main/src/tasks/spatial_decomposition)
+
+## Motivation
+
+Spatial decomposition (also often referred to as Spatial deconvolution)
+is applicable to spatial transcriptomics data where the transcription
+profile of each capture location (spot, voxel, bead, etc.) do not share
+a bijective relationship with the cells in the tissue, i.e., multiple
+cells may contribute to the same capture location. The task of spatial
+decomposition then refers to estimating the composition of cell
+types/states that are present at each capture location. The cell
+type/states estimates are presented as proportion values, representing
+the proportion of the cells at each capture location that belong to a
+given cell type.
+
+## Description
+
+We distinguish between *reference-based* decomposition and *de novo*
+decomposition, where the former leverage external data (e.g., scRNA-seq
+or scNuc-seq) to guide the inference process, while the latter only work
+with the spatial data. We require that all datasets have an associated
+reference single cell data set, but methods are free to ignore this
+information. Due to the lack of real datasets with the necessary
+ground-truth, this task makes use of a simulated dataset generated by
+creating cell-aggregates by sampling from a Dirichlet distribution. The
+ground-truth dataset consists of the spatial expression matrix, XY
+coordinates of the spots, true cell-type proportions for each spot, and
+the reference single-cell data (from which cell aggregated were
+simulated).
+
+## Authors & contributors
+
+| name             | roles              |
+|:-----------------|:-------------------|
+| Giovanni Palla   | author, maintainer |
+| Scott Gigante    | author             |
+| Sai Nirmayi Yasa | author             |
+
+## API
+
+``` mermaid
+flowchart LR
+  file_common_dataset("Common Dataset")
+  comp_process_dataset[/"Data processor"/]
+  file_single_cell("Single cell data")
+  file_spatial_masked("Spatial masked")
+  file_solution("Solution")
+  comp_control_method[/"Control method"/]
+  comp_method[/"Method"/]
+  comp_metric[/"Metric"/]
+  file_output("Output")
+  file_score("Score")
+  file_common_dataset---comp_process_dataset
+  comp_process_dataset-->file_single_cell
+  comp_process_dataset-->file_spatial_masked
+  comp_process_dataset-->file_solution
+  file_single_cell---comp_control_method
+  file_single_cell---comp_method
+  file_spatial_masked---comp_control_method
+  file_spatial_masked---comp_method
+  file_solution---comp_control_method
+  file_solution---comp_metric
+  comp_control_method-->file_output
+  comp_method-->file_output
+  comp_metric-->file_score
+  file_output---comp_metric
+```
+
+## File format: Common Dataset
+
+A subset of the common dataset.
+
+Example file:
+`resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/dataset_simulated.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'cell_type', 'batch'
+     var: 'hvg', 'hvg_score'
+     obsm: 'X_pca', 'coordinates', 'proportions_true'
+     layers: 'counts'
+     uns: 'cell_type_names', 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                         | Type      | Description                                                                                                         |
+|:-----------------------------|:----------|:--------------------------------------------------------------------------------------------------------------------|
+| `obs["cell_type"]`           | `string`  | Cell type label IDs.                                                                                                |
+| `obs["batch"]`               | `string`  | A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. |
+| `var["hvg"]`                 | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’.                                            |
+| `var["hvg_score"]`           | `double`  | A ranking of the features by hvg.                                                                                   |
+| `obsm["X_pca"]`              | `double`  | The resulting PCA embedding.                                                                                        |
+| `obsm["coordinates"]`        | `double`  | (*Optional*) XY coordinates for each spot.                                                                          |
+| `obsm["proportions_true"]`   | `double`  | (*Optional*) True cell type proportions for each spot.                                                              |
+| `layers["counts"]`           | `integer` | Raw counts.                                                                                                         |
+| `uns["cell_type_names"]`     | `string`  | (*Optional*) Cell type names corresponding to values in `cell_type`.                                                |
+| `uns["dataset_id"]`          | `string`  | A unique identifier for the dataset.                                                                                |
+| `uns["dataset_name"]`        | `string`  | Nicely formatted name.                                                                                              |
+| `uns["dataset_url"]`         | `string`  | (*Optional*) Link to the original source of the dataset.                                                            |
+| `uns["dataset_reference"]`   | `string`  | (*Optional*) Bibtex reference of the paper in which the dataset was published.                                      |
+| `uns["dataset_summary"]`     | `string`  | Short description of the dataset.                                                                                   |
+| `uns["dataset_description"]` | `string`  | Long description of the dataset.                                                                                    |
+| `uns["dataset_organism"]`    | `string`  | (*Optional*) The organism of the sample in the dataset.                                                             |
+
+</div>
+
+## Component type: Data processor
+
+Path:
+[`src/spatial_decomposition`](https://github.com/openproblems-bio/openproblems/tree/main/src/spatial_decomposition)
+
+A spatial decomposition dataset processor.
+
+Arguments:
+
+<div class="small">
+
+| Name                      | Type   | Description                                                                                                                                                     |
+|:--------------------------|:-------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `--input`                 | `file` | A subset of the common dataset.                                                                                                                                 |
+| `--output_single_cell`    | `file` | (*Output*) The single-cell data file used as reference for the spatial data.                                                                                    |
+| `--output_spatial_masked` | `file` | (*Output*) The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.                      |
+| `--output_solution`       | `file` | (*Output*) The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location. |
+
+</div>
+
+## File format: Single cell data
+
+The single-cell data file used as reference for the spatial data
+
+Example file:
+`resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'cell_type', 'batch'
+     layers: 'counts'
+     uns: 'cell_type_names', 'dataset_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                     | Type      | Description                                                                                                                      |
+|:-------------------------|:----------|:---------------------------------------------------------------------------------------------------------------------------------|
+| `obs["cell_type"]`       | `string`  | Cell type label IDs.                                                                                                             |
+| `obs["batch"]`           | `string`  | (*Optional*) A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. |
+| `layers["counts"]`       | `integer` | Raw counts.                                                                                                                      |
+| `uns["cell_type_names"]` | `string`  | Cell type names corresponding to values in `cell_type`.                                                                          |
+| `uns["dataset_id"]`      | `string`  | A unique identifier for the dataset.                                                                                             |
+
+</div>
+
+## File format: Spatial masked
+
+The spatial data file containing transcription profiles for each capture
+location, without cell-type proportions for each spot.
+
+Example file:
+`resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obsm: 'coordinates'
+     layers: 'counts'
+     uns: 'cell_type_names', 'dataset_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                     | Type      | Description                                                               |
+|:-------------------------|:----------|:--------------------------------------------------------------------------|
+| `obsm["coordinates"]`    | `double`  | XY coordinates for each spot.                                             |
+| `layers["counts"]`       | `integer` | Raw counts.                                                               |
+| `uns["cell_type_names"]` | `string`  | Cell type names corresponding to columns of `proportions_pred` in output. |
+| `uns["dataset_id"]`      | `string`  | A unique identifier for the dataset.                                      |
+
+</div>
+
+## File format: Solution
+
+The spatial data file containing transcription profiles for each capture
+location, with true cell-type proportions for each spot / capture
+location.
+
+Example file:
+`resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obsm: 'coordinates', 'proportions_true'
+     layers: 'counts'
+     uns: 'cell_type_names', 'dataset_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                       | Type      | Description                                                |
+|:---------------------------|:----------|:-----------------------------------------------------------|
+| `obsm["coordinates"]`      | `double`  | XY coordinates for each spot.                              |
+| `obsm["proportions_true"]` | `double`  | True cell type proportions for each spot.                  |
+| `layers["counts"]`         | `integer` | Raw counts.                                                |
+| `uns["cell_type_names"]`   | `string`  | Cell type names corresponding to columns of `proportions`. |
+| `uns["dataset_id"]`        | `string`  | A unique identifier for the dataset.                       |
+
+</div>
+
+## Component type: Control method
+
+Path:
+[`src/spatial_decomposition/control_methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/spatial_decomposition/control_methods)
+
+Quality control methods for verifying the pipeline.
+
+Arguments:
+
+<div class="small">
+
+| Name                     | Type   | Description                                                                                                                                          |
+|:-------------------------|:-------|:-----------------------------------------------------------------------------------------------------------------------------------------------------|
+| `--input_single_cell`    | `file` | The single-cell data file used as reference for the spatial data.                                                                                    |
+| `--input_spatial_masked` | `file` | The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.                      |
+| `--input_solution`       | `file` | The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location. |
+| `--output`               | `file` | (*Output*) Spatial data with estimated proportions.                                                                                                  |
+
+</div>
+
+## Component type: Method
+
+Path:
+[`src/spatial_decomposition/methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/spatial_decomposition/methods)
+
+A spatial composition method.
+
+Arguments:
+
+<div class="small">
+
+| Name                     | Type   | Description                                                                                                                     |
+|:-------------------------|:-------|:--------------------------------------------------------------------------------------------------------------------------------|
+| `--input_single_cell`    | `file` | The single-cell data file used as reference for the spatial data.                                                               |
+| `--input_spatial_masked` | `file` | The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot. |
+| `--output`               | `file` | (*Output*) Spatial data with estimated proportions.                                                                             |
+
+</div>
+
+## Component type: Metric
+
+Path:
+[`src/spatial_decomposition/metrics`](https://github.com/openproblems-bio/openproblems/tree/main/src/spatial_decomposition/metrics)
+
+A spatial decomposition metric.
+
+Arguments:
+
+<div class="small">
+
+| Name               | Type   | Description                                                                                                                                          |
+|:-------------------|:-------|:-----------------------------------------------------------------------------------------------------------------------------------------------------|
+| `--input_method`   | `file` | Spatial data with estimated proportions.                                                                                                             |
+| `--input_solution` | `file` | The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location. |
+| `--output`         | `file` | (*Output*) Metric score file.                                                                                                                        |
+
+</div>
+
+## File format: Output
+
+Spatial data with estimated proportions.
+
+Example file:
+`resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad`
+
+Description:
+
+Spatial data file with estimated cell type proportions.
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obsm: 'coordinates', 'proportions_pred'
+     layers: 'counts'
+     uns: 'cell_type_names', 'dataset_id', 'method_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                       | Type      | Description                                                |
+|:---------------------------|:----------|:-----------------------------------------------------------|
+| `obsm["coordinates"]`      | `double`  | XY coordinates for each spot.                              |
+| `obsm["proportions_pred"]` | `double`  | Estimated cell type proportions for each spot.             |
+| `layers["counts"]`         | `integer` | Raw counts.                                                |
+| `uns["cell_type_names"]`   | `string`  | Cell type names corresponding to columns of `proportions`. |
+| `uns["dataset_id"]`        | `string`  | A unique identifier for the dataset.                       |
+| `uns["method_id"]`         | `string`  | A unique identifier for the method.                        |
+
+</div>
+
+## File format: Score
+
+Metric score file.
+
+Example file:
+`resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/score.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                   | Type     | Description                                                                                  |
+|:-----------------------|:---------|:---------------------------------------------------------------------------------------------|
+| `uns["dataset_id"]`    | `string` | A unique identifier for the dataset.                                                         |
+| `uns["method_id"]`     | `string` | A unique identifier for the method.                                                          |
+| `uns["metric_ids"]`    | `string` | One or more unique metric identifiers.                                                       |
+| `uns["metric_values"]` | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |
+
+</div>
+
diff --git a/src/tasks/spatial_decomposition/api/comp_control_method.yaml b/src/tasks/spatial_decomposition/api/comp_control_method.yaml
new file mode 100644
index 0000000000..b92cd53051
--- /dev/null
+++ b/src/tasks/spatial_decomposition/api/comp_control_method.yaml
@@ -0,0 +1,38 @@
+functionality:
+  namespace: "spatial_decomposition/control_methods"
+  info:
+    type: control_method
+    type_info:
+      label: Control method
+      summary: Quality control methods for verifying the pipeline.
+      description: |
+        Control methods have the same interface as the regular methods
+        but also receive the solution object as input. It serves as a
+        starting point to test the relative accuracy of new methods in
+        the task, and also as a quality control for the metrics defined
+        in the task.
+  arguments:
+    - name: "--input_single_cell"
+      __merge__: file_single_cell.yaml
+      direction: input
+      required: true
+    - name: "--input_spatial_masked"
+      __merge__: file_spatial_masked.yaml
+      direction: input
+      required: true
+    - name: "--input_solution"
+      __merge__: file_solution.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_output.yaml
+      direction: output
+      required: true
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas
+      dest: resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas
+    - path: /src/common/library.bib
diff --git a/src/tasks/spatial_decomposition/api/comp_method.yaml b/src/tasks/spatial_decomposition/api/comp_method.yaml
new file mode 100644
index 0000000000..def4edae91
--- /dev/null
+++ b/src/tasks/spatial_decomposition/api/comp_method.yaml
@@ -0,0 +1,29 @@
+functionality:
+  namespace: "spatial_decomposition/methods"
+  info:
+    type: method
+    type_info:
+      label: Method
+      summary: A spatial composition method.
+      description: "Method to estimate cell type proportions from spatial and single cell data"
+  arguments:
+    - name: "--input_single_cell"
+      __merge__: file_single_cell.yaml
+      direction: input
+      required: true
+    - name: "--input_spatial_masked"
+      __merge__: file_spatial_masked.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_output.yaml
+      direction: output
+      required: true
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas
+      dest: resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas
+    - path: /src/common/library.bib
\ No newline at end of file
diff --git a/src/tasks/spatial_decomposition/api/comp_metric.yaml b/src/tasks/spatial_decomposition/api/comp_metric.yaml
new file mode 100644
index 0000000000..2cd5efd33a
--- /dev/null
+++ b/src/tasks/spatial_decomposition/api/comp_metric.yaml
@@ -0,0 +1,31 @@
+functionality:
+  namespace: "spatial_decomposition/metrics"
+  info:
+    type: metric
+    type_info:
+      label: Metric
+      summary: A spatial decomposition metric.  
+      description: |
+        A metric for evaluating accuracy of cell type proportion estimate
+  arguments:
+    - name: "--input_method"
+      __merge__: file_output.yaml
+      direction: input
+      required: true
+    - name: "--input_solution"
+      __merge__: file_solution.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_score.yaml
+      direction: output
+      required: true
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/check_metric_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas
+      dest: resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas
+    - path: /src/common/library.bib
+        
\ No newline at end of file
diff --git a/src/tasks/spatial_decomposition/api/comp_process_dataset.yaml b/src/tasks/spatial_decomposition/api/comp_process_dataset.yaml
new file mode 100644
index 0000000000..336aa9866e
--- /dev/null
+++ b/src/tasks/spatial_decomposition/api/comp_process_dataset.yaml
@@ -0,0 +1,33 @@
+functionality:
+  namespace: "spatial_decomposition"
+  info:
+    type: process_dataset
+    type_info:
+      label: Data processor
+      summary: A spatial decomposition dataset processor.
+      description: |
+        Prepare a common dataset for the spatial_decomposition task.
+  arguments:
+    - name: "--input"
+      __merge__: file_common_dataset.yaml
+      direction: input
+      required: true
+    - name: "--output_single_cell"
+      __merge__: file_single_cell.yaml
+      direction: output
+      required: true
+    - name: "--output_spatial_masked"
+      __merge__: file_spatial_masked.yaml
+      direction: output
+      required: true
+    - name: "--output_solution"
+      __merge__: file_solution.yaml
+      direction: output
+      required: true
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/common/cxg_mouse_pancreas_atlas
+      dest: resources_test/common/cxg_mouse_pancreas_atlas
+    - path: /resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas
+      dest: resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas
\ No newline at end of file
diff --git a/src/tasks/spatial_decomposition/api/file_common_dataset.yaml b/src/tasks/spatial_decomposition/api/file_common_dataset.yaml
new file mode 100644
index 0000000000..b5399a37c2
--- /dev/null
+++ b/src/tasks/spatial_decomposition/api/file_common_dataset.yaml
@@ -0,0 +1,75 @@
+type: file
+example: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/dataset_simulated.h5ad"
+info:
+  label: "Common Dataset"
+  summary: A subset of the common dataset.
+  slots:
+    layers:
+      - type: integer
+        name: counts
+        description: Raw counts.
+        required: true
+    obs:
+      - type: string
+        name: cell_type
+        description: Cell type label IDs.
+        required: true
+      - type: string
+        name: batch
+        description: A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.
+        required: true
+    var:
+      - type: boolean
+        name: hvg
+        description: Whether or not the feature is considered to be a 'highly variable gene'
+        required: true
+      - type: double
+        name: hvg_score
+        description: A ranking of the features by hvg.
+        required: true
+    obsm:
+      - type: double
+        name: X_pca
+        description: The resulting PCA embedding.
+        required: true
+      - type: double
+        name: coordinates
+        description: XY coordinates for each spot.
+        required: false
+      - type: double
+        name: proportions_true
+        description: True cell type proportions for each spot
+        required: false
+    uns:
+      - type: string
+        name: cell_type_names
+        description: Cell type names corresponding to values in `cell_type`
+        required: false
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
\ No newline at end of file
diff --git a/src/tasks/spatial_decomposition/api/file_output.yaml b/src/tasks/spatial_decomposition/api/file_output.yaml
new file mode 100644
index 0000000000..4328ea5e80
--- /dev/null
+++ b/src/tasks/spatial_decomposition/api/file_output.yaml
@@ -0,0 +1,35 @@
+type: file
+example: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+info:
+  label: Output
+  summary: "Spatial data with estimated proportions."
+  description: "Spatial data file with estimated cell type proportions."
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+    obsm:
+      - type: double
+        name: coordinates
+        description: XY coordinates for each spot
+        required: true
+      - type: double
+        name: proportions_pred
+        description: Estimated cell type proportions for each spot
+        required: true
+    uns:
+      - type: string
+        name: cell_type_names
+        description: Cell type names corresponding to columns of `proportions`
+        required: true
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: method_id
+        description: "A unique identifier for the method"
+        required: true
+      
\ No newline at end of file
diff --git a/src/tasks/spatial_decomposition/api/file_score.yaml b/src/tasks/spatial_decomposition/api/file_score.yaml
new file mode 100644
index 0000000000..fea2d39e02
--- /dev/null
+++ b/src/tasks/spatial_decomposition/api/file_score.yaml
@@ -0,0 +1,25 @@
+type: file
+example: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/score.h5ad"
+info:
+  label: "Score"
+  summary: Metric score file.
+  slots:
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: method_id
+        description: "A unique identifier for the method"
+        required: true
+      - type: string
+        name: metric_ids
+        description: "One or more unique metric identifiers"
+        multiple: true
+        required: true
+      - type: double
+        name: metric_values
+        description: "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'."
+        multiple: true
+        required: true
diff --git a/src/tasks/spatial_decomposition/api/file_single_cell.yaml b/src/tasks/spatial_decomposition/api/file_single_cell.yaml
new file mode 100644
index 0000000000..0ec0a94156
--- /dev/null
+++ b/src/tasks/spatial_decomposition/api/file_single_cell.yaml
@@ -0,0 +1,29 @@
+type: file
+example: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+info:
+  label: "Single cell data"
+  summary: "The single-cell data file used as reference for the spatial data"
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+    obs:
+      - type: string
+        name: cell_type
+        description: Cell type label IDs
+        required: true
+      - type: string
+        name: batch
+        description: A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.
+        required: false
+    uns:
+      - type: string
+        name: cell_type_names
+        description: Cell type names corresponding to values in `cell_type`
+        required: true
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
\ No newline at end of file
diff --git a/src/tasks/spatial_decomposition/api/file_solution.yaml b/src/tasks/spatial_decomposition/api/file_solution.yaml
new file mode 100644
index 0000000000..ecd447e061
--- /dev/null
+++ b/src/tasks/spatial_decomposition/api/file_solution.yaml
@@ -0,0 +1,57 @@
+type: file
+example: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+info:
+  label: Solution
+  summary: "The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location."
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+    obsm:
+      - type: double
+        name: coordinates
+        description: XY coordinates for each spot
+        required: true
+      - type: double
+        name: proportions_true
+        description: True cell type proportions for each spot
+        required: true
+    uns:
+      - name: cell_type_names
+        type: string
+        description: Cell type names corresponding to columns of `proportions`
+        required: true
+      - name: dataset_id
+        type: string
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
+      - type: string
+        name: normalization_id
+        description: "Which normalization was used"
+        required: true
\ No newline at end of file
diff --git a/src/tasks/spatial_decomposition/api/file_spatial_masked.yaml b/src/tasks/spatial_decomposition/api/file_spatial_masked.yaml
new file mode 100644
index 0000000000..77632b59b6
--- /dev/null
+++ b/src/tasks/spatial_decomposition/api/file_spatial_masked.yaml
@@ -0,0 +1,25 @@
+type: file
+example: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+info:
+  label: "Spatial masked"
+  summary: "The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot."
+  slots:
+    layers: 
+      - type: integer
+        name: counts
+        description: Raw counts
+        required: true
+    obsm:
+      - type: double
+        name: coordinates
+        description: XY coordinates for each spot
+        required: true
+    uns:
+      - type: string
+        name: cell_type_names
+        description: Cell type names corresponding to columns of `proportions_pred` in output
+        required: true
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
\ No newline at end of file
diff --git a/src/tasks/spatial_decomposition/api/task_info.yaml b/src/tasks/spatial_decomposition/api/task_info.yaml
new file mode 100644
index 0000000000..0fa3e16723
--- /dev/null
+++ b/src/tasks/spatial_decomposition/api/task_info.yaml
@@ -0,0 +1,23 @@
+name: spatial_decomposition
+label: "Spatial decomposition"
+summary: "Estimation of cell type proportions per spot in 2D space from spatial transcriptomic data coupled with corresponding single-cell data"
+motivation: |
+  Spatial decomposition (also often referred to as Spatial deconvolution) is applicable to spatial transcriptomics data where the transcription profile of each capture location (spot, voxel, bead, etc.) do not share a bijective relationship with the cells in the tissue, i.e., multiple cells may contribute to the same capture location. The task of spatial decomposition then refers to estimating the composition of cell types/states that are present at each capture location. The cell type/states estimates are presented as proportion values, representing the proportion of the cells at each capture location that belong to a given cell type.
+description: |
+  We distinguish between _reference-based_ decomposition and _de novo_ decomposition, where the former leverage external data (e.g., scRNA-seq or scNuc-seq) to guide the inference process, while the latter only work with the spatial data. We require that all datasets have an associated reference single cell data set, but methods are free to ignore this information. 
+  Due to the lack of real datasets with the necessary ground-truth, this task makes use of a simulated dataset generated by creating cell-aggregates by sampling from a Dirichlet distribution. The ground-truth dataset consists of the spatial expression matrix, XY coordinates of the spots, true cell-type proportions for each spot, and the reference single-cell data (from which cell aggregated were simulated).
+authors:
+  - name: "Giovanni Palla"
+    roles: [ author, maintainer ]
+    info: 
+      github: giovp
+  - name: "Scott Gigante"
+    roles: [ author ]
+    info: 
+      github: scottgigante
+      orcid: "0000-0002-4544-2764"
+  - name: "Sai Nirmayi Yasa"
+    roles: [ author ]
+    info: 
+      github: sainirmayi
+      orcid: "0009-0003-6319-9803"
\ No newline at end of file
diff --git a/src/tasks/spatial_decomposition/control_methods/random_proportions/config.vsh.yaml b/src/tasks/spatial_decomposition/control_methods/random_proportions/config.vsh.yaml
new file mode 100644
index 0000000000..4134e7b7d5
--- /dev/null
+++ b/src/tasks/spatial_decomposition/control_methods/random_proportions/config.vsh.yaml
@@ -0,0 +1,25 @@
+__merge__: ../../api/comp_control_method.yaml
+
+functionality:
+  name: random_proportions
+  info:
+    label: Random Proportions
+    summary: "Negative control method that randomly assigns celltype proportions from a Dirichlet distribution."
+    description: |
+      A negative control method with random assignment of predicted celltype proportions from a Dirichlet distribution.
+    preferred_normalization: counts
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: numpy
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/spatial_decomposition/control_methods/random_proportions/script.py b/src/tasks/spatial_decomposition/control_methods/random_proportions/script.py
new file mode 100644
index 0000000000..17af41d752
--- /dev/null
+++ b/src/tasks/spatial_decomposition/control_methods/random_proportions/script.py
@@ -0,0 +1,42 @@
+import anndata as ad
+import numpy as np
+
+## VIASH START
+par = {
+  'input_single_cell': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad',
+  'input_spatial_masked': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad',
+  'input_solution': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad',
+  'output': 'output.h5ad'
+}
+meta = {
+  'functionality_name': 'random_proportions'
+}
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial_masked = ad.read_h5ad(par['input_spatial_masked'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Generate predictions', flush=True)
+label_distribution = input_single_cell.obs["cell_type"].value_counts()
+input_spatial_masked.obsm["proportions_pred"] = np.random.dirichlet(label_distribution, size=input_spatial_masked.shape[0])
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial_masked.obs[[]],
+  var=input_spatial_masked.var[[]],
+  uns={
+    'cell_type_names': input_spatial_masked.uns['cell_type_names'],
+    'dataset_id': input_spatial_masked.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial_masked.obsm['coordinates'],
+    'proportions_pred': input_spatial_masked.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial_masked.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/spatial_decomposition/control_methods/true_proportions/config.vsh.yaml b/src/tasks/spatial_decomposition/control_methods/true_proportions/config.vsh.yaml
new file mode 100644
index 0000000000..0c58473acc
--- /dev/null
+++ b/src/tasks/spatial_decomposition/control_methods/true_proportions/config.vsh.yaml
@@ -0,0 +1,22 @@
+__merge__: ../../api/comp_control_method.yaml
+
+functionality:
+  name: true_proportions
+  info:
+    label: True Proportions
+    summary: "Positive control method that assigns celltype proportions from the ground truth."
+    description: |
+      A positive control method with perfect assignment of predicted celltype proportions from the ground truth.
+    preferred_normalization: counts
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/spatial_decomposition/control_methods/true_proportions/script.py b/src/tasks/spatial_decomposition/control_methods/true_proportions/script.py
new file mode 100644
index 0000000000..e4c47e31da
--- /dev/null
+++ b/src/tasks/spatial_decomposition/control_methods/true_proportions/script.py
@@ -0,0 +1,40 @@
+import anndata as ad
+
+## VIASH START
+par = {
+  'input_single_cell': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell.h5ad',
+  'input_spatial_masked': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad',
+  'input_solution': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad',
+  'output': 'output.h5ad'
+}
+meta = {
+  'functionality_name': 'true_proportions'
+}
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial_masked = ad.read_h5ad(par['input_spatial_masked'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Generate predictions', flush=True)
+input_spatial_masked.obsm["proportions_pred"] = input_solution.obsm["proportions_true"]
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial_masked.obs[[]],
+  var=input_spatial_masked.var[[]],
+  uns={
+    'cell_type_names': input_spatial_masked.uns['cell_type_names'],
+    'dataset_id': input_spatial_masked.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial_masked.obsm['coordinates'],
+    'proportions_pred': input_spatial_masked.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial_masked.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/spatial_decomposition/dataset_simulator/config.vsh.yaml b/src/tasks/spatial_decomposition/dataset_simulator/config.vsh.yaml
new file mode 100644
index 0000000000..5295a9e79d
--- /dev/null
+++ b/src/tasks/spatial_decomposition/dataset_simulator/config.vsh.yaml
@@ -0,0 +1,201 @@
+functionality:
+  name: "dataset_simulator"
+  namespace: "spatial_decomposition"
+  info:
+    type: dataset_simulator
+    type_info:
+      label: Dataset simulator
+      summary: Simulate cell aggregates from single-cell data.
+      description: |
+        The dataset simulator creates cell-aggregates from the single-cell dataset by sampling from a Dirichlet distribution. The simulated data consists of the the spatial expression matrix, the XY coordinates of the spots, the cell-type proportions in each spot, and the reference single-cell data.
+      variants: 
+        alpha_1:
+          alpha: 1
+        alpha_5:
+          alpha: 5
+        alpha_0_5:
+          alpha: 0.5
+  arguments:
+    - name: "--input"
+      type: file
+      description: Single-cell reference dataset
+      direction: input
+      example: "resources_test/common/cxg_mouse_pancreas_atlas/dataset.h5ad"
+      info:
+        slots:
+          layers:
+            - type: integer
+              name: counts
+              description: Raw counts.
+              required: true
+          obs:
+            - type: string
+              name: cell_type
+              description: Cell type label IDs.
+              required: true
+            - type: string
+              name: batch
+              description: A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.
+              required: true
+          var:
+            - type: boolean
+              name: hvg
+              description: Whether or not the feature is considered to be a 'highly variable gene'
+              required: false
+            - type: integer
+              name: hvg_score
+              description: A ranking of the features by hvg.
+              required: false
+          obsm:
+            - type: double
+              name: X_pca
+              description: The resulting PCA embedding.
+              required: false
+          uns:
+            - type: string
+              name: dataset_id
+              description: "A unique identifier for the dataset"
+              required: true
+            - name: dataset_name
+              type: string
+              description: Nicely formatted name.
+              required: true
+            - type: string
+              name: dataset_url
+              description: Link to the original source of the dataset.
+              required: false
+            - name: dataset_reference
+              type: string
+              description: Bibtex reference of the paper in which the dataset was published.
+              required: false
+            - name: dataset_summary
+              type: string
+              description: Short description of the dataset.
+              required: true
+            - name: dataset_description
+              type: string
+              description: Long description of the dataset.
+              required: true
+            - name: dataset_organism
+              type: string
+              description: The organism of the sample in the dataset.
+              required: false
+    - name: "--alpha"
+      type: double
+      description: Alpha value to use for generating synthetic dataset
+      default: 1.0
+    - name: "--n_obs"
+      type: integer
+      description: Number of spatial observations to generate. Default value is 100.
+      default: 100
+    - name: "--cell_lb"
+      type: integer
+      description: Lower bound for number of cells at each spot. Default value is 10.
+      default: 10
+    - name: "--cell_ub"
+      type: integer
+      description: Upper bound for number of cells at each spot. Default value is 30.
+      default: 30
+    - name: "--umi_lb"
+      type: integer
+      description: Lower bound for number of cells at each spot. Default value is 1000.
+      default: 1000
+    - name: "--umi_ub"
+      type: integer
+      description: Upper bound for number of UMIs at each spot. Default value is 5000.
+      default: 5000
+    - name: "--simulated_data"
+      type: file
+      direction: output
+      description: Simulated dataset
+      required: false
+      example: dataset_simulated.h5ad
+      info:
+        slots:
+          layers:
+            - type: integer
+              name: counts
+              description: Raw counts.
+              required: true
+          obs:
+            - type: string
+              name: cell_type
+              description: Cell type label IDs.
+              required: true
+            - type: string
+              name: batch
+              description: A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.
+              required: true
+          var:
+            - type: boolean
+              name: hvg
+              description: Whether or not the feature is considered to be a 'highly variable gene'
+              required: false
+            - type: integer
+              name: hvg_score
+              description: A ranking of the features by hvg.
+              required: false
+          obsm:
+            - type: double
+              name: X_pca
+              description: The resulting PCA embedding.
+              required: false
+            - type: double
+              name: coordinates
+              description: XY coordinates for each spot.
+              required: true
+            - type: double
+              name: proportions_true
+              description: True cell type proportions for each spot.
+              required: true
+          uns:
+            - type: string
+              name: cell_type_names
+              description: Cell type names corresponding to values in `cell_type`.
+              required: true
+            - type: string
+              name: dataset_id
+              description: "A unique identifier for the dataset"
+              required: true
+            - name: dataset_name
+              type: string
+              description: Nicely formatted name.
+              required: true
+            - type: string
+              name: dataset_url
+              description: Link to the original source of the dataset.
+              required: false
+            - name: dataset_reference
+              type: string
+              description: Bibtex reference of the paper in which the dataset was published.
+              required: false
+            - name: dataset_summary
+              type: string
+              description: Short description of the dataset.
+              required: true
+            - name: dataset_description
+              type: string
+              description: Long description of the dataset.
+              required: true
+            - name: dataset_organism
+              type: string
+              description: The organism of the sample in the dataset.
+              required: false
+  resources:
+    - type: python_script
+      path: script.py
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/common/cxg_mouse_pancreas_atlas
+      dest: resources_test/common/cxg_mouse_pancreas_atlas
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: [numpy, scanpy]
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
+  - type: native
diff --git a/src/tasks/spatial_decomposition/dataset_simulator/script.py b/src/tasks/spatial_decomposition/dataset_simulator/script.py
new file mode 100644
index 0000000000..901d7def5f
--- /dev/null
+++ b/src/tasks/spatial_decomposition/dataset_simulator/script.py
@@ -0,0 +1,208 @@
+from typing import Sequence
+from typing import Union
+
+import anndata as ad
+import numpy as np
+import scanpy as sc
+
+## VIASH START
+par = {
+    "input": "resources_test/common/cxg_mouse_pancreas_atlas/dataset.h5ad",
+    "alpha": 1,
+    "n_obs": 100,
+    "cell_lb": 10,
+    "cell_ub": 30,
+    "umi_lb": 1000,
+    "umi_ub": 5000,
+    "simulated_data": "dataset_simulated.h5ad"
+}
+meta = {
+    "functionality_name": "dataset_simulator",
+    "resources_dir": "src/tasks/spatial_decomposition/dataset_simulator",
+}
+## VIASH END
+
+CELLTYPE_MIN_CELLS = 25
+
+# Reading input dataset
+adata = ad.read_h5ad(par['input'])
+
+
+def generate_synthetic_dataset(
+    adata: ad.AnnData,
+    alpha: Union[float, Sequence] = 1.0,
+    n_obs: int = 1000,
+    cell_lb: int = 10,
+    cell_ub: int = 30,
+    umi_lb: int = 1000,
+    umi_ub: int = 5000,
+) -> ad.AnnData:
+    """Create cell-aggregate samples for ground-truth spatial decomposition task.
+
+    Parameters
+    ----------
+    adata: AnnData
+        Anndata object.
+    type_column: str
+        name of column in `adata.obs` where cell type labels are given
+    alpha: Union[float,Sequence]
+        alpha value in dirichlet distribution. If single number then all alpha_i values
+        will be set to this value. Default value is 1.
+    n_obs: int
+        number of spatial observations to generate. Default value is 1000.
+    cell_lb: int
+        lower bound for number of cells at each spot. Default value is 10.
+    cell_ub: int
+        upper bound for number of cells at each spot. Default value is 30.
+    umi_lb: int
+        lower bound for number of UMIs at each spot. Default value is 10.
+    umi_ub: int
+        upper bound for number of UMIs at each spot. Default value is 30.
+
+    Returns
+    -------
+    AnnData with:
+        - `adata_merged.X`: simulated counts (aggregate of sc dataset).
+        - `adata_merged.obsm["proportions_true"]`: true proportion values.
+        - `adata_merged.obsm["coordinates"]`: coordinates of each spot.
+        - `adata_merged.obsm["n_cells"]`: number of cells from each type at every location.
+
+    """
+    
+    # remove rare celltypes
+    adata = filter_celltypes(adata)
+
+    # set random generator seed
+    rng = np.random.default_rng(42)
+
+    # get single cell expression data
+    counts = adata.layers['counts']
+    # get cell annotations/labels
+    labels = adata.obs['cell_type'].values
+    # get unique labels
+    uni_labs = np.unique(labels)
+    # count number of labels
+    n_labs = len(uni_labs)
+    # get number of genes
+    n_genes = adata.shape[1]
+
+    # create dict with indices of each label
+    label_indices = dict()
+    for label in uni_labs:
+        label_indices[label] = np.where(labels == label)[0]
+
+    # adjust alpha to vector if single scalar
+    if not hasattr(alpha, "__len__"):
+        alpha = np.ones(n_labs) * alpha
+    else:
+        assert len(alpha) == n_labs, "alpha must be same size as number of cell types"
+
+    # generate probability of sampling label at each spot
+    sp_props = rng.dirichlet(alpha, size=n_obs)
+    # number of cells present at each spot
+    n_cells = rng.integers(cell_lb, cell_ub, size=n_obs)
+
+    # initialize spatial expression matrix
+    sp_x = np.zeros((n_obs, n_genes))
+    # initialize spatial proportion matrix
+    sp_p = np.zeros((n_obs, n_labs))
+    # initialize spatial cell number matrix
+    sp_c = np.zeros(sp_p.shape)
+
+    # generate expression vector for each spot (s)
+    for s in range(n_obs):
+        # number of cells from each label at s
+        raw_s = rng.multinomial(n_cells[s], pvals=sp_props[s, :])
+        # store number of cells from each type at s
+        sp_c[s, :] = raw_s
+        # compute proportion of each type at s
+        prop_s = raw_s / n_cells[s]
+        # store proportion of each type at s
+        sp_p[s, :] = prop_s
+
+        # initialize transcript pool at s
+        pool_s = np.zeros(n_genes)
+
+        # add molecules to transcript pool
+        for lab, n in enumerate(raw_s):
+            # get indices of cells from which transcripts should be added
+            idx_sl = rng.choice(label_indices[uni_labs[lab]], size=n)
+            # add molecules to pool
+            pool_s += counts[idx_sl, :].sum(axis=0).A.flatten()
+
+        # number of UMIs at spot s
+        n_umis = rng.integers(umi_lb, umi_ub)
+        # compute probability of sampling UMI from gene
+        prob_pool_s = pool_s / pool_s.sum()
+
+        # sample transcripts from pool
+        sp_x[s, :] = np.random.multinomial(n=n_umis, pvals=prob_pool_s)
+
+    obs_names = ["spatial_{}".format(x) for x in range(n_obs)]
+    adata_spatial = ad.AnnData(
+        sp_x,
+        obs=dict(obs_names=obs_names),
+        var=dict(var_names=adata.var_names),
+    )
+
+    # fake coordinates
+    adata_spatial.obsm["coordinates"] = rng.random((adata_spatial.shape[0], 2))
+    adata_spatial.obsm["proportions_true"] = sp_p
+    adata_spatial.obs["n_cells"] = n_cells
+    adata_spatial.obsm["n_cells"] = sp_c
+    
+    adata_merged = ad.concat(
+        {"sc": adata, "sp": adata_spatial}, 
+        label="modality",
+        join="outer", 
+        index_unique=None, 
+        merge="unique", 
+        uns_merge="unique"
+    )
+    adata_merged.X[adata_merged.X == np.inf] = adata_merged.X.max()  # remove inf
+    adata_merged.layers["counts"] = adata_merged.X
+    adata_merged.uns["cell_type_names"] = uni_labs
+    return adata_merged
+
+
+def filter_celltypes(adata, min_cells=CELLTYPE_MIN_CELLS):
+    """Filter rare celltypes from an AnnData"""
+    celltype_counts = adata.obs["cell_type"].value_counts() >= min_cells
+    keep_cells = np.isin(adata.obs["cell_type"], celltype_counts.index[celltype_counts])
+    return adata[adata.obs.index[keep_cells]].copy()
+
+
+def filter_genes_cells(adata):
+    """Remove empty cells and genes."""
+    if "var_names_all" not in adata.uns:
+        # fill in original var names before filtering
+        adata.uns["var_names_all"] = adata.var.index.to_numpy()
+    sc.pp.filter_genes(adata, min_cells=1)
+    sc.pp.filter_cells(adata, min_counts=2)
+
+
+adata.X = adata.layers["counts"]
+sc.pp.filter_genes(adata, min_counts=10)
+adata_merged = generate_synthetic_dataset(adata, 
+    alpha=par['alpha'], 
+    n_obs=par['n_obs'], 
+    cell_lb=par['cell_lb'], 
+    cell_ub=par['cell_ub'], 
+    umi_lb=par['umi_lb'], 
+    umi_ub=par['umi_ub'] 
+)
+adata_merged.uns["spatial_data_summary"] = f"Dirichlet alpha={par['alpha']}"
+filter_genes_cells(adata_merged)
+adata_merged.X = None
+
+# Convert non-string objects to categoricals to avoid
+# TypeError: Can't implicitly convert non-string objects to strings
+# In this case, the error is raised when there are NA values in .obs columns with dtype object (boolean).
+# The resulting anndata object cannot be written to a file.
+# This conversion is handled in later versions of anndata (0.10)
+for col in adata_merged.obs:
+    if adata_merged.obs[col].dtype == 'object':
+        adata_merged.obs[col] = adata_merged.obs[col].astype('category')
+
+print("Writing output to file")
+adata_merged.write_h5ad(par["simulated_data"])
diff --git a/src/tasks/spatial_decomposition/methods/cell2location/config.vsh.yaml b/src/tasks/spatial_decomposition/methods/cell2location/config.vsh.yaml
new file mode 100644
index 0000000000..14ca1fb518
--- /dev/null
+++ b/src/tasks/spatial_decomposition/methods/cell2location/config.vsh.yaml
@@ -0,0 +1,87 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: cell2location
+
+  info:
+    label: Cell2Location
+    summary: "Cell2location uses a Bayesian model to resolve cell types in spatial transcriptomic data and create comprehensive cellular maps of diverse tissues."
+    description: |
+      Cell2location is a decomposition method based on Negative Binomial regression that is able to account for batch effects in estimating the single-cell gene expression signature used for the spatial decomposition step. 
+      Note that when batch information is unavailable for this task, we can use either a hard-coded reference, or a negative-binomial learned reference without batch labels. The parameter alpha refers to the detection efficiency prior.
+    preferred_normalization: counts
+    variants: 
+      cell2location_amortised_detection_alpha_20:
+        detection_alpha: 20
+        amortised: true
+      cell2location_detection_alpha_1:
+        detection_alpha: 1
+      cell2location_detection_alpha_20:
+        detection_alpha: 20
+      cell2location_detection_alpha_20_nb:
+        detection_alpha: 20
+        hard_coded_reference: false
+      cell2location_detection_alpha_200:
+        detection_alpha: 200
+    reference: "kleshchevnikov2022cell2location"
+    documentation_url: https://cell2location.readthedocs.io/en/latest/
+    repository_url: https://github.com/BayraktarLab/cell2location
+    
+  # Component-specific parameters (optional)
+  arguments:
+    - name: "--detection_alpha"
+      type: double
+      default: 20.0
+      description: Hyperparameter controlling normalisation of within-experiment variation in RNA detection.
+    - name: "--n_cells_per_location"
+      type: integer
+      default: 20
+      description: The expected average cell abundance. It is a tissue-dependent hyper-prior which can be estimated from  histology images
+    - name: "--hard_coded_reference"
+      type: boolean
+      default: true
+      description: Whether to use hard-coded reference or negative binomial regression model to account for batch effects. Hard-coded reference used by default.
+    - name: "--amortised"
+      type: boolean
+      default: false
+      description: Whether to use amortised inference.
+    - name: "--num_samples"
+      type: integer
+      default: 1000
+      description: Number of samples to use for summarising posterior distribution.
+    - name: "--sc_batch_size"
+      type: integer
+      default: 2500
+      description: Batch size used to train regression model for estimation of reference single-cell gene expression signature.
+    - name: "--st_batch_size"
+      type: integer
+      description: Batch size used to train cell2location model for spatial mapping.
+    - name: "--max_epochs_sc"
+      type: integer
+      default: 250
+      description: Maximum number of epochs to train regression model for estimation of reference single-cell gene expression signature.
+    - name: "--max_epochs_st"
+      type: integer
+      default: 30000
+      description: Maximum number of epochs to train cell2location model for spatial mapping.
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: 
+          - scvi-tools==1.0.4
+          - cell2location
+          - jax==0.4.23
+          - jaxlib==0.4.23
+          - scipy<1.13 # The scipy.linalg functions tri, triu & tril are deprecated and will be removed in SciPy 1.13.
+
+  - type: native
+  - type: nextflow
+    directives:
+      label: [hightime, midmem, midcpu]
diff --git a/src/tasks/spatial_decomposition/methods/cell2location/script.py b/src/tasks/spatial_decomposition/methods/cell2location/script.py
new file mode 100644
index 0000000000..3d47991691
--- /dev/null
+++ b/src/tasks/spatial_decomposition/methods/cell2location/script.py
@@ -0,0 +1,152 @@
+import anndata as ad
+import numpy as np
+from cell2location.cluster_averages.cluster_averages import compute_cluster_averages
+from cell2location.models import Cell2location
+from cell2location.models import RegressionModel
+from torch.nn import ELU
+
+## VIASH START
+par = {
+  'input_single_cell': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad',
+  'input_spatial_masked': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad',
+  'output': 'output.h5ad',
+  'detection_alpha': 20.0,
+  'n_cells_per_location': 20,
+  'hard_coded_reference': True,
+  'amortised': False,
+  'num_samples': 1000,
+  'sc_batch_size': 2500,
+  'st_batch_size': None,
+  'max_epochs_sc': 250,
+  'max_epochs_st': 5000
+}
+meta = {
+  'functionality_name': 'cell2location'
+}
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+input_single_cell.X = input_single_cell.layers["counts"]
+input_spatial.X = input_spatial.layers["counts"]
+
+if not par["hard_coded_reference"]:
+  if "batch" in input_single_cell.obs.columns:
+      input_single_cell.obs["batch_key"] = input_single_cell.obs["batch"].copy()
+  else:
+    input_single_cell.obs["batch_key"] = "all"
+  # REFERENCE SIGNATURE ESTIMATION FROM scRNA
+  # prepare anndata for the regression model
+  RegressionModel.setup_anndata(
+    adata=input_single_cell,
+    # 10X reaction / sample / batch
+    batch_key="batch_key",
+    # cell type, covariate used for constructing signatures
+    labels_key="cell_type",
+  )
+  sc_model = RegressionModel(input_single_cell)
+  sc_model.train(max_epochs=par["max_epochs_sc"], batch_size=par["sc_batch_size"])
+  # In this section, we export the estimated cell abundance
+  # (summary of the posterior distribution).
+  input_single_cell = sc_model.export_posterior(
+    input_single_cell,
+    sample_kwargs={"num_samples": par["num_samples"], "batch_size": par["sc_batch_size"]},
+  )
+  # export estimated expression in each cluster
+  try:
+    means_per_cluster = input_single_cell.varm["means_per_cluster_mu_fg"]
+  except KeyError:
+    # sometimes varm fails for unknown reason
+    means_per_cluster = input_single_cell.var
+  means_per_cluster = means_per_cluster[
+    [
+      f"means_per_cluster_mu_fg_{i}"
+      for i in input_single_cell.uns["mod"]["factor_names"]
+    ]
+  ].copy()
+  means_per_cluster.columns = input_single_cell.uns["mod"]["factor_names"]
+else:
+  means_per_cluster = compute_cluster_averages(
+    input_single_cell,
+    labels="cell_type",
+    layer=None,
+    use_raw=False,
+  )
+
+# SPATIAL MAPPING
+# find shared genes and subset both anndata and reference signatures
+intersect = np.intersect1d(input_spatial.var_names, means_per_cluster.index)
+input_spatial = input_spatial[:, intersect].copy()
+means_per_cluster = means_per_cluster.loc[intersect, :].copy()
+
+# prepare anndata for cell2location model
+input_spatial.obs["sample"] = "all"
+Cell2location.setup_anndata(adata=input_spatial, batch_key="sample")
+cell2location_kwargs = dict(
+    cell_state_df=means_per_cluster,
+    # the expected average cell abundance: tissue-dependent hyper-prior which can be estimated from paired histology:
+    # here = average in the simulated dataset
+    N_cells_per_location=par["n_cells_per_location"],
+    # hyperparameter controlling normalisation of within-experiment variation in RNA detection:
+    detection_alpha=par["detection_alpha"],
+)
+if par["amortised"]:
+    cell2location_kwargs["amortised"] = True
+    cell2location_kwargs["encoder_mode"] = "multiple"
+    cell2location_kwargs["encoder_kwargs"] = {
+        "dropout_rate": 0.1,
+        "n_hidden": {
+            "single": 256,
+            "n_s_cells_per_location": 10,
+            "b_s_groups_per_location": 10,
+            "z_sr_groups_factors": 64,
+            "w_sf": 256,
+            "detection_y_s": 20,
+        },
+        "use_batch_norm": False,
+        "use_layer_norm": True,
+        "n_layers": 1,
+        "activation_fn": ELU,
+    }
+# create and train the model
+st_model = Cell2location(input_spatial, **cell2location_kwargs)
+st_model.train(
+    max_epochs=par["max_epochs_st"],
+    # train using full data (batch_size=None)
+    batch_size=par["st_batch_size"],
+    # use all data points in training because we need to estimate cell abundance at all locations
+    train_size=1,
+)
+# In this section, we export the estimated cell abundance (summary of the posterior distribution).
+input_spatial = st_model.export_posterior(
+    input_spatial,
+    sample_kwargs={
+        "num_samples": par["num_samples"],
+        "batch_size": par["st_batch_size"],
+    },
+)
+
+input_spatial.obsm["proportions_pred"] = input_spatial.obsm["q05_cell_abundance_w_sf"].values
+input_spatial.obsm["proportions_pred"] /= input_spatial.obsm["proportions_pred"].sum(axis=1)[:, None]
+
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  },
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
\ No newline at end of file
diff --git a/src/tasks/spatial_decomposition/methods/destvi/config.vsh.yaml b/src/tasks/spatial_decomposition/methods/destvi/config.vsh.yaml
new file mode 100644
index 0000000000..3a642e0a23
--- /dev/null
+++ b/src/tasks/spatial_decomposition/methods/destvi/config.vsh.yaml
@@ -0,0 +1,43 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: destvi
+
+  info:
+    label: DestVI
+    summary: "DestVI is a probabilistic method for multi-resolution analysis for spatial transcriptomics that explicitly models continuous variation within cell types"
+    description: |
+      Deconvolution of Spatial Transcriptomics profiles using Variational Inference (DestVI) is a spatial decomposition method that leverages a conditional generative model of spatial transcriptomics down to the sub-cell-type variation level, which is then used to decompose the cell-type proportions determining the spatial organization of a tissue.
+    preferred_normalization: counts
+    reference: "lopez2022destvi"
+    documentation_url: https://docs.scvi-tools.org/en/stable/user_guide/models/destvi.html
+    repository_url: https://github.com/scverse/scvi-tools
+
+  arguments:
+    - name: "--max_epochs_sc"
+      type: integer
+      default: 500
+      description: Number of epochs to train the Conditional version of single-cell Variational Inference (CondSCVI) model using MAP inference.
+    - name: "--max_epochs_sp"
+      type: integer
+      default: 2000
+      description: Number of epochs to train the DestVI model using MAP inference.
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: openproblems/base_pytorch_nvidia:1.0.0
+    setup:
+      - type: python
+        packages: 
+          - scvi-tools>=1.1.0
+      - type: docker
+        run: |
+          pip install -U "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
+  - type: native
+  - type: nextflow
+    directives:
+      label: [hightime, midmem, midcpu, gpu]
diff --git a/src/tasks/spatial_decomposition/methods/destvi/script.py b/src/tasks/spatial_decomposition/methods/destvi/script.py
new file mode 100644
index 0000000000..4682e74f49
--- /dev/null
+++ b/src/tasks/spatial_decomposition/methods/destvi/script.py
@@ -0,0 +1,62 @@
+import anndata as ad
+from scvi.model import CondSCVI
+from scvi.model import DestVI
+
+## VIASH START
+par = {
+  'input_single_cell': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad',
+  'input_spatial_masked': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad',
+  'output': 'output.h5ad', 
+  'max_epochs_sc': 500,
+  'max_epochs_sp': 5000
+}
+meta = {
+  'functionality_name': 'destvi'
+}
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+input_single_cell.X = input_single_cell.layers["counts"]
+input_spatial.X = input_spatial.layers["counts"]
+
+CondSCVI.setup_anndata(input_single_cell, labels_key="cell_type")
+sc_model = CondSCVI(input_single_cell, weight_obs=False)
+sc_model.train(
+  max_epochs=par['max_epochs_sc'],
+  early_stopping=True,
+  train_size=0.9,
+  validation_size=0.1,
+  early_stopping_monitor="elbo_validation",
+)
+
+DestVI.setup_anndata(input_spatial)
+st_model = DestVI.from_rna_model(input_spatial, sc_model)
+st_model.train(
+  max_epochs=par['max_epochs_sp'],
+  batch_size=min(int(input_spatial.n_obs / 20 + 3), 128),
+  plan_kwargs={"min_kl_weight": 3.0, "max_kl_weight": 3},
+)
+input_spatial.obsm["proportions_pred"] = st_model.get_proportions().to_numpy()
+
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
\ No newline at end of file
diff --git a/src/tasks/spatial_decomposition/methods/nmfreg/config.vsh.yaml b/src/tasks/spatial_decomposition/methods/nmfreg/config.vsh.yaml
new file mode 100644
index 0000000000..0d32f2d410
--- /dev/null
+++ b/src/tasks/spatial_decomposition/methods/nmfreg/config.vsh.yaml
@@ -0,0 +1,37 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: nmfreg
+  info:
+    label: NMFreg
+    summary: "NMFreg reconstructs gene expression as a weighted combination of cell type signatures defined by scRNA-seq."
+    description: |
+      Non-Negative Matrix Factorization regression (NMFreg) is a decomposition method that reconstructs expression of each spatial location as a weighted combination of cell-type signatures defined by scRNA-seq. It was originally developed for Slide-seq data. This is a re-implementation from https://github.com/tudaga/NMFreg_tutorial.
+    preferred_normalization: counts
+    reference: "rodriques2019slide"
+    documentation_url: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html
+    repository_url: https://github.com/tudaga/NMFreg_tutorial/tree/master?tab=readme-ov-file
+
+  arguments:
+    - name: "--n_components"
+      type: integer
+      default: 30
+      description: Number of components to use for non-negative matrix factorization.
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: 
+          - numpy
+          - scipy
+          - scikit-learn
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, highmem, midcpu]
diff --git a/src/tasks/spatial_decomposition/methods/nmfreg/script.py b/src/tasks/spatial_decomposition/methods/nmfreg/script.py
new file mode 100644
index 0000000000..1cc0fd165a
--- /dev/null
+++ b/src/tasks/spatial_decomposition/methods/nmfreg/script.py
@@ -0,0 +1,91 @@
+import anndata as ad
+import numpy as np
+from scipy.optimize import nnls
+from scipy.sparse import issparse
+from sklearn.decomposition import NMF
+from sklearn.preprocessing import StandardScaler
+
+## VIASH START
+par = {
+  'input_single_cell': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad',
+  'input_spatial_masked': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad',
+  'output': 'output.h5ad',
+  'n_components': 30
+}
+meta = {
+  'functionality_name': 'nmfreg'
+}
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+n_types = input_single_cell.obs["cell_type"].cat.categories.shape[0]
+
+# Learn from reference
+X = input_single_cell.layers['counts']
+X_norm = X / X.sum(1)
+X_scaled = StandardScaler(with_mean=False).fit_transform(X_norm)
+model = NMF(
+  n_components=par['n_components'],
+  init="random",
+  random_state=42
+)
+Ha = model.fit_transform(X_scaled)
+Wa = model.components_
+
+cluster_df = input_single_cell.obs[["cell_type"]].copy()
+cluster_df.loc[:, "factor"] = np.argmax(Ha, axis=1)
+cluster_df.loc[:, "code"] = cluster_df.cell_type.values.codes
+factor_to_cluster_map = np.array(
+  [
+    np.histogram(
+      cluster_df.loc[cluster_df.factor == k, "code"],
+      bins=n_types,
+      range=(0, n_types),
+    )[0]
+    for k in range(par['n_components'])
+  ]
+).T
+
+factor_to_best_celltype = np.argmax(factor_to_cluster_map, axis=0)
+
+factor_to_best_celltype_matrix = np.zeros((par['n_components'], n_types))
+for i, j in enumerate(factor_to_best_celltype):
+  factor_to_best_celltype_matrix[i, j] = 1
+
+Ha_norm = StandardScaler(with_mean=False).fit_transform(Ha)
+sc_deconv = np.dot(Ha_norm**2, factor_to_best_celltype_matrix)
+sc_deconv = sc_deconv / sc_deconv.sum(1)[:, np.newaxis]
+
+# Start run on actual spatial data
+X_sp = input_spatial.layers['counts']
+X_sp_norm = X_sp / X_sp.sum(1)
+X_sp_scaled = StandardScaler(with_mean=False).fit_transform(X_sp_norm)
+
+bead_prop_soln = np.array([nnls(Wa.T, X_sp_scaled[b, : ].toarray().reshape(-1))[0] for b in range(X_sp_scaled.shape[0])])
+bead_prop_soln = StandardScaler(with_mean=False).fit_transform(bead_prop_soln)
+bead_prop = np.dot(bead_prop_soln, factor_to_best_celltype_matrix)
+
+prop = bead_prop / bead_prop.sum(1)[:, np.newaxis]
+input_spatial.obsm["proportions_pred"] = prop
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
\ No newline at end of file
diff --git a/src/tasks/spatial_decomposition/methods/nnls/config.vsh.yaml b/src/tasks/spatial_decomposition/methods/nnls/config.vsh.yaml
new file mode 100644
index 0000000000..100458b95a
--- /dev/null
+++ b/src/tasks/spatial_decomposition/methods/nnls/config.vsh.yaml
@@ -0,0 +1,30 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: nnls
+  info:
+    label: NNLS
+    summary: "NNLS is a decomposition method based on Non-Negative Least Square Regression."
+    description: |
+      NonNegative Least Squares (NNLS), is a convex optimization problem with convex constraints. It was used by the AutoGeneS method to infer cellular proporrtions by solvong a multi-objective optimization problem.
+    preferred_normalization: counts
+    reference: "aliee2021autogenes"
+    documentation_url: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.nnls.html
+    repository_url: https://github.com/scipy/scipy
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: 
+          - numpy
+          - scipy
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/spatial_decomposition/methods/nnls/script.py b/src/tasks/spatial_decomposition/methods/nnls/script.py
new file mode 100644
index 0000000000..069ba572fa
--- /dev/null
+++ b/src/tasks/spatial_decomposition/methods/nnls/script.py
@@ -0,0 +1,63 @@
+import anndata as ad
+import numpy as np
+from scipy.optimize import nnls
+from scipy.sparse import issparse
+
+## VIASH START
+par = {
+  'input_single_cell': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad',
+  'input_spatial_masked': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad',
+  'output': 'output.h5ad'
+}
+meta = {
+  'functionality_name': 'nnls'
+}
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+# Compute means over each 'cell_type'
+labels = input_single_cell.obs['cell_type'].cat.categories
+n_var = input_single_cell.shape[1]
+means = np.empty((labels.shape[0], n_var))
+for i, lab in enumerate(labels):
+  adata_lab = input_single_cell[input_single_cell.obs['cell_type'] == lab]
+  x_lab = adata_lab.layers['counts']
+  means[i, :] = x_lab.mean(axis=0).flatten()
+adata_means = ad.AnnData(means)
+adata_means.obs_names = labels
+adata_means.var_names = input_single_cell.var_names
+
+X = adata_means.X.T
+y = input_spatial.layers['counts'].T
+res = np.zeros((y.shape[1], X.shape[1]))  # (voxels, cells)
+for i in range(y.shape[1]):
+  x, _ = nnls(X, y[:, i].toarray().reshape(-1))
+  res[i] = x
+
+# Normalize coefficients to sum to 1
+res[res < 0] = 0
+res = res / res.sum(axis=1, keepdims=1)
+
+input_spatial.obsm["proportions_pred"] = res
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/spatial_decomposition/methods/rctd/config.vsh.yaml b/src/tasks/spatial_decomposition/methods/rctd/config.vsh.yaml
new file mode 100644
index 0000000000..57ab923a75
--- /dev/null
+++ b/src/tasks/spatial_decomposition/methods/rctd/config.vsh.yaml
@@ -0,0 +1,39 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: rctd
+  info:
+    label: RCTD
+    summary: "RCTD learns cell type profiles from scRNA-seq to decompose cell type mixtures while correcting for differences across sequencing technologies."
+    description: |
+      RCTD (Robust Cell Type Decomposition) is a decomposition method that uses signatures learnt from single-cell data to decompose spatial expression of tissues. It is able to use a platform effect normalization step, which normalizes the scRNA-seq cell type profiles to match the platform effects of the spatial transcriptomics dataset.
+    preferred_normalization: counts
+    reference: cable2021robust
+    documentation_url: https://raw.githack.com/dmcable/spacexr/master/vignettes/spatial-transcriptomics.html
+    repository_url: https://github.com/dmcable/spacexr
+
+  arguments:
+    - name: "--fc_cutoff"
+      type: double
+      default: 0.5
+      description: Minimum log-fold-change (across cell types) for genes to be included in the platform effect normalization step.
+    - name: "--fc_cutoff_reg"
+      type: double
+      default: 0.75
+      description: Minimum log-fold-change (across cell types) for genes to be included in the RCTD step.
+  resources:
+    - type: r_script
+      path: script.R
+
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [ Matrix, pak ]
+      - type: r
+        script: 'pak::pkg_install("dmcable/spacexr")'
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, highmem, midcpu]
diff --git a/src/tasks/spatial_decomposition/methods/rctd/script.R b/src/tasks/spatial_decomposition/methods/rctd/script.R
new file mode 100644
index 0000000000..f5878ae05e
--- /dev/null
+++ b/src/tasks/spatial_decomposition/methods/rctd/script.R
@@ -0,0 +1,94 @@
+library(anndata)
+library(spacexr)
+library(Matrix)
+
+## VIASH START
+par <- list(
+  input_single_cell = "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad",
+  input_spatial_masked = "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad",
+  output = "output.h5ad", 
+  fc_cutoff = 0.5, 
+  fc_cutoff_reg = 0.75
+)
+meta <- list(
+  functionality_name = "rctd", 
+  cpus = 1
+)
+## VIASH END
+
+cat("Reading input files\n")
+input_single_cell <- anndata::read_h5ad(par$input_single_cell)
+input_spatial <- anndata::read_h5ad(par$input_spatial)
+
+# set spatial coordinates for the single cell data
+coordinates <- matrix(1, dim(input_single_cell)[1], 2)
+rownames(coordinates) <- rownames(input_single_cell)
+input_single_cell$obsm <- list(coordinates = coordinates)
+
+# remove rare cell types to prevent RCTD error
+# celltype_counts <- table(input_single_cell$obs$cell_type)
+# input_single_cell <- input_single_cell[input_single_cell$obs$cell_type %in% names(as.table(celltype_counts[celltype_counts > 25]))]
+
+# get single cell reference counts
+sc_counts <- t(input_single_cell$layers['counts'])
+# get single cell reference labels
+sc_cell_types <- factor(input_single_cell$obs$cell_type)
+names(sc_cell_types) <- rownames(input_single_cell)
+# construct reference object (specific for RCTD)
+reference <- Reference(sc_counts, sc_cell_types)
+
+# get spatial data counts
+sp_counts <- t(input_spatial$layers['counts'])
+# get spatial data coordinates
+sp_coords <- as.data.frame(input_spatial$obsm['coordinates'])
+colnames(sp_coords) <- c("x", "y")
+rownames(sp_coords) <- rownames(input_spatial)
+# create spatial object to use in RCTD
+puck <- SpatialRNA(sp_coords, sp_counts)
+
+# create RCTD object from reference and spatialRNA objects
+if (!is.null(meta$cpus)) {
+max_cores <- meta$cpus
+} else {
+max_cores <- 1
+}
+rctd <- create.RCTD(
+  puck,
+  reference,
+  max_cores = max_cores,
+  fc_cutoff = par$fc_cutoff,
+  fc_cutoff_reg = par$fc_cutoff_reg,
+  test_mode = FALSE,
+  UMI_min_sigma = 100,
+  CELL_MIN_INSTANCE = 1
+)
+
+# run analysis and get results
+rctd <- run.RCTD(rctd)
+results <- rctd@results
+cell_type_names <- rctd@cell_type_info$info[[2]]
+
+# extract proportions and normalize them (to sum to one)
+norm_weights <- sweep(results$weights, 1, rowSums(results$weights), "/")
+norm_weights <- as.matrix(norm_weights)
+coordinates <- as.matrix(sp_coords)
+
+cat("Write output AnnData to file\n")
+output <- anndata::AnnData(
+  shape = input_spatial$shape, 
+  obs = input_spatial$obs,
+  var = input_spatial$var,
+  uns = list(
+    cell_type_names = input_spatial$uns['cell_type_names'],
+    dataset_id = input_spatial$uns[["dataset_id"]],
+    method_id = meta[["functionality_name"]]
+  ),
+  obsm = list(
+    coordinates = coordinates,
+    proportions_pred = norm_weights
+  ),
+  layers = list(
+    counts = input_spatial$layers['counts']
+  )
+)
+output$write_h5ad(par[["output"]], compression = "gzip")
diff --git a/src/tasks/spatial_decomposition/methods/seurat/config.vsh.yaml b/src/tasks/spatial_decomposition/methods/seurat/config.vsh.yaml
new file mode 100644
index 0000000000..86c15cff30
--- /dev/null
+++ b/src/tasks/spatial_decomposition/methods/seurat/config.vsh.yaml
@@ -0,0 +1,39 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: seurat
+  info:
+    label: Seurat
+    summary: "Seurat method that is based on Canonical Correlation Analysis (CCA)."
+    description: |
+      This method applies the 'anchor'-based integration workflow introduced in Seurat v3, that enables the probabilistic transfer of annotations from a reference to a query set. First, mutual nearest neighbors (anchors) are identified from the reference scRNA-seq and query spatial datasets. Then, annotations are transfered from the single cell reference data to the sptial data along with prediction scores for each spot.
+    preferred_normalization: counts
+    reference: stuart2019comprehensive
+    documentation_url: https://satijalab.org/seurat/articles/spatial_vignette
+    repository_url: https://github.com/satijalab/seurat
+
+  arguments:
+    - name: "--n_pcs"
+      type: integer
+      default: 30
+      description: Number of principal components.
+    - name: "--sctransform_n_cells"
+      type: integer
+      default: 5000
+      description: Number of cells sampled to build NB regression.
+
+  resources:
+    - type: r_script
+      path: script.R
+
+platforms:
+  - type: docker
+    image: openproblems/base_r:1.0.0
+    setup:
+      - type: r
+        cran: [Matrix, Seurat]
+
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, highmem, midcpu]
diff --git a/src/tasks/spatial_decomposition/methods/seurat/script.R b/src/tasks/spatial_decomposition/methods/seurat/script.R
new file mode 100644
index 0000000000..77917dd92e
--- /dev/null
+++ b/src/tasks/spatial_decomposition/methods/seurat/script.R
@@ -0,0 +1,99 @@
+library(anndata)
+library(Seurat)
+
+## VIASH START
+par <- list(
+  input_single_cell = "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad",
+  input_spatial_masked = "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad",
+  output = "output.h5ad", 
+  n_pcs = 30,
+  sctransform_n_cells = 500
+)
+meta <- list(
+  functionality_name = "seurat"
+)
+## VIASH END
+
+cat("Reading input files\n")
+input_single_cell <- anndata::read_h5ad(par$input_single_cell)
+input_spatial <- anndata::read_h5ad(par$input_spatial)
+
+cat(">> Converting AnnData to Seurat\n")
+anndataToSeurat <- function(adata, assay) {
+  obj <- SeuratObject::CreateSeuratObject(counts = as(Matrix::t(adata$layers[["counts"]]), "CsparseMatrix"), assay = assay)
+  obj <- SeuratObject::AddMetaData(object = obj, metadata = adata$obs)
+  obj
+}
+
+seurat_sc <- anndataToSeurat(input_single_cell, "RNA")
+seurat_sp <- anndataToSeurat(input_spatial, "spatial")
+
+cat(">> Generate predictions\n")
+
+# Normalize and do dimred for spatial data
+seurat_sp <- SCTransform(
+  seurat_sp,
+  assay = "spatial",
+  ncells = min(par$sctransform_n_cells, nrow(seurat_sp)),
+  verbose = TRUE,
+  conserve.memory = TRUE
+)
+
+seurat_sp <- RunPCA(seurat_sp, assay = "SCT", verbose = FALSE, n_pcs = par$n_pcs)
+
+# Normalize and do dimred for single cell data
+seurat_sc <- SCTransform(
+  seurat_sc,
+  assay = "RNA",
+  ncells = min(par$sctransform_n_cells, nrow(seurat_sc)),
+  verbose = TRUE,
+  conserve.memory = TRUE
+)
+
+seurat_sc <- RunPCA(seurat_sc, verbose = FALSE, n_pcs = par$n_pcs)
+
+# find anchors (MNN's to compute adjustmen vectors)
+anchors <- FindTransferAnchors(
+  reference = seurat_sc,
+  query = seurat_sp,
+  normalization.method = "SCT"
+)
+
+# transfer labels from single cell data to spatial
+predictions_assay <- TransferData(
+  anchorset = anchors,
+  refdata = as.factor(as.character(seurat_sc@meta.data$cell_type)),
+  prediction.assay = TRUE,
+  weight.reduction = seurat_sp[["pca"]],
+  dims = 1:par$n_pcs
+)
+
+# format data and return results
+predictions <- LayerData(predictions_assay, layer = "data")
+predictions <- predictions[!(rownames(predictions) == "max"), ]
+predictions <- t(predictions)
+
+sp_coords <- as.data.frame(input_spatial$obsm['coordinates'])
+colnames(sp_coords) <- c("x", "y")
+rownames(sp_coords) <- rownames(input_spatial)
+sp_coords <- as.matrix(sp_coords)
+
+cat("Write output AnnData to file\n")
+output <- anndata::AnnData(
+  shape = input_spatial$shape, 
+  obs = input_spatial$obs,
+  var = input_spatial$var,
+  uns = list(
+    cell_type_names = input_spatial$uns['cell_type_names'],
+    dataset_id = input_spatial$uns[["dataset_id"]],
+    method_id = meta[["functionality_name"]]
+  ),
+  obsm = list(
+    coordinates = sp_coords,
+    proportions_pred = predictions
+  ),
+  layers = list(
+    counts = input_spatial$layers['counts']
+  )
+)
+output$write_h5ad(par[["output"]], compression = "gzip")
diff --git a/src/tasks/spatial_decomposition/methods/stereoscope/config.vsh.yaml b/src/tasks/spatial_decomposition/methods/stereoscope/config.vsh.yaml
new file mode 100644
index 0000000000..4c8ca98bb1
--- /dev/null
+++ b/src/tasks/spatial_decomposition/methods/stereoscope/config.vsh.yaml
@@ -0,0 +1,43 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: stereoscope
+
+  info:
+    label: Stereoscope
+    summary: "Stereoscope is a decomposition method based on Negative Binomial regression."
+    description: |
+      Stereoscope is a decomposition method based on Negative Binomial regression. It is similar in scope and implementation to cell2location but less flexible to incorporate additional covariates such as batch effects and other type of experimental design annotations.
+    preferred_normalization: counts
+    reference: andersson2020single
+    documentation_url: https://docs.scvi-tools.org/en/stable/user_guide/models/stereoscope.html
+    repository_url: https://github.com/scverse/scvi-tools
+
+  arguments: 
+    - name: "--max_epochs_sc"
+      type: integer
+      default: 100
+      description: Number of of epochs to train RNAStereoscope model.
+    - name: "--max_epochs_sp"
+      type: integer
+      default: 1000
+      description: Number of of epochs to train SpatialStereoscope model.
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: openproblems/base_pytorch_nvidia:1.0.0
+    setup:
+      - type: python
+        packages: 
+          - scvi-tools>=1.1.0
+      - type: docker
+        run: |
+          pip install -U "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
+  - type: native
+  - type: nextflow
+    directives:
+      label: [hightime, midmem, midcpu, gpu]
diff --git a/src/tasks/spatial_decomposition/methods/stereoscope/script.py b/src/tasks/spatial_decomposition/methods/stereoscope/script.py
new file mode 100644
index 0000000000..e69bb5f118
--- /dev/null
+++ b/src/tasks/spatial_decomposition/methods/stereoscope/script.py
@@ -0,0 +1,61 @@
+import anndata as ad
+from scvi.external import RNAStereoscope
+from scvi.external import SpatialStereoscope
+
+## VIASH START
+par = {
+  'input_single_cell': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad',
+  'input_spatial_masked': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad',
+  'output': 'output.h5ad', 
+  'max_epochs_sc': 100,
+  'max_epochs_sp': 1000
+}
+meta = {
+  'functionality_name': 'stereoscope'
+}
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+input_single_cell.X = input_single_cell.layers["counts"]
+input_spatial.X = input_spatial.layers["counts"]
+
+print('Generate predictions', flush=True)
+
+RNAStereoscope.setup_anndata(input_single_cell, labels_key="cell_type")
+sc_model = RNAStereoscope(input_single_cell)
+sc_model.train(
+  max_epochs=par["max_epochs_sc"],
+  # early_stopping=True,
+  # early_stopping_monitor="elbo_validation"
+)
+
+SpatialStereoscope.setup_anndata(input_spatial)
+st_model = SpatialStereoscope.from_rna_model(input_spatial, sc_model)
+st_model.train(
+  max_epochs=par["max_epochs_sp"],
+  # early_stopping=True,
+  # early_stopping_monitor="elbo_validation"
+)
+input_spatial.obsm["proportions_pred"] = st_model.get_proportions().to_numpy()
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  },
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/spatial_decomposition/methods/tangram/config.vsh.yaml b/src/tasks/spatial_decomposition/methods/tangram/config.vsh.yaml
new file mode 100644
index 0000000000..652b276927
--- /dev/null
+++ b/src/tasks/spatial_decomposition/methods/tangram/config.vsh.yaml
@@ -0,0 +1,38 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: tangram
+  info:
+    label: Tangram
+    summary: "Tanagram maps single-cell gene expression data onto spatial gene expression data by fitting gene expression on shared genes"
+    description: |
+      Tangram is a method to map gene expression signatures from scRNA-seq data to spatial data. It performs the cell type mapping by learning a similarity matrix between single-cell and spatial locations based on gene expression profiles.
+    preferred_normalization: counts
+    reference: biancalani2021deep
+    documentation_url: https://tangram-sc.readthedocs.io/en/latest/index.html
+    repository_url: https://github.com/broadinstitute/Tangram
+
+  arguments:
+    - name: "--num_epochs"
+      type: integer
+      default: 1000
+      description: Number of epochs to use while mapping single cells to spatial locations.
+    - name: "--n_markers"
+      type: integer
+      default: 100
+      description: Number of marker genes to use.
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: tangram-sc
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime,midmem, midcpu]
diff --git a/src/tasks/spatial_decomposition/methods/tangram/script.py b/src/tasks/spatial_decomposition/methods/tangram/script.py
new file mode 100644
index 0000000000..544664ff94
--- /dev/null
+++ b/src/tasks/spatial_decomposition/methods/tangram/script.py
@@ -0,0 +1,84 @@
+import anndata as ad
+import pandas as pd
+import scanpy as sc
+import tangram as tg
+import torch
+
+## VIASH START
+par = {
+  'input_single_cell': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad',
+  'input_spatial_masked': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad',
+  'output': 'output.h5ad',
+  'num_epochs': 1000,
+  'n_markers': 100
+}
+meta = {
+  'functionality_name': 'tangram'
+}
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+print('Generate predictions', flush=True)
+# analysis based on github.com/broadinstitute/Tangram/blob/master/tutorial_tangram_with_squidpy.ipynb
+# using tangram from PyPi, not github version
+
+input_single_cell.X = input_single_cell.layers["counts"]
+input_spatial.X = input_spatial.layers["counts"]
+
+# pre-process single cell data
+sc.pp.normalize_total(input_single_cell, 1e4)
+sc.pp.log1p(input_single_cell)
+# identify marker genes
+sc.tl.rank_genes_groups(input_single_cell, groupby="cell_type", use_raw=False)
+
+# extract marker genes to data frame
+markers_df = pd.DataFrame(input_single_cell.uns["rank_genes_groups"]["names"]).iloc[0:par['n_markers'], :]
+
+# get union of all marker genes
+markers = list(set(markers_df.melt().value.values))
+
+# match genes between single cell and spatial data
+tg.pp_adatas(input_single_cell, input_spatial, genes=markers)
+
+# get device
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+
+# map single cells to spatial locations
+ad_map = tg.map_cells_to_space(
+  input_single_cell,
+  input_spatial,
+  device=device,
+  num_epochs=par['num_epochs'],
+)
+
+# transfer labels from mapped cells to spatial location
+tg.project_cell_annotations(adata_map=ad_map, adata_sp=input_spatial, annotation="cell_type")
+
+# normalize scores
+pred_props = input_spatial.obsm["tangram_ct_pred"].to_numpy()
+input_spatial.obsm["proportions_pred"] = pred_props / pred_props.sum(axis=1)[:, None]
+
+# remove un-normalized predictions
+del input_spatial.obsm["tangram_ct_pred"]
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  },
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/spatial_decomposition/methods/vanillanmf/config.vsh.yaml b/src/tasks/spatial_decomposition/methods/vanillanmf/config.vsh.yaml
new file mode 100644
index 0000000000..7b04a40e93
--- /dev/null
+++ b/src/tasks/spatial_decomposition/methods/vanillanmf/config.vsh.yaml
@@ -0,0 +1,37 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: vanillanmf
+  info:
+    label: NMF
+    summary: "NMF reconstructs gene expression as a weighted combination of cell type signatures defined by scRNA-seq."
+    description: |
+      NMF is a decomposition method based on Non-negative Matrix Factorization (NMF) that reconstructs expression of each spatial location as a weighted combination of cell-type signatures defined by scRNA-seq. It is a simpler baseline than NMFreg as it only performs the NMF step based on mean expression signatures of cell types, returning the weights loading of the NMF as (normalized) cell type proportions, without the regression step.
+    preferred_normalization: counts
+    reference: cichocki2009fast
+    documentation_url: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html
+    repository_url: https://github.com/scikit-learn/scikit-learn/blob/92c9b1866/sklearn/decomposition/
+
+  arguments:
+    - name: "--max_iter"
+      type: integer
+      default: 4000
+      description: Maximum number of iterations before timing out.
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: 
+          - numpy
+          - scipy
+          - scikit-learn
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime,midmem, midcpu]
diff --git a/src/tasks/spatial_decomposition/methods/vanillanmf/script.py b/src/tasks/spatial_decomposition/methods/vanillanmf/script.py
new file mode 100644
index 0000000000..ff550796b0
--- /dev/null
+++ b/src/tasks/spatial_decomposition/methods/vanillanmf/script.py
@@ -0,0 +1,77 @@
+import anndata as ad
+import numpy as np
+from scipy.sparse import issparse
+from sklearn.decomposition import NMF
+
+## VIASH START
+par = {
+  'input_single_cell': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad',
+  'input_spatial_masked': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad',
+  'output': 'output.h5ad',
+  'max_iter': 4000
+}
+meta = {
+  'functionality_name': 'vanillanmf'
+}
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+print('Generate predictions', flush=True)
+
+n_types = input_single_cell.obs["cell_type"].cat.categories.shape[0]
+vanila_nmf_model = NMF(
+  n_components=n_types,
+  beta_loss="kullback-leibler",
+  solver="mu",
+  max_iter=par['max_iter'],
+  alpha_W=0.1,
+  alpha_H=0.1,
+  init="custom",
+  random_state=42,
+)
+
+# Make profiles from single-cell expression dataset
+# Compute means over each 'cell_type'
+labels = input_single_cell.obs['cell_type'].cat.categories
+n_var = input_single_cell.shape[1]
+means = np.empty((labels.shape[0], n_var))
+for i, lab in enumerate(labels):
+  adata_lab = input_single_cell[input_single_cell.obs['cell_type'] == lab]
+  x_lab = adata_lab.layers['counts']
+  means[i, :] = x_lab.mean(axis=0).flatten()
+adata_means = ad.AnnData(means)
+adata_means.obs_names = labels
+adata_means.var_names = input_single_cell.var_names
+
+X = input_spatial.layers['counts'].toarray()
+
+Wa = vanila_nmf_model.fit_transform(
+  X.astype(adata_means.X.dtype),
+  H=adata_means.X,
+  W=np.ones((input_spatial.shape[0], n_types), dtype=adata_means.X.dtype),
+)
+
+prop = Wa / Wa.sum(1)[:, np.newaxis]
+input_spatial.obsm["proportions_pred"] = prop
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
diff --git a/src/tasks/spatial_decomposition/metrics/r2/config.vsh.yaml b/src/tasks/spatial_decomposition/metrics/r2/config.vsh.yaml
new file mode 100644
index 0000000000..96679c5be4
--- /dev/null
+++ b/src/tasks/spatial_decomposition/metrics/r2/config.vsh.yaml
@@ -0,0 +1,32 @@
+__merge__: ../../api/comp_metric.yaml
+
+functionality:
+  name: r2
+  info:
+    metrics:
+      - name: r2
+        label: R2
+        summary: "R2 represents the proportion of variance in the true proportions which is explained by the predicted proportions."
+        description: |
+          R2, or the “coefficient of determination”, reports the fraction of the true proportion values' variance that can be explained by the predicted proportion values. The best score, and upper bound, is 1.0. There is no fixed lower bound for the metric. The uniform/non-weighted average across all cell types/states is used to summarise performance. By default, cases resulting in a score of NaN (perfect predictions) or -Inf (imperfect predictions) are replaced with 1.0 (perfect predictions) or 0.0 (imperfect predictions) respectively.
+        reference: miles2005rsquared
+        documentation_url: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html
+        repository_url: https://github.com/scikit-learn/scikit-learn/tree/5c4aa5d0d90ba66247d675d4c3fc2fdfba3c39ff
+        min: -inf
+        max: 1
+        maximize: true
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+    setup:
+      - type: python
+        packages: scikit-learn
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime,midmem, midcpu]
diff --git a/src/tasks/spatial_decomposition/metrics/r2/script.py b/src/tasks/spatial_decomposition/metrics/r2/script.py
new file mode 100644
index 0000000000..35420e021c
--- /dev/null
+++ b/src/tasks/spatial_decomposition/metrics/r2/script.py
@@ -0,0 +1,39 @@
+import anndata as ad
+import sklearn.metrics
+
+## VIASH START
+par = {
+  'input_method': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad',
+  'input_solution': 'resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad',
+  'output': 'score.h5ad'
+}
+meta = {
+  'functionality_name': 'r2'
+}
+## VIASH END
+
+print('Reading input files', flush=True)
+input_method = ad.read_h5ad(par['input_method'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Compute metrics', flush=True)
+prop_true = input_solution.obsm["proportions_true"]
+prop_pred = input_method.obsm["proportions_pred"]
+r2_score = sklearn.metrics.r2_score(
+  prop_true, prop_pred, sample_weight=None, multioutput="uniform_average"
+)
+
+uns_metric_ids = [ 'r2' ]
+uns_metric_values = [ r2_score ]
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  uns={
+    'dataset_id': input_method.uns['dataset_id'],
+    'method_id': input_method.uns['method_id'],
+    'metric_ids': uns_metric_ids,
+    'metric_values': uns_metric_values
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+
diff --git a/src/tasks/spatial_decomposition/process_dataset/config.vsh.yaml b/src/tasks/spatial_decomposition/process_dataset/config.vsh.yaml
new file mode 100644
index 0000000000..d6f62e0c7e
--- /dev/null
+++ b/src/tasks/spatial_decomposition/process_dataset/config.vsh.yaml
@@ -0,0 +1,13 @@
+__merge__: ../api/comp_process_dataset.yaml
+functionality:
+  name: "process_dataset"
+  resources:
+    - type: python_script
+      path: script.py
+    - path: /src/common/helper_functions/subset_anndata.py
+platforms:
+  - type: docker
+    image: openproblems/base_python:1.0.0
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/spatial_decomposition/process_dataset/script.py b/src/tasks/spatial_decomposition/process_dataset/script.py
new file mode 100644
index 0000000000..6e73a17830
--- /dev/null
+++ b/src/tasks/spatial_decomposition/process_dataset/script.py
@@ -0,0 +1,46 @@
+import anndata as ad
+import sys 
+import numpy as np
+
+## VIASH START
+par = {
+    "input": "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/dataset_simulated.h5ad",
+    "output_spatial_masked": "spatial_masked.h5ad",
+    "output_single_cell": "single_cell_ref.h5ad",
+    "output_solution": "solution.h5ad",
+}
+meta = {
+    "functionality_name": "process_dataset",
+    "resources_dir": "src/tasks/spatial_decomposition/process_dataset",
+    "config": "target/nextflow/spatial_decomposition/process_dataset/.config.vsh.yaml"
+}
+## VIASH END
+
+sys.path.append(meta['resources_dir'])
+from subset_anndata import read_config_slots_info, subset_anndata
+
+print(">> Load dataset", flush=True)
+adata = ad.read_h5ad(par["input"])
+
+# TO DO: Non-integer values in the counts layer are detected as un-normalized data by some methods, thereby causing them to fail.
+adata.layers['counts'] = adata.layers['counts'].floor()
+
+print(">> Figuring out which data needs to be copied to which output file", flush=True)
+slot_info = read_config_slots_info(meta["config"])
+
+print(">> Split dataset by modality", flush=True)
+is_sp = adata.obs["modality"] == "sp"
+adata_sp = adata[is_sp, :].copy()
+adata_sc = adata[~is_sp, :].copy()
+
+print(">> Create dataset for methods", flush=True)
+output_spatial_masked = subset_anndata(adata_sp, slot_info['output_spatial_masked'])
+output_single_cell = subset_anndata(adata_sc, slot_info['output_single_cell'])
+
+print(">> Create solution object for metrics", flush=True)
+output_solution = subset_anndata(adata_sp, slot_info['output_solution'])
+
+print(">> Write to disk", flush=True)
+output_spatial_masked.write_h5ad(par["output_spatial_masked"])
+output_single_cell.write_h5ad(par["output_single_cell"])
+output_solution.write_h5ad(par["output_solution"])
diff --git a/src/tasks/spatial_decomposition/resources_scripts/process_datasets.sh b/src/tasks/spatial_decomposition/resources_scripts/process_datasets.sh
new file mode 100755
index 0000000000..39ea9604e3
--- /dev/null
+++ b/src/tasks/spatial_decomposition/resources_scripts/process_datasets.sh
@@ -0,0 +1,36 @@
+#!/bin/bash
+
+# Simulating spot-resolution spatial data with alpha = 1
+
+cat > /tmp/params.yaml << 'HERE'
+id: spatial_decomposition_process_datasets
+input_states: s3://openproblems-data/resources/datasets/**/state.yaml
+settings: '{"output_spatial_masked": "$id/spatial_masked.h5ad", "output_single_cell": "$id/single_cell_ref.h5ad", "output_solution": "$id/solution.h5ad", "alpha": 1.0, "simulated_data": "$id/dataset_simulated.h5ad"}'
+rename_keys: 'input:output_dataset'
+output_state: "$id/state.yaml"
+publish_dir: s3://openproblems-data/resources/spatial_decomposition/datasets
+HERE
+
+cat > /tmp/nextflow.config << HERE
+process {
+  executor = 'awsbatch'
+  withName:'.*publishStatesProc' {
+    memory = '16GB'
+    disk = '100GB'
+  }
+  withLabel:highmem {
+    memory = '350GB'
+  }
+}
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/spatial_decomposition/workflows/process_datasets/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --entry-name auto \
+  --config /tmp/nextflow.config \
+  # --labels spatial_decomposition,process_datasets
diff --git a/src/tasks/spatial_decomposition/resources_scripts/run_benchmark.sh b/src/tasks/spatial_decomposition/resources_scripts/run_benchmark.sh
new file mode 100755
index 0000000000..db22b76f3a
--- /dev/null
+++ b/src/tasks/spatial_decomposition/resources_scripts/run_benchmark.sh
@@ -0,0 +1,22 @@
+#!/bin/bash
+
+RUN_ID="run_$(date +%Y-%m-%d_%H-%M-%S)"
+publish_dir="s3://openproblems-data/resources/spatial_decomposition/results/${RUN_ID}"
+
+cat > /tmp/params.yaml << HERE
+input_states: s3://openproblems-data/resources/spatial_decomposition/datasets/**/state.yaml
+rename_keys: 'input_single_cell:output_single_cell,input_spatial_masked:output_spatial_masked,input_solution:output_solution'
+output_state: "state.yaml"
+publish_dir: "$publish_dir"
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision main_build \
+  --pull-latest \
+  --main-script target/nextflow/spatial_decomposition/workflows/run_benchmark/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --entry-name auto \
+  --config src/wf_utils/labels_tw.config \
+  # --labels spatial_decomposition,full
\ No newline at end of file
diff --git a/src/tasks/spatial_decomposition/resources_test_scripts/cxg_mouse_pancreas_atlas.sh b/src/tasks/spatial_decomposition/resources_test_scripts/cxg_mouse_pancreas_atlas.sh
new file mode 100755
index 0000000000..fcbdaed3e0
--- /dev/null
+++ b/src/tasks/spatial_decomposition/resources_test_scripts/cxg_mouse_pancreas_atlas.sh
@@ -0,0 +1,39 @@
+#!/bin/bash
+
+# make sure the following command has been executed
+# viash ns build -q 'spatial_decomposition|common'
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+RAW_DATA=resources_test/common
+DATASET_DIR=resources_test/spatial_decomposition
+
+mkdir -p $DATASET_DIR
+
+echo "Running process_dataset"
+nextflow run . \
+    -main-script target/nextflow/spatial_decomposition/workflows/process_datasets/main.nf \
+    -profile docker \
+    -entry auto \
+    -c src/wf_utils/labels_ci.config \
+    --input_states "$RAW_DATA/**/state.yaml" \
+    --rename_keys 'input:output_dataset' \
+    --settings '{"output_spatial_masked": "$id/spatial_masked.h5ad", "output_single_cell": "$id/single_cell_ref.h5ad", "output_solution": "$id/solution.h5ad", "alpha": 1.0, "simulated_data": "$id/dataset_simulated.h5ad"}' \
+    --publish_dir "$DATASET_DIR" \
+    --output_state '$id/state.yaml'
+
+# run one method
+viash run src/tasks/spatial_decomposition/methods/rctd/config.vsh.yaml -- \
+    --input_single_cell $DATASET_DIR/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad \
+    --input_spatial_masked $DATASET_DIR/cxg_mouse_pancreas_atlas/spatial_masked.h5ad \
+    --output $DATASET_DIR/cxg_mouse_pancreas_atlas/output.h5ad
+
+# run one metric
+viash run src/tasks/spatial_decomposition/metrics/r2/config.vsh.yaml -- \
+    --input_method $DATASET_DIR/cxg_mouse_pancreas_atlas/output.h5ad \
+    --input_solution $DATASET_DIR/cxg_mouse_pancreas_atlas/solution.h5ad \
+    --output $DATASET_DIR/cxg_mouse_pancreas_atlas/score.h5ad
diff --git a/src/tasks/spatial_decomposition/resources_test_scripts/pancreas.sh b/src/tasks/spatial_decomposition/resources_test_scripts/pancreas.sh
new file mode 100755
index 0000000000..e4be381201
--- /dev/null
+++ b/src/tasks/spatial_decomposition/resources_test_scripts/pancreas.sh
@@ -0,0 +1,43 @@
+#!/bin/bash
+
+# make sure the following command has been executed
+# viash ns build -q 'spatial_decomposition|common'
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+RAW_DATA=resources_test/common/pancreas/dataset.h5ad
+DATASET_DIR=resources_test/spatial_decomposition/pancreas
+
+if [ ! -f $RAW_DATA ]; then
+    echo "Error! Could not find raw data"
+    exit 1
+fi
+
+mkdir -p $DATASET_DIR
+
+# generate synthetic spatial data
+SYNTHETIC_DATA=$DATASET_DIR/dataset_synthetic.h5ad
+python3 src/tasks/spatial_decomposition/datasets/sample_datasets.py $RAW_DATA $SYNTHETIC_DATA
+
+# process dataset
+viash run src/tasks/spatial_decomposition/process_dataset/config.vsh.yml -- \
+    --input $SYNTHETIC_DATA \
+    --output_spatial_masked $DATASET_DIR/spatial_masked.h5ad \
+    --output_single_cell $DATASET_DIR/single_cell_ref.h5ad \
+    --output_solution $DATASET_DIR/solution.h5ad
+
+# process dataset
+# echo Running process_dataset
+# nextflow run . \
+#   -main-script target/nextflow/spatial_decomposition/workflows/process_datasets/main.nf \
+#   -profile docker \
+#   -entry auto \
+#   --input_states "$RAW_DATA/**/state.yaml" \
+#   --rename_keys 'input:output_dataset' \
+#   --settings '{"output_spatial_masked": "$id/spatial_masked.h5ad", "output_single_cell": "$id/single_cell_ref.h5ad", "output_solution": "$id/solution.h5ad"}' \
+#   --publish_dir "$DATASET_DIR" \
+#   --output_state '$id/state.yaml'
\ No newline at end of file
diff --git a/src/tasks/spatial_decomposition/workflows/process_datasets/config.vsh.yaml b/src/tasks/spatial_decomposition/workflows/process_datasets/config.vsh.yaml
new file mode 100644
index 0000000000..e99fd6787d
--- /dev/null
+++ b/src/tasks/spatial_decomposition/workflows/process_datasets/config.vsh.yaml
@@ -0,0 +1,43 @@
+functionality:
+  name: "process_datasets"
+  namespace: "spatial_decomposition/workflows"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input"
+          __merge__: "/src/tasks/spatial_decomposition/api/file_common_dataset.yaml"
+          required: true
+          direction: input
+        - name: "--alpha"
+          type: double
+          required: false
+          direction: input
+    - name: Outputs
+      arguments:
+        - name: "--output_single_cell"
+          __merge__: /src/tasks/spatial_decomposition/api/file_single_cell.yaml
+          required: true
+          direction: output
+        - name: "--output_spatial_masked"
+          __merge__: /src/tasks/spatial_decomposition/api/file_spatial_masked.yaml
+          required: true
+          direction: output
+        - name: "--output_solution"
+          __merge__: /src/tasks/spatial_decomposition/api/file_solution.yaml
+          required: true
+          direction: output
+        - name: "--simulated_data"
+          type: file
+          required: false
+          direction: output
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - path: /src/wf_utils/helper.nf
+  dependencies:
+    - name: common/check_dataset_schema
+    - name: spatial_decomposition/dataset_simulator
+    - name: spatial_decomposition/process_dataset
+platforms:
+  - type: nextflow
diff --git a/src/tasks/spatial_decomposition/workflows/process_datasets/main.nf b/src/tasks/spatial_decomposition/workflows/process_datasets/main.nf
new file mode 100644
index 0000000000..4d53e5751e
--- /dev/null
+++ b/src/tasks/spatial_decomposition/workflows/process_datasets/main.nf
@@ -0,0 +1,65 @@
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    | check_dataset_schema.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset": checks["exit_code"] == 0 ? state.input : null,
+        ]
+      }
+    )
+
+    // remove datasets which didn't pass the schema check
+    | filter { id, state ->
+      state.dataset != null
+    }
+
+    | dataset_simulator.run(
+      runIf: {id, state -> state.alpha}, 
+      fromState: [ 
+        input: "dataset", 
+        alpha: "alpha"
+      ],
+      toState: [ dataset: "simulated_data"], 
+      auto: [publish: true]
+    )
+
+    | process_dataset.run(
+      fromState: [ input: "dataset" ],
+      toState: [
+        output_single_cell: "output_single_cell",
+        output_spatial_masked: "output_spatial_masked",
+        output_solution: "output_solution" 
+      ]
+    )
+
+    // only output the files for which an output file was specified
+    | setState(["output_single_cell", "output_spatial_masked", "output_solution"])
+
+  emit:
+  output_ch
+}
diff --git a/src/tasks/spatial_decomposition/workflows/process_datasets/run_test.sh b/src/tasks/spatial_decomposition/workflows/process_datasets/run_test.sh
new file mode 100644
index 0000000000..432c924789
--- /dev/null
+++ b/src/tasks/spatial_decomposition/workflows/process_datasets/run_test.sh
@@ -0,0 +1,23 @@
+#!/bin/bash
+
+# Run this prior to executing this script:
+# bin/viash_build -q 'spatial_decomposition'
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+# generate spatial dataset
+nextflow run . \
+  -main-script target/nextflow/spatial_decomposition/workflows/process_datasets/main.nf \
+  -profile docker \
+  -entry auto \
+  -c src/wf_utils/labels_ci.config \
+  --input_states "resources_test/common/**/state.yaml" \
+  --rename_keys 'input:output_dataset' \
+  --settings '{"output_spatial_masked": "$id/spatial_masked.h5ad", "output_single_cell": "$id/single_cell_ref.h5ad", "output_solution": "$id/solution.h5ad", "generate_dataset": true, "alpha": 1.0, "simulated_data": "$id/dataset_simulated.h5ad"}' \
+  --publish_dir "resources_test/spatial_decomposition"
\ No newline at end of file
diff --git a/src/tasks/spatial_decomposition/workflows/run_benchmark/config.vsh.yaml b/src/tasks/spatial_decomposition/workflows/run_benchmark/config.vsh.yaml
new file mode 100644
index 0000000000..38d97d01c1
--- /dev/null
+++ b/src/tasks/spatial_decomposition/workflows/run_benchmark/config.vsh.yaml
@@ -0,0 +1,69 @@
+functionality:
+  name: "run_benchmark"
+  namespace: "spatial_decomposition/workflows"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input_single_cell"
+          __merge__: "/src/tasks/spatial_decomposition/api/file_single_cell.yaml"
+          required: true
+          direction: input
+        - name: "--input_spatial_masked"
+          __merge__: "/src/tasks/spatial_decomposition/api/file_spatial_masked.yaml"
+          required: true
+          direction: input
+        - name: "--input_solution"
+          __merge__: "/src/tasks/spatial_decomposition/api/file_solution.yaml"
+          required: true
+          direction: input
+    - name: Outputs
+      arguments:
+        - name: "--output_scores"
+          type: file
+          required: true
+          direction: output
+          description: A yaml file containing the scores of each of the methods
+          default: score_uns.yaml
+        - name: "--output_method_configs"
+          type: file
+          required: true
+          direction: output
+          default: method_configs.yaml
+        - name: "--output_metric_configs"
+          type: file
+          required: true
+          direction: output
+          default: metric_configs.yaml
+        - name: "--output_dataset_info"
+          type: file
+          required: true
+          direction: output
+          default: dataset_uns.yaml
+        - name: "--output_task_info"
+          type: file
+          required: true
+          direction: output
+          default: task_info.yaml
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - type: file
+      path: "../../api/task_info.yaml"
+  dependencies:
+    - name: common/check_dataset_schema
+    - name: common/extract_metadata
+    - name: spatial_decomposition/control_methods/random_proportions
+    - name: spatial_decomposition/control_methods/true_proportions
+    - name: spatial_decomposition/methods/cell2location
+    - name: spatial_decomposition/methods/destvi
+    - name: spatial_decomposition/methods/nmfreg
+    - name: spatial_decomposition/methods/nnls
+    - name: spatial_decomposition/methods/rctd
+    - name: spatial_decomposition/methods/seurat
+    - name: spatial_decomposition/methods/stereoscope
+    - name: spatial_decomposition/methods/tangram
+    - name: spatial_decomposition/methods/vanillanmf
+    - name: spatial_decomposition/metrics/r2
+platforms:
+  - type: nextflow
\ No newline at end of file
diff --git a/src/tasks/spatial_decomposition/workflows/run_benchmark/main.nf b/src/tasks/spatial_decomposition/workflows/run_benchmark/main.nf
new file mode 100644
index 0000000000..82a29fe136
--- /dev/null
+++ b/src/tasks/spatial_decomposition/workflows/run_benchmark/main.nf
@@ -0,0 +1,198 @@
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // construct list of methods
+  methods = [
+    random_proportions,
+    true_proportions,
+    cell2location,
+    destvi,
+    nmfreg,
+    nnls,
+    rctd,
+    seurat,
+    stereoscope,
+    tangram,
+    vanillanmf
+  ]
+
+  // construct list of metrics
+  metrics = [
+    r2
+  ]
+
+
+  /****************************
+   * EXTRACT DATASET METADATA *
+   ****************************/
+  dataset_ch = input_ch
+    // store join id
+    | map{ id, state -> 
+      [id, state + ["_meta": [join_id: id]]]
+    }
+    
+    // extract the dataset metadata
+    | extract_metadata.run(
+      fromState: [input: "input_solution"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+  /***************************
+   * RUN METHODS AND METRICS *
+   ***************************/
+  score_ch = dataset_ch
+
+    // run all methods
+    | runEach(
+      components: methods,
+
+      // use the 'filter' argument to only run a method on the normalisation the component is asking for
+      filter: { id, state, comp ->
+        def norm = state.dataset_uns.normalization_id
+        def pref = comp.config.functionality.info.preferred_normalization
+        // if the preferred normalisation is none at all,
+        // we can pass whichever dataset we want
+        def norm_check = (norm == "log_cp10k" && pref == "counts") || norm == pref
+        def method_check = !state.method_ids || state.method_ids.contains(comp.config.functionality.name)
+
+        method_check && norm_check
+      },
+      
+      // define a new 'id' by appending the method name to the dataset id
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: { id, state, comp ->
+        def new_args = [
+          input_single_cell: state.input_single_cell, 
+          input_spatial_masked: state.input_spatial_masked
+        ]
+        if (comp.config.functionality.info.type == "control_method") {
+          new_args.input_solution = state.input_solution
+        }
+        new_args
+      },
+
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          method_id: comp.config.functionality.name,
+          method_output: output.output
+        ]
+      }
+    )
+
+    // run all metrics
+    | runEach(
+      components: metrics,
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: { id, state, comp ->
+        [
+          input_solution: state.input_solution,
+          input_method: state.method_output
+        ]
+      },
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          metric_id: comp.config.functionality.name,
+          metric_output: output.output
+        ]
+      }
+    )
+
+  /******************************
+   * GENERATE OUTPUT YAML FILES *
+   ******************************/
+  // TODO: can we store everything below in a separate helper function?
+
+  // extract the dataset metadata
+  dataset_meta_ch = dataset_ch
+    | joinStates { ids, states ->
+      // store the dataset metadata in a file
+      def dataset_uns = states.collect{state ->
+        def uns = state.dataset_uns.clone()
+        uns.remove("normalization_id")
+        uns
+      }
+      def dataset_uns_yaml_blob = toYamlBlob(dataset_uns)
+      def dataset_uns_file = tempFile("dataset_uns.yaml")
+      dataset_uns_file.write(dataset_uns_yaml_blob)
+
+      ["output", [output_dataset_info: dataset_uns_file]]
+    }
+
+  output_ch = score_ch
+
+    // extract the scores
+    | extract_metadata.run(
+      key: "extract_scores",
+      fromState: [input: "metric_output"],
+      toState: { id, output, state ->
+        state + [
+          score_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | joinStates { ids, states ->
+      // store the method configs in a file
+      def method_configs = methods.collect{it.config}
+      def method_configs_yaml_blob = toYamlBlob(method_configs)
+      def method_configs_file = tempFile("method_configs.yaml")
+      method_configs_file.write(method_configs_yaml_blob)
+
+      // store the metric configs in a file
+      def metric_configs = metrics.collect{it.config}
+      def metric_configs_yaml_blob = toYamlBlob(metric_configs)
+      def metric_configs_file = tempFile("metric_configs.yaml")
+      metric_configs_file.write(metric_configs_yaml_blob)
+
+      def task_info_file = meta.resources_dir.resolve("task_info.yaml")
+
+      // store the scores in a file
+      def score_uns = states.collect{it.score_uns}
+      def score_uns_yaml_blob = toYamlBlob(score_uns)
+      def score_uns_file = tempFile("score_uns.yaml")
+      score_uns_file.write(score_uns_yaml_blob)
+
+      def new_state = [
+        output_method_configs: method_configs_file,
+        output_metric_configs: metric_configs_file,
+        output_task_info: task_info_file,
+        output_scores: score_uns_file,
+        _meta: states[0]._meta
+      ]
+
+      ["output", new_state]
+    }
+
+    // merge all of the output data 
+    | mix(dataset_meta_ch)
+    | joinStates{ ids, states ->
+      def mergedStates = states.inject([:]) { acc, m -> acc + m }
+      [ids[0], mergedStates]
+    }
+
+  emit:
+  output_ch
+}
diff --git a/src/tasks/spatial_decomposition/workflows/run_benchmark/run_test.sh b/src/tasks/spatial_decomposition/workflows/run_benchmark/run_test.sh
new file mode 100755
index 0000000000..c48824bae5
--- /dev/null
+++ b/src/tasks/spatial_decomposition/workflows/run_benchmark/run_test.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+DATASETS_DIR="resources_test/spatial_decomposition"
+OUTPUT_DIR="output/temp"
+
+if [ ! -d "$OUTPUT_DIR" ]; then
+  mkdir -p "$OUTPUT_DIR"
+fi
+
+nextflow run . \
+  -main-script target/nextflow/spatial_decomposition/workflows/run_benchmark/main.nf \
+  -profile docker \
+  -resume \
+  -entry auto \
+  -c src/wf_utils/labels_ci.config \
+  --input_states "$DATASETS_DIR/**/state.yaml" \
+  --rename_keys 'input_single_cell:output_single_cell,input_spatial_masked:output_spatial_masked,input_solution:output_solution' \
+  --settings '{"output_scores": "scores.yaml", "output_dataset_info": "dataset_info.yaml", "output_method_configs": "method_configs.yaml", "output_metric_configs": "metric_configs.yaml", "output_task_info": "task_info.yaml"}' \
+  --publish_dir "$OUTPUT_DIR" \
+  --output_state "state.yaml"
\ No newline at end of file
diff --git a/src/tasks/spatially_variable_genes/README.md b/src/tasks/spatially_variable_genes/README.md
new file mode 100644
index 0000000000..5e9f43407d
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/README.md
@@ -0,0 +1,335 @@
+# Spatially variable genes
+
+
+<!--
+This file is automatically generated from the tasks's api/*.yaml files.
+Do not edit this file directly.
+-->
+
+Spatially variable genes (SVGs) are genes whose expression levels vary
+significantly across different spatial regions within a tissue or across
+cells in a spatially structured context.
+
+Path to source:
+[`src/tasks/spatially_variable_genes`](https://github.com/openproblems-bio/openproblems/tree/main/src/tasks/spatially_variable_genes)
+
+## Motivation
+
+Recent years have witnessed significant progress in spatially-resolved
+transcriptome profiling techniques that simultaneously characterize
+cellular gene expression and their physical position, generating spatial
+transcriptomic (ST) data. The application of these techniques has
+dramatically advanced our understanding of disease and developmental
+biology. One common task for all ST profiles, regardless of the employed
+protocols, is to identify genes that exhibit spatial patterns. These
+genes, defined as spatially variable genes (SVGs), contain additional
+information about the spatial structure of the tissues of interest,
+compared to highly variable genes (HVGs).
+
+## Description
+
+Identification of spatially variable genes is crucial to for studying
+spatial domains within tissue microenvironmnets, developmental gradients
+and cell signaling pathways. In this task we attempt to evaluate various
+methods for detecting SVGs using a number of realistic simulated
+datasets with diverse patterns derived from real-world spatial
+transcriptomics data using scDesign3. Synthetic data is generated by
+mixing a Gaussian Process (GP) model and a non-spatial model (obtained
+by shuffling mean parameters of the GP model to remove spatial
+correlation between spots) to generate gene expressions with various
+spatial variability.
+
+## Authors & contributors
+
+| name              | roles              |
+|:------------------|:-------------------|
+| Zhijian Li        | author, maintainer |
+| Zain M. Patel     | author             |
+| Dongyuan Song     | author             |
+| Guanao Yan        | author             |
+| Jingyi Jessica Li | author             |
+| Luca Pinello      | author             |
+| Robrecht Cannoodt | contributor        |
+| Sai Nirmayi Yasa  | contributor        |
+
+## API
+
+``` mermaid
+flowchart LR
+  file_common_dataset("Common Dataset")
+  comp_process_dataset[/"Data processor"/]
+  file_dataset("Dataset")
+  file_solution("Solution")
+  comp_control_method[/"Control method"/]
+  comp_method[/"Method"/]
+  comp_metric[/"Metric"/]
+  file_output("Output")
+  file_score("Score")
+  file_common_dataset---comp_process_dataset
+  comp_process_dataset-->file_dataset
+  comp_process_dataset-->file_solution
+  file_dataset---comp_control_method
+  file_dataset---comp_method
+  file_solution---comp_control_method
+  file_solution---comp_metric
+  comp_control_method-->file_output
+  comp_method-->file_output
+  comp_metric-->file_score
+  file_output---comp_metric
+```
+
+## File format: Common Dataset
+
+A subset of the common dataset.
+
+Example file:
+`resources_test/common/mouse_brain_coronal_section1/dataset.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     var: 'feature_id', 'feature_name'
+     obsm: 'spatial'
+     layers: 'counts', 'counts'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                         | Type      | Description                                                                    |
+|:-----------------------------|:----------|:-------------------------------------------------------------------------------|
+| `var["feature_id"]`          | `string`  | (*Optional*) Unique identifier for the feature, usually a ENSEMBL gene id.     |
+| `var["feature_name"]`        | `string`  | A human-readable name for the feature, usually a gene symbol.                  |
+| `obsm["spatial"]`            | `double`  | Spatial coordinates for each spot.                                             |
+| `layers["counts"]`           | `integer` | Raw counts.                                                                    |
+| `layers["counts"]`           | `double`  | Normalized expression values.                                                  |
+| `uns["dataset_id"]`          | `string`  | A unique identifier for the dataset.                                           |
+| `uns["dataset_name"]`        | `string`  | (*Optional*) Nicely formatted name.                                            |
+| `uns["dataset_url"]`         | `string`  | (*Optional*) Link to the original source of the dataset.                       |
+| `uns["dataset_reference"]`   | `string`  | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]`     | `string`  | (*Optional*) Short description of the dataset.                                 |
+| `uns["dataset_description"]` | `string`  | (*Optional*) Long description of the dataset.                                  |
+| `uns["dataset_organism"]`    | `string`  | (*Optional*) The organism of the sample in the dataset.                        |
+
+</div>
+
+## Component type: Data processor
+
+Path:
+[`src/spatially_variable_genes`](https://github.com/openproblems-bio/openproblems/tree/main/src/spatially_variable_genes)
+
+A spatially variable genes dataset processor.
+
+Arguments:
+
+<div class="small">
+
+| Name                | Type   | Description                                              |
+|:--------------------|:-------|:---------------------------------------------------------|
+| `--input`           | `file` | A subset of the common dataset.                          |
+| `--output_dataset`  | `file` | (*Output*) The dataset without spatially variable genes. |
+| `--output_solution` | `file` | (*Output*) Anndata with true spatial variability.        |
+
+</div>
+
+## File format: Dataset
+
+The dataset without spatially variable genes.
+
+Example file:
+`resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     var: 'feature_id', 'feature_name'
+     obsm: 'spatial'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'dataset_name'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                   | Type      | Description                                                                                               |
+|:-----------------------|:----------|:----------------------------------------------------------------------------------------------------------|
+| `var["feature_id"]`    | `string`  | (*Optional*) Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value. |
+| `var["feature_name"]`  | `string`  | (*Optional*) A human-readable name for the feature, in this case a gene symbol suffixed with alpha value. |
+| `obsm["spatial"]`      | `double`  | Spatial coordinates for each spot.                                                                        |
+| `layers["counts"]`     | `integer` | Raw counts.                                                                                               |
+| `layers["normalized"]` | `double`  | Normalised expression values.                                                                             |
+| `uns["dataset_id"]`    | `string`  | A unique identifier for the dataset.                                                                      |
+| `uns["dataset_name"]`  | `string`  | (*Optional*) Nicely formatted name.                                                                       |
+
+</div>
+
+## File format: Solution
+
+Anndata with true spatial variability.
+
+Example file:
+`resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad`
+
+Description:
+
+Anndata with true spatial variability score for each gene.
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     var: 'feature_id', 'feature_name', 'orig_feature_name', 'true_spatial_var_score'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                            | Type     | Description                                                                                      |
+|:--------------------------------|:---------|:-------------------------------------------------------------------------------------------------|
+| `var["feature_id"]`             | `string` | (*Optional*) Unique identifier for the feature (e.g., ESEMBL gene id suffixed with alpha value). |
+| `var["feature_name"]`           | `string` | A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.     |
+| `var["orig_feature_name"]`      | `string` | Original human-readable name for the feature, usually a gene symbol.                             |
+| `var["true_spatial_var_score"]` | `double` | True spatial variability score.                                                                  |
+| `uns["dataset_id"]`             | `string` | A unique identifier for the dataset.                                                             |
+| `uns["dataset_name"]`           | `string` | Nicely formatted name.                                                                           |
+| `uns["dataset_url"]`            | `string` | Link to the original source of the dataset.                                                      |
+| `uns["dataset_reference"]`      | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published.                   |
+| `uns["dataset_summary"]`        | `string` | Short description of the dataset.                                                                |
+| `uns["dataset_description"]`    | `string` | Long description of the dataset.                                                                 |
+| `uns["dataset_organism"]`       | `string` | The organism of the sample in the dataset.                                                       |
+
+</div>
+
+## Component type: Control method
+
+Path:
+[`src/spatially_variable_genes/control_methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/spatially_variable_genes/control_methods)
+
+Quality control methods for verifying the pipeline.
+
+Arguments:
+
+<div class="small">
+
+| Name               | Type   | Description                                           |
+|:-------------------|:-------|:------------------------------------------------------|
+| `--input_data`     | `file` | The dataset without spatially variable genes.         |
+| `--input_solution` | `file` | Anndata with true spatial variability.                |
+| `--output`         | `file` | (*Output*) Anndata with estimate spatial variability. |
+
+</div>
+
+## Component type: Method
+
+Path:
+[`src/spatially_variable_genes/methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/spatially_variable_genes/methods)
+
+A spatially variable gene identification method.
+
+Arguments:
+
+<div class="small">
+
+| Name           | Type   | Description                                           |
+|:---------------|:-------|:------------------------------------------------------|
+| `--input_data` | `file` | The dataset without spatially variable genes.         |
+| `--output`     | `file` | (*Output*) Anndata with estimate spatial variability. |
+
+</div>
+
+## Component type: Metric
+
+Path:
+[`src/spatially_variable_genes/metrics`](https://github.com/openproblems-bio/openproblems/tree/main/src/spatially_variable_genes/metrics)
+
+A spatially variable genes identification metric.
+
+Arguments:
+
+<div class="small">
+
+| Name               | Type   | Description                                |
+|:-------------------|:-------|:-------------------------------------------|
+| `--input_method`   | `file` | Anndata with estimate spatial variability. |
+| `--input_solution` | `file` | Anndata with true spatial variability.     |
+| `--output`         | `file` | (*Output*) Metric score file.              |
+
+</div>
+
+## File format: Output
+
+Anndata with estimate spatial variability.
+
+Example file:
+`resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad`
+
+Description:
+
+Anndata with estimated spatial variability score for each gene.
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     var: 'feature_id', 'feature_name', 'pred_spatial_var_score'
+     uns: 'dataset_id', 'method_id'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                            | Type     | Description                          |
+|:--------------------------------|:---------|:-------------------------------------|
+| `var["feature_id"]`             | `string` | Feature ID.                          |
+| `var["feature_name"]`           | `string` | (*Optional*) Feature name.           |
+| `var["pred_spatial_var_score"]` | `double` | Predicted spatial variability score. |
+| `uns["dataset_id"]`             | `string` | A unique identifier for the dataset. |
+| `uns["method_id"]`              | `string` | A unique identifier for the method.  |
+
+</div>
+
+## File format: Score
+
+Metric score file.
+
+Example file:
+`resources_test/spatially_variable_genes/mouse_brain_coronal_section1/score.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'
+
+</div>
+
+Slot description:
+
+<div class="small">
+
+| Slot                   | Type     | Description                                                                                  |
+|:-----------------------|:---------|:---------------------------------------------------------------------------------------------|
+| `uns["dataset_id"]`    | `string` | A unique identifier for the dataset.                                                         |
+| `uns["method_id"]`     | `string` | A unique identifier for the method.                                                          |
+| `uns["metric_ids"]`    | `string` | One or more unique metric identifiers.                                                       |
+| `uns["metric_values"]` | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |
+
+</div>
+
diff --git a/src/tasks/spatially_variable_genes/api/comp_control_method.yaml b/src/tasks/spatially_variable_genes/api/comp_control_method.yaml
new file mode 100644
index 0000000000..ee107bfd24
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/api/comp_control_method.yaml
@@ -0,0 +1,34 @@
+functionality:
+  namespace: "spatially_variable_genes/control_methods"
+  info:
+    type: control_method
+    type_info:
+      label: Control method
+      summary: Quality control methods for verifying the pipeline.
+      description: |
+        Control methods have the same interface as the regular methods
+        but also receive the solution object as input. It serves as a
+        starting point to test the relative accuracy of new methods in
+        the task, and also as a quality control for the metrics defined
+        in the task.
+  arguments:
+    - name: "--input_data"
+      __merge__: file_dataset.yaml
+      direction: input
+      required: true
+    - name: "--input_solution"
+      __merge__: file_solution.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_output.yaml
+      direction: output
+      required: true
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/spatially_variable_genes/mouse_brain_coronal_section1
+      dest: resources_test/spatially_variable_genes/mouse_brain_coronal_section1
+    - path: /src/common/library.bib
diff --git a/src/tasks/spatially_variable_genes/api/comp_method.yaml b/src/tasks/spatially_variable_genes/api/comp_method.yaml
new file mode 100644
index 0000000000..52372f7b33
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/api/comp_method.yaml
@@ -0,0 +1,25 @@
+functionality:
+  namespace: "spatially_variable_genes/methods"
+  info:
+    type: method
+    type_info:
+      label: Method
+      summary: A spatially variable gene identification method.
+      description: "Method to identify spatially variable genes"
+  arguments:
+    - name: "--input_data"
+      __merge__: file_dataset.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_output.yaml
+      direction: output
+      required: true
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/check_method_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/spatially_variable_genes/mouse_brain_coronal_section1
+      dest: resources_test/spatially_variable_genes/mouse_brain_coronal_section1
+    - path: /src/common/library.bib
\ No newline at end of file
diff --git a/src/tasks/spatially_variable_genes/api/comp_metric.yaml b/src/tasks/spatially_variable_genes/api/comp_metric.yaml
new file mode 100644
index 0000000000..73166a2160
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/api/comp_metric.yaml
@@ -0,0 +1,31 @@
+functionality:
+  namespace: "spatially_variable_genes/metrics"
+  info:
+    type: metric
+    type_info:
+      label: Metric
+      summary: A spatially variable genes identification metric.  
+      description: |
+        A metric for evaluating accuracy spatially variable genes identification
+  arguments:
+    - name: "--input_method"
+      __merge__: file_output.yaml
+      direction: input
+      required: true
+    - name: "--input_solution"
+      __merge__: file_solution.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: file_score.yaml
+      direction: output
+      required: true
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/check_metric_config.py
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/spatially_variable_genes/mouse_brain_coronal_section1
+      dest: resources_test/spatially_variable_genes/mouse_brain_coronal_section1
+    - path: /src/common/library.bib
+        
\ No newline at end of file
diff --git a/src/tasks/spatially_variable_genes/api/comp_process_dataset.yaml b/src/tasks/spatially_variable_genes/api/comp_process_dataset.yaml
new file mode 100644
index 0000000000..b18780013d
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/api/comp_process_dataset.yaml
@@ -0,0 +1,27 @@
+functionality:
+  namespace: "spatially_variable_genes"
+  info:
+    type: process_dataset
+    type_info:
+      label: Data processor
+      summary: A spatially variable genes dataset processor.
+      description: |
+        Prepare a common dataset for the spatially_variable_genes task.
+  arguments:
+    - name: "--input"
+      __merge__: file_common_dataset.yaml
+      direction: input
+      required: true
+    - name: "--output_dataset"
+      __merge__: file_dataset.yaml
+      direction: output
+      required: true
+    - name: "--output_solution"
+      __merge__: file_solution.yaml
+      direction: output
+      required: true
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/common/mouse_brain_coronal_section1
+      dest: resources_test/common/mouse_brain_coronal_section1
\ No newline at end of file
diff --git a/src/tasks/spatially_variable_genes/api/file_common_dataset.yaml b/src/tasks/spatially_variable_genes/api/file_common_dataset.yaml
new file mode 100644
index 0000000000..1837e45020
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/api/file_common_dataset.yaml
@@ -0,0 +1,58 @@
+type: file
+example: "resources_test/common/mouse_brain_coronal_section1/dataset.h5ad"
+info:
+  label: "Common Dataset"
+  summary: A subset of the common dataset.
+  slots:
+    layers:
+      - type: integer
+        name: counts
+        description: Raw counts.
+        required: true
+      - type: double
+        name: counts
+        description: Normalized expression values.
+        required: true
+    var:
+      - type: string
+        name: feature_id
+        description: Unique identifier for the feature, usually a ENSEMBL gene id.
+        required: false
+      - type: string
+        name: feature_name
+        description: A human-readable name for the feature, usually a gene symbol.
+        required: true
+    obsm:
+      - type: double
+        name: spatial
+        description: Spatial coordinates for each spot.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: false
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: false
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: false
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: false
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: false
\ No newline at end of file
diff --git a/src/tasks/spatially_variable_genes/api/file_dataset.yaml b/src/tasks/spatially_variable_genes/api/file_dataset.yaml
new file mode 100644
index 0000000000..1061720a11
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/api/file_dataset.yaml
@@ -0,0 +1,40 @@
+type: file
+example: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+info:
+  label: "Dataset"
+  summary: The dataset without spatially variable genes.
+  slots:
+    layers:
+      - type: integer
+        name: counts
+        description: Raw counts.
+        required: true
+      - type: double
+        name: normalized
+        description: Normalised expression values
+        required: true
+    var: 
+      - type: string
+        name: feature_id
+        description: Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.
+        required: false   
+      - type: string
+        name: feature_name
+        description: A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.
+        required: false   
+
+    obsm:
+      - type: double
+        name: spatial
+        description: Spatial coordinates for each spot.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: false
+        
\ No newline at end of file
diff --git a/src/tasks/spatially_variable_genes/api/file_output.yaml b/src/tasks/spatially_variable_genes/api/file_output.yaml
new file mode 100644
index 0000000000..e1fb7f6eac
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/api/file_output.yaml
@@ -0,0 +1,30 @@
+type: file
+example: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+info:
+  label: Output
+  summary: "Anndata with estimate spatial variability."
+  description: "Anndata with estimated spatial variability score for each gene."
+  slots:
+    var:
+      - type: string
+        name: feature_id
+        description: Feature ID
+        required: true
+      - type: string
+        name: feature_name
+        description: Feature name
+        required: false
+      - type: double
+        name: pred_spatial_var_score
+        description: Predicted spatial variability score
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: method_id
+        description: "A unique identifier for the method"
+        required: true
+      
\ No newline at end of file
diff --git a/src/tasks/spatially_variable_genes/api/file_score.yaml b/src/tasks/spatially_variable_genes/api/file_score.yaml
new file mode 100644
index 0000000000..28b3a47e14
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/api/file_score.yaml
@@ -0,0 +1,25 @@
+type: file
+example: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/score.h5ad"
+info:
+  label: "Score"
+  summary: Metric score file.
+  slots:
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - type: string
+        name: method_id
+        description: "A unique identifier for the method"
+        required: true
+      - type: string
+        name: metric_ids
+        description: "One or more unique metric identifiers"
+        multiple: true
+        required: true
+      - type: double
+        name: metric_values
+        description: "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'."
+        multiple: true
+        required: true
diff --git a/src/tasks/spatially_variable_genes/api/file_simulated_dataset.yaml b/src/tasks/spatially_variable_genes/api/file_simulated_dataset.yaml
new file mode 100644
index 0000000000..043b459690
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/api/file_simulated_dataset.yaml
@@ -0,0 +1,66 @@
+type: file
+example: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/simulated_dataset.h5ad"
+info:
+  label: "Common Dataset"
+  summary: A subset of the common dataset.
+  slots:
+    layers:
+      - type: integer
+        name: counts
+        description: Raw counts.
+        required: true 
+    var:
+      - type: string
+        name: feature_id
+        description: Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.
+        required: false    
+      - type: string
+        name: feature_name
+        description: A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.
+        required: true     
+      - type: string
+        name: orig_feature_id
+        description: Original unique identifier for the feature, usually a ENSEMBL gene id.
+        required: false 
+      - type: string
+        name: orig_feature_name
+        description: Original human-readable name for the feature, usually a gene symbol.
+        required: true
+      - type: double
+        name: true_spatial_var_score
+        description: True spatial variability score
+        required: true
+    obsm:
+      - type: double
+        name: spatial
+        description: Spatial coordinates for each spot.
+        required: true
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: true
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: true
\ No newline at end of file
diff --git a/src/tasks/spatially_variable_genes/api/file_solution.yaml b/src/tasks/spatially_variable_genes/api/file_solution.yaml
new file mode 100644
index 0000000000..f26006bfd0
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/api/file_solution.yaml
@@ -0,0 +1,57 @@
+type: file
+example: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+info:
+  label: Solution
+  summary: "Anndata with true spatial variability."
+  description: "Anndata with true spatial variability score for each gene."
+  slots:
+    var:
+      - type: string
+        name: feature_id
+        description: Unique identifier for the feature (e.g., ESEMBL gene id suffixed with alpha value).
+        required: false
+      - type: string
+        name: feature_name
+        description: A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.
+        required: true      
+      # - type: string
+      #   name: orig_feature_id
+      #   description: Original unique identifier for the feature, usually a ENSEMBL gene id.
+      #   required: false    
+      - type: string
+        name: orig_feature_name
+        description: Original human-readable name for the feature, usually a gene symbol.
+        required: true      
+      - type: double
+        name: true_spatial_var_score
+        description: True spatial variability score
+        required: true   
+    uns:
+      - type: string
+        name: dataset_id
+        description: "A unique identifier for the dataset"
+        required: true
+      - name: dataset_name
+        type: string
+        description: Nicely formatted name.
+        required: true
+      - type: string
+        name: dataset_url
+        description: Link to the original source of the dataset.
+        required: true
+      - name: dataset_reference
+        type: string
+        description: Bibtex reference of the paper in which the dataset was published.
+        required: false
+      - name: dataset_summary
+        type: string
+        description: Short description of the dataset.
+        required: true
+      - name: dataset_description
+        type: string
+        description: Long description of the dataset.
+        required: true
+      - name: dataset_organism
+        type: string
+        description: The organism of the sample in the dataset.
+        required: true  
\ No newline at end of file
diff --git a/src/tasks/spatially_variable_genes/api/task_info.yaml b/src/tasks/spatially_variable_genes/api/task_info.yaml
new file mode 100644
index 0000000000..78937fb59e
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/api/task_info.yaml
@@ -0,0 +1,47 @@
+name: spatially_variable_genes
+label: "Spatially variable genes"
+summary: "Detecting genes whose expression levels vary across spatial regions."
+motivation: |
+  Recent years have witnessed significant progress in spatially-resolved transcriptome profiling techniques that simultaneously characterize cellular gene expression and their physical position, generating spatial transcriptomic (ST) data. The application of these techniques has dramatically advanced our understanding of disease and developmental biology. One common task for all ST profiles, regardless of the employed protocols, is to identify genes that exhibit spatial patterns. These genes, defined as spatially variable genes (SVGs), contain additional information about the spatial structure of the tissues of interest, compared to highly variable genes (HVGs).
+description: |
+  Identification of spatially variable genes is crucial to for studying spatial domains within tissue microenvironmnets, developmental gradients and cell signaling pathways. In this task we attempt to evaluate various methods for detecting SVGs using a number of realistic simulated datasets with diverse patterns derived from real-world spatial transcriptomics data using scDesign3. Synthetic data is generated by mixing a Gaussian Process (GP) model and a non-spatial model (obtained by shuffling mean parameters of the GP model to remove spatial correlation between spots) to generate gene expressions with various spatial variability. For more details, please refer to our [manuscript](https://www.biorxiv.org/content/10.1101/2023.12.02.569717v1) and [Github](https://github.com/pinellolab/SVG_Benchmarking).
+references:
+  doi:
+    # Benchmarking computational methods to identify spatially variable genes and peaks
+    # Zhijian Li, Zain M.Patel, Dongyuan Song, Guanao Yan, Jingyi Jessica Li, Luca Pinello
+    # bioRxiv 2023.12.02.569717; doi: https://doi.org/10.1101/2023.12.02.569717 
+    - 10.1101/2023.12.02.569717
+authors:
+  - name: Zhijian Li
+    roles: [ author, maintainer ]
+    info: 
+      github: lzj1769
+      orcid: 0000-0002-1523-1333
+  - name: Zain M. Patel
+    roles: [ author ]
+    info: 
+      github: doczmp
+  - name: Dongyuan Song
+    roles: [ author]
+    info:
+      github: SONGDONGYUAN1994
+  - name: Guanao Yan
+    roles: [ author ]
+  - name: Jingyi Jessica Li
+    roles: [ author ]
+    info:
+      github: JSB-UCLA
+  - name: Luca Pinello
+    roles: [ author ]
+    info:
+      github: pinellolab
+  - name: Robrecht Cannoodt
+    roles: [contributor]
+    info:
+      github: rcannood
+      orcid: 0000-0003-3641-729X
+  - name: Sai Nirmayi Yasa
+    roles: [contributor]
+    info: 
+      github: sainirmayi
+      orcid: 0009-0003-6319-9803
\ No newline at end of file
diff --git a/src/tasks/spatially_variable_genes/control_methods/random_ranking/config.vsh.yaml b/src/tasks/spatially_variable_genes/control_methods/random_ranking/config.vsh.yaml
new file mode 100644
index 0000000000..fcb1b767f1
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/control_methods/random_ranking/config.vsh.yaml
@@ -0,0 +1,25 @@
+__merge__: ../../api/comp_control_method.yaml
+
+functionality:
+  name: random_ranking
+  info:
+    label: Random Ranking
+    summary: "Negative control method that randomly rank genes."
+    description: |
+      A negative control method with random ranking of genes.
+    preferred_normalization: counts
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: ghcr.io/openproblems-bio/base_python:1.0.4
+    setup:
+      - type: python
+        packages: pandas
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/spatially_variable_genes/control_methods/random_ranking/script.py b/src/tasks/spatially_variable_genes/control_methods/random_ranking/script.py
new file mode 100644
index 0000000000..e43c4e5079
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/control_methods/random_ranking/script.py
@@ -0,0 +1,28 @@
+import anndata as ad
+import numpy as np
+
+# VIASH START
+par = {
+    'input_data': 'resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad',
+    'input_solution': 'resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'random_ranking'
+}
+# VIASH END
+
+print('Generate predictions', flush=True)
+input_data = ad.read_h5ad(par['input_data'])
+
+df = input_data.var[["feature_id"]]
+
+np.random.seed(0)
+df['pred_spatial_var_score'] = np.random.rand(len(df['feature_id']))
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': input_data.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
diff --git a/src/tasks/spatially_variable_genes/control_methods/true_ranking/config.vsh.yaml b/src/tasks/spatially_variable_genes/control_methods/true_ranking/config.vsh.yaml
new file mode 100644
index 0000000000..b37a66d9d8
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/control_methods/true_ranking/config.vsh.yaml
@@ -0,0 +1,25 @@
+__merge__: ../../api/comp_control_method.yaml
+
+functionality:
+  name: true_ranking
+  info:
+    label: True Ranking
+    summary: "Positive control method that correctly rank genes."
+    description: |
+      A positive control method with correct ranking of genes.
+    preferred_normalization: counts
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: ghcr.io/openproblems-bio/base_python:1.0.4
+    setup:
+      - type: python
+        packages: pandas
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/spatially_variable_genes/control_methods/true_ranking/script.py b/src/tasks/spatially_variable_genes/control_methods/true_ranking/script.py
new file mode 100644
index 0000000000..2504fdc4f2
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/control_methods/true_ranking/script.py
@@ -0,0 +1,25 @@
+import anndata as ad
+
+# VIASH START
+par = {
+    'input_data': 'resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad',
+    'input_solution': 'resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'true_ranking'
+}
+# VIASH END
+
+print('Generate predictions', flush=True)
+input_solution = ad.read_h5ad(par['input_solution'])
+
+df = input_solution.var[["feature_id", "true_spatial_var_score"]]
+df.rename(columns={'true_spatial_var_score': 'pred_spatial_var_score'}, inplace=True)
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': input_solution.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
diff --git a/src/tasks/spatially_variable_genes/methods/boostgp/config.vsh.yaml b/src/tasks/spatially_variable_genes/methods/boostgp/config.vsh.yaml
new file mode 100644
index 0000000000..060dc44675
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/boostgp/config.vsh.yaml
@@ -0,0 +1,48 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: boostgp
+  info:
+    label: BOOST-GP
+    summary: "Bayesian modeling of spatial molecular profiling data via Gaussian process"
+    description: |
+      BOOST-GP a novel Bayesian hierarchical model to analyze spatial transcriptomics data, 
+      with several unique characteristics. It models the zero-inflated and over-dispersed 
+      counts by deploying a zero-inflated negative binomial model that greatly increases 
+      model stability and robustness. Besides, the Bayesian inference framework allows us 
+      to borrow strength in parameter estimation in a de novo fashion. As a result, 
+      the proposed model shows competitive performances in accuracy and robustness 
+      over existing methods in both simulation studies and two real data applications.
+    preferred_normalization: counts
+    reference: li2021bayesian
+    documentation_url: https://github.com/Minzhe/BOOST-GP
+    repository_url: https://github.com/Minzhe/BOOST-GP
+
+  arguments:
+    - name: --n_iter
+      type: integer
+      description: Number of iterations.
+      default: 10
+      info:
+        test_default: 7
+
+  resources:
+    - type: r_script
+      path: script.R
+
+platforms:
+  - type: docker
+    image: ghcr.io/openproblems-bio/base_r:1.0.4
+    setup:
+      - type: apt
+        packages: 
+          - git
+      - type: docker
+        run : |
+          git clone https://github.com/Minzhe/BOOST-GP.git /opt/BOOST-GP
+      - type: r
+        cran: [RcppDist, ggplot2, anndata]
+  - type: native
+  - type: nextflow
+    directives:
+      label: [veryhightime, midmem, midcpu]
diff --git a/src/tasks/spatially_variable_genes/methods/boostgp/script.R b/src/tasks/spatially_variable_genes/methods/boostgp/script.R
new file mode 100644
index 0000000000..4596bff2e6
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/boostgp/script.R
@@ -0,0 +1,50 @@
+library(RcppDist)
+library(anndata)
+
+dest <- getwd()
+
+# VIASH START
+par <- list(
+    "input_data" = "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad",
+    "output" = "output.h5ad",
+    "n_iter" = 10
+)
+meta <- list(
+    "functionality_name" = "BOOST-GP"
+)
+# VIASH END
+
+cat("Load data\n")
+adata <- anndata::read_h5ad(par$input_data)
+
+setwd("/opt/BOOST-GP")
+source("./R/boost.gp.R")
+
+counts <- as.matrix(adata$layers[["counts"]])
+colnames(counts) <- adata$var_names
+rownames(counts) <- adata$obs_names
+mode(counts) <- "integer"
+
+loc <- as.data.frame(adata$obsm[["spatial"]])
+rownames(loc) <- adata$obs_names
+colnames(loc) <- c("x", "y")
+
+cat("Run BOOST-GP\n")
+df <- as.data.frame(boost.gp(Y = counts, loc = loc, iter = par$n_iter, burn = 5))
+
+df$feature_id <- rownames(df)
+df <- subset(df, select = c("feature_id", "PPI"))
+colnames(df) <- c("feature_id", "pred_spatial_var_score")
+
+# save output
+cat("Write output AnnData to file\n")
+output <- anndata::AnnData(
+    shape = adata$shape,
+    var = df,
+    uns = list(
+        "dataset_id" = adata$uns[["dataset_id"]],
+        "method_id" = meta[["functionality_name"]]
+    )
+)
+
+zzz <- output$write_h5ad(paste0(dest, "/", par$output), compression = "gzip")
diff --git a/src/tasks/spatially_variable_genes/methods/gpcounts/config.vsh.yaml b/src/tasks/spatially_variable_genes/methods/gpcounts/config.vsh.yaml
new file mode 100644
index 0000000000..49f40476e1
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/gpcounts/config.vsh.yaml
@@ -0,0 +1,56 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: gpcounts
+  info:
+    label: GPcounts
+    summary: "GPcounts is non-parametric modelling of temporal and spatial counts data from RNA-seq experiments."
+    description: |
+      The GPcounts package implements GP regression methods for modelling counts data using a 
+      negative binomial likelihood function. Computational efficiency is achieved through the use of 
+      variational Bayesian inference. The GP function models changes in the mean of the negative binomial 
+      likelihood through a logarithmic link function and the dispersion parameter is fitted by maximum 
+      likelihood. We validate the method on simulated time course data, showing better performance to identify 
+      changes in over-dispersed counts data than methods based on Gaussian or Poisson likelihoods. 
+    preferred_normalization: counts
+    reference: bintayyash2021non
+    documentation_url: https://github.com/ManchesterBioinference/GPcounts/blob/master/demo_notebooks/GPcounts_spatial.ipynb
+    repository_url: https://github.com/ManchesterBioinference/GPcounts
+
+  arguments:
+    - name: --n_features
+      type: integer
+      description: Number of features to include.
+      info: 
+        test_default: 120
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    # image: python:3.9.16
+    image: openproblems/base_tensorflow_nvidia:1.0.0
+    setup:
+      - type: apt
+        packages: 
+          - git
+      - type: python
+        packages:
+          - tensorflow-probability
+          - tensorflow[and-cuda]
+          - gpflow
+          - scipy==1.9.1
+      - type: docker
+        run : |
+          git clone https://github.com/markvdw/RobustGP.git /opt/RobustGP && \
+          git clone https://github.com/lzj1769/GPcounts.git /opt/GPcounts
+      - type: python
+        packages: 
+          - /opt/RobustGP
+          - /opt/GPcounts
+  - type: native
+  - type: nextflow
+    directives:
+      label: [veryhightime, midmem, midcpu, gpu]
diff --git a/src/tasks/spatially_variable_genes/methods/gpcounts/script.py b/src/tasks/spatially_variable_genes/methods/gpcounts/script.py
new file mode 100644
index 0000000000..9bcf0497be
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/gpcounts/script.py
@@ -0,0 +1,92 @@
+import statsmodels.api as sm
+import statsmodels.formula.api as smf
+import pandas as pd
+import anndata as ad
+import scipy
+from GPcounts.RNA_seq_GP import rna_seq_gp
+import warnings
+warnings.filterwarnings('ignore')
+
+
+# VIASH START
+par = {
+    'input_data': 'resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad',
+    'output': 'output.h5ad',
+    'n_features': 120
+}
+meta = {
+    'functionality_name': 'GPcounts'
+}
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run GPcounts')
+
+# Subset if required
+if par['n_features']:
+    adata = adata[:, :par['n_features']]
+
+counts = adata.layers["counts"]
+if scipy.sparse.issparse(counts): 
+    counts = counts.todense()
+
+# spatialx = [str(i) for i in adata.obsm['spatial'][:, 0]]
+# spatialy = [str(i) for i in adata.obsm['spatial'][:, 1]]
+
+# index_names = [i+'x'+j for i, j in zip(spatialx, spatialy)]
+# Y = pd.DataFrame(data=counts, index=index_names, columns=adata.var.index)
+
+# spatial_locations = pd.DataFrame(index=Y.index)
+# spatial_locations['x'] = Y.index.str.split('x').str.get(0).map(float)
+# spatial_locations['y'] = Y.index.str.split('x').str.get(1).map(float)
+
+# spatial_locations['total_counts'] = Y.sum(1)
+
+Y = pd.DataFrame(data=counts, 
+                index=adata.obs_names, 
+                columns=adata.var_names)
+
+spatial_locations = pd.DataFrame(data=adata.obsm['spatial'], 
+                                index=adata.obs_names, 
+                                columns=['x', 'y'])
+spatial_locations['total_counts'] = Y.sum(1)
+
+Y = Y.loc[spatial_locations.index]
+X = spatial_locations[['x', 'y']]
+
+scales = []
+for i in range(0, len(Y.columns)):
+    model = smf.glm(formula="Y.iloc[:,i]~0+spatial_locations['total_counts']", data=Y,
+                    family=sm.families.NegativeBinomial(sm.families.links.identity())).fit()
+    res = model.params[0]*spatial_locations['total_counts']
+    scales.append(res)
+scalesdf = pd.DataFrame(scales)
+scalesdf = scalesdf.T
+
+Y = Y.T
+X = X[['x', 'y']]
+
+sparse = True
+nb_scaled = True  # set the nb_scaled argument to True to pass the scale factors
+gene_name = Y.index
+likelihood = 'Negative_binomial'
+gp_counts = rna_seq_gp(
+    X, Y.loc[gene_name], sparse=sparse, M=250, scale=scalesdf, safe_mode=False)
+
+log_likelihood_ratio = gp_counts.One_sample_test(likelihood)
+
+df = gp_counts.calculate_FDR(log_likelihood_ratio)
+
+# save results
+df = df.loc[adata.var_names][['log_likelihood_ratio']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
diff --git a/src/tasks/spatially_variable_genes/methods/moran_i/config.vsh.yaml b/src/tasks/spatially_variable_genes/methods/moran_i/config.vsh.yaml
new file mode 100644
index 0000000000..594f51f423
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/moran_i/config.vsh.yaml
@@ -0,0 +1,40 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: moran_i
+  info:
+    label: Moran's I
+    summary: "Moran's I is a measurement of spatial autocorrelation."
+    description: |
+      The MoranI global spatial auto-correlation statistics evaluates whether features (i.e. genes) 
+      shows a pattern that is clustered, dispersed or random in the tissue are under consideration.
+    preferred_normalization: counts
+    reference: palla2022squidpy
+    documentation_url: https://squidpy.readthedocs.io/en/stable/api/squidpy.gr.spatial_autocorr.html
+    repository_url: https://github.com/scverse/squidpy
+
+  # Component-specific parameters (optional)
+  arguments:
+    - name: "--coord_type_moran_i"
+      type: string
+      default: "generic"
+      description: Type of coordinate system. Valid options are "grid" for grid coordinates or "generic" for generic coordinates.
+      choices: [grid, generic]
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: ghcr.io/openproblems-bio/base_python:1.0.4
+    setup:
+      - type: python
+        packages: 
+        - pandas
+        - squidpy==1.4.1
+        - matplotlib==3.8.3
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/spatially_variable_genes/methods/moran_i/script.py b/src/tasks/spatially_variable_genes/methods/moran_i/script.py
new file mode 100644
index 0000000000..c158348dd5
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/moran_i/script.py
@@ -0,0 +1,44 @@
+import warnings
+warnings.filterwarnings('ignore')
+
+import anndata as ad
+import squidpy as sq
+
+# VIASH START
+par = {
+    'input_data': 'resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad',
+    'output': 'output.h5ad',
+    'coord_type_moran_i': 'generic'
+    
+}
+meta = {
+    'functionality_name': 'moranI'
+}
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run moranI', flush=True)
+sq.gr.spatial_neighbors(adata,
+                        coord_type=par['coord_type_moran_i'],
+                        delaunay=True)
+
+sq.gr.spatial_autocorr(adata,
+                       mode="moran",
+                       layer='normalized',
+                       n_perms=100,
+                       genes=adata.var_names)
+
+# save results
+df = adata.uns["moranI"]
+df = df.loc[adata.var_names][['I']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
diff --git a/src/tasks/spatially_variable_genes/methods/nnsvg/config.vsh.yaml b/src/tasks/spatially_variable_genes/methods/nnsvg/config.vsh.yaml
new file mode 100644
index 0000000000..ec8db15e15
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/nnsvg/config.vsh.yaml
@@ -0,0 +1,31 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: nnsvg
+  info:
+    label: nnSVG
+    summary: "nnSVG is based on nearest-neighbor Gaussian process (NNGP) models to estimate parameters in GPs"
+    description: |
+      nnSVG identifies genes that vary in expression continuously across the entire tissue or within a priori defined 
+      spatial domains. It uses gene-specific estimates of length scale parameters within the Gaussian process models, 
+      and scales linearly with the number of spatial locations.
+    preferred_normalization: counts
+    reference: weber2023nnsvg
+    documentation_url: https://bioconductor.org/packages/release/bioc/vignettes/nnSVG/inst/doc/nnSVG.html
+    repository_url: https://github.com/lmweber/nnSVG
+
+  resources:
+    - type: r_script
+      path: script.R
+
+platforms:
+  - type: docker
+    image: ghcr.io/openproblems-bio/base_r:1.0.4
+    setup:
+      - type: r
+        cran: [anndata, dplyr]
+        bioc: [SpatialExperiment, scran, nnSVG]
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/spatially_variable_genes/methods/nnsvg/script.R b/src/tasks/spatially_variable_genes/methods/nnsvg/script.R
new file mode 100644
index 0000000000..44a95571d6
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/nnsvg/script.R
@@ -0,0 +1,71 @@
+suppressMessages(library(SpatialExperiment))
+suppressMessages(library(scran))
+suppressMessages(library(nnSVG))
+suppressMessages(library(anndata))
+suppressMessages(library(dplyr))
+
+# VIASH START
+par = list(
+    'input_data' = 'resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad',
+    'output' = 'output.h5ad'
+)
+meta = list(
+    'functionality_name' = 'nnSVG',
+    'cpus' = 4
+)
+
+# VIASH END
+
+# load data
+cat('Load data')
+adata <- read_h5ad(par$input_data)
+counts <- t(as.matrix(adata$layers[['counts']]))
+    
+colnames(counts) <- adata$obs_names
+rownames(counts) <- adata$var_names
+    
+loc <- as.data.frame(adata$obsm[['spatial']])
+
+row_data = adata$var
+row_data$gene_id = rownames(row_data)
+row_data$feature_type = "Gene Expression"
+
+colnames(loc) <- c("x", "y")
+rownames(loc) <- colnames(counts)
+
+spe <- SpatialExperiment(
+    assays = list(counts = counts),
+    rowData = row_data,
+    colData = loc, 
+    spatialCoordsNames = c("x", "y"))
+
+# calculate logcounts (log-transformed normalized counts) using scran package
+# using library size factors
+spe <- computeLibraryFactors(spe)
+spe <- logNormCounts(spe)
+
+# run nnSVG
+if (!is.null(meta$cpus)) {
+    n_cpus <- meta$cpus
+} else {
+    n_cpus <- 1
+}
+
+cat('Run nnSVG')
+spe <- nnSVG(spe, n_threads=n_cpus)
+
+df <- as.data.frame(rowData(spe)) %>%
+    subset(select = c('feature_id', 'LR_stat'))
+
+colnames(df) <- c('feature_id', 'pred_spatial_var_score')
+rownames(df) <- NULL
+
+# save output
+cat("Write output AnnData to file\n")
+output = anndata::AnnData(
+    shape = adata$shape, 
+    var=df,
+    uns=list('dataset_id' = adata$uns[['dataset_id']],
+             'method_id' =  meta[['functionality_name']]))
+
+anndata::write_h5ad(anndata = output, filename = par$output)
diff --git a/src/tasks/spatially_variable_genes/methods/scgco/config.vsh.yaml b/src/tasks/spatially_variable_genes/methods/scgco/config.vsh.yaml
new file mode 100644
index 0000000000..6980284a42
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/scgco/config.vsh.yaml
@@ -0,0 +1,69 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: scgco
+  info:
+    label: scGCO
+    summary: "Identification of spatially variable genes with graph cuts."
+    description: |
+      Single-cell gene expression data with positional information is critical to dissect 
+      mechanisms and architectures of multicellular organisms, but the potential is limited 
+      by the scalability of current data analysis strategies. Here, we present scGCO, 
+      a method based on fast optimization of hidden Markov Random Fields with graph cuts 
+      to identify spatially variable genes. Comparing to existing methods, scGCO delivers 
+      a superior performance with lower false positive rate and improved specificity, 
+      while demonstrates a more robust performance in the presence of noises. 
+      Critically, scGCO scales near linearly with inputs and demonstrates orders of 
+      magnitude better running time and memory requirement than existing methods, 
+      and could represent a valuable solution when spatial transcriptomics data grows 
+      into millions of data points and beyond..
+    preferred_normalization: counts
+    reference: zhang2022identification
+    documentation_url: https://github.com/WangPeng-Lab/scGCO/blob/master/code/Tutorial/scGCO_tutorial.ipynb
+    repository_url: https://github.com/WangPeng-Lab/scGCO
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: python:3.9.16
+    setup:
+      - type: apt
+        packages: 
+          - git
+          - procps
+          - libhdf5-dev
+          - cmake
+          - gdal-bin 
+          - libgdal-dev
+      - type: docker
+        run : |
+          pip install Cython==0.29.33 numpy==1.23.5 scipy==1.9.1
+      - type: docker
+        run : |
+          git clone https://github.com/lzj1769/scGCO_simple.git /opt/scGCO/scGCO_simple
+      - type: python
+        packages:
+          - h5py==3.8.0
+          - pandas==1.5.3
+          - parmap==1.6.0
+          - scanpy==1.9.3
+          - tqdm==4.65.0
+          - anndata==0.8.0
+          - matplotlib==3.7.1
+          - scikit-learn==1.2.2
+          - hdbscan
+          - seaborn==0.12.2
+          - pysal==2.0.0
+          - pygco==0.0.16
+          - shapely==2.0.1
+          - networkx==2.5
+          - scikit-image
+          - pyyaml
+          - requests
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/spatially_variable_genes/methods/scgco/script.py b/src/tasks/spatially_variable_genes/methods/scgco/script.py
new file mode 100644
index 0000000000..062a0dede3
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/scgco/script.py
@@ -0,0 +1,63 @@
+import warnings
+warnings.filterwarnings('ignore')
+
+import pandas as pd
+import anndata as ad
+import numpy as np
+import scipy
+import sys
+sys.path.append("/opt/scGCO")
+
+from scGCO_simple import *
+
+# VIASH START
+par = {
+    'input_data': 'resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'scGCO'
+}
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+counts = adata.layers["counts"]
+if scipy.sparse.issparse(counts): 
+    counts = counts.todense()
+
+data = pd.DataFrame(
+    counts,
+    columns=adata.var_names,
+    index=adata.obs_names
+)
+
+print('Run scGCO', flush=True)
+data_norm = normalize_count_cellranger(data)
+
+exp = data.iloc[:, 0]
+locs = adata.obsm['spatial'].copy()
+
+print('Create graph with weight', flush=True)
+cellGraph = create_graph_with_weight(locs, exp)
+gmmDict = gmm_model(data_norm)
+
+print('Identify spatial genes', flush=True)
+df = identify_spatial_genes(locs, data_norm, cellGraph, gmmDict)
+
+# save results
+df = df.loc[adata.var_names][['fdr']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+# Transform the values via -log10 to make sure a bigger score represents a 
+# higher spatial variation
+df['pred_spatial_var_score'] = -np.log10(df['pred_spatial_var_score'].tolist())
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
diff --git a/src/tasks/spatially_variable_genes/methods/sepal/config.vsh.yaml b/src/tasks/spatially_variable_genes/methods/sepal/config.vsh.yaml
new file mode 100644
index 0000000000..d64abab561
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/sepal/config.vsh.yaml
@@ -0,0 +1,46 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: sepal
+  info:
+    label: Sepal
+    summary: "Sepal simulates diffusion of individual transcripts to extract genes with spatial patterns."
+    description: |
+      This method assesses the degree of randomness exhibited by each transcript profile and rank them accordingly.
+    preferred_normalization: counts
+    reference: andersson2021sepal
+    documentation_url: https://squidpy.readthedocs.io/en/stable/api/squidpy.gr.sepal.html
+    repository_url: https://github.com/scverse/squidpy
+
+
+  # Component-specific parameters (optional)
+  arguments:
+    - name: "--max_neighs_sepal"
+      type: integer
+      default: 6
+      description: Maximum number of neighbors of a node in the spatial graph. Valid options are 4 (square-grid) and 6 (hexagonal-grid).
+      choices: [4, 6]
+
+    - name: "--coord_type_sepal"
+      type: string
+      default: "grid"
+      description: Type of coordinate system. Valid options are "grid" for grid coordinates or "generic" for generic coordinates.
+      choices: [grid, generic]
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: ghcr.io/openproblems-bio/base_python:1.0.4
+    setup:
+      - type: python
+        packages: 
+        - pandas
+        - squidpy==1.4.1
+        - matplotlib==3.8.3
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/spatially_variable_genes/methods/sepal/script.py b/src/tasks/spatially_variable_genes/methods/sepal/script.py
new file mode 100644
index 0000000000..b2672adaed
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/sepal/script.py
@@ -0,0 +1,40 @@
+import anndata as ad
+import squidpy as sq
+
+# VIASH START
+par = {
+    'input_data': 'resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad',
+    'output': 'output.h5ad',
+    'coord_type_sepal': 'grid',
+    'max_neighs_sepal': 6,
+}
+meta = {
+    'functionality_name': 'Sepal'
+}
+# VIASH END
+
+print('Generate predictions', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+sq.gr.spatial_neighbors(adata,
+                        coord_type=par['coord_type_sepal'],
+                        delaunay=False)
+
+sq.gr.sepal(adata, 
+            layer='normalized',
+            max_neighs=par['max_neighs_sepal'], 
+            genes=adata.var_names,
+            n_jobs=1)
+
+# save results
+df = adata.uns["sepal_score"]
+df = df.loc[adata.var_names][['sepal_score']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
diff --git a/src/tasks/spatially_variable_genes/methods/somde/config.vsh.yaml b/src/tasks/spatially_variable_genes/methods/somde/config.vsh.yaml
new file mode 100644
index 0000000000..546059110e
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/somde/config.vsh.yaml
@@ -0,0 +1,37 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: somde
+  info:
+    label: SOMDE
+    summary: "SOMDE is a scalable method for identifying spatially variable genes with self-organizing map."
+    description: |
+      SOMDE uses self-organizing map to cluster neighboring cells into nodes, and then uses a Gaussian process 
+      to fit the node-level spatial gene expression to identify SVgenes. Experiments show that SOMDE is about 
+      5 to 50 times faster than existing methods with comparable results. 
+      The adjustable resolution of SOMDE makes it the only method that can give results in about 
+      5 min in large datasets of more than 20 000 sequencing sites.
+    preferred_normalization: counts
+    reference: hao2021somde
+    documentation_url: https://github.com/WhirlFirst/somde/blob/master/slide_seq0819_11_SOM.ipynb
+    repository_url: https://github.com/XuegongLab/somde
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: ghcr.io/openproblems-bio/base_python:1.0.4
+    setup:
+      - type: python
+        packages: 
+        - somde
+        - scanpy==1.9.8
+        - pandas==2.2.1
+        - numpy==1.26.4
+        - scipy==1.11.4
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/spatially_variable_genes/methods/somde/script.py b/src/tasks/spatially_variable_genes/methods/somde/script.py
new file mode 100644
index 0000000000..4dc3b84c95
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/somde/script.py
@@ -0,0 +1,53 @@
+import anndata as ad
+import pandas as pd
+import numpy as np
+import scanpy as sc
+from somde import SomNode
+import scipy
+
+# VIASH START
+par = {
+    'input_data': 'resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'SOMDE'
+}
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run SOMDE', flush=True)
+counts = adata.layers["counts"]
+if scipy.sparse.issparse(counts): 
+    counts = counts.todense()
+    
+data = pd.DataFrame(
+    counts, 
+    columns=adata.var_names, 
+    index=adata.obs_names
+)
+
+X = pd.DataFrame(adata.obsm["spatial"], 
+                     index=adata.obs_names, 
+                     columns=["x", "y"]).values.astype(np.float32)
+    
+som = SomNode(X, k=10)
+ndf, ninfo = som.mtx(data.transpose())
+nres = som.norm() 
+
+df, SVnum = som.run()
+
+# save results
+df.set_index("g", inplace=True)
+df = df.loc[adata.var_names][['FSV']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
diff --git a/src/tasks/spatially_variable_genes/methods/spagcn/config.vsh.yaml b/src/tasks/spatially_variable_genes/methods/spagcn/config.vsh.yaml
new file mode 100644
index 0000000000..78db85a819
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/spagcn/config.vsh.yaml
@@ -0,0 +1,47 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: spagcn
+  info:
+    label: SpaGCN
+    summary: "Integrating gene expression, spatial location and histology to identify spatial 
+      domains and spatially variable genes by graph convolutional network."
+    description: |
+      To elucidate spatial gene expression variation, we present SpaGCN, a graph convolutional 
+      network approach that integrates gene expression, spatial location and histology in SRT data analysis. 
+      Through graph convolution, SpaGCN aggregates gene expression of each spot from its neighboring spots, 
+      which enables the identification of spatial domains with coherent expression and histology. 
+      The subsequent domain guided differential expression (DE) analysis then detects genes with 
+      enriched expression patterns in the identified domains. Analyzing seven SRT datasets using 
+      SpaGCN, we show it can detect genes with much more enriched spatial expression patterns than competing methods. Furthermore, genes detected by SpaGCN are transferrable and can be utilized to study spatial variation of gene expression in other datasets. SpaGCN is computationally 
+      fast, platform independent, making it a desirable tool for diverse SRT studies.
+    preferred_normalization: counts
+    reference: hu2021spagcn
+    documentation_url: https://github.com/jianhuupenn/SpaGCN/blob/master/tutorial/tutorial.ipynb
+    repository_url: https://github.com/jianhuupenn/SpaGCN
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: ghcr.io/openproblems-bio/base_python:1.0.4
+    setup:
+      - type: apt
+        packages: 
+          - git
+          - procps
+          - libhdf5-dev
+          - cmake
+      - type: docker
+        run : |
+          git clone https://github.com/jianhuupenn/SpaGCN.git /opt/SpaGCN
+      - type: python
+        packages: 
+        - numpy<2.0
+        - /opt/SpaGCN/SpaGCN_package
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/spatially_variable_genes/methods/spagcn/script.py b/src/tasks/spatially_variable_genes/methods/spagcn/script.py
new file mode 100644
index 0000000000..e60e08db61
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/spagcn/script.py
@@ -0,0 +1,132 @@
+import anndata as ad
+import SpaGCN as spg
+import pandas as pd
+import numpy as np
+import scanpy as sc
+import random
+import torch
+
+# VIASH START
+par = {
+    'input_data': 'resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'SpaGCN'
+}
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+# run normalization
+adata.X = adata.layers['counts'].copy()
+sc.pp.normalize_total(adata=adata)
+sc.pp.log1p(adata)
+
+print('Run SpaGCN', flush=True)
+random_seed = 100
+
+# Set seed
+random.seed(random_seed)
+torch.manual_seed(random_seed)
+np.random.seed(random_seed)
+
+p = 0.5
+min_in_group_fraction = 0
+min_in_out_group_ratio = 0
+min_fold_change = 0
+
+
+adj = spg.calculate_adj_matrix(
+    x=adata.obsm["spatial"][:, 0],
+    y=adata.obsm["spatial"][:, 1],
+    histology=False
+)
+l = spg.search_l(p, adj, start=0.01, end=1000, tol=0.01, max_run=100)
+
+clf = spg.SpaGCN()
+clf.set_l(l)
+
+# Run
+clf.train(
+    adata,
+    adj,
+    init_spa=True,
+    init="louvain",
+    res=0.5,
+    tol=5e-3,
+    lr=0.05,
+    max_epochs=200,
+)
+
+y_pred, prob = clf.predict()
+adata.obs["pred"] = y_pred
+de_genes_all = list()
+n_clusters = len(adata.obs["pred"].unique())
+
+# identify DE genes
+for target in range(n_clusters):
+    print(f"target: {target}")
+    start, end = np.quantile(adj[adj != 0], q=0.001), np.quantile(
+        adj[adj != 0], q=0.1
+    )
+    r = spg.search_radius(
+        target_cluster=target,
+        cell_id=adata.obs.index.tolist(),
+        x=adata.obsm["spatial"][:, 0],
+        y=adata.obsm["spatial"][:, 1],
+        pred=adata.obs["pred"].tolist(),
+        start=start,
+        end=end,
+        num_min=10,
+        num_max=14,
+        max_run=100,
+    )
+
+    try:
+        nbr_domians = spg.find_neighbor_clusters(
+            target_cluster=target,
+            cell_id=adata.obs.index.tolist(),
+            x=adata.obsm["spatial"][:, 0],
+            y=adata.obsm["spatial"][:, 1],
+            pred=adata.obs["pred"].tolist(),
+            radius=r,
+            ratio=0,
+        )
+
+        de_genes_info = spg.rank_genes_groups(
+            input_adata=adata,
+            target_cluster=target,
+            nbr_list=nbr_domians,
+            label_col="pred",
+            adj_nbr=True,
+            log=True,
+        )
+        de_genes_all.append(de_genes_info)
+    except (RuntimeError, TypeError, NameError):
+        pass
+
+if len(de_genes_all) == 0:
+    df = adata.var
+    df['pvals_adj'] = np.random.random(adata.n_vars)
+else:
+    df_res = pd.concat(de_genes_all)
+    df_res = df_res.groupby(["genes"]).min()
+    df_res = df_res.loc[adata.var_names]
+    df = pd.concat([df_res, adata.var], axis=1)
+
+# save results
+df = df.loc[adata.var_names][['pvals_adj']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+# reverse it to make sure a bigger score represents a higher spatial variation
+df['pred_spatial_var_score'] = -np.log10(df['pred_spatial_var_score'])
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
diff --git a/src/tasks/spatially_variable_genes/methods/spagft/config.vsh.yaml b/src/tasks/spatially_variable_genes/methods/spagft/config.vsh.yaml
new file mode 100644
index 0000000000..48418da815
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/spagft/config.vsh.yaml
@@ -0,0 +1,59 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: spagft
+  info:
+    label: SpaGFT
+    summary: "SpaGFT is a graph Fourier transform for tissue module identification from spatially resolved transcriptomics"
+    description: |
+      The tissue module (TM) was defined as an architectural area containing recurrent cellular 
+      communities executing specific biological functions at different tissue sites. 
+      However, the computational identification of TMs poses challenges owing to their various 
+      length scales, convoluted biological processes, not well-defined molecular features, and 
+      irregular spatial patterns. Here, we present a hypothesis-free graph Fourier transform model, 
+      SpaGFT, to characterize TMs. For the first time, SpaGFT transforms complex gene expression 
+      patterns into simple, but informative signals, leading to the accurate identification of 
+      spatially variable genes (SVGs) at a fast computational speed. Based on clustering the 
+      transformed signals of the SVGs, SpaGFT provides a novel computational framework for TM 
+      characterization. Three case studies were used to illustrate TM identities, the biological 
+      processes of convoluted TMs in the lymph node, and conserved TMs across multiple samples constituting 
+      the complex organ. The superior accuracy, scalability, and interpretability of SpaGFT indicate 
+      that it is a novel and powerful tool for the investigation of TMs to gain new insights into a variety 
+      of biological questions.
+    preferred_normalization: counts
+    reference: chang2022spatial
+    documentation_url: https://spagft.readthedocs.io/en/latest/
+    repository_url: https://github.com/jxLiu-bio/SpaGFT
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: python:3.10
+    setup:
+      - type: apt
+        packages: 
+          - git
+          - procps
+          - libhdf5-dev
+          - cmake
+      - type: docker
+        run : |
+          git clone https://github.com/jxLiu-bio/SpaGFT.git /opt/SpaGFT
+      - type: python
+        packages: 
+          - h5py
+          - numba==0.55.1
+          - louvain==0.7.1
+          - chardet==5.1.0
+          - charset-normalizer==3.1.0
+          - anndata
+          - /opt/SpaGFT
+          - mizani==0.9.3
+          - pyyaml
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/spatially_variable_genes/methods/spagft/script.py b/src/tasks/spatially_variable_genes/methods/spagft/script.py
new file mode 100644
index 0000000000..9968e5aad0
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/spagft/script.py
@@ -0,0 +1,44 @@
+import anndata as ad
+import SpaGFT as spg
+
+# VIASH START
+par = {
+    'input_data': 'resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'SpaGFT'
+}
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run SpaGFT', flush=True)
+
+adata.X = adata.layers['normalized'].copy()
+
+adata.obs.loc[:, ['array_row', 'array_col']] = adata.obsm['spatial']
+
+(ratio_low, ratio_high) = spg.gft.determine_frequency_ratio(adata, ratio_neighbors=1)
+
+df = spg.detect_svg(adata,
+                    spatial_info=['array_row', 'array_col'],
+                    ratio_low_freq=ratio_low,
+                    ratio_high_freq=ratio_high,
+                    ratio_neighbors=1,
+                    filter_peaks=True,
+                    S=6)
+
+
+# save results
+df = df.loc[adata.var_names][['gft_score']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
diff --git a/src/tasks/spatially_variable_genes/methods/spanve/config.vsh.yaml b/src/tasks/spatially_variable_genes/methods/spanve/config.vsh.yaml
new file mode 100644
index 0000000000..ec95755912
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/spanve/config.vsh.yaml
@@ -0,0 +1,45 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: spanve
+  info:
+    label: Spanve
+    summary: "Spanve is a non-parametric statistical approach based on modeling space dependence as a distance of two distributions for detecting SV genes."
+    description: |
+        The depiction of in situ gene expression through spatial transcriptomics facilitates the inference of cell 
+        function mechanisms. To build spatial maps of transcriptomes, the first and crucial step is to 
+        identify spatially variable (SV) genes. However, current methods fall short in dealing with 
+        large-scale spatial transcriptomics data and may result in a high false positive rate due to the 
+        modeling of gene expression into parametric distributions. 
+        This paper introduces Spanve (https://github.com/zjupgx/Spanve), a non-parametric statistical approach 
+        based on modeling space dependence as a distance of two distributions for detecting SV genes. 
+        The high computing efficiency and accuracy of Spanve is demonstrated through comprehensive benchmarking. 
+        Additionally, Spanve can detect clustering-friendly SV genes and spatially variable co-expression, 
+        facilitating the identification of spatial tissue domains by an imputation.   
+    preferred_normalization: counts
+    reference: cai2023spanve
+    documentation_url: https://github.com/zjupgx/Spanve/blob/main/tutorial.ipynb
+    repository_url: https://github.com/zjupgx/Spanve
+
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: ghcr.io/openproblems-bio/base_python:1.0.4
+    setup:
+      - type: apt
+        packages: git
+      - type: docker
+        run : |
+          git clone https://github.com/gx-Cai/Spanve.git /opt/Spanve
+      - type: python
+        packages: 
+        - /opt/Spanve
+        - numpy==1.26.4
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/spatially_variable_genes/methods/spanve/script.py b/src/tasks/spatially_variable_genes/methods/spanve/script.py
new file mode 100644
index 0000000000..ea2c7a98e3
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/spanve/script.py
@@ -0,0 +1,33 @@
+import anndata as ad
+from Spanve import Spanve
+
+# VIASH START
+par = {
+    'input_data': 'resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'Spanve'
+}
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run Spanve', flush=True)
+adata.X = adata.layers['counts']
+spanve = Spanve(adata)
+spanve.fit(verbose=False)
+
+# save results
+df = spanve.result_df
+df = df.loc[adata.var_names][['ent']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
diff --git a/src/tasks/spatially_variable_genes/methods/spark/config.vsh.yaml b/src/tasks/spatially_variable_genes/methods/spark/config.vsh.yaml
new file mode 100644
index 0000000000..4ed2e5fd0a
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/spark/config.vsh.yaml
@@ -0,0 +1,30 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: spark
+  info:
+    label: SPARK
+    summary: "Spatial PAttern Recognition via Kernels"
+    description: |
+      SPARK builds upon a generalized linear spatial model (GLSM) with a variety of spatial kernels to accommodate count data.
+      With a newly developed penalized quasi-likelihood (PQL) algorithm, SPARK is scalable to analyzing tens of 
+      thousands of genes across tens of thousands spatial locations.
+    preferred_normalization: counts
+    reference: sun2020statistical
+    documentation_url: https://xzhoulab.github.io/SPARK/02_SPARK_Example/
+    repository_url: https://github.com/xzhoulab/SPARK
+
+  resources:
+    - type: r_script
+      path: script.R
+
+platforms:
+  - type: docker
+    image: ghcr.io/openproblems-bio/base_r:1.0.4
+    setup:
+      - type: r
+        github: xzhoulab/SPARK
+  - type: native
+  - type: nextflow
+    directives:
+      label: [veryhightime, highmem, midcpu]
diff --git a/src/tasks/spatially_variable_genes/methods/spark/script.R b/src/tasks/spatially_variable_genes/methods/spark/script.R
new file mode 100644
index 0000000000..2de1f38bbb
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/spark/script.R
@@ -0,0 +1,75 @@
+suppressMessages(library(SPARK))
+suppressMessages(library(anndata))
+
+# VIASH START
+par <- list(
+    "input_data" = "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad",
+    "output" = "output.h5ad"
+)
+meta <- list(
+    "functionality_name" = "SPARK",
+    "cpus" = 4
+)
+
+# VIASH END
+
+# load data
+cat("Load data\n")
+adata <- anndata::read_h5ad(par$input_data)
+counts <- t(as.matrix(adata$layers[["counts"]]))
+colnames(counts) <- adata$obs_names
+rownames(counts) <- adata$var_names
+info <- as.data.frame(adata$obsm[["spatial"]])
+rownames(info) <- colnames(counts)
+colnames(info) <- c("x", "y")
+
+# run SPARK
+cat("Run SPARK\n")
+if (!is.null(meta$cpus)) {
+    n_cpus <- meta$cpus
+} else {
+    n_cpus <- 1
+}
+
+spark <- CreateSPARKObject(
+    counts = counts, percentage = 0,
+    min_total_counts = 0, location = info[, 1:2]
+)
+
+spark@lib_size <- apply(spark@counts, 2, sum)
+spark <- spark.vc(spark,
+    covariates = NULL,
+    lib_size = spark@lib_size,
+    num_core = n_cpus,
+    verbose = FALSE
+)
+
+## Calculating pval
+spark <- spark.test(spark,
+    check_positive = T,
+    verbose = F
+)
+
+df <- as.data.frame(spark@res_mtest)
+
+df$feature_id <- rownames(df)
+
+df <- subset(df, select = c("feature_id", "adjusted_pvalue"))
+colnames(df) <- c("feature_id", "pred_spatial_var_score")
+
+# because SPARK only generates p-values, we here transform the values
+# via -log10 to make sure a bigger score represents a higher spatial variation
+df$pred_spatial_var_score <- -log10(df$pred_spatial_var_score)
+
+# save output
+cat("Write output AnnData to file\n")
+output <- anndata::AnnData(
+    shape = adata$shape,
+    var = df,
+    uns = list(
+        "dataset_id" = adata$uns[["dataset_id"]],
+        "method_id" = meta[["functionality_name"]]
+    )
+)
+
+anndata::write_h5ad(anndata = output, filename = par$output)
diff --git a/src/tasks/spatially_variable_genes/methods/spark_x/config.vsh.yaml b/src/tasks/spatially_variable_genes/methods/spark_x/config.vsh.yaml
new file mode 100644
index 0000000000..cc3a62eeef
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/spark_x/config.vsh.yaml
@@ -0,0 +1,35 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: spark_x
+  info:
+    label: SPARK-X
+    summary: "SPARK-X is a non-parametric method for rapid and effective detection of spatially expressed genes in large spatial transcriptomic studies."
+    description: |
+      Spatial transcriptomic studies are becoming increasingly common and large, posing important 
+      statistical and computational challenges for many analytic tasks. Here, we present SPARK-X, 
+      a non-parametric method for rapid and effective detection of spatially expressed genes in large 
+      spatial transcriptomic studies. SPARK-X not only produces effective type I error control and 
+      high power but also brings orders of magnitude computational savings. We apply SPARK-X to 
+      analyze three large datasets, one of which is only analyzable by SPARK-X. In these data, 
+      SPARK-X identifies many spatially expressed genes including those that are spatially 
+      expressed within the same cell type, revealing new biological insights.
+    preferred_normalization: counts
+    reference: zhu2021spark
+    documentation_url: https://xzhoulab.github.io/SPARK/02_SPARK_Example/
+    repository_url: https://github.com/xzhoulab/SPARK
+
+  resources:
+    - type: r_script
+      path: script.R
+
+platforms:
+  - type: docker
+    image: ghcr.io/openproblems-bio/base_r:1.0.4
+    setup:
+      - type: r
+        github: xzhoulab/SPARK
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/spatially_variable_genes/methods/spark_x/script.R b/src/tasks/spatially_variable_genes/methods/spark_x/script.R
new file mode 100644
index 0000000000..c5f9d8a96b
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/spark_x/script.R
@@ -0,0 +1,57 @@
+suppressMessages(library(SPARK))
+suppressMessages(library(anndata))
+
+# VIASH START
+par <- list(
+    "input_data" = "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad",
+    "output" = "output.h5ad"
+)
+meta <- list(
+    "functionality_name" = "SPARK-X",
+    "cpus" = 4L
+)
+
+# VIASH END
+
+# load data
+cat("Load data\n")
+adata <- anndata::read_h5ad(par$input_data)
+counts <- t(as.matrix(adata$layers[["counts"]]))
+colnames(counts) <- adata$obs_names
+rownames(counts) <- adata$var_names
+info <- as.data.frame(adata$obsm[["spatial"]])
+rownames(info) <- colnames(counts)
+colnames(info) <- c("x", "y")
+
+# run SPARK-X
+cat("Load SPARK-X\n")
+if (!is.null(meta$cpus)) {
+    n_cpus <- meta$cpus
+} else {
+    n_cpus <- 1
+}
+
+sparkX <- sparkx(counts, info[, 1:2], numCores = n_cpus, option = "mixture")
+
+df <- as.data.frame(sparkX$res_mtest)
+df$feature_id <- rownames(df)
+df <- subset(df, select = c("feature_id", "adjustedPval"))
+colnames(df) <- c("feature_id", "pred_spatial_var_score")
+rownames(df) <- NULL
+
+# because SPARK-X only generates p-values, we here transform the values
+# via -log10 to make sure a bigger score represents a higher spatial variation
+df$pred_spatial_var_score <- -log10(df$pred_spatial_var_score)
+
+# save output
+cat("Write output AnnData to file\n")
+output <- anndata::AnnData(
+    shape = adata$shape,
+    var = df,
+    uns = list(
+        "dataset_id" = adata$uns[["dataset_id"]],
+        "method_id" = meta[["functionality_name"]]
+    )
+)
+
+anndata::write_h5ad(anndata = output, filename = par$output)
diff --git a/src/tasks/spatially_variable_genes/methods/spatialde/config.vsh.yaml b/src/tasks/spatially_variable_genes/methods/spatialde/config.vsh.yaml
new file mode 100644
index 0000000000..eee1ca1be1
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/spatialde/config.vsh.yaml
@@ -0,0 +1,39 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: spatialde
+  info:
+    label: SpatialDE
+    summary: "SpatialDE is a method for identify spatially variable genes based on Gaussian Process model "
+    description: |
+      SpatialDE decomposes expression variability into spatial and nonspatial components using two random effect terms: a spatial variance term that parametrizes gene expression covariance by pairwise distances of samples, and a noise term that models nonspatial variability.
+    preferred_normalization: counts
+    reference: svensson2018spatialde
+    documentation_url: https://github.com/Teichlab/SpatialDE
+    repository_url: https://github.com/Teichlab/SpatialDE
+
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: ghcr.io/openproblems-bio/base_python:1.0.4
+    setup:
+      - type: apt
+        packages: git
+      - type: docker
+        run : |
+          git clone https://github.com/Teichlab/SpatialDE.git /opt/SpatialDE
+      - type: python
+        packages: 
+        - /opt/SpatialDE/Python-module
+        - scanpy==1.9.8
+        - pandas==2.2.1
+        - numpy==1.26.4
+        - scipy==1.11.4
+  - type: native
+  - type: nextflow
+    directives:
+      label: [hightime, highmem, midcpu]
diff --git a/src/tasks/spatially_variable_genes/methods/spatialde/script.py b/src/tasks/spatially_variable_genes/methods/spatialde/script.py
new file mode 100644
index 0000000000..f5e0a9b21d
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/spatialde/script.py
@@ -0,0 +1,53 @@
+import warnings
+warnings.filterwarnings('ignore')
+
+import scanpy as sc
+import anndata as ad
+import NaiveDE
+import SpatialDE
+
+# VIASH START
+par = {
+    'input_data': 'resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'spatialDE'
+}
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+# run spatialDE
+print('Run spatialDE')
+sc.pp.calculate_qc_metrics(adata, 
+                           layer='counts', 
+                           inplace=True, 
+                           percent_top=[10])
+    
+counts = sc.get.obs_df(adata, 
+                       keys=list(adata.var_names), 
+                       use_raw=False, 
+                       layer='counts')
+
+total_counts = sc.get.obs_df(adata, keys=["total_counts"])
+norm_expr = NaiveDE.stabilize(counts.T).T
+resid_expr = NaiveDE.regress_out(total_counts, 
+                                 norm_expr.T, 
+                                 "np.log(total_counts)").T
+    
+df = SpatialDE.run(adata.obsm["spatial"], resid_expr)
+
+# save results
+df.set_index("g", inplace=True)
+df = df.loc[adata.var_names][['FSV']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
diff --git a/src/tasks/spatially_variable_genes/methods/spatialde2/config.vsh.yaml b/src/tasks/spatially_variable_genes/methods/spatialde2/config.vsh.yaml
new file mode 100644
index 0000000000..0d812b0a1f
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/spatialde2/config.vsh.yaml
@@ -0,0 +1,53 @@
+__merge__: ../../api/comp_method.yaml
+
+functionality:
+  name: spatialde2
+  info:
+    label: SpatialDE2
+    summary: "SpatialDE2: Fast and localized variance component analysis of spatial transcriptomics"
+    description: |
+      Spatial transcriptomics is now a mature technology, allowing to assay gene expression changes 
+      in the histological context of complex tissues. A canonical analysis workflow starts with the 
+      identification of tissue zones that share similar expression profiles, followed by the detection 
+      of highly variable or spatially variable genes. Rapid increases in the scale and complexity of 
+      spatial transcriptomic datasets demand that these analysis steps are conducted in a consistent 
+      and integrated manner, a requirement that is not met by current methods. To address this, we 
+      here present SpatialDE2, which unifies the mapping of tissue zones and spatial variable gene 
+      detection as integrated software framework, while at the same time advancing current algorithms 
+      for both of these steps. Formulated in a Bayesian framework, the model accounts for the Poisson 
+      count noise, while simultaneously offering superior computational speed compared to previous methods. 
+      We validate SpatialDE2 using simulated data and illustrate its utility in the context of two real-world 
+      applications to the spatial transcriptomics profiles of the mouse brain and human endometrium.
+    preferred_normalization: counts
+    reference: kats2021spatialde2
+    documentation_url: https://pmbio.github.io/SpatialDE/
+    repository_url: https://github.com/PMBio/SpatialDE
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: python:3.7.12
+    setup:
+      - type: apt
+        packages: 
+          - git
+          - procps
+          - libhdf5-dev
+          - cmake
+      - type: docker
+        run : |
+          git clone https://github.com/PMBio/SpatialDE.git /opt/SpatialDE2
+      - type: python
+        packages: 
+        - scanpy
+        - anndata
+        - patsy
+        - /opt/SpatialDE2
+        - pyyaml
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu, gpu]
diff --git a/src/tasks/spatially_variable_genes/methods/spatialde2/script.py b/src/tasks/spatially_variable_genes/methods/spatialde2/script.py
new file mode 100644
index 0000000000..fe82d40981
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/methods/spatialde2/script.py
@@ -0,0 +1,51 @@
+import scanpy as sc
+import anndata as ad
+import SpatialDE as sd
+import NaiveDE
+import warnings
+warnings.filterwarnings("ignore")
+
+
+# VIASH START
+par = {
+    'input_data': 'resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad',
+    'output': 'output.h5ad'
+}
+meta = {
+    'functionality_name': 'spatialDE2'
+}
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+# run SpatialDE2
+print('Run spatialDE2', flush=True)
+adata.X = adata.layers['counts'].copy()
+sc.pp.calculate_qc_metrics(adata, inplace=True, percent_top=[10])
+
+counts = sc.get.obs_df(adata,
+                       keys=list(adata.var_names),
+                       use_raw=False,
+                       layer='counts')
+
+total_counts = sc.get.obs_df(adata, keys=["total_counts"])
+norm_expr = NaiveDE.stabilize(counts.T).T
+adata.X = NaiveDE.regress_out(
+    total_counts, norm_expr.T, "np.log(total_counts)").T
+
+# run SpatialDE2
+df = sd.fit(adata, normalized=True, control=None)
+df.set_index("gene", inplace=True)
+
+# save results
+df = df.loc[adata.var_names][['FSV']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
diff --git a/src/tasks/spatially_variable_genes/metrics/correlation/config.vsh.yaml b/src/tasks/spatially_variable_genes/metrics/correlation/config.vsh.yaml
new file mode 100644
index 0000000000..c13a852c24
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/metrics/correlation/config.vsh.yaml
@@ -0,0 +1,32 @@
+__merge__: ../../api/comp_metric.yaml
+
+functionality:
+  name: correlation
+  info:
+    metrics:
+      - name: correlation
+        label: correlation
+        summary: "Correlation represents the agreement of true and predicted spatial variability."
+        description: |
+          Kendall rank correlation coefficient measures the ordinal association between two measured quantities. The best score and upper bound is 1 (observations have an identical rank), while the lower bound is -1 (observations have a completely different rank).
+        reference: kendall1938new
+        documentation_url: https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient
+        repository_url: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html
+        min: -1
+        max: 1
+        maximize: true
+
+  resources:
+    - type: python_script
+      path: script.py
+
+platforms:
+  - type: docker
+    image: ghcr.io/openproblems-bio/base_python:1.0.4
+    setup:
+      - type: python
+        packages: pandas
+  - type: native
+  - type: nextflow
+    directives:
+      label: [midtime, midmem, midcpu]
diff --git a/src/tasks/spatially_variable_genes/metrics/correlation/script.py b/src/tasks/spatially_variable_genes/metrics/correlation/script.py
new file mode 100644
index 0000000000..f61ea17193
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/metrics/correlation/script.py
@@ -0,0 +1,37 @@
+import anndata as ad
+import pandas as pd
+
+## VIASH START
+par = {
+  'input_method': 'resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad',
+  'input_solution': 'resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad',
+  'output': 'score.h5ad'
+}
+meta = {
+  'functionality_name': 'correlation'
+}
+## VIASH END
+
+print('Reading input files', flush=True)
+input_method = ad.read_h5ad(par['input_method'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Compute metrics', flush=True)
+df = pd.merge(input_method.var, input_solution.var, how='left', on='feature_id')
+groupby = df.groupby('orig_feature_name', observed=True)
+corr = groupby.apply(lambda x: x['pred_spatial_var_score'].corr(x['true_spatial_var_score'], method='kendall'))
+
+uns_metric_ids = [ 'correlation' ]
+uns_metric_values = [ corr.mean() ]
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  uns={
+    'dataset_id': input_method.uns['dataset_id'],
+    'method_id': input_method.uns['method_id'],
+    'metric_ids': uns_metric_ids,
+    'metric_values': uns_metric_values
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+
diff --git a/src/tasks/spatially_variable_genes/process_dataset/select_reference/config.vsh.yaml b/src/tasks/spatially_variable_genes/process_dataset/select_reference/config.vsh.yaml
new file mode 100644
index 0000000000..229f039a62
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/process_dataset/select_reference/config.vsh.yaml
@@ -0,0 +1,51 @@
+functionality:
+  name: "select_reference"
+  namespace: "spatially_variable_genes/process_dataset"
+  description: "Compute SVG"
+  info:
+    type: dataset_processor
+    type_info:
+      label: select_reference
+      description: |
+        Computes the spatially variable genes scores and select certain number of SVGs as reference.
+  arguments:
+    - name: "--input"
+      __merge__: ../../api/file_common_dataset.yaml
+      required: true
+      direction: input
+    - name: "--input_layer"
+      type: string
+      default: "normalized"
+      description: Which layer to use as input.
+    - name: "--output"
+      type: file
+      direction: output
+      required: true
+      __merge__: ../../api/file_common_dataset.yaml
+    - name: "--coord_type_proc"
+      type: string
+      default: "grid"
+      description: "How to create spatial graph to select reference genes."
+      choices: [grid, generic]
+    - name: "--num_features"
+      type: integer
+      default: 200
+      description: "The number of variable genes to select"
+  resources:
+    - type: python_script
+      path: script.py
+  test_resources:
+    - path: /resources_test/common/mouse_brain_coronal_section1
+      dest: resources_test/common/mouse_brain_coronal_section1
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+platforms:
+  - type: docker
+    image: ghcr.io/openproblems-bio/base_python:1.0.4
+    setup: 
+      - type: python
+        packages:
+          - squidpy
+  - type: nextflow
+    directives:
+      label: [midtime, highmem, midcpu]
diff --git a/src/tasks/spatially_variable_genes/process_dataset/select_reference/script.py b/src/tasks/spatially_variable_genes/process_dataset/select_reference/script.py
new file mode 100644
index 0000000000..481735c6fa
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/process_dataset/select_reference/script.py
@@ -0,0 +1,36 @@
+import anndata as ad
+import squidpy as sq
+
+### VIASH START
+par = {
+    "input": "resources_test/common/mouse_brain_coronal_section1/dataset.h5ad",
+    "input_layer": "normalized",
+    "output": "reference_dataset.h5ad",
+    "num_features": 50,
+    "coord_type_proc": "grid"
+}
+### VIASH END
+
+print(">> Load data", flush=True)
+adata = ad.read_h5ad(par['input'])
+
+print(">> Run Moran's I spatial autocorrelation", flush=True)
+sq.gr.spatial_neighbors(adata, 
+                        coord_type=par['coord_type_proc'], 
+                        delaunay=False)
+sq.gr.spatial_autocorr(adata, 
+                       layer="normalized",
+                       mode="moran", 
+                       n_perms=100, n_jobs=10, 
+                       genes=adata.var_names)
+
+n_svgs = par['num_features']
+sel_genes = (
+    adata.uns["moranI"]["I"].sort_values(ascending=False).head(n_svgs).index.tolist()
+)
+
+adata = adata[:, sel_genes]
+
+print(">> Writing data", flush=True)
+adata.write_h5ad(par['output'])
+
diff --git a/src/tasks/spatially_variable_genes/process_dataset/simulate_svg/config.vsh.yaml b/src/tasks/spatially_variable_genes/process_dataset/simulate_svg/config.vsh.yaml
new file mode 100644
index 0000000000..825958d337
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/process_dataset/simulate_svg/config.vsh.yaml
@@ -0,0 +1,46 @@
+functionality:
+  name: "simulate_svg"
+  namespace: "spatially_variable_genes/process_dataset"
+  info:
+    type: process_dataset
+    type_info:
+      label: Data processor
+      summary: A spatially variable genes simulator.
+      description: |
+        Simulate spatially variable and spatially non-variable genes.
+  arguments:
+    - name: "--input"
+      __merge__: ../../api/file_common_dataset.yaml
+      direction: input
+      required: true
+    - name: "--output"
+      __merge__: ../../api/file_simulated_dataset.yaml
+      direction: output
+      required: true
+    - type: integer
+      name: --gp_k
+      description: Dimension of basis used for the Gaussian process smoother.
+      default: 500
+      info:
+        test_default: 50
+    - type: integer
+      name: --select_top_variable_genes
+      description: Number of top variable genes to use for subsetting.
+      default: 50
+  resources:
+    - type: r_script
+      path: script.R
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/common/mouse_brain_coronal_section1
+      dest: resources_test/common/mouse_brain_coronal_section1
+platforms:
+  - type: docker
+    image: ghcr.io/openproblems-bio/base_r:1.0.4
+    setup:
+      - type: r
+        github: SONGDONGYUAN1994/scDesign3
+  - type: nextflow
+    directives: 
+      label: [hightime, highmem, highcpu]
diff --git a/src/tasks/spatially_variable_genes/process_dataset/simulate_svg/script.R b/src/tasks/spatially_variable_genes/process_dataset/simulate_svg/script.R
new file mode 100644
index 0000000000..43ea0476d8
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/process_dataset/simulate_svg/script.R
@@ -0,0 +1,196 @@
+requireNamespace("scDesign3", quietly = TRUE)
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("Matrix", quietly = TRUE)
+requireNamespace("SingleCellExperiment", quietly = TRUE)
+library(rlang)
+
+# set random seed
+set.seed(2024)
+
+## VIASH START
+par <- list(
+  input = "resources_test/common/mouse_brain_coronal_section1/dataset.h5ad",
+  output = "dataset_sim.h5ad",
+  gp_k = 50L,
+  select_top_variable_genes = 50L
+)
+meta <- list(
+  cpus = 30L
+)
+## VIASH END
+
+cat("Read AnnData\n")
+adata <- anndata::read_h5ad(par$input)
+
+cat("Transform into SCE\n")
+df_loc <- as.data.frame(adata$obsm[['spatial']])
+colnames(df_loc) <- c("spatial1", "spatial2")
+rownames(df_loc) <- adata$obs_names
+
+ref_sce <- SingleCellExperiment::SingleCellExperiment(
+  list(counts = Matrix::t(adata$layers[["counts"]])),
+  colData = df_loc
+)
+
+ref_sce
+
+# check the number of genes in reference object
+n_genes <- dim(ref_sce)[1]
+
+mu_formula <- paste0(
+  "s(spatial1, spatial2, bs = 'gp', k = ", par$gp_k, ")"
+)
+
+if (n_genes > par$select_top_variable_genes) {
+  cat("Select ", par$select_top_variable_genes, " genes among ", n_genes, " reference genes ", "\n", sep = "")
+
+  cat("Transform into scDesign3 data format\n")
+  ref_data <- scDesign3::construct_data(
+    sce = ref_sce,
+    assay_use = "counts",
+    celltype = NULL,
+    pseudotime = NULL,
+    spatial = c("spatial1", "spatial2"),
+    other_covariates = NULL,
+    corr_by = "1"
+  )
+
+  cat("Fit regression models for each feature\n")
+  ref_marginal <- scDesign3::fit_marginal(
+    data = ref_data,
+    predictor = "gene",
+    mu_formula = mu_formula,
+    sigma_formula = "1",
+    family_use = "nb",
+    parallelization = "pbmcmapply",
+    n_cores = 2L,
+    usebam = FALSE,
+    trace = TRUE
+  )
+
+  cat("Subset to the top variable genes\n")
+  dev_explain <- sapply(ref_marginal, function(x) {
+    if (length(x$fit) == 1 && is.na(x$fit)) {
+      return(NA_real_)
+    }
+    summary(x$fit)$dev.expl
+  })
+  top_sel <- names(sort(dev_explain, decreasing = TRUE))[seq_len(par$select_top_variable_genes)]
+} else {
+  top_sel <- adata$var_names
+}
+
+ref_sce <- ref_sce[top_sel, ]
+var_subset <- adata$var[top_sel, , drop = FALSE]
+
+cat("Transform subset matrix into scDesign3 data format\n")
+ref_data <- scDesign3::construct_data(
+  sce = ref_sce,
+  assay_use = "counts",
+  celltype = NULL,
+  pseudotime = NULL,
+  spatial = c("spatial1", "spatial2"),
+  other_covariates = NULL,
+  corr_by = "1"
+)
+
+cat("Fit expression of each gene with GP model\n")
+ref_marginal <- scDesign3::fit_marginal(
+  data = ref_data,
+  predictor = "gene",
+  mu_formula = mu_formula,
+  sigma_formula = "1",
+  family_use = "nb",
+  parallelization = "pbmcmapply",
+  n_cores = 2L,
+  usebam = FALSE,
+  trace = TRUE
+)
+
+cat("Fit a copula, obtain AIC and BIC\n")
+ref_copula <- scDesign3::fit_copula(
+  sce = ref_sce,
+  assay_use = "counts",
+  marginal_list = ref_marginal,
+  family_use = "nb",
+  copula = "gaussian",
+  parallelization = "pbmcmapply",
+  n_cores = 2L,
+  input_data = ref_data$dat
+)
+
+cat("Extract out the estimated parameters\n")
+ref_para <- scDesign3::extract_para(
+  sce = ref_sce,
+  marginal_list = ref_marginal,
+  family_use = "nb",
+  new_covariate = ref_data$newCovariate,
+  data = ref_data$dat,
+  parallelization = "pbmcmapply",
+  n_cores = 2L
+)
+
+cat("Simulate the new count matrix\n")
+# generate non-spatially variable mean values with shuffling
+shuffle_idx <- sample(nrow(ref_para$mean_mat))
+non_de_mat <- ref_para$mean_mat[shuffle_idx, ]
+
+# simulate data with varied spatial variability
+outputs <- lapply(seq(0, 1.0, 0.05), function(alpha){
+  cat("Simulate data with alpha = ", alpha, "\n", sep = "")
+  counts <- scDesign3::simu_new(
+    sce = ref_sce,
+    mean_mat = alpha * ref_para$mean_mat + (1 - alpha) * non_de_mat,
+    sigma_mat = ref_para$sigma_mat,
+    zero_mat = ref_para$zero_mat,
+    quantile_mat = NULL,
+    copula_list = ref_copula$copula_list,
+    n_cores = 5L,
+    family_use = "nb",
+    input_data = ref_data$dat,
+    new_covariate = ref_data$newCovariate,
+    important_feature = rep(TRUE, nrow(ref_sce)),
+    filtered_gene = NULL
+  )
+
+  if ("feature_id" %in% names(var_subset)) {
+    new_var <- data.frame(
+      feature_id = paste0(var_subset$feature_id, "_", alpha),
+      feature_name = paste0(var_subset$feature_name, "_", alpha),
+      orig_feature_id = var_subset$feature_id,
+      orig_feature_name = var_subset$feature_name,
+      true_spatial_var_score = alpha
+    )
+    rownames(counts) <- new_var$feature_id
+    rownames(new_var) <- new_var$feature_id
+  } else {
+    new_var <- data.frame(
+      feature_id = paste0(var_subset$feature_name, "_", alpha),
+      feature_name = paste0(var_subset$feature_name, "_", alpha),
+      orig_feature_name = var_subset$feature_name,
+      true_spatial_var_score = alpha
+    )
+    rownames(counts) <- new_var$feature_name
+    rownames(new_var) <- new_var$feature_name
+  }
+
+  list(
+    counts = Matrix::t(counts),
+    var = new_var
+  )
+})
+
+cat("Collecting final output\n", sep = "")
+final_counts <- do.call(cbind, lapply(outputs, function(x) x$counts))
+final_var <- do.call(rbind, lapply(outputs, function(x) x$var))
+final_uns <- adata$uns[c("dataset_id", "dataset_name", "dataset_description", "dataset_summary", "dataset_url", "dataset_organism", "dataset_reference")]
+
+output <- anndata::AnnData(
+  layers = list(counts = final_counts),
+  obs = adata$obs,
+  var = final_var,
+  obsm = adata$obsm,
+  uns = final_uns
+)
+
+zzz <- output$write_h5ad(par$output, compression = "gzip")
diff --git a/src/tasks/spatially_variable_genes/process_dataset/split_dataset/config.vsh.yaml b/src/tasks/spatially_variable_genes/process_dataset/split_dataset/config.vsh.yaml
new file mode 100644
index 0000000000..d99688d759
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/process_dataset/split_dataset/config.vsh.yaml
@@ -0,0 +1,38 @@
+functionality:
+  name: "split_dataset"
+  namespace: "spatially_variable_genes/process_dataset"
+  info:
+    type: process_dataset
+    type_info:
+      label: Data processor
+      summary: A spatially variable genes dataset processor.
+      description: |
+        Split the common dataset for the spatially_variable_genes task.
+  arguments:
+    - name: "--input"
+      __merge__: ../../api/file_simulated_dataset.yaml
+      direction: input
+      required: true
+    - name: "--output_dataset"
+      __merge__: ../../api/file_dataset.yaml
+      direction: output
+      required: true
+    - name: "--output_solution"
+      __merge__: ../../api/file_solution.yaml
+      direction: output
+      required: true
+  resources:
+    - type: python_script
+      path: script.py
+    - path: /src/common/helper_functions/subset_anndata.py
+  test_resources:
+    - type: python_script
+      path: /src/common/comp_tests/run_and_check_adata.py
+    - path: /resources_test/spatially_variable_genes/mouse_brain_coronal_section1
+      dest: resources_test/spatially_variable_genes/mouse_brain_coronal_section1
+platforms:
+  - type: docker
+    image: ghcr.io/openproblems-bio/base_python:1.0.4
+  - type: nextflow
+    directives: 
+      label: [midtime, highmem, highcpu]
diff --git a/src/tasks/spatially_variable_genes/process_dataset/split_dataset/script.py b/src/tasks/spatially_variable_genes/process_dataset/split_dataset/script.py
new file mode 100644
index 0000000000..97bf014fa5
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/process_dataset/split_dataset/script.py
@@ -0,0 +1,34 @@
+import anndata as ad
+import sys 
+
+## VIASH START
+par = {
+    "input": "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/simulated_dataset.h5ad",
+    "output_dataset": "dataset.h5ad",
+    "output_solution": "solution.h5ad",
+}
+meta = {
+    "functionality_name": "process_dataset",
+    "resources_dir": "src/tasks/spatially_variable_genes/process_dataset",
+    "config": "target/nextflow/spatially_variable_genes/process_dataset/split_dataset/.config.vsh.yaml"
+}
+## VIASH END
+
+sys.path.append(meta['resources_dir'])
+from subset_anndata import read_config_slots_info, subset_anndata
+
+print(">> Load dataset", flush=True)
+adata = ad.read_h5ad(par["input"])
+
+print(">> Figuring out which data needs to be copied to which output file", flush=True)
+slot_info = read_config_slots_info(meta["config"])
+
+print(">> Create dataset for methods", flush=True)
+output_dataset = subset_anndata(adata, slot_info['output_dataset'])
+
+print(">> Create solution object for metrics", flush=True)
+output_solution = subset_anndata(adata, slot_info['output_solution'])
+
+print(">> Write to disk", flush=True)
+output_dataset.write_h5ad(par["output_dataset"])
+output_solution.write_h5ad(par["output_solution"])
diff --git a/src/tasks/spatially_variable_genes/resources_scripts/process_datasets.sh b/src/tasks/spatially_variable_genes/resources_scripts/process_datasets.sh
new file mode 100755
index 0000000000..74b18f465c
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/resources_scripts/process_datasets.sh
@@ -0,0 +1,110 @@
+#!/bin/bash
+
+cat > /tmp/params.yaml << 'HERE'
+param_list:
+  - id: svg_process_datasets_visium
+    input_states: "s3://openproblems-data/resources/datasets/spatial_10x_visium/**/state.yaml"
+    settings: '{"output_dataset": "$id/dataset.h5ad", "output_solution": "$id/solution.h5ad", "dataset_simulated_normalized": "$id/simulated_dataset.h5ad", "gp_k_sim": 500, "select_top_variable_genes_sim": 50, "num_reference_genes": 200, "coord_type_proc": "grid"}'
+
+  - id: svg_process_datasets_xenium
+    input_states: "s3://openproblems-data/resources/datasets/spatial_10x_xenium/**/state.yaml"
+    settings: '{"output_dataset": "$id/dataset.h5ad", "output_solution": "$id/solution.h5ad", "dataset_simulated_normalized": "$id/simulated_dataset.h5ad", "gp_k_sim": 500, "select_top_variable_genes_sim": 50, "num_reference_genes": 100, "coord_type_proc": "grid"}'
+
+  - id: svg_process_datasets_slidetags
+    input_states: "s3://openproblems-data/resources/datasets/spatial_slide_tags/**/state.yaml"
+    settings: '{"output_dataset": "$id/dataset.h5ad", "output_solution": "$id/solution.h5ad", "dataset_simulated_normalized": "$id/simulated_dataset.h5ad", "gp_k_sim": 500, "select_top_variable_genes_sim": 50, "num_reference_genes": 50, "coord_type_proc": "grid"}'
+
+  - id: svg_process_datasets_slideseqv2
+    input_states: "s3://openproblems-data/resources/datasets/spatial_slideseq_v2/**/state.yaml"
+    settings: '{"output_dataset": "$id/dataset.h5ad", "output_solution": "$id/solution.h5ad", "dataset_simulated_normalized": "$id/simulated_dataset.h5ad", "gp_k_sim": 500, "select_top_variable_genes_sim": 10, "num_reference_genes": 10, "coord_type_proc": "generic"}'
+
+  - id: svg_process_datasets_dbitseq
+    input_states: "s3://openproblems-data/resources/datasets/spatial_dbit_seq/**/state.yaml"
+    settings: '{"output_dataset": "$id/dataset.h5ad", "output_solution": "$id/solution.h5ad", "dataset_simulated_normalized": "$id/simulated_dataset.h5ad", "gp_k_sim": 500, "select_top_variable_genes_sim": 50, "num_reference_genes": 200, "coord_type_proc": "generic"}'
+
+  - id: svg_process_datasets_seqfish
+    input_states: "s3://openproblems-data/resources/datasets/spatial_seqfish/**/state.yaml"
+    settings: '{"output_dataset": "$id/dataset.h5ad", "output_solution": "$id/solution.h5ad", "dataset_simulated_normalized": "$id/simulated_dataset.h5ad", "gp_k_sim": 500, "select_top_variable_genes_sim": 25, "num_reference_genes": 25, "coord_type_proc": "generic"}'
+
+  - id: svg_process_datasets_starmap
+    input_states: "s3://openproblems-data/resources/datasets/spatial_star_map/**/state.yaml"
+    settings: '{"output_dataset": "$id/dataset.h5ad", "output_solution": "$id/solution.h5ad", "dataset_simulated_normalized": "$id/simulated_dataset.h5ad", "gp_k_sim": 500, "select_top_variable_genes_sim": 25, "num_reference_genes": 25, "coord_type_proc": "generic"}'
+
+  - id: svg_process_datasets_stereoseq
+    input_states: "s3://openproblems-data/resources/datasets/spatial_stereo_seq/**/state.yaml"
+    settings: '{"output_dataset": "$id/dataset.h5ad", "output_solution": "$id/solution.h5ad", "dataset_simulated_normalized": "$id/simulated_dataset.h5ad", "gp_k_sim": 500, "select_top_variable_genes_sim": 50, "num_reference_genes": 50, "coord_type_proc": "generic"}'
+
+rename_keys: 'input:output_dataset'
+output_state: "$id/state.yaml"
+publish_dir: s3://openproblems-data/resources/spatially_variable_genes/datasets
+HERE
+
+# cat > /tmp/params.yaml << 'HERE'
+# param_list:
+#   - id: spatial_merfish/human_cortex_1
+#     input: "s3://openproblems-data/resources/datasets/spatial_merfish/human_cortex_1/dataset.h5ad"
+#     gp_k_sim: 500
+#     select_top_variable_genes_sim: 25
+#     num_reference_genes: 25
+#     coord_type_proc: generic
+  
+#   - id: spatial_merfish/human_cortex_2
+#     input: "s3://openproblems-data/resources/datasets/spatial_merfish/human_cortex_2/dataset.h5ad"
+#     gp_k_sim: 500
+#     select_top_variable_genes_sim: 50
+#     num_reference_genes: 50
+#     coord_type_proc: generic
+  
+#   - id: spatial_merfish/human_cortex_3
+#     input: "s3://openproblems-data/resources/datasets/spatial_merfish/human_cortex_3/dataset.h5ad"
+#     gp_k_sim: 500
+#     select_top_variable_genes_sim: 50
+#     num_reference_genes: 50
+#     coord_type_proc: generic
+  
+#   - id: spatial_merfish/human_cortex_4
+#     input: "s3://openproblems-data/resources/datasets/spatial_merfish/human_cortex_4/dataset.h5ad"
+#     gp_k_sim: 500
+#     select_top_variable_genes_sim: 50
+#     num_reference_genes: 50
+#     coord_type_proc: generic
+  
+#   - id: spatial_merfish/mouse_cortex
+#     input: "s3://openproblems-data/resources/datasets/spatial_merfish/mouse_cortex/dataset.h5ad"
+#     gp_k_sim: 500
+#     select_top_variable_genes_sim: 25
+#     num_reference_genes: 25
+#     coord_type_proc: generic
+
+# output_dataset: "$id/dataset.h5ad"
+# output_solution: "$id/solution.h5ad"
+# dataset_simulated_normalized: "$id/simulated_dataset.h5ad"
+# output_state: "$id/state.yaml"
+# publish_dir: s3://openproblems-data/resources/spatially_variable_genes/datasets
+# HERE
+
+cat > /tmp/nextflow.config << HERE
+process {
+  executor = 'awsbatch'
+  withName:'.*publishStatesProc' {
+    memory = '16GB'
+    disk = '100GB'
+  }
+  withLabel:highmem {
+    memory = '350GB'
+  }
+  withLabel:hightime { 
+    time = 15.h 
+  }
+}
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision integration_build \
+  --pull-latest \
+  --main-script target/nextflow/spatially_variable_genes/workflows/process_datasets/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 6TeIFgV5OY4pJCk8I0bfOh \
+  --params-file /tmp/params.yaml \
+  --config /tmp/nextflow.config \
+  --entry-name auto \
diff --git a/src/tasks/spatially_variable_genes/resources_scripts/run_benchmark.sh b/src/tasks/spatially_variable_genes/resources_scripts/run_benchmark.sh
new file mode 100755
index 0000000000..8620bbafe8
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/resources_scripts/run_benchmark.sh
@@ -0,0 +1,64 @@
+#!/bin/bash
+
+RUN_ID="run_$(date +%Y-%m-%d_%H-%M-%S)"
+publish_dir="s3://openproblems-data/resources/spatially_variable_genes/results/${RUN_ID}"
+
+# cat > /tmp/params.yaml << HERE
+# input_states: s3://openproblems-data/resources/spatially_variable_genes/datasets/**/state.yaml
+# rename_keys: 'input_dataset:output_dataset,input_solution:output_solution'
+# output_state: "state.yaml"
+# publish_dir: "$publish_dir"
+# HERE
+
+cat > /tmp/params.yaml << HERE
+param_list:
+  - id: svg_datasets_visium
+    input_states: "s3://openproblems-data/resources/spatially_variable_genes/datasets/spatial_10x_visium/**/state.yaml"
+    settings: '{"coord_type_moran_i": "generic", "coord_type_sepal": "grid", "max_neighs_sepal": 6}'
+
+  - id: svg_datasets_xenium
+    input_states: "s3://openproblems-data/resources/spatially_variable_genes/datasets/spatial_10x_xenium/**/state.yaml"
+    settings: '{"coord_type_moran_i": "generic", "coord_type_sepal": "grid", "max_neighs_sepal": 6}'
+
+  - id: svg_datasets_dbitseq
+    input_states: "s3://openproblems-data/resources/spatially_variable_genes/datasets/spatial_dbit_seq/**/state.yaml"
+    settings: '{"coord_type_moran_i": "generic", "coord_type_sepal": "grid", "max_neighs_sepal": 4}'
+
+  - id: svg_datasets_merfish
+    input_states: "s3://openproblems-data/resources/spatially_variable_genes/datasets/spatial_merfish/**/state.yaml"
+    settings: '{"coord_type_moran_i": "generic", "coord_type_sepal": "grid", "max_neighs_sepal": 6}'
+
+  - id: svg_datasets_seqfish
+    input_states: "s3://openproblems-data/resources/spatially_variable_genes/datasets/spatial_seqfish/**/state.yaml"
+    settings: '{"coord_type_moran_i": "generic", "coord_type_sepal": "grid", "max_neighs_sepal": 6}'
+
+  - id: svg_datasets_slidetags
+    input_states: "s3://openproblems-data/resources/spatially_variable_genes/datasets/spatial_slide_tags/**/state.yaml"
+    settings: '{"coord_type_moran_i": "generic", "coord_type_sepal": "grid", "max_neighs_sepal": 6}'
+
+  - id: svg_datasets_slideseqv2
+    input_states: "s3://openproblems-data/resources/spatially_variable_genes/datasets/spatial_slideseq_v2/**/state.yaml"
+    settings: '{"coord_type_moran_i": "generic", "coord_type_sepal": "grid", "max_neighs_sepal": 6}'
+
+  - id: svg_datasets_starmap
+    input_states: "s3://openproblems-data/resources/spatially_variable_genes/datasets/spatial_star_map/**/state.yaml"
+    settings: '{"coord_type_moran_i": "generic", "coord_type_sepal": "grid", "max_neighs_sepal": 6}'
+
+  - id: svg_datasets_stereoseq
+    input_states: "s3://openproblems-data/resources/spatially_variable_genes/datasets/spatial_stereo_seq/**/state.yaml"
+    settings: '{"coord_type_moran_i": "generic", "coord_type_sepal": "grid", "max_neighs_sepal": 4}'
+
+rename_keys: 'input_dataset:output_dataset,input_solution:output_solution'
+output_state: "state.yaml"
+publish_dir: "$publish_dir"
+HERE
+
+tw launch https://github.com/openproblems-bio/openproblems.git \
+  --revision integration_build \
+  --pull-latest \
+  --main-script target/nextflow/spatially_variable_genes/workflows/run_benchmark/main.nf \
+  --workspace 53907369739130 \
+  --compute-env 1euVrtATIcRyy9Yc2ERbaZ \
+  --params-file /tmp/params.yaml \
+  --entry-name auto \
+  --config src/wf_utils/labels_tw.config \
diff --git a/src/tasks/spatially_variable_genes/resources_test_scripts/mouse_brain_coronal_section1.sh b/src/tasks/spatially_variable_genes/resources_test_scripts/mouse_brain_coronal_section1.sh
new file mode 100755
index 0000000000..2110d29f1b
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/resources_test_scripts/mouse_brain_coronal_section1.sh
@@ -0,0 +1,43 @@
+#!/bin/bash
+
+# make sure the following command has been executed
+# viash ns build -q 'spatially_variable_genes|common'
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+RAW_DATA=resources_test/common
+DATASET_DIR=resources_test/spatially_variable_genes
+
+mkdir -p $DATASET_DIR
+
+echo "Running process_dataset"
+nextflow run . \
+  -main-script target/nextflow/spatially_variable_genes/workflows/process_datasets/main.nf \
+  -profile docker \
+  -c src/wf_utils/labels_ci.config \
+  --id mouse_brain_coronal_section1 \
+  --input $RAW_DATA/mouse_brain_coronal_section1/dataset.h5ad \
+  --output_dataset dataset.h5ad \
+  --output_solution solution.h5ad \
+  --dataset_simulated_normalized simulated_dataset.h5ad \
+  --publish_dir $DATASET_DIR/mouse_brain_coronal_section1 \
+  --output_state "state.yaml" \
+  --gp_k_sim 50 \
+  --select_top_variable_genes 50 \
+  --num_reference_genes 200
+
+echo "Running control method"
+viash run src/tasks/spatially_variable_genes/control_methods/true_ranking/config.vsh.yaml -- \
+  --input_data $DATASET_DIR/mouse_brain_coronal_section1/dataset.h5ad \
+  --input_solution $DATASET_DIR/mouse_brain_coronal_section1/solution.h5ad \
+  --output $DATASET_DIR/mouse_brain_coronal_section1/output.h5ad
+
+echo "Running metric"
+viash run src/tasks/spatially_variable_genes/metrics/correlation/config.vsh.yaml -- \
+    --input_method $DATASET_DIR/mouse_brain_coronal_section1/output.h5ad \
+    --input_solution $DATASET_DIR/mouse_brain_coronal_section1/solution.h5ad \
+    --output $DATASET_DIR/mouse_brain_coronal_section1/score.h5ad
diff --git a/src/tasks/spatially_variable_genes/workflows/process_datasets/config.vsh.yaml b/src/tasks/spatially_variable_genes/workflows/process_datasets/config.vsh.yaml
new file mode 100644
index 0000000000..19f3c2f200
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/workflows/process_datasets/config.vsh.yaml
@@ -0,0 +1,67 @@
+functionality:
+  name: "process_datasets"
+  namespace: "spatially_variable_genes/workflows"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input"
+          __merge__: /src/tasks/spatially_variable_genes/api/file_common_dataset.yaml
+          description: "Input dataset"
+          required: true
+    - name: Outputs
+      arguments:
+        - name: "--output_dataset"
+          __merge__: /src/tasks/spatially_variable_genes/api/file_dataset.yaml
+          required: true
+          direction: output
+        - name: "--output_solution"
+          __merge__: /src/tasks/spatially_variable_genes/api/file_solution.yaml
+          required: true
+          direction: output
+        - name: "--dataset_simulated_normalized"
+          __merge__: /src/tasks/spatially_variable_genes/api/file_simulated_dataset.yaml
+          required: false
+          direction: output
+    - name: Simulation options
+      arguments:
+        - type: integer
+          name: --gp_k_sim
+          description: Dimension of basis used for the Gaussian process smoother.
+          default: 500
+          info:
+            test_value: 50
+        - type: integer
+          name: --select_top_variable_genes_sim
+          description: Number of top variable genes to use for subsetting.
+          default: 50
+    - name: Reference genes
+      arguments: 
+        - name: "--num_reference_genes"
+          type: integer
+          description: Number of top SVGs to select as reference.
+          default: 200
+        - name: "--coord_type_proc"
+          type: string
+          default: "grid"
+          description: "How to create spatial graph to select reference genes."
+          choices: [grid, generic]
+    - name: Normalization options
+      arguments: 
+        - name: "--n_cp"
+          type: integer
+          default: -1
+          description: "Number of counts per cell. When set to -1, will use None."
+
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - path: /src/wf_utils/helper.nf
+  dependencies:
+    - name: common/check_dataset_schema
+    - name: spatially_variable_genes/process_dataset/select_reference
+    - name: spatially_variable_genes/process_dataset/simulate_svg
+    - name: datasets/normalization/log_cp
+    - name: spatially_variable_genes/process_dataset/split_dataset
+platforms:
+  - type: nextflow
diff --git a/src/tasks/spatially_variable_genes/workflows/process_datasets/main.nf b/src/tasks/spatially_variable_genes/workflows/process_datasets/main.nf
new file mode 100644
index 0000000000..7a5f58356b
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/workflows/process_datasets/main.nf
@@ -0,0 +1,86 @@
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    | check_dataset_schema.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset": checks["exit_code"] == 0 ? state.input : null,
+        ]
+      }
+    )
+
+    // remove datasets which didn't pass the schema check
+    | filter { id, state ->
+      state.dataset != null
+    }
+
+    | select_reference.run(
+      fromState: [
+        input: "dataset",
+        num_features: "num_reference_genes",
+        coord_type_proc: "coord_type_proc"
+      ],
+      toState: [dataset: "output"]
+    )
+
+    | simulate_svg.run(
+      fromState: [
+        input: "dataset",
+        gp_k: "gp_k_sim",
+        select_top_variable_genes: "select_top_variable_genes_sim"
+      ],
+      toState: [
+        dataset_simulated: "output"
+      ]
+    )
+
+    | log_cp.run(
+      fromState: [
+        input: "dataset_simulated",
+      ],
+      toState: [
+        dataset_simulated_normalized: "output"
+      ],
+      args: [n_cp: -1]
+    )
+
+    | split_dataset.run(
+      fromState: [
+        input: "dataset_simulated_normalized"
+      ],
+      toState: [
+        output_dataset: "output_dataset",
+        output_solution: "output_solution" 
+      ]
+    )
+
+    // only output the files for which an output file was specified
+    | setState(["output_dataset", "output_solution", "dataset_simulated_normalized"])
+
+  emit:
+  output_ch
+}
diff --git a/src/tasks/spatially_variable_genes/workflows/process_datasets/run_test.sh b/src/tasks/spatially_variable_genes/workflows/process_datasets/run_test.sh
new file mode 100644
index 0000000000..b5df48aa92
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/workflows/process_datasets/run_test.sh
@@ -0,0 +1,32 @@
+#!/bin/bash
+
+# Run this prior to executing this script:
+# viash ns build -q 'spatially_variable_genes'
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+RAW_DATA=resources_test/common
+DATASET_DIR=resources_test/spatially_variable_genes
+
+mkdir -p $DATASET_DIR
+
+nextflow run . \
+  -main-script target/nextflow/spatially_variable_genes/workflows/process_datasets/main.nf \
+  -profile docker \
+  -c src/wf_utils/labels_ci.config \
+  --id mouse_brain_coronal_section1 \
+  --input $RAW_DATA/mouse_brain_coronal_section1/dataset.h5ad \
+  --output_dataset dataset.h5ad \
+  --output_solution solution.h5ad \
+  --dataset_simulated_normalized simulated_dataset.h5ad \
+  --publish_dir $DATASET_DIR/mouse_brain_coronal_section1 \
+  --output_state "state.yaml" \
+  --gp_k_sim 50 \
+  --select_top_variable_genes 50 \
+  --num_reference_genes 200
\ No newline at end of file
diff --git a/src/tasks/spatially_variable_genes/workflows/run_benchmark/config.vsh.yaml b/src/tasks/spatially_variable_genes/workflows/run_benchmark/config.vsh.yaml
new file mode 100644
index 0000000000..f1a8c599d4
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/workflows/run_benchmark/config.vsh.yaml
@@ -0,0 +1,87 @@
+functionality:
+  name: "run_benchmark"
+  namespace: "spatially_variable_genes/workflows"
+  argument_groups:
+    - name: Inputs
+      arguments:
+        - name: "--input_dataset"
+          __merge__: "/src/tasks/spatially_variable_genes/api/file_dataset.yaml"
+          required: true
+          direction: input
+        - name: "--input_solution"
+          __merge__: "/src/tasks/spatially_variable_genes/api/file_solution.yaml"
+          required: true
+          direction: input
+    - name: Method specific arguments
+      arguments:
+        - name: "--coord_type_moran_i"
+          type: string
+          required: false
+          description: Type of coordinate system to use for Moran's I. Valid options are "grid" for grid coordinates or "generic" for generic coordinates.
+          choices: [grid, generic]
+        - name: "--coord_type_sepal"
+          type: string
+          required: false
+          description: Type of coordinate system to use for Sepal. Valid options are "grid" for grid coordinates or "generic" for generic coordinates.
+          choices: [grid, generic]
+        - name: "--max_neighs_sepal"
+          type: integer
+          choices: [4, 6]
+          required: false
+          description: Maximum number of neighbors of a node in the spatial graph. Valid options are 4 (square-grid) and 6 (hexagonal-grid).
+    - name: Outputs
+      arguments:
+        - name: "--output_scores"
+          type: file
+          required: true
+          direction: output
+          description: A yaml file containing the scores of each of the methods
+          default: score_uns.yaml
+        - name: "--output_method_configs"
+          type: file
+          required: true
+          direction: output
+          default: method_configs.yaml
+        - name: "--output_metric_configs"
+          type: file
+          required: true
+          direction: output
+          default: metric_configs.yaml
+        - name: "--output_dataset_info"
+          type: file
+          required: true
+          direction: output
+          default: dataset_uns.yaml
+        - name: "--output_task_info"
+          type: file
+          required: true
+          direction: output
+          default: task_info.yaml
+  resources:
+    - type: nextflow_script
+      path: main.nf
+      entrypoint: run_wf
+    - type: file
+      path: "../../api/task_info.yaml"
+  dependencies:
+    - name: common/check_dataset_schema
+    - name: common/extract_metadata
+    - name: spatially_variable_genes/control_methods/random_ranking
+    - name: spatially_variable_genes/control_methods/true_ranking
+    - name: spatially_variable_genes/methods/boostgp
+    - name: spatially_variable_genes/methods/gpcounts
+    - name: spatially_variable_genes/methods/moran_i
+    - name: spatially_variable_genes/methods/nnsvg
+    - name: spatially_variable_genes/methods/scgco
+    - name: spatially_variable_genes/methods/sepal
+    - name: spatially_variable_genes/methods/somde
+    - name: spatially_variable_genes/methods/spagcn
+    - name: spatially_variable_genes/methods/spagft
+    - name: spatially_variable_genes/methods/spanve
+    - name: spatially_variable_genes/methods/spark
+    - name: spatially_variable_genes/methods/spark_x
+    - name: spatially_variable_genes/methods/spatialde
+    - name: spatially_variable_genes/methods/spatialde2
+    - name: spatially_variable_genes/metrics/correlation
+platforms:
+  - type: nextflow
\ No newline at end of file
diff --git a/src/tasks/spatially_variable_genes/workflows/run_benchmark/main.nf b/src/tasks/spatially_variable_genes/workflows/run_benchmark/main.nf
new file mode 100644
index 0000000000..821f0911e9
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/workflows/run_benchmark/main.nf
@@ -0,0 +1,197 @@
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // construct list of methods
+  methods = [
+    random_ranking,
+    true_ranking, 
+    boostgp, 
+    gpcounts,
+    moran_i,
+    nnsvg,
+    scgco,
+    sepal,
+    somde,
+    spagcn,
+    spagft,
+    spanve,
+    spark,
+    spark_x,
+    spatialde,
+    spatialde2
+  ]
+
+  // construct list of metrics
+  metrics = [
+    correlation
+  ]
+
+
+  /****************************
+   * EXTRACT DATASET METADATA *
+   ****************************/
+  dataset_ch = input_ch
+    // store join id
+    | map{ id, state -> 
+      [id, state + ["_meta": [join_id: id]]]
+    }
+    
+    // extract the dataset metadata
+    | extract_metadata.run(
+      fromState: [input: "input_solution"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+  /***************************
+   * RUN METHODS AND METRICS *
+   ***************************/
+  score_ch = dataset_ch
+  
+    // run all methods
+    | runEach(
+      components: methods,
+
+      // define a new 'id' by appending the method name to the dataset id
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: { id, state, comp ->
+        def new_args = [
+          input_data: state.input_dataset
+        ]
+        if (comp.config.functionality.info.type == "control_method") {
+          new_args.input_solution = state.input_solution
+        }
+        if (comp.config.functionality.name == "sepal") {
+          new_args.coord_type_sepal = state.coord_type_sepal
+          new_args.max_neighs_sepal = state.max_neighs_sepal
+        }
+        if (comp.config.functionality.name == "moran_i") {
+          new_args.coord_type_moran_i = state.coord_type_moran_i
+        }
+        new_args
+      },
+
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          method_id: comp.config.functionality.name,
+          method_output: output.output
+        ]
+      }
+    )
+    
+    // run all metrics
+    | runEach(
+      components: metrics,
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: { id, state, comp ->
+        [
+          input_solution: state.input_solution,
+          input_method: state.method_output
+        ]
+      },
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          metric_id: comp.config.functionality.name,
+          metric_output: output.output
+        ]
+      }
+    )
+
+  /******************************
+   * GENERATE OUTPUT YAML FILES *
+   ******************************/
+  // TODO: can we store everything below in a separate helper function?
+
+  // extract the dataset metadata
+  dataset_meta_ch = dataset_ch
+    | joinStates { ids, states ->
+      // store the dataset metadata in a file
+      def dataset_uns = states.collect{state ->
+        def uns = state.dataset_uns.clone()
+        uns.remove("normalization_id")
+        uns
+      }
+      def dataset_uns_yaml_blob = toYamlBlob(dataset_uns)
+      def dataset_uns_file = tempFile("dataset_uns.yaml")
+      dataset_uns_file.write(dataset_uns_yaml_blob)
+
+      ["output", [output_dataset_info: dataset_uns_file]]
+    }
+
+  output_ch = score_ch
+
+    // extract the scores
+    | extract_metadata.run(
+      key: "extract_scores",
+      fromState: [input: "metric_output"],
+      toState: { id, output, state ->
+        state + [
+          score_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | joinStates { ids, states ->
+      // store the method configs in a file
+      def method_configs = methods.collect{it.config}
+      def method_configs_yaml_blob = toYamlBlob(method_configs)
+      def method_configs_file = tempFile("method_configs.yaml")
+      method_configs_file.write(method_configs_yaml_blob)
+
+      // store the metric configs in a file
+      def metric_configs = metrics.collect{it.config}
+      def metric_configs_yaml_blob = toYamlBlob(metric_configs)
+      def metric_configs_file = tempFile("metric_configs.yaml")
+      metric_configs_file.write(metric_configs_yaml_blob)
+
+      def task_info_file = meta.resources_dir.resolve("task_info.yaml")
+
+      // store the scores in a file
+      def score_uns = states.collect{it.score_uns}
+      def score_uns_yaml_blob = toYamlBlob(score_uns)
+      def score_uns_file = tempFile("score_uns.yaml")
+      score_uns_file.write(score_uns_yaml_blob)
+
+      def new_state = [
+        output_method_configs: method_configs_file,
+        output_metric_configs: metric_configs_file,
+        output_task_info: task_info_file,
+        output_scores: score_uns_file,
+        _meta: states[0]._meta
+      ]
+
+      ["output", new_state]
+    }
+
+    // merge all of the output data 
+    | mix(dataset_meta_ch)
+    | joinStates{ ids, states ->
+      def mergedStates = states.inject([:]) { acc, m -> acc + m }
+      [ids[0], mergedStates]
+    }
+
+  emit:
+  output_ch
+}
diff --git a/src/tasks/spatially_variable_genes/workflows/run_benchmark/run_test.sh b/src/tasks/spatially_variable_genes/workflows/run_benchmark/run_test.sh
new file mode 100755
index 0000000000..a5c57c7a41
--- /dev/null
+++ b/src/tasks/spatially_variable_genes/workflows/run_benchmark/run_test.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+
+# get the root of the directory
+REPO_ROOT=$(git rev-parse --show-toplevel)
+
+# ensure that the command below is run from the root of the repository
+cd "$REPO_ROOT"
+
+set -e
+
+DATASETS_DIR="resources_test/spatially_variable_genes"
+OUTPUT_DIR="output/temp"
+
+if [ ! -d "$OUTPUT_DIR" ]; then
+  mkdir -p "$OUTPUT_DIR"
+fi
+
+nextflow run . \
+  -main-script target/nextflow/spatially_variable_genes/workflows/run_benchmark/main.nf \
+  -profile docker \
+  -resume \
+  -entry auto \
+  -c src/wf_utils/labels_ci.config \
+  --input_states "$DATASETS_DIR/**/state.yaml" \
+  --rename_keys 'input_dataset:output_dataset,input_solution:output_solution' \
+  --publish_dir "$OUTPUT_DIR" \
+  --output_state "state.yaml" \
+  --settings '{"coord_type_moran_i": "generic", "coord_type_sepal": "grid", "max_neighs_sepal": 4}'
\ No newline at end of file
diff --git a/src/wf_utils/ProfilesHelper.config b/src/wf_utils/ProfilesHelper.config
new file mode 100644
index 0000000000..35442065c6
--- /dev/null
+++ b/src/wf_utils/ProfilesHelper.config
@@ -0,0 +1,64 @@
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
diff --git a/src/wf_utils/helper.nf b/src/wf_utils/helper.nf
new file mode 100644
index 0000000000..7b3acd5b1c
--- /dev/null
+++ b/src/wf_utils/helper.nf
@@ -0,0 +1,14 @@
+Map findArgumentSchema(Map config, String argument_id) {
+  def argument_groups =
+    (config.functionality.argument_groups ?: []) +
+    [
+      arguments: config.functionality.arguments ?: []
+    ]
+
+  def schema_value = argument_groups.findResult{ gr ->
+    gr.arguments.find { arg ->
+      arg.name == ("--" + argument_id)
+    }
+  }
+  return schema_value
+}
diff --git a/src/wf_utils/labels.config b/src/wf_utils/labels.config
new file mode 100644
index 0000000000..9a29d57c48
--- /dev/null
+++ b/src/wf_utils/labels.config
@@ -0,0 +1,11 @@
+process {
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+}
diff --git a/src/wf_utils/labels_ci.config b/src/wf_utils/labels_ci.config
new file mode 100644
index 0000000000..5161976609
--- /dev/null
+++ b/src/wf_utils/labels_ci.config
@@ -0,0 +1,11 @@
+process {
+  withLabel: lowmem { memory = 5.Gb }
+  withLabel: lowcpu { cpus = 2 }
+  withLabel: midmem { memory = 5.Gb }
+  withLabel: midcpu { cpus = 2 }
+  withLabel: highmem { memory = 5.Gb }
+  withLabel: highcpu { cpus = 2 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+}
diff --git a/src/wf_utils/labels_tw.config b/src/wf_utils/labels_tw.config
new file mode 100644
index 0000000000..93a076367b
--- /dev/null
+++ b/src/wf_utils/labels_tw.config
@@ -0,0 +1,76 @@
+process {
+  executor = 'awsbatch'
+
+  // Default disk space
+  disk = 50.GB
+
+  // Retry for exit codes that have something to do with memory issues
+  errorStrategy = { task.attempt < 3 && task.exitStatus in (137) ? 'retry' : 'ignore' }
+  maxRetries = 3
+  maxMemory = null
+
+  // Resource labels
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowmem { 
+    memory = { get_memory( 20.GB * task.attempt ) } 
+    disk = { 50.GB * task.attempt } 
+  }
+  withLabel: midmem { 
+    memory = { get_memory( 50.GB * task.attempt ) }
+    disk = { 100.GB * task.attempt } 
+  }
+  withLabel: highmem {
+    memory = { get_memory( 100.GB * task.attempt ) }
+    disk = { 200.GB * task.attempt } 
+  }
+  withLabel: lowsharedmem {
+    containerOptions = { workflow.containerEngine != 'singularity' ? "--shm-size ${String.format("%.0f",task.memory.mega * 0.05)}" : ""}
+  }
+  withLabel: midsharedmem {
+    containerOptions = { workflow.containerEngine != 'singularity' ? "--shm-size ${String.format("%.0f",task.memory.mega * 0.1)}" : ""}
+  }
+  withLabel: highsharedmem {
+    containerOptions = { workflow.containerEngine != 'singularity' ? "--shm-size ${String.format("%.0f",task.memory.mega * 0.25)}" : ""}
+  }
+  withLabel: gpu {
+    accelerator = 1
+    containerOptions = { workflow.containerEngine == "singularity" ? '--nv':
+       ( workflow.containerEngine == "docker" ? '--gpus all': null ) }
+  }
+
+  // make sure publishstates gets enough disk space and memory
+  withName:'.*publishStatesProc' {
+    memory = '16GB'
+    disk = '100GB'
+  }
+}
+
+def get_memory(to_compare) {
+  if (!process.containsKey("maxMemory") || !process.maxMemory) {
+    return to_compare
+  }
+
+  try {
+    if (process.containsKey("maxRetries") && process.maxRetries && task.attempt == (process.maxRetries as int)) {
+      return process.maxMemory
+    }
+    else if (to_compare.compareTo(process.maxMemory as nextflow.util.MemoryUnit) == 1) {
+      return max_memory as nextflow.util.MemoryUnit
+    }
+    else {
+      return to_compare
+    }  
+  } catch (all) {
+        println "Error processing memory resources. Please check that process.maxMemory '${process.maxMemory}' and process.maxRetries '${process.maxRetries}' are valid!"
+        System.exit(1)
+  }
+}
+
+// set tracing file
+trace {
+    enabled = true
+    overwrite = true
+    file = "${params.publish_dir}/trace.txt"
+}
diff --git a/target/.build.yaml b/target/.build.yaml
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/target/docker/batch_integration/control_methods/no_integration/batch_embed/.config.vsh.yaml b/target/docker/batch_integration/control_methods/no_integration/batch_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..e16b78724e
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/no_integration/batch_embed/.config.vsh.yaml
@@ -0,0 +1,206 @@
+functionality:
+  name: "batch_embed"
+  namespace: "batch_integration/control_methods/no_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "No integration by Batch"
+    summary: "Cells are embedded by computing PCA independently on each batch"
+    description: "Cells are embedded by computing PCA independently on each batch"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_embed/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "embedding"
+    type_info:
+      label: "Control method (embedding)"
+      summary: "A batch integration embedding control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/batch_embed/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/no_integration/batch_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/no_integration/batch_embed/batch_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/control_methods/no_integration/batch_embed/batch_embed b/target/docker/batch_integration/control_methods/no_integration/batch_embed/batch_embed
new file mode 100755
index 0000000000..2201250b9b
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/no_integration/batch_embed/batch_embed
@@ -0,0 +1,957 @@
+#!/usr/bin/env bash
+
+# batch_embed 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="batch_embed"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "batch_embed 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/control_methods/no_integration batch_embed"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:32Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-batch_embed-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "batch_embed 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/batch_embed:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/batch_embed:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/batch_embed:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/batch_embed:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/batch_embed:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/batch_embed:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/batch_embed:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-batch_embed-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import scanpy as sc
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+adata.var["highly_variable"] = adata.var["hvg"]
+
+print("Process dataset", flush=True)
+adata.obsm["X_emb"] = np.zeros((adata.shape[0], 50), dtype=float)
+for batch in adata.obs["batch"].unique():
+    batch_idx = adata.obs["batch"] == batch
+    n_comps = min(50, np.sum(batch_idx))
+    solver = "full" if n_comps == np.sum(batch_idx) else "arpack"
+    adata.obsm["X_emb"][batch_idx, :n_comps] = sc.tl.pca(
+        adata[batch_idx].copy(),
+        n_comps=n_comps,
+        use_highly_variable=True,
+        svd_solver=solver,
+        copy=True,
+    ).obsm["X_pca"]
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/control_methods/no_integration/batch_embed/read_anndata_partial.py b/target/docker/batch_integration/control_methods/no_integration/batch_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/no_integration/batch_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/control_methods/no_integration/global_embed/.config.vsh.yaml b/target/docker/batch_integration/control_methods/no_integration/global_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..d192080c35
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/no_integration/global_embed/.config.vsh.yaml
@@ -0,0 +1,206 @@
+functionality:
+  name: "global_embed"
+  namespace: "batch_integration/control_methods/no_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "No integration"
+    summary: "Cells are embedded by PCA on the unintegrated data"
+    description: "Cells are embedded by PCA on the unintegrated data"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "embedding"
+    type_info:
+      label: "Control method (embedding)"
+      summary: "A batch integration embedding control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_embed/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/no_integration/global_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/no_integration/global_embed/global_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/control_methods/no_integration/global_embed/global_embed b/target/docker/batch_integration/control_methods/no_integration/global_embed/global_embed
new file mode 100755
index 0000000000..415e1ff2e6
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/no_integration/global_embed/global_embed
@@ -0,0 +1,943 @@
+#!/usr/bin/env bash
+
+# global_embed 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="global_embed"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "global_embed 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/control_methods/no_integration global_embed"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:32Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-global_embed-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "global_embed 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_embed:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_embed:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_embed:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_embed:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_embed:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_embed:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_embed:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-global_embed-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+print("process dataset", flush=True)
+adata.obsm["X_emb"] = adata.obsm["X_pca"]
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/control_methods/no_integration/global_embed/read_anndata_partial.py b/target/docker/batch_integration/control_methods/no_integration/global_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/no_integration/global_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/control_methods/no_integration/global_feature/.config.vsh.yaml b/target/docker/batch_integration/control_methods/no_integration/global_feature/.config.vsh.yaml
new file mode 100644
index 0000000000..c967c209ff
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/no_integration/global_feature/.config.vsh.yaml
@@ -0,0 +1,206 @@
+functionality:
+  name: "global_feature"
+  namespace: "batch_integration/control_methods/no_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "No integration"
+    summary: "Original feature space is not modified"
+    description: "Original feature space is not modified"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "feature"
+    type_info:
+      label: "Control method (feature)"
+      summary: "A batch integration feature control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ feature space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_feature/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/no_integration/global_feature"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/no_integration/global_feature/global_feature"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/control_methods/no_integration/global_feature/global_feature b/target/docker/batch_integration/control_methods/no_integration/global_feature/global_feature
new file mode 100755
index 0000000000..ba14d14d0d
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/no_integration/global_feature/global_feature
@@ -0,0 +1,945 @@
+#!/usr/bin/env bash
+
+# global_feature 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="global_feature"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "global_feature 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/control_methods/no_integration global_feature"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:31Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-global_feature-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "global_feature 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_feature:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_feature:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_feature:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_feature:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_feature:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_feature:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_feature:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-global_feature-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+# no processing, subset matrix to highly variable genes
+adata_hvg = adata[:, adata.var["hvg"]].copy()
+adata.layers['corrected_counts'] = adata_hvg.X.copy()
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/control_methods/no_integration/global_feature/read_anndata_partial.py b/target/docker/batch_integration/control_methods/no_integration/global_feature/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/no_integration/global_feature/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/control_methods/no_integration/global_graph/.config.vsh.yaml b/target/docker/batch_integration/control_methods/no_integration/global_graph/.config.vsh.yaml
new file mode 100644
index 0000000000..2512db161f
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/no_integration/global_graph/.config.vsh.yaml
@@ -0,0 +1,217 @@
+functionality:
+  name: "global_graph"
+  namespace: "batch_integration/control_methods/no_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "graph"
+      label: "Integrated Graph"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        obsp:
+        - type: "double"
+          name: "connectivities"
+          description: "Neighbors connectivities matrix."
+          required: true
+        - type: "double"
+          name: "distances"
+          description: "Neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "object"
+          name: "neighbors"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "No integration"
+    summary: "kNN graph is built on the PCA of the unintegrated data"
+    description: "Cells are embedded by PCA on the unintegrated data. A kNN graph\
+      \ is built on this PCA."
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "graph"
+    type_info:
+      label: "Control method (graph)"
+      summary: "A batch integration graph control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ cell graphs.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_graph/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/no_integration/global_graph"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/no_integration/global_graph/global_graph"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/control_methods/no_integration/global_graph/global_graph b/target/docker/batch_integration/control_methods/no_integration/global_graph/global_graph
new file mode 100755
index 0000000000..a8b29fa99f
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/no_integration/global_graph/global_graph
@@ -0,0 +1,947 @@
+#!/usr/bin/env bash
+
+# global_graph 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="global_graph"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "global_graph 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/control_methods/no_integration global_graph"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:31Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-global_graph-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "global_graph 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_graph:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_graph:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_graph:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_graph:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_graph:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_graph:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_graph:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-global_graph-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import scanpy as sc
+import sys
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _set_uns
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsp='obsp',
+    uns='uns'
+)
+
+print("process dataset", flush=True)
+neighbors_map = adata.uns['knn']
+adata.obsp['connectivities'] = adata.obsp[neighbors_map['connectivities_key']]
+adata.obsp['distances'] = adata.obsp[neighbors_map['distances_key']]
+_set_uns(adata, neighbors_key='knn')
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/control_methods/no_integration/global_graph/read_anndata_partial.py b/target/docker/batch_integration/control_methods/no_integration/global_graph/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/no_integration/global_graph/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/control_methods/no_integration/global_graph/utils.py b/target/docker/batch_integration/control_methods/no_integration/global_graph/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/no_integration/global_graph/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/docker/batch_integration/control_methods/perfect_integration/celltype_embed/.config.vsh.yaml b/target/docker/batch_integration/control_methods/perfect_integration/celltype_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..fe5b823a24
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/perfect_integration/celltype_embed/.config.vsh.yaml
@@ -0,0 +1,208 @@
+functionality:
+  name: "celltype_embed"
+  namespace: "batch_integration/control_methods/perfect_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Perfect embedding by cell type"
+    summary: "Cells are embedded as a one-hot encoding of celltype labels"
+    description: "Cells are embedded as a one-hot encoding of celltype labels"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "embedding"
+    type_info:
+      label: "Control method (embedding)"
+      summary: "A batch integration embedding control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/perfect_integration/celltype_embed/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/perfect_integration/celltype_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/perfect_integration/celltype_embed/celltype_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/control_methods/perfect_integration/celltype_embed/celltype_embed b/target/docker/batch_integration/control_methods/perfect_integration/celltype_embed/celltype_embed
new file mode 100755
index 0000000000..282d93e8ac
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/perfect_integration/celltype_embed/celltype_embed
@@ -0,0 +1,942 @@
+#!/usr/bin/env bash
+
+# celltype_embed 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="celltype_embed"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "celltype_embed 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/control_methods/perfect_integration celltype_embed"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:33Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-celltype_embed-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "celltype_embed 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/perfect_integration/celltype_embed:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/perfect_integration/celltype_embed:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/perfect_integration/celltype_embed:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/perfect_integration/celltype_embed:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/perfect_integration/celltype_embed:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/perfect_integration/celltype_embed:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/perfect_integration/celltype_embed:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-celltype_embed-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import sys
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+sys.path.append(meta["resources_dir"])
+from utils import _perfect_embedding
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    uns='uns'
+)
+
+print('Process data...', flush=True)
+adata.obsm["X_emb"] = _perfect_embedding(partition=adata.obs["label"])
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/control_methods/perfect_integration/celltype_embed/read_anndata_partial.py b/target/docker/batch_integration/control_methods/perfect_integration/celltype_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/perfect_integration/celltype_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/control_methods/perfect_integration/celltype_embed/utils.py b/target/docker/batch_integration/control_methods/perfect_integration/celltype_embed/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/perfect_integration/celltype_embed/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/docker/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/.config.vsh.yaml b/target/docker/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..90b8dd8b6b
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/.config.vsh.yaml
@@ -0,0 +1,220 @@
+functionality:
+  name: "celltype_jitter_embed"
+  namespace: "batch_integration/control_methods/perfect_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--jitter"
+    info: null
+    default:
+    - 0.01
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Perfect embedding by celltype with jitter"
+    summary: "Cells are embedded as a one-hot encoding of celltype labels, with a\
+      \ small amount of random noise added to the embedding"
+    description: "Cells are embedded as a one-hot encoding of celltype labels, with\
+      \ a small amount of random noise added to the embedding"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_embed/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "embedding"
+    type_info:
+      label: "Control method (embedding)"
+      summary: "A batch integration embedding control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/perfect_integration/celltype_jitter_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/celltype_jitter_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/celltype_jitter_embed b/target/docker/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/celltype_jitter_embed
new file mode 100755
index 0000000000..5f24c06058
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/celltype_jitter_embed
@@ -0,0 +1,972 @@
+#!/usr/bin/env bash
+
+# celltype_jitter_embed 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="celltype_jitter_embed"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "celltype_jitter_embed 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+  echo ""
+  echo "    --jitter"
+  echo "        type: double"
+  echo "        default: 0.01"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/control_methods/perfect_integration celltype_jitter_embed"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:32Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-celltype_jitter_embed-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "celltype_jitter_embed 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --jitter)
+            [ -n "$VIASH_PAR_JITTER" ] && ViashError Bad arguments for option \'--jitter\': \'$VIASH_PAR_JITTER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_JITTER="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --jitter. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --jitter=*)
+            [ -n "$VIASH_PAR_JITTER" ] && ViashError Bad arguments for option \'--jitter=*\': \'$VIASH_PAR_JITTER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_JITTER=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/perfect_integration/celltype_jitter_embed:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/perfect_integration/celltype_jitter_embed:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/perfect_integration/celltype_jitter_embed:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/perfect_integration/celltype_jitter_embed:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_JITTER+x} ]; then
+  VIASH_PAR_JITTER="0.01"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_JITTER" ]]; then
+  if ! [[ "$VIASH_PAR_JITTER" =~ ^[-+]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)([eE][-+]?[0-9]+)?$ ]]; then
+    ViashError '--jitter' has to be a double. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/perfect_integration/celltype_jitter_embed:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/perfect_integration/celltype_jitter_embed:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/perfect_integration/celltype_jitter_embed:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-celltype_jitter_embed-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import sys
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'jitter': $( if [ ! -z ${VIASH_PAR_JITTER+x} ]; then echo "float(r'${VIASH_PAR_JITTER//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+sys.path.append(meta["resources_dir"])
+from utils import _perfect_embedding
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    uns='uns'
+)
+
+print('Process data...', flush=True)
+adata.obsm["X_emb"] = _perfect_embedding(
+    partition=adata.obs["label"],
+    jitter=par["jitter"]
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/read_anndata_partial.py b/target/docker/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/utils.py b/target/docker/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/docker/batch_integration/control_methods/random_integration/batch_embed/.config.vsh.yaml b/target/docker/batch_integration/control_methods/random_integration/batch_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..0fbd433e07
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/batch_embed/.config.vsh.yaml
@@ -0,0 +1,208 @@
+functionality:
+  name: "batch_embed"
+  namespace: "batch_integration/control_methods/random_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Random integration by batch"
+    summary: "Embedding coordinates are randomly permuted within each batch"
+    description: "Embedding coordinates are randomly permuted within each batch"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "embedding"
+    type_info:
+      label: "Control method (embedding)"
+      summary: "A batch integration embedding control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_embed/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/random_integration/batch_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/random_integration/batch_embed/batch_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/control_methods/random_integration/batch_embed/batch_embed b/target/docker/batch_integration/control_methods/random_integration/batch_embed/batch_embed
new file mode 100755
index 0000000000..de5ba01ac3
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/batch_embed/batch_embed
@@ -0,0 +1,947 @@
+#!/usr/bin/env bash
+
+# batch_embed 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="batch_embed"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "batch_embed 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/control_methods/random_integration batch_embed"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:30Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-batch_embed-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "batch_embed 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_embed:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_embed:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_embed:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_embed:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_embed:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_embed:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_embed:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-batch_embed-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_features
+from read_anndata_partial import read_anndata
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+print("process dataset", flush=True)
+adata.obsm["X_emb"] = _randomize_features(
+    adata.obsm["X_pca"],
+    partition=adata.obs["batch"],
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/control_methods/random_integration/batch_embed/read_anndata_partial.py b/target/docker/batch_integration/control_methods/random_integration/batch_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/batch_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/control_methods/random_integration/batch_embed/utils.py b/target/docker/batch_integration/control_methods/random_integration/batch_embed/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/batch_embed/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/docker/batch_integration/control_methods/random_integration/batch_feature/.config.vsh.yaml b/target/docker/batch_integration/control_methods/random_integration/batch_feature/.config.vsh.yaml
new file mode 100644
index 0000000000..8f5168d5b2
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/batch_feature/.config.vsh.yaml
@@ -0,0 +1,208 @@
+functionality:
+  name: "batch_feature"
+  namespace: "batch_integration/control_methods/random_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Random integration by batch"
+    summary: "Feature values are randomly permuted within each batch"
+    description: "Feature values are randomly permuted within each batch"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "acf5c95a7306b819c4a13972783433d0a48f769b"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "feature"
+    type_info:
+      label: "Control method (feature)"
+      summary: "A batch integration feature control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ feature space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_feature/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/random_integration/batch_feature"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/random_integration/batch_feature/batch_feature"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/control_methods/random_integration/batch_feature/batch_feature b/target/docker/batch_integration/control_methods/random_integration/batch_feature/batch_feature
new file mode 100755
index 0000000000..499b204de7
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/batch_feature/batch_feature
@@ -0,0 +1,949 @@
+#!/usr/bin/env bash
+
+# batch_feature 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="batch_feature"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "batch_feature 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/control_methods/random_integration batch_feature"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:36Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-batch_feature-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "batch_feature 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_feature:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_feature:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_feature:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_feature:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_feature:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_feature:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_feature:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-batch_feature-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import sys
+
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_features
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+adata.layers['corrected_counts'] = _randomize_features(
+    adata.X,
+    partition=adata.obs["batch"],
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/control_methods/random_integration/batch_feature/read_anndata_partial.py b/target/docker/batch_integration/control_methods/random_integration/batch_feature/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/batch_feature/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/control_methods/random_integration/batch_feature/utils.py b/target/docker/batch_integration/control_methods/random_integration/batch_feature/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/batch_feature/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/docker/batch_integration/control_methods/random_integration/batch_graph/.config.vsh.yaml b/target/docker/batch_integration/control_methods/random_integration/batch_graph/.config.vsh.yaml
new file mode 100644
index 0000000000..259d155742
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/batch_graph/.config.vsh.yaml
@@ -0,0 +1,216 @@
+functionality:
+  name: "batch_graph"
+  namespace: "batch_integration/control_methods/random_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "graph"
+      label: "Integrated Graph"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        obsp:
+        - type: "double"
+          name: "connectivities"
+          description: "Neighbors connectivities matrix."
+          required: true
+        - type: "double"
+          name: "distances"
+          description: "Neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "object"
+          name: "neighbors"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Random integration"
+    summary: "Graph connectivity values are randomly permuted within each batch"
+    description: "Graph connectivity values are randomly permuted within each batch"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "graph"
+    type_info:
+      label: "Control method (graph)"
+      summary: "A batch integration graph control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ cell graphs.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_graph/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/random_integration/batch_graph"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/random_integration/batch_graph/batch_graph"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/control_methods/random_integration/batch_graph/batch_graph b/target/docker/batch_integration/control_methods/random_integration/batch_graph/batch_graph
new file mode 100755
index 0000000000..51b7669047
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/batch_graph/batch_graph
@@ -0,0 +1,948 @@
+#!/usr/bin/env bash
+
+# batch_graph 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="batch_graph"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "batch_graph 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/control_methods/random_integration batch_graph"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:30Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-batch_graph-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "batch_graph 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_graph:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_graph:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_graph:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_graph:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_graph:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_graph:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_graph:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-batch_graph-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import sys
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_graph
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsp='obsp',
+    uns='uns'
+)
+
+print('Randomize graph...', flush=True)
+adata = _randomize_graph(
+    adata,
+    neighbors_key="knn",
+    partition=adata.obs["batch"],
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/control_methods/random_integration/batch_graph/read_anndata_partial.py b/target/docker/batch_integration/control_methods/random_integration/batch_graph/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/batch_graph/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/control_methods/random_integration/batch_graph/utils.py b/target/docker/batch_integration/control_methods/random_integration/batch_graph/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/batch_graph/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/docker/batch_integration/control_methods/random_integration/celltype_embed/.config.vsh.yaml b/target/docker/batch_integration/control_methods/random_integration/celltype_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..f7c6c91d9f
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/celltype_embed/.config.vsh.yaml
@@ -0,0 +1,208 @@
+functionality:
+  name: "celltype_embed"
+  namespace: "batch_integration/control_methods/random_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Random embedding by cell type"
+    summary: "Embedding coordinates are randomized within celltype labels"
+    description: "Embedding coordinates are randomized within celltype labels"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "embedding"
+    type_info:
+      label: "Control method (embedding)"
+      summary: "A batch integration embedding control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_embed/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/random_integration/celltype_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/random_integration/celltype_embed/celltype_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/control_methods/random_integration/celltype_embed/celltype_embed b/target/docker/batch_integration/control_methods/random_integration/celltype_embed/celltype_embed
new file mode 100755
index 0000000000..6f0a6e4052
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/celltype_embed/celltype_embed
@@ -0,0 +1,946 @@
+#!/usr/bin/env bash
+
+# celltype_embed 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="celltype_embed"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "celltype_embed 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/control_methods/random_integration celltype_embed"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:31Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-celltype_embed-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "celltype_embed 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_embed:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_embed:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_embed:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_embed:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_embed:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_embed:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_embed:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-celltype_embed-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import sys
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_features
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+print('Process data...', flush=True)
+adata.obsm["X_emb"] = _randomize_features(
+    adata.obsm["X_pca"],
+    partition=adata.obs["label"]
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/control_methods/random_integration/celltype_embed/read_anndata_partial.py b/target/docker/batch_integration/control_methods/random_integration/celltype_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/celltype_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/control_methods/random_integration/celltype_embed/utils.py b/target/docker/batch_integration/control_methods/random_integration/celltype_embed/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/celltype_embed/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/docker/batch_integration/control_methods/random_integration/celltype_feature/.config.vsh.yaml b/target/docker/batch_integration/control_methods/random_integration/celltype_feature/.config.vsh.yaml
new file mode 100644
index 0000000000..1cfc4d01f5
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/celltype_feature/.config.vsh.yaml
@@ -0,0 +1,208 @@
+functionality:
+  name: "celltype_feature"
+  namespace: "batch_integration/control_methods/random_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Random feature by cell type"
+    summary: "Features are randomized within celltype labels"
+    description: "Features are randomized within celltype labels"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "feature"
+    type_info:
+      label: "Control method (feature)"
+      summary: "A batch integration feature control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ feature space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_feature/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/random_integration/celltype_feature"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/random_integration/celltype_feature/celltype_feature"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/control_methods/random_integration/celltype_feature/celltype_feature b/target/docker/batch_integration/control_methods/random_integration/celltype_feature/celltype_feature
new file mode 100755
index 0000000000..8a7410617d
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/celltype_feature/celltype_feature
@@ -0,0 +1,949 @@
+#!/usr/bin/env bash
+
+# celltype_feature 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="celltype_feature"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "celltype_feature 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/control_methods/random_integration celltype_feature"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:36Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-celltype_feature-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "celltype_feature 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_feature:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_feature:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_feature:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_feature:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_feature:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_feature:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_feature:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-celltype_feature-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_features
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+print("Process data...", flush=True)
+adata.layers['corrected_counts'] = _randomize_features(
+    adata.X,
+    partition=adata.obs["label"]
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/control_methods/random_integration/celltype_feature/read_anndata_partial.py b/target/docker/batch_integration/control_methods/random_integration/celltype_feature/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/celltype_feature/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/control_methods/random_integration/celltype_feature/utils.py b/target/docker/batch_integration/control_methods/random_integration/celltype_feature/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/celltype_feature/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/docker/batch_integration/control_methods/random_integration/celltype_graph/.config.vsh.yaml b/target/docker/batch_integration/control_methods/random_integration/celltype_graph/.config.vsh.yaml
new file mode 100644
index 0000000000..073f536f69
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/celltype_graph/.config.vsh.yaml
@@ -0,0 +1,216 @@
+functionality:
+  name: "celltype_graph"
+  namespace: "batch_integration/control_methods/random_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "graph"
+      label: "Integrated Graph"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        obsp:
+        - type: "double"
+          name: "connectivities"
+          description: "Neighbors connectivities matrix."
+          required: true
+        - type: "double"
+          name: "distances"
+          description: "Neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "object"
+          name: "neighbors"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Random graph by cell type"
+    summary: "Graph connectivities are randomized within celltype labels"
+    description: "Graph connectivities are randomized within celltype labels"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "graph"
+    type_info:
+      label: "Control method (graph)"
+      summary: "A batch integration graph control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ cell graphs.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_graph/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/random_integration/celltype_graph"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/random_integration/celltype_graph/celltype_graph"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/control_methods/random_integration/celltype_graph/celltype_graph b/target/docker/batch_integration/control_methods/random_integration/celltype_graph/celltype_graph
new file mode 100755
index 0000000000..b160cdd898
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/celltype_graph/celltype_graph
@@ -0,0 +1,947 @@
+#!/usr/bin/env bash
+
+# celltype_graph 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="celltype_graph"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "celltype_graph 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/control_methods/random_integration celltype_graph"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:31Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-celltype_graph-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "celltype_graph 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_graph:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_graph:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_graph:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_graph:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_graph:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_graph:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_graph:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-celltype_graph-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_graph
+from read_anndata_partial import read_anndata
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsp='obsp',
+    uns='uns'
+)
+
+print("Process data...", flush=True)
+adata = _randomize_graph(
+    adata,
+    neighbors_key="knn",
+    partition=adata.obs["label"],
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/control_methods/random_integration/celltype_graph/read_anndata_partial.py b/target/docker/batch_integration/control_methods/random_integration/celltype_graph/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/celltype_graph/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/control_methods/random_integration/celltype_graph/utils.py b/target/docker/batch_integration/control_methods/random_integration/celltype_graph/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/celltype_graph/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/docker/batch_integration/control_methods/random_integration/global_embed/.config.vsh.yaml b/target/docker/batch_integration/control_methods/random_integration/global_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..597e7a262d
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/global_embed/.config.vsh.yaml
@@ -0,0 +1,208 @@
+functionality:
+  name: "global_embed"
+  namespace: "batch_integration/control_methods/random_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Random integration"
+    summary: "Embedding coordinates are randomly permuted"
+    description: "Embedding coordinates are randomly permuted"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "embedding"
+    type_info:
+      label: "Control method (embedding)"
+      summary: "A batch integration embedding control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_embed/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/random_integration/global_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/random_integration/global_embed/global_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/control_methods/random_integration/global_embed/global_embed b/target/docker/batch_integration/control_methods/random_integration/global_embed/global_embed
new file mode 100755
index 0000000000..5edc174615
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/global_embed/global_embed
@@ -0,0 +1,944 @@
+#!/usr/bin/env bash
+
+# global_embed 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="global_embed"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "global_embed 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/control_methods/random_integration global_embed"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:30Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-global_embed-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "global_embed 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_embed:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_embed:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_embed:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_embed:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_embed:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_embed:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_embed:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-global_embed-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_features
+from read_anndata_partial import read_anndata
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+print("process dataset", flush=True)
+adata.obsm["X_emb"] = _randomize_features(adata.obsm["X_pca"])
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/control_methods/random_integration/global_embed/read_anndata_partial.py b/target/docker/batch_integration/control_methods/random_integration/global_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/global_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/control_methods/random_integration/global_embed/utils.py b/target/docker/batch_integration/control_methods/random_integration/global_embed/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/global_embed/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/docker/batch_integration/control_methods/random_integration/global_feature/.config.vsh.yaml b/target/docker/batch_integration/control_methods/random_integration/global_feature/.config.vsh.yaml
new file mode 100644
index 0000000000..56a975dc3e
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/global_feature/.config.vsh.yaml
@@ -0,0 +1,208 @@
+functionality:
+  name: "global_feature"
+  namespace: "batch_integration/control_methods/random_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Random integration"
+    summary: "Feature values are randomly permuted"
+    description: "Feature values are randomly permuted"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "acf5c95a7306b819c4a13972783433d0a48f769b"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "feature"
+    type_info:
+      label: "Control method (feature)"
+      summary: "A batch integration feature control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ feature space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_feature/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/random_integration/global_feature"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/random_integration/global_feature/global_feature"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/control_methods/random_integration/global_feature/global_feature b/target/docker/batch_integration/control_methods/random_integration/global_feature/global_feature
new file mode 100755
index 0000000000..5646045131
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/global_feature/global_feature
@@ -0,0 +1,945 @@
+#!/usr/bin/env bash
+
+# global_feature 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="global_feature"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "global_feature 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/control_methods/random_integration global_feature"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:36Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-global_feature-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "global_feature 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_feature:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_feature:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_feature:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_feature:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_feature:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_feature:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_feature:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-global_feature-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import sys
+
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_features
+from read_anndata_partial import read_anndata
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+adata.layers['corrected_counts'] = _randomize_features(adata.X)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/control_methods/random_integration/global_feature/read_anndata_partial.py b/target/docker/batch_integration/control_methods/random_integration/global_feature/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/global_feature/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/control_methods/random_integration/global_feature/utils.py b/target/docker/batch_integration/control_methods/random_integration/global_feature/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/global_feature/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/docker/batch_integration/control_methods/random_integration/global_graph/.config.vsh.yaml b/target/docker/batch_integration/control_methods/random_integration/global_graph/.config.vsh.yaml
new file mode 100644
index 0000000000..74ce13136d
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/global_graph/.config.vsh.yaml
@@ -0,0 +1,216 @@
+functionality:
+  name: "global_graph"
+  namespace: "batch_integration/control_methods/random_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "graph"
+      label: "Integrated Graph"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        obsp:
+        - type: "double"
+          name: "connectivities"
+          description: "Neighbors connectivities matrix."
+          required: true
+        - type: "double"
+          name: "distances"
+          description: "Neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "object"
+          name: "neighbors"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Random integration"
+    summary: "Graph connectivity values are randomly permuted"
+    description: "Graph connectivity values are randomly permuted"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "graph"
+    type_info:
+      label: "Control method (graph)"
+      summary: "A batch integration graph control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ cell graphs.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_graph/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/random_integration/global_graph"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/control_methods/random_integration/global_graph/global_graph"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/control_methods/random_integration/global_graph/global_graph b/target/docker/batch_integration/control_methods/random_integration/global_graph/global_graph
new file mode 100755
index 0000000000..2d9578f545
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/global_graph/global_graph
@@ -0,0 +1,944 @@
+#!/usr/bin/env bash
+
+# global_graph 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="global_graph"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "global_graph 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/control_methods/random_integration global_graph"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:37Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-global_graph-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "global_graph 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_graph:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_graph:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_graph:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_graph:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_graph:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_graph:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_graph:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-global_graph-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import sys
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_graph
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsp='obsp',
+    uns='uns'
+)
+
+print('Randomize graph...', flush=True)
+adata = _randomize_graph(adata, neighbors_key="knn")
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/control_methods/random_integration/global_graph/read_anndata_partial.py b/target/docker/batch_integration/control_methods/random_integration/global_graph/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/global_graph/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/control_methods/random_integration/global_graph/utils.py b/target/docker/batch_integration/control_methods/random_integration/global_graph/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/docker/batch_integration/control_methods/random_integration/global_graph/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/docker/batch_integration/methods/bbknn/.config.vsh.yaml b/target/docker/batch_integration/methods/bbknn/.config.vsh.yaml
new file mode 100644
index 0000000000..b256483e2c
--- /dev/null
+++ b/target/docker/batch_integration/methods/bbknn/.config.vsh.yaml
@@ -0,0 +1,267 @@
+functionality:
+  name: "bbknn"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "graph"
+      label: "Integrated Graph"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        obsp:
+        - type: "double"
+          name: "connectivities"
+          description: "Neighbors connectivities matrix."
+          required: true
+        - type: "double"
+          name: "distances"
+          description: "Neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "object"
+          name: "neighbors"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--annoy_n_trees"
+    description: "Number of trees to use in the annoy forrest."
+    info: null
+    default:
+    - 10
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--neighbors_within_batch"
+    description: "Number of neighbors to report within each batch."
+    info: null
+    default:
+    - 3
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to use."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "BBKNN"
+    summary: "BBKNN creates k nearest neighbours graph by identifying neighbours within\
+      \ batches, then combining and processing them with UMAP for visualization."
+    description: "\"BBKNN or batch balanced k nearest neighbours graph is built for\
+      \ each cell by\nidentifying its k nearest neighbours within each defined batch\
+      \ separately,\ncreating independent neighbour sets for each cell in each batch.\
+      \ These sets\nare then combined and processed with the UMAP algorithm for visualisation.\"\
+      \n"
+    reference: "polanski2020bbknn"
+    repository_url: "https://github.com/Teichlab/bbknn"
+    documentation_url: "https://github.com/Teichlab/bbknn#readme"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/bbknn.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      bbknn_full_unscaled: null
+      bbknn_full_scaled:
+        preferred_normalization: "log_cp10k_scaled"
+    type: "method"
+    subtype: "graph"
+    type_info:
+      label: "Method (graph)"
+      summary: "A batch integration graph method."
+      description: "A batch integration method which outputs a batch-corrected cell\
+        \ graphs.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "bbknn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/bbknn/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/bbknn"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/bbknn/bbknn"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/methods/bbknn/bbknn b/target/docker/batch_integration/methods/bbknn/bbknn
new file mode 100755
index 0000000000..1690f94802
--- /dev/null
+++ b/target/docker/batch_integration/methods/bbknn/bbknn
@@ -0,0 +1,1052 @@
+#!/usr/bin/env bash
+
+# bbknn 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="bbknn"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "bbknn 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+  echo ""
+  echo "    --annoy_n_trees"
+  echo "        type: integer"
+  echo "        default: 10"
+  echo "        Number of trees to use in the annoy forrest."
+  echo ""
+  echo "    --neighbors_within_batch"
+  echo "        type: integer"
+  echo "        default: 3"
+  echo "        Number of neighbors to report within each batch."
+  echo ""
+  echo "    --n_hvg"
+  echo "        type: integer"
+  echo "        default: 2000"
+  echo "        Number of highly variable genes to use."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "bbknn"
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/methods bbknn"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:35Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-bbknn-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "bbknn 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --annoy_n_trees)
+            [ -n "$VIASH_PAR_ANNOY_N_TREES" ] && ViashError Bad arguments for option \'--annoy_n_trees\': \'$VIASH_PAR_ANNOY_N_TREES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_ANNOY_N_TREES="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --annoy_n_trees. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --annoy_n_trees=*)
+            [ -n "$VIASH_PAR_ANNOY_N_TREES" ] && ViashError Bad arguments for option \'--annoy_n_trees=*\': \'$VIASH_PAR_ANNOY_N_TREES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_ANNOY_N_TREES=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --neighbors_within_batch)
+            [ -n "$VIASH_PAR_NEIGHBORS_WITHIN_BATCH" ] && ViashError Bad arguments for option \'--neighbors_within_batch\': \'$VIASH_PAR_NEIGHBORS_WITHIN_BATCH\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NEIGHBORS_WITHIN_BATCH="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --neighbors_within_batch. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --neighbors_within_batch=*)
+            [ -n "$VIASH_PAR_NEIGHBORS_WITHIN_BATCH" ] && ViashError Bad arguments for option \'--neighbors_within_batch=*\': \'$VIASH_PAR_NEIGHBORS_WITHIN_BATCH\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NEIGHBORS_WITHIN_BATCH=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hvg)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hvg=*)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg=*\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/bbknn:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/bbknn:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/bbknn:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/bbknn:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_ANNOY_N_TREES+x} ]; then
+  VIASH_PAR_ANNOY_N_TREES="10"
+fi
+if [ -z ${VIASH_PAR_NEIGHBORS_WITHIN_BATCH+x} ]; then
+  VIASH_PAR_NEIGHBORS_WITHIN_BATCH="3"
+fi
+if [ -z ${VIASH_PAR_N_HVG+x} ]; then
+  VIASH_PAR_N_HVG="2000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_ANNOY_N_TREES" ]]; then
+  if ! [[ "$VIASH_PAR_ANNOY_N_TREES" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--annoy_n_trees' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_NEIGHBORS_WITHIN_BATCH" ]]; then
+  if ! [[ "$VIASH_PAR_NEIGHBORS_WITHIN_BATCH" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--neighbors_within_batch' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_N_HVG" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hvg' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/methods/bbknn:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/bbknn:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/bbknn:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-bbknn-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+import scanpy as sc
+import bbknn
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'annoy_n_trees': $( if [ ! -z ${VIASH_PAR_ANNOY_N_TREES+x} ]; then echo "int(r'${VIASH_PAR_ANNOY_N_TREES//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'neighbors_within_batch': $( if [ ! -z ${VIASH_PAR_NEIGHBORS_WITHIN_BATCH+x} ]; then echo "int(r'${VIASH_PAR_NEIGHBORS_WITHIN_BATCH//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+if par['n_hvg']:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var['hvg_score'].to_numpy().argsort()[::-1][:par['n_hvg']]
+    adata = adata[:, idx].copy()
+    sc.pp.pca(adata)
+
+print('Run BBKNN', flush=True)
+kwargs = dict(batch_key='batch', copy=True)
+kwargs['annoy_n_trees'] = par['annoy_n_trees']
+kwargs['neighbors_within_batch'] = par['neighbors_within_batch']
+
+ad_bbknn = bbknn.bbknn(adata, **kwargs)
+
+print("Store output", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    obsp={
+        'connectivities': ad_bbknn.obsp['connectivities'],
+        'distances': ad_bbknn.obsp['distances'],
+    },
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+        'neighbors': ad_bbknn.uns['neighbors']
+    }
+)
+
+print("Store outputs", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/methods/bbknn/read_anndata_partial.py b/target/docker/batch_integration/methods/bbknn/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/methods/bbknn/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/methods/combat/.config.vsh.yaml b/target/docker/batch_integration/methods/combat/.config.vsh.yaml
new file mode 100644
index 0000000000..d9c828b15d
--- /dev/null
+++ b/target/docker/batch_integration/methods/combat/.config.vsh.yaml
@@ -0,0 +1,231 @@
+functionality:
+  name: "combat"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to use."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Combat"
+    summary: "Adjusting batch effects in microarray expression data using empirical\
+      \ Bayes methods"
+    description: "\"An Empirical Bayes (EB) approach to correct for batch effects.\
+      \ It\nestimates batch-specific parameters by pooling information across genes\
+      \ in\neach batch and shrinks the estimates towards the overall mean of the batch\n\
+      effect estimates across all genes. These parameters are then used to adjust\n\
+      the data for batch effects, leading to more accurate and reproducible\nresults.\"\
+      \n"
+    reference: "hansen2012removing"
+    repository_url: "https://scanpy.readthedocs.io/en/stable/api/scanpy.pp.combat.html"
+    documentation_url: "https://scanpy.readthedocs.io/en/stable/api/scanpy.pp.combat.html"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/combat.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      combat_full_unscaled: null
+      combat_full_scaled:
+        preferred_normalization: "log_cp10k_scaled"
+    type: "method"
+    subtype: "feature"
+    type_info:
+      label: "Method (feature)"
+      summary: "A batch integration feature method."
+      description: "A batch integration method which outputs a batch-corrected feature-space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/combat/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/combat"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/combat/combat"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/methods/combat/combat b/target/docker/batch_integration/methods/combat/combat
new file mode 100755
index 0000000000..0ae52ebb9f
--- /dev/null
+++ b/target/docker/batch_integration/methods/combat/combat
@@ -0,0 +1,993 @@
+#!/usr/bin/env bash
+
+# combat 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="combat"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "combat 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+  echo ""
+  echo "    --n_hvg"
+  echo "        type: integer"
+  echo "        default: 2000"
+  echo "        Number of highly variable genes to use."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/methods combat"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:32Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-combat-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "combat 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hvg)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hvg=*)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg=*\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/combat:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/combat:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/combat:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/combat:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_HVG+x} ]; then
+  VIASH_PAR_N_HVG="2000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_N_HVG" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hvg' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/methods/combat:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/combat:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/combat:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-combat-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import scanpy as sc
+from scipy.sparse import csr_matrix
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+if par['n_hvg']:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var['hvg_score'].to_numpy().argsort()[::-1][:par['n_hvg']]
+    adata = adata[:, idx].copy()
+
+
+print('Run Combat', flush=True)
+adata.X = sc.pp.combat(adata, key='batch', inplace=False)
+
+
+print("Store output", flush=True)
+output = sc.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+    },
+    layers={
+        'corrected_counts': csr_matrix(adata.X),
+    }
+)
+
+print("Store outputs", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/methods/combat/read_anndata_partial.py b/target/docker/batch_integration/methods/combat/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/methods/combat/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/methods/fastmnn_embedding/.config.vsh.yaml b/target/docker/batch_integration/methods/fastmnn_embedding/.config.vsh.yaml
new file mode 100644
index 0000000000..527e61c1cb
--- /dev/null
+++ b/target/docker/batch_integration/methods/fastmnn_embedding/.config.vsh.yaml
@@ -0,0 +1,220 @@
+functionality:
+  name: "fastmnn_embedding"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "../fastmnn_feature/script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "fastMnn (embedding)"
+    summary: "A simpler version of the original mnnCorrect algorithm."
+    description: "The fastMNN() approach is much simpler than the original mnnCorrect()\
+      \ algorithm, and proceeds in several steps.\n\n1. Perform a multi-sample PCA\
+      \ on the (cosine-)normalized expression values to reduce dimensionality.\n2.\
+      \ Identify MNN pairs in the low-dimensional space between a reference batch\
+      \ and a target batch.\n3. Remove variation along the average batch vector in\
+      \ both reference and target batches.\n4. Correct the cells in the target batch\
+      \ towards the reference, using locally weighted correction vectors.\n5. Merge\
+      \ the corrected target batch with the reference, and repeat with the next target\
+      \ batch.\n"
+    reference: "haghverdi2018batch"
+    repository_url: "https://code.bioconductor.org/browse/batchelor/"
+    documentation_url: "https://bioconductor.org/packages/batchelor/"
+    preferred_normalization: "log_cp10k"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/fastmnn.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "method"
+    subtype: "embedding"
+    type_info:
+      label: "Method (embedding)"
+      summary: "A batch integration embedding method."
+      description: "A batch integration method which outputs a batch-corrected embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    bioc:
+    - "batchelor"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowcpu"
+    - "highmem"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/fastmnn_embedding/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/fastmnn_embedding"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/fastmnn_embedding/fastmnn_embedding"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/methods/fastmnn_embedding/fastmnn_embedding b/target/docker/batch_integration/methods/fastmnn_embedding/fastmnn_embedding
new file mode 100755
index 0000000000..3748e301c5
--- /dev/null
+++ b/target/docker/batch_integration/methods/fastmnn_embedding/fastmnn_embedding
@@ -0,0 +1,973 @@
+#!/usr/bin/env bash
+
+# fastmnn_embedding 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="fastmnn_embedding"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "fastmnn_embedding 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")' && \
+  Rscript -e 'if (!requireNamespace("batchelor", quietly = TRUE)) BiocManager::install("batchelor")'
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/methods fastmnn_embedding"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:34Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-fastmnn_embedding-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "fastmnn_embedding 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/fastmnn_embedding:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/fastmnn_embedding:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/fastmnn_embedding:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/fastmnn_embedding:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/methods/fastmnn_embedding:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/fastmnn_embedding:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/fastmnn_embedding:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-fastmnn_embedding-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+cat("Loading dependencies\\n")
+suppressPackageStartupMessages({
+  requireNamespace("anndata", quietly = TRUE)
+  library(Matrix, warn.conflicts = FALSE)
+  requireNamespace("batchelor", quietly = TRUE)
+  library(SingleCellExperiment, warn.conflicts = FALSE)
+})
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Read input\\n")
+adata <- anndata::read_h5ad(par\$input)
+
+# TODO: pass output of 'multiBatchNorm' to fastMNN
+
+cat("Run mnn\\n")
+out <- suppressWarnings(batchelor::fastMNN(
+  t(adata\$layers[["normalized"]]),
+  batch = adata\$obs[["batch"]]
+))
+
+cat("Reformat output\\n")
+# reusing the same script for fastmnn_embed and fastmnn_feature
+return_type <- gsub("fastmnn_", "", meta[["functionality_name"]])
+
+output <- anndata::AnnData(
+  shape = adata\$shape,
+  uns = list(
+    dataset_id = adata\$uns[["dataset_id"]],
+    normalization_id = adata\$uns[["normalization_id"]],
+    method_id = meta\$functionality_name
+  )
+)
+
+if (return_type == "feature") {
+  layer <- as(SummarizedExperiment::assay(out, "reconstructed"), "sparseMatrix")
+  output\$layers[["corrected_counts"]] <- t(layer)
+} else if (return_type == "embedding") {
+  obsm <- SingleCellExperiment::reducedDim(out, "corrected")
+  output\$obsm[["X_emb"]] <- obsm
+}
+
+cat("Write output to file\\n")
+zzz <- output\$write_h5ad(par\$output, compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/methods/fastmnn_feature/.config.vsh.yaml b/target/docker/batch_integration/methods/fastmnn_feature/.config.vsh.yaml
new file mode 100644
index 0000000000..bb7a91e995
--- /dev/null
+++ b/target/docker/batch_integration/methods/fastmnn_feature/.config.vsh.yaml
@@ -0,0 +1,220 @@
+functionality:
+  name: "fastmnn_feature"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "fastMnn (feature)"
+    summary: "A simpler version of the original mnnCorrect algorithm."
+    description: "The fastMNN() approach is much simpler than the original mnnCorrect()\
+      \ algorithm, and proceeds in several steps.\n\n1. Perform a multi-sample PCA\
+      \ on the (cosine-)normalized expression values to reduce dimensionality.\n2.\
+      \ Identify MNN pairs in the low-dimensional space between a reference batch\
+      \ and a target batch.\n3. Remove variation along the average batch vector in\
+      \ both reference and target batches.\n4. Correct the cells in the target batch\
+      \ towards the reference, using locally weighted correction vectors.\n5. Merge\
+      \ the corrected target batch with the reference, and repeat with the next target\
+      \ batch.\n"
+    reference: "haghverdi2018batch"
+    repository_url: "https://code.bioconductor.org/browse/batchelor/"
+    documentation_url: "https://bioconductor.org/packages/batchelor/"
+    preferred_normalization: "log_cp10k"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/fastmnn.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "method"
+    subtype: "feature"
+    type_info:
+      label: "Method (feature)"
+      summary: "A batch integration feature method."
+      description: "A batch integration method which outputs a batch-corrected feature-space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    bioc:
+    - "batchelor"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowcpu"
+    - "highmem"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/fastmnn_feature/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/fastmnn_feature"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/fastmnn_feature/fastmnn_feature"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/methods/fastmnn_feature/fastmnn_feature b/target/docker/batch_integration/methods/fastmnn_feature/fastmnn_feature
new file mode 100755
index 0000000000..ddec949525
--- /dev/null
+++ b/target/docker/batch_integration/methods/fastmnn_feature/fastmnn_feature
@@ -0,0 +1,973 @@
+#!/usr/bin/env bash
+
+# fastmnn_feature 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="fastmnn_feature"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "fastmnn_feature 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")' && \
+  Rscript -e 'if (!requireNamespace("batchelor", quietly = TRUE)) BiocManager::install("batchelor")'
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/methods fastmnn_feature"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:33Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-fastmnn_feature-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "fastmnn_feature 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/fastmnn_feature:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/fastmnn_feature:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/fastmnn_feature:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/fastmnn_feature:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/methods/fastmnn_feature:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/fastmnn_feature:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/fastmnn_feature:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-fastmnn_feature-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+cat("Loading dependencies\\n")
+suppressPackageStartupMessages({
+  requireNamespace("anndata", quietly = TRUE)
+  library(Matrix, warn.conflicts = FALSE)
+  requireNamespace("batchelor", quietly = TRUE)
+  library(SingleCellExperiment, warn.conflicts = FALSE)
+})
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Read input\\n")
+adata <- anndata::read_h5ad(par\$input)
+
+# TODO: pass output of 'multiBatchNorm' to fastMNN
+
+cat("Run mnn\\n")
+out <- suppressWarnings(batchelor::fastMNN(
+  t(adata\$layers[["normalized"]]),
+  batch = adata\$obs[["batch"]]
+))
+
+cat("Reformat output\\n")
+# reusing the same script for fastmnn_embed and fastmnn_feature
+return_type <- gsub("fastmnn_", "", meta[["functionality_name"]])
+
+output <- anndata::AnnData(
+  shape = adata\$shape,
+  uns = list(
+    dataset_id = adata\$uns[["dataset_id"]],
+    normalization_id = adata\$uns[["normalization_id"]],
+    method_id = meta\$functionality_name
+  )
+)
+
+if (return_type == "feature") {
+  layer <- as(SummarizedExperiment::assay(out, "reconstructed"), "sparseMatrix")
+  output\$layers[["corrected_counts"]] <- t(layer)
+} else if (return_type == "embedding") {
+  obsm <- SingleCellExperiment::reducedDim(out, "corrected")
+  output\$obsm[["X_emb"]] <- obsm
+}
+
+cat("Write output to file\\n")
+zzz <- output\$write_h5ad(par\$output, compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/methods/liger/.config.vsh.yaml b/target/docker/batch_integration/methods/liger/.config.vsh.yaml
new file mode 100644
index 0000000000..d477d9c040
--- /dev/null
+++ b/target/docker/batch_integration/methods/liger/.config.vsh.yaml
@@ -0,0 +1,218 @@
+functionality:
+  name: "liger"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "LIGER"
+    summary: "Linked Inference of Genomic Experimental Relationships"
+    description: "LIGER or linked inference of genomic experimental relationships\
+      \ uses iNMF \nderiving and implementing a novel coordinate descent algorithm\
+      \ to efficiently \ndo the factorization. Joint clustering is performed and factor\
+      \ loadings are \nnormalised.\n"
+    reference: "welch2019single"
+    repository_url: "https://github.com/welch-lab/liger"
+    documentation_url: "https://github.com/welch-lab/liger"
+    preferred_normalization: "log_cp10k"
+    type: "method"
+    subtype: "embedding"
+    type_info:
+      label: "Method (embedding)"
+      summary: "A batch integration embedding method."
+      description: "A batch integration method which outputs a batch-corrected embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "cmake"
+    interactive: false
+  - type: "r"
+    cran:
+    - "rliger"
+    github:
+    - "welch-lab/RcppPlanc"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowcpu"
+    - "highmem"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/liger/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/liger"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/liger/liger"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/methods/liger/liger b/target/docker/batch_integration/methods/liger/liger
new file mode 100755
index 0000000000..58aa93cd35
--- /dev/null
+++ b/target/docker/batch_integration/methods/liger/liger
@@ -0,0 +1,1035 @@
+#!/usr/bin/env bash
+
+# liger 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="liger"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "liger 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN apt-get update && \
+  DEBIAN_FRONTEND=noninteractive apt-get install -y cmake && \
+  rm -rf /var/lib/apt/lists/*
+
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_cran(c("rliger"), repos = "https://cran.rstudio.com")' && \
+  Rscript -e 'remotes::install_github(c("welch-lab/RcppPlanc"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/methods liger"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:34Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-liger-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "liger 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/liger:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/liger:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/liger:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/liger:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/methods/liger:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/liger:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/liger:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-liger-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+cat(">> Load dependencies\\n")
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("rliger", quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Read input\\n")
+adata <- anndata::read_h5ad(par\$input)
+
+anndataToLiger <- function(adata) {
+  # fetch batch names
+  batch <- adata\$obs\$batch
+  batch_names <- as.character(unique(batch))
+
+  # restructure data
+  raw_data <- lapply(batch_names, function(batch_name) {
+    Matrix::t(adata\$layers[["counts"]][batch == batch_name, , drop = FALSE])
+  })
+  names(raw_data) <- batch_names
+
+  rliger::createLiger(rawData = raw_data, removeMissing = FALSE)
+}
+
+addNormalizedDataToLiger <- function(adata, lobj) {
+  norm_data <- lapply(names(rliger::rawData(lobj)), function(name) {
+    norm <- adata\$layers[["normalized"]]
+
+    # subset
+    col_names <- colnames(rliger::rawData(lobj)[[name]])
+    row_names <- rownames(rliger::rawData(lobj)[[name]])
+    prefix <- paste0(name, "_")
+    col_names <- sub(prefix, "", col_names)
+
+    norm <- norm[
+      col_names,
+      row_names,
+      drop = FALSE
+    ]
+
+    # add prefix
+    rownames(norm) <- paste0(prefix, rownames(norm))
+
+    # transpose
+    norm <- Matrix::t(norm)
+
+    # turn into dgcMatrix
+    as(as(norm, "denseMatrix"), "CsparseMatrix")
+  })
+  names(norm_data) <- names(rliger::rawData(lobj))
+
+  for (name in names(rliger::rawData(lobj))) {
+    lobj@datasets[[name]]@normData <- norm_data[[name]]
+  }
+
+  lobj
+}
+
+cat(">> Create Liger Data object\\n")
+lobj <- anndataToLiger(adata)
+
+cat(">> Normalize data\\n")
+lobj <- addNormalizedDataToLiger(adata, lobj)
+
+# could also use the rliger normalization instead
+# lobj <- rliger::normalize(lobj)
+
+cat(">> Select genes\\n")
+# lobj <- rliger::selectGenes(lobj)
+# overwrite gene selection to include all genes
+lobj@varFeatures <- adata\$var_names
+
+cat(">> Perform scaling\\n")
+lobj <- rliger::scaleNotCenter(lobj, removeMissing = FALSE)
+
+cat(">> Joint Matrix Factorization\\n")
+lobj <- rliger::runIntegration(lobj, k = 20)
+
+cat(">> Quantile normalization\\n")
+lobj <- rliger::quantileNorm(lobj)
+
+cat(">> Store output\\n")
+# remove dataset names from rownames
+for (name in names(rliger::rawData(lobj))) {
+  rownames(lobj@H.norm) <- sub(paste0(name, "_"), "", rownames(lobj@H.norm))
+}
+
+output <- anndata::AnnData(
+  uns = list(
+    dataset_id = adata\$uns[["dataset_id"]],
+    normalization_id = adata\$uns[["normalization_id"]],
+    method_id = meta\$functionality_name
+  ),
+  obsm = list(
+    X_emb = lobj@H.norm[rownames(adata), , drop = FALSE]
+  ),
+  shape = adata\$shape
+)
+
+cat(">> Write AnnData to file\\n")
+zzz <- output\$write_h5ad(par\$output, compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/methods/mnn_correct/.config.vsh.yaml b/target/docker/batch_integration/methods/mnn_correct/.config.vsh.yaml
new file mode 100644
index 0000000000..aa8f28aca5
--- /dev/null
+++ b/target/docker/batch_integration/methods/mnn_correct/.config.vsh.yaml
@@ -0,0 +1,214 @@
+functionality:
+  name: "mnn_correct"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "mnnCorrect"
+    summary: "Correct for batch effects in single-cell expression data using the mutual\
+      \ nearest neighbors method."
+    description: "We present a strategy for batch correction based on the detection\
+      \ of mutual nearest neighbors (MNNs) in the high-dimensional expression space.\n\
+      Our approach does not rely on predefined or equal population compositions across\
+      \ batches; instead, it requires only that a subset of the population be shared\
+      \ between batches.\n"
+    reference: "haghverdi2018batch"
+    repository_url: "https://code.bioconductor.org/browse/batchelor/"
+    documentation_url: "https://bioconductor.org/packages/batchelor/"
+    preferred_normalization: "log_cp10k"
+    type: "method"
+    subtype: "feature"
+    type_info:
+      label: "Method (feature)"
+      summary: "A batch integration feature method."
+      description: "A batch integration method which outputs a batch-corrected feature-space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    bioc:
+    - "batchelor"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowcpu"
+    - "highmem"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/mnn_correct/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/mnn_correct"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/mnn_correct/mnn_correct"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/methods/mnn_correct/mnn_correct b/target/docker/batch_integration/methods/mnn_correct/mnn_correct
new file mode 100755
index 0000000000..9ad2d74b73
--- /dev/null
+++ b/target/docker/batch_integration/methods/mnn_correct/mnn_correct
@@ -0,0 +1,969 @@
+#!/usr/bin/env bash
+
+# mnn_correct 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="mnn_correct"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "mnn_correct 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")' && \
+  Rscript -e 'if (!requireNamespace("batchelor", quietly = TRUE)) BiocManager::install("batchelor")'
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/methods mnn_correct"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:33Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-mnn_correct-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "mnn_correct 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/mnn_correct:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/mnn_correct:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/mnn_correct:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/mnn_correct:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/methods/mnn_correct:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/mnn_correct:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/mnn_correct:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-mnn_correct-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+cat("Loading dependencies\\n")
+suppressPackageStartupMessages({
+  requireNamespace("anndata", quietly = TRUE)
+  library(Matrix, warn.conflicts = FALSE)
+  requireNamespace("batchelor", quietly = TRUE)
+  library(SingleCellExperiment, warn.conflicts = FALSE)
+})
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Read input\\n")
+adata <- anndata::read_h5ad(par\$input)
+
+cat("Run mnn\\n")
+out <- suppressWarnings(batchelor::mnnCorrect(
+  t(adata\$layers[["normalized"]]),
+  batch = adata\$obs[["batch"]]
+))
+
+cat("Reformat output\\n")
+layer <- SummarizedExperiment::assay(out, "corrected")
+as(t(layer), "sparseMatrix")
+
+
+
+cat("Store outputs\\n")
+output <- anndata::AnnData(
+  uns = list(
+    dataset_id = adata\$uns[["dataset_id"]],
+    normalization_id = adata\$uns[["normalization_id"]],
+    method_id = meta\$functionality_name
+  ),
+  layers = list(
+    corrected_counts = as(t(layer), "sparseMatrix")
+  ),
+  shape = adata\$shape
+)
+
+cat("Write output to file\\n")
+zzz <- output\$write_h5ad(par\$output, compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/methods/mnnpy/.config.vsh.yaml b/target/docker/batch_integration/methods/mnnpy/.config.vsh.yaml
new file mode 100644
index 0000000000..7f77dc911c
--- /dev/null
+++ b/target/docker/batch_integration/methods/mnnpy/.config.vsh.yaml
@@ -0,0 +1,247 @@
+functionality:
+  name: "mnnpy"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to use."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "mnnpy"
+    summary: "Batch effect correction by matching mutual nearest neighbors, Python\
+      \ implementation."
+    description: "An implementation of MNN correct in python featuring low memory\
+      \ usage, full multicore support and compatibility with the scanpy framework.\n\
+      \nBatch effect correction by matching mutual nearest neighbors (Haghverdi et\
+      \ al, 2018) has been implemented as a function 'mnnCorrect' in the R package\
+      \ scran. Sadly it's extremely slow for big datasets and doesn't make full use\
+      \ of the parallel architecture of modern CPUs.\n\nThis project is a python implementation\
+      \ of the MNN correct algorithm which takes advantage of python's extendability\
+      \ and hackability. It seamlessly integrates with the scanpy framework and has\
+      \ multicore support in its bones.\n"
+    reference: "hie2019efficient"
+    repository_url: "https://github.com/chriscainx/mnnpy"
+    documentation_url: "https://github.com/chriscainx/mnnpy#readme"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/mnn.py"
+      commit: "29803b95c88b4ec5921df2eec7111fd5d1a95daf"
+    preferred_normalization: "log_cp10k"
+    variants:
+      mnn_full_unscaled: null
+      mnn_full_scaled:
+        preferred_normalization: "log_cp10k_scaled"
+    type: "method"
+    subtype: "feature"
+    type_info:
+      label: "Method (feature)"
+      summary: "A batch integration feature method."
+      description: "A batch integration method which outputs a batch-corrected feature-space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.8"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "procps"
+    interactive: false
+  - type: "python"
+    user: false
+    pypi:
+    - "anndata~=0.8.0"
+    - "scanpy"
+    - "pyyaml"
+    - "requests"
+    - "jsonschema"
+    github:
+    - "chriscainx/mnnpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowcpu"
+    - "lowmem"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/mnnpy/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/mnnpy"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/mnnpy/mnnpy"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/methods/mnnpy/mnnpy b/target/docker/batch_integration/methods/mnnpy/mnnpy
new file mode 100755
index 0000000000..cfae66e7fa
--- /dev/null
+++ b/target/docker/batch_integration/methods/mnnpy/mnnpy
@@ -0,0 +1,1000 @@
+#!/usr/bin/env bash
+
+# mnnpy 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="mnnpy"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "mnnpy 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+  echo ""
+  echo "    --n_hvg"
+  echo "        type: integer"
+  echo "        default: 2000"
+  echo "        Number of highly variable genes to use."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM python:3.8
+
+ENTRYPOINT []
+
+ 
+RUN apt-get update && \
+  DEBIAN_FRONTEND=noninteractive apt-get install -y procps && \
+  rm -rf /var/lib/apt/lists/*
+
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "anndata~=0.8.0" "scanpy" "pyyaml" "requests" "jsonschema" && \
+  pip install --upgrade --no-cache-dir "git+https://github.com/chriscainx/mnnpy"
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/methods mnnpy"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:36Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-mnnpy-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "mnnpy 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hvg)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hvg=*)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg=*\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/mnnpy:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/mnnpy:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/mnnpy:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/mnnpy:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_HVG+x} ]; then
+  VIASH_PAR_N_HVG="2000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_N_HVG" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hvg' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/methods/mnnpy:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/mnnpy:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/mnnpy:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-mnnpy-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import mnnpy
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Read input', flush=True)
+adata = ad.read_h5ad(par['input'])
+adata.X = adata.layers['normalized']
+del adata.layers['normalized']
+del adata.layers['counts']
+
+if par['n_hvg']:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var['hvg_score'].to_numpy().argsort()[::-1][:par['n_hvg']]
+    adata = adata[:, idx].copy()
+
+print('Run mnn', flush=True)
+split = []
+batch_categories = adata.obs['batch'].cat.categories
+for i in batch_categories:
+    split.append(adata[adata.obs['batch'] == i].copy())
+corrected, _, _ = mnnpy.mnn_correct(
+        *split,
+        batch_key='batch',
+        batch_categories=batch_categories,
+        index_unique=None
+    )
+
+print("Store outputs", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+    },
+    layers={
+        'corrected_counts': corrected.X,
+    }
+)
+
+
+print("Store outputs", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/methods/pyliger/.config.vsh.yaml b/target/docker/batch_integration/methods/pyliger/.config.vsh.yaml
new file mode 100644
index 0000000000..59cb9b718d
--- /dev/null
+++ b/target/docker/batch_integration/methods/pyliger/.config.vsh.yaml
@@ -0,0 +1,223 @@
+functionality:
+  name: "pyliger"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "pyliger"
+    summary: "Python implementation of LIGER (Linked Inference of Genomic Experimental\
+      \ Relationships"
+    description: "LIGER (installed as rliger) is a package for integrating and analyzing\
+      \ multiple \nsingle-cell datasets, developed by the Macosko lab and maintained/extended\
+      \ by the \nWelch lab. It relies on integrative non-negative matrix factorization\
+      \ to identify \nshared and dataset-specific factors.\n"
+    reference: "welch2019single"
+    repository_url: "https://github.com/welch-lab/pyliger"
+    documentation_url: "https://github.com/welch-lab/pyliger"
+    preferred_normalization: "log_cp10k"
+    variants:
+      liger_unscaled: null
+      liger_scaled:
+        preferred_normalization: "log_cp10k_scaled"
+    type: "method"
+    subtype: "embedding"
+    type_info:
+      label: "Method (embedding)"
+      summary: "A batch integration embedding method."
+      description: "A batch integration method which outputs a batch-corrected embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "umap-learn[plot]"
+    - "pyliger"
+    - "dask-expr"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowcpu"
+    - "highmem"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/pyliger/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/pyliger"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/pyliger/pyliger"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/methods/pyliger/pyliger b/target/docker/batch_integration/methods/pyliger/pyliger
new file mode 100755
index 0000000000..31ce08824d
--- /dev/null
+++ b/target/docker/batch_integration/methods/pyliger/pyliger
@@ -0,0 +1,1000 @@
+#!/usr/bin/env bash
+
+# pyliger 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="pyliger"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "pyliger 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "umap-learn[plot]" "pyliger" "dask-expr"
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/methods pyliger"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:34Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-pyliger-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "pyliger 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/pyliger:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/pyliger:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/pyliger:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/pyliger:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/methods/pyliger:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/pyliger:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/pyliger:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-pyliger-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+import numpy as np
+import pyliger
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('>> Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/counts',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+adata.layers['norm_data'] = read_anndata(par['input'], X='layers/normalized').X
+
+print('>> Prepare data', flush=True)
+adata_per_batch = []
+for batch in adata.obs['batch'].unique():
+  adb = adata[adata.obs['batch'] == batch].copy()
+  
+  # save row sum and sum of squares for further use
+  norm_sum = np.ravel(np.sum(adb.layers["norm_data"], axis=0))
+  norm_sum_sq = np.ravel(np.sum(adb.layers["norm_data"].power(2), axis=0))
+  adb.var["norm_sum"] = norm_sum
+  adb.var["norm_sum_sq"] = norm_sum_sq
+  adb.var["norm_mean"] = norm_sum / adb.shape[0]
+
+  # set more metadata
+  adb.obs.index.name = 'cell_barcode'
+  adb.var.index.name = 'gene_id'
+  adb.uns['sample_name'] = batch
+
+  # append to list
+  adata_per_batch.append(adb)
+
+print('Create liger object', flush=True)
+lobj = pyliger.create_liger(
+  adata_per_batch,
+  remove_missing=False
+)
+
+# do not select genes
+lobj.var_genes = adata.var_names
+
+print('>> Scaling', flush=True)
+pyliger.scale_not_center(lobj, remove_missing=False)
+
+print('>> Optimize ALS', flush=True)
+pyliger.optimize_ALS(lobj, k=20)
+
+print('>> Quantile normalization', flush=True)
+pyliger.quantile_norm(lobj)
+
+print('>> Concatenate outputs', flush=True)
+ad_out = ad.concat(lobj.adata_list)
+
+print('Store output', flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+        obsm={
+        'X_emb': ad_out[adata.obs_names, :].obsm['H_norm']
+    },
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/methods/pyliger/read_anndata_partial.py b/target/docker/batch_integration/methods/pyliger/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/methods/pyliger/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/methods/scalex_embed/.config.vsh.yaml b/target/docker/batch_integration/methods/scalex_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..0aa2cce236
--- /dev/null
+++ b/target/docker/batch_integration/methods/scalex_embed/.config.vsh.yaml
@@ -0,0 +1,237 @@
+functionality:
+  name: "scalex_embed"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to use."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "SCALEX (embedding)"
+    summary: "Online single-cell data integration through projecting heterogeneous\
+      \ datasets into a common cell-embedding space"
+    description: "SCALEX is a method for integrating heterogeneous single-cell data\
+      \ online using a VAE framework. Its generalised encoder disentangles batch-related\
+      \ components from batch-invariant biological components, which are then projected\
+      \ into a common cell-embedding space.\n"
+    reference: "xiong2021online"
+    repository_url: "https://github.com/jsxlei/SCALEX"
+    documentation_url: "https://scalex.readthedocs.io"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/scalex.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      scalex_feature_unscaled: null
+      scanorama_feature_scaled:
+        preferred_normalization: "log_cp10k_scaled"
+    type: "method"
+    subtype: "embedding"
+    type_info:
+      label: "Method (embedding)"
+      summary: "A batch integration embedding method."
+      description: "A batch integration method which outputs a batch-corrected embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scalex"
+    - "numpy<1.24"
+    - "torch<2.1"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowmem"
+    - "lowcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scalex_embed/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/scalex_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/scalex_embed/scalex_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/methods/scalex_embed/read_anndata_partial.py b/target/docker/batch_integration/methods/scalex_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/methods/scalex_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/methods/scalex_embed/scalex_embed b/target/docker/batch_integration/methods/scalex_embed/scalex_embed
new file mode 100755
index 0000000000..c6bb69c80f
--- /dev/null
+++ b/target/docker/batch_integration/methods/scalex_embed/scalex_embed
@@ -0,0 +1,1010 @@
+#!/usr/bin/env bash
+
+# scalex_embed 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="scalex_embed"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "scalex_embed 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+  echo ""
+  echo "    --n_hvg"
+  echo "        type: integer"
+  echo "        default: 2000"
+  echo "        Number of highly variable genes to use."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scalex" "numpy<1.24" "torch<2.1"
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/methods scalex_embed"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:33Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-scalex_embed-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "scalex_embed 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hvg)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hvg=*)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg=*\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scalex_embed:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scalex_embed:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scalex_embed:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scalex_embed:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_HVG+x} ]; then
+  VIASH_PAR_N_HVG="2000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_N_HVG" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hvg' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scalex_embed:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scalex_embed:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scalex_embed:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-scalex_embed-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+import scalex
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+
+if par['n_hvg']:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var['hvg_score'].to_numpy().argsort()[::-1][:par['n_hvg']]
+    adata = adata[:, idx].copy()
+
+print('Run SCALEX', flush=True)
+adata = scalex.SCALEX(
+    adata,
+    batch_key="batch",
+    ignore_umap=True,
+    impute=adata.obs["batch"].cat.categories[0],
+    processed=True,
+    max_iteration=40,
+    min_features=None,
+    min_cells=None,
+    n_top_features=0,
+    outdir=None,
+    gpu=0,
+)
+adata.obsm["X_emb"] = adata.obsm["latent"]
+
+print("Store outputs", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    layers={
+        'corrected_counts': adata.layers["impute"],
+    },
+    obsm={
+        'X_emb': adata.obsm['latent'],
+    },
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/methods/scalex_feature/.config.vsh.yaml b/target/docker/batch_integration/methods/scalex_feature/.config.vsh.yaml
new file mode 100644
index 0000000000..a609534ea0
--- /dev/null
+++ b/target/docker/batch_integration/methods/scalex_feature/.config.vsh.yaml
@@ -0,0 +1,237 @@
+functionality:
+  name: "scalex_feature"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to use."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "../scalex_embed/script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "SCALEX (feature)"
+    summary: "Online single-cell data integration through projecting heterogeneous\
+      \ datasets into a common cell-embedding space"
+    description: "SCALEX is a method for integrating heterogeneous single-cell data\
+      \ online using a VAE framework. Its generalised encoder disentangles batch-related\
+      \ components from batch-invariant biological components, which are then projected\
+      \ into a common cell-embedding space.\n"
+    reference: "xiong2021online"
+    repository_url: "https://github.com/jsxlei/SCALEX"
+    documentation_url: "https://scalex.readthedocs.io"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/scalex.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      scalex_feature_unscaled: null
+      scanorama_feature_scaled:
+        preferred_normalization: "log_cp10k_scaled"
+    type: "method"
+    subtype: "feature"
+    type_info:
+      label: "Method (feature)"
+      summary: "A batch integration feature method."
+      description: "A batch integration method which outputs a batch-corrected feature-space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scalex"
+    - "numpy<1.24"
+    - "torch<2.1"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowmem"
+    - "lowcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scalex_feature/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/scalex_feature"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/scalex_feature/scalex_feature"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/methods/scalex_feature/read_anndata_partial.py b/target/docker/batch_integration/methods/scalex_feature/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/methods/scalex_feature/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/methods/scalex_feature/scalex_feature b/target/docker/batch_integration/methods/scalex_feature/scalex_feature
new file mode 100755
index 0000000000..0faf8482ca
--- /dev/null
+++ b/target/docker/batch_integration/methods/scalex_feature/scalex_feature
@@ -0,0 +1,1010 @@
+#!/usr/bin/env bash
+
+# scalex_feature 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="scalex_feature"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "scalex_feature 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+  echo ""
+  echo "    --n_hvg"
+  echo "        type: integer"
+  echo "        default: 2000"
+  echo "        Number of highly variable genes to use."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scalex" "numpy<1.24" "torch<2.1"
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/methods scalex_feature"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:35Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-scalex_feature-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "scalex_feature 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hvg)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hvg=*)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg=*\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scalex_feature:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scalex_feature:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scalex_feature:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scalex_feature:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_HVG+x} ]; then
+  VIASH_PAR_N_HVG="2000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_N_HVG" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hvg' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scalex_feature:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scalex_feature:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scalex_feature:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-scalex_feature-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+import scalex
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+
+if par['n_hvg']:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var['hvg_score'].to_numpy().argsort()[::-1][:par['n_hvg']]
+    adata = adata[:, idx].copy()
+
+print('Run SCALEX', flush=True)
+adata = scalex.SCALEX(
+    adata,
+    batch_key="batch",
+    ignore_umap=True,
+    impute=adata.obs["batch"].cat.categories[0],
+    processed=True,
+    max_iteration=40,
+    min_features=None,
+    min_cells=None,
+    n_top_features=0,
+    outdir=None,
+    gpu=0,
+)
+adata.obsm["X_emb"] = adata.obsm["latent"]
+
+print("Store outputs", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    layers={
+        'corrected_counts': adata.layers["impute"],
+    },
+    obsm={
+        'X_emb': adata.obsm['latent'],
+    },
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/methods/scanorama_embed/.config.vsh.yaml b/target/docker/batch_integration/methods/scanorama_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..6fdc59b740
--- /dev/null
+++ b/target/docker/batch_integration/methods/scanorama_embed/.config.vsh.yaml
@@ -0,0 +1,234 @@
+functionality:
+  name: "scanorama_embed"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to use."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Scanorama (embedding)"
+    summary: "Efficient integration of heterogeneous single-cell transcriptomes using\
+      \ Scanorama"
+    description: "\"Scanorama is an extension of the MNN method. Other then MNN, it\
+      \ finds mutual nearest neighbours over all batches and embeds observations into\
+      \ a joint hyperplane.\"\n"
+    reference: "hie2019efficient"
+    repository_url: "https://github.com/brianhie/scanorama"
+    documentation_url: "https://github.com/brianhie/scanorama#readme"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/scanorama.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      scanorama_embed_full_unscaled: null
+      scanorama_embed_full_scaled:
+        preferred_normalization: "log_cp10k_scaled"
+    type: "method"
+    subtype: "embedding"
+    type_info:
+      label: "Method (embedding)"
+      summary: "A batch integration embedding method."
+      description: "A batch integration method which outputs a batch-corrected embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scanorama"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanorama_embed/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/scanorama_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/scanorama_embed/scanorama_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/methods/scanorama_embed/read_anndata_partial.py b/target/docker/batch_integration/methods/scanorama_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/methods/scanorama_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/methods/scanorama_embed/scanorama_embed b/target/docker/batch_integration/methods/scanorama_embed/scanorama_embed
new file mode 100755
index 0000000000..07379ac846
--- /dev/null
+++ b/target/docker/batch_integration/methods/scanorama_embed/scanorama_embed
@@ -0,0 +1,1027 @@
+#!/usr/bin/env bash
+
+# scanorama_embed 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="scanorama_embed"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "scanorama_embed 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+  echo ""
+  echo "    --n_hvg"
+  echo "        type: integer"
+  echo "        default: 2000"
+  echo "        Number of highly variable genes to use."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scanorama"
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/methods scanorama_embed"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:33Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-scanorama_embed-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "scanorama_embed 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hvg)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hvg=*)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg=*\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanorama_embed:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanorama_embed:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanorama_embed:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanorama_embed:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_HVG+x} ]; then
+  VIASH_PAR_N_HVG="2000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_N_HVG" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hvg' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanorama_embed:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanorama_embed:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanorama_embed:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-scanorama_embed-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+import scanorama
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+# based on scib
+# -> https://github.com/theislab/scib/blob/59ae6eee5e611d9d3db067685ec96c28804e9127/scib/utils.py#L51C1-L72C62
+def merge_adata(*adata_list, **kwargs):
+    """Merge adatas from list while remove duplicated \`\`obs\`\` and \`\`var\`\` columns
+
+    :param adata_list: \`\`anndata\`\` objects to be concatenated
+    :param kwargs: arguments to be passed to \`\`anndata.AnnData.concatenate\`\`
+    """
+
+    if len(adata_list) == 1:
+        return adata_list[0]
+
+    # Make sure that adatas do not contain duplicate columns
+    for _adata in adata_list:
+        for attr in ("obs", "var"):
+            df = getattr(_adata, attr)
+            dup_mask = df.columns.duplicated()
+            if dup_mask.any():
+                print(
+                    f"Deleting duplicated keys \`{list(df.columns[dup_mask].unique())}\` from \`adata.{attr}\`."
+                )
+                setattr(_adata, attr, df.loc[:, ~dup_mask])
+
+    return ad.AnnData.concatenate(*adata_list, **kwargs)
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+if par['n_hvg']:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var['hvg_score'].to_numpy().argsort()[::-1][:par['n_hvg']]
+    adata = adata[:, idx].copy()
+
+print('Run scanorama', flush=True)
+split = []
+batch_categories = adata.obs['batch'].cat.categories
+for i in batch_categories:
+    split.append(adata[adata.obs['batch'] == i].copy())
+corrected = scanorama.correct_scanpy(split, return_dimred=True)
+corrected = merge_adata(*corrected, batch_key='batch', batch_categories=batch_categories, index_unique=None)
+
+print("Store output", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+    },
+    layers={
+        'corrected_counts': corrected.X,
+    },
+    obsm={
+        'X_emb': corrected.obsm["X_scanorama"],
+    }
+)
+
+print("Write output to file", flush=True)
+output.write(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/methods/scanorama_feature/.config.vsh.yaml b/target/docker/batch_integration/methods/scanorama_feature/.config.vsh.yaml
new file mode 100644
index 0000000000..54dd596299
--- /dev/null
+++ b/target/docker/batch_integration/methods/scanorama_feature/.config.vsh.yaml
@@ -0,0 +1,234 @@
+functionality:
+  name: "scanorama_feature"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to use."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "../scanorama_embed/script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Scanorama (feature)"
+    summary: "Efficient integration of heterogeneous single-cell transcriptomes using\
+      \ Scanorama"
+    description: "\"Scanorama is an extension of the MNN method. Other then MNN, it\
+      \ finds mutual nearest neighbours over all batches and embeds observations into\
+      \ a joint hyperplane.\"\n"
+    reference: "hie2019efficient"
+    repository_url: "https://github.com/brianhie/scanorama"
+    documentation_url: "https://github.com/brianhie/scanorama#readme"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/scanorama.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      scanorama_feature_full_unscaled: null
+      scanorama_feature_full_scaled:
+        preferred_normalization: "log_cp10k_scaled"
+    type: "method"
+    subtype: "feature"
+    type_info:
+      label: "Method (feature)"
+      summary: "A batch integration feature method."
+      description: "A batch integration method which outputs a batch-corrected feature-space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scanorama"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanorama_feature/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/scanorama_feature"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/scanorama_feature/scanorama_feature"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/methods/scanorama_feature/read_anndata_partial.py b/target/docker/batch_integration/methods/scanorama_feature/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/methods/scanorama_feature/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/methods/scanorama_feature/scanorama_feature b/target/docker/batch_integration/methods/scanorama_feature/scanorama_feature
new file mode 100755
index 0000000000..476ce876d5
--- /dev/null
+++ b/target/docker/batch_integration/methods/scanorama_feature/scanorama_feature
@@ -0,0 +1,1027 @@
+#!/usr/bin/env bash
+
+# scanorama_feature 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="scanorama_feature"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "scanorama_feature 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+  echo ""
+  echo "    --n_hvg"
+  echo "        type: integer"
+  echo "        default: 2000"
+  echo "        Number of highly variable genes to use."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scanorama"
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/methods scanorama_feature"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:35Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-scanorama_feature-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "scanorama_feature 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hvg)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hvg=*)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg=*\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanorama_feature:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanorama_feature:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanorama_feature:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanorama_feature:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_HVG+x} ]; then
+  VIASH_PAR_N_HVG="2000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_N_HVG" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hvg' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanorama_feature:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanorama_feature:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanorama_feature:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-scanorama_feature-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+import scanorama
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+# based on scib
+# -> https://github.com/theislab/scib/blob/59ae6eee5e611d9d3db067685ec96c28804e9127/scib/utils.py#L51C1-L72C62
+def merge_adata(*adata_list, **kwargs):
+    """Merge adatas from list while remove duplicated \`\`obs\`\` and \`\`var\`\` columns
+
+    :param adata_list: \`\`anndata\`\` objects to be concatenated
+    :param kwargs: arguments to be passed to \`\`anndata.AnnData.concatenate\`\`
+    """
+
+    if len(adata_list) == 1:
+        return adata_list[0]
+
+    # Make sure that adatas do not contain duplicate columns
+    for _adata in adata_list:
+        for attr in ("obs", "var"):
+            df = getattr(_adata, attr)
+            dup_mask = df.columns.duplicated()
+            if dup_mask.any():
+                print(
+                    f"Deleting duplicated keys \`{list(df.columns[dup_mask].unique())}\` from \`adata.{attr}\`."
+                )
+                setattr(_adata, attr, df.loc[:, ~dup_mask])
+
+    return ad.AnnData.concatenate(*adata_list, **kwargs)
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+if par['n_hvg']:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var['hvg_score'].to_numpy().argsort()[::-1][:par['n_hvg']]
+    adata = adata[:, idx].copy()
+
+print('Run scanorama', flush=True)
+split = []
+batch_categories = adata.obs['batch'].cat.categories
+for i in batch_categories:
+    split.append(adata[adata.obs['batch'] == i].copy())
+corrected = scanorama.correct_scanpy(split, return_dimred=True)
+corrected = merge_adata(*corrected, batch_key='batch', batch_categories=batch_categories, index_unique=None)
+
+print("Store output", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+    },
+    layers={
+        'corrected_counts': corrected.X,
+    },
+    obsm={
+        'X_emb': corrected.obsm["X_scanorama"],
+    }
+)
+
+print("Write output to file", flush=True)
+output.write(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/methods/scanvi/.config.vsh.yaml b/target/docker/batch_integration/methods/scanvi/.config.vsh.yaml
new file mode 100644
index 0000000000..465fd03600
--- /dev/null
+++ b/target/docker/batch_integration/methods/scanvi/.config.vsh.yaml
@@ -0,0 +1,292 @@
+functionality:
+  name: "scanvi"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to use."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_latent"
+    description: "Number of latent dimensions."
+    info: null
+    default:
+    - 30
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hidden"
+    description: "Number of hidden units."
+    info: null
+    default:
+    - 128
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_layers"
+    description: "Number of layers."
+    info: null
+    default:
+    - 2
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_scvi"
+    description: "Maximum number of training epochs for scVI."
+    info: null
+    example:
+    - 400
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_scanvi"
+    description: "Maximum number of training epochs for scANVI."
+    info: null
+    example:
+    - 10
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "scANVI"
+    summary: "scANVI is a deep learning method that considers cell type labels."
+    description: "scANVI (single-cell ANnotation using Variational Inference; Python\
+      \ class SCANVI) is a semi-supervised model for single-cell transcriptomics data.\
+      \ In a sense, it can be seen as a scVI extension that can leverage the cell\
+      \ type knowledge for a subset of the cells present in the data sets to infer\
+      \ the states of the rest of the cells.\n"
+    reference: "lopez2018deep"
+    repository_url: "https://github.com/scverse/scvi-tools"
+    documentation_url: "https://docs.scvi-tools.org/en/stable/user_guide/models/scanvi.html"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/scanvi.py"
+      commit: "29803b95c88b4ec5921df2eec7111fd5d1a95daf"
+    preferred_normalization: "counts"
+    variants:
+      scanvi_full_unscaled: null
+    type: "method"
+    subtype: "embedding"
+    type_info:
+      label: "Method (embedding)"
+      summary: "A batch integration embedding method."
+      description: "A batch integration method which outputs a batch-corrected embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scvi-tools>=1.1.0"
+    upgrade: true
+  - type: "docker"
+    run:
+    - "pip install -U \"jax[cuda12_pip]\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanvi/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/scanvi"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/scanvi/scanvi"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/methods/scanvi/read_anndata_partial.py b/target/docker/batch_integration/methods/scanvi/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/methods/scanvi/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/methods/scanvi/scanvi b/target/docker/batch_integration/methods/scanvi/scanvi
new file mode 100755
index 0000000000..23166da864
--- /dev/null
+++ b/target/docker/batch_integration/methods/scanvi/scanvi
@@ -0,0 +1,1138 @@
+#!/usr/bin/env bash
+
+# scanvi 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="scanvi"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "scanvi 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+  echo ""
+  echo "    --n_hvg"
+  echo "        type: integer"
+  echo "        default: 2000"
+  echo "        Number of highly variable genes to use."
+  echo ""
+  echo "    --n_latent"
+  echo "        type: integer"
+  echo "        default: 30"
+  echo "        Number of latent dimensions."
+  echo ""
+  echo "    --n_hidden"
+  echo "        type: integer"
+  echo "        default: 128"
+  echo "        Number of hidden units."
+  echo ""
+  echo "    --n_layers"
+  echo "        type: integer"
+  echo "        default: 2"
+  echo "        Number of layers."
+  echo ""
+  echo "    --max_epochs_scvi"
+  echo "        type: integer"
+  echo "        example: 400"
+  echo "        Maximum number of training epochs for scVI."
+  echo ""
+  echo "    --max_epochs_scanvi"
+  echo "        type: integer"
+  echo "        example: 10"
+  echo "        Maximum number of training epochs for scANVI."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scvi-tools>=1.1.0"
+
+RUN pip install -U "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/methods scanvi"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:34Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-scanvi-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "scanvi 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hvg)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hvg=*)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg=*\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_latent)
+            [ -n "$VIASH_PAR_N_LATENT" ] && ViashError Bad arguments for option \'--n_latent\': \'$VIASH_PAR_N_LATENT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_LATENT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_latent. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_latent=*)
+            [ -n "$VIASH_PAR_N_LATENT" ] && ViashError Bad arguments for option \'--n_latent=*\': \'$VIASH_PAR_N_LATENT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_LATENT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hidden)
+            [ -n "$VIASH_PAR_N_HIDDEN" ] && ViashError Bad arguments for option \'--n_hidden\': \'$VIASH_PAR_N_HIDDEN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HIDDEN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hidden. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hidden=*)
+            [ -n "$VIASH_PAR_N_HIDDEN" ] && ViashError Bad arguments for option \'--n_hidden=*\': \'$VIASH_PAR_N_HIDDEN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HIDDEN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_layers)
+            [ -n "$VIASH_PAR_N_LAYERS" ] && ViashError Bad arguments for option \'--n_layers\': \'$VIASH_PAR_N_LAYERS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_LAYERS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_layers. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_layers=*)
+            [ -n "$VIASH_PAR_N_LAYERS" ] && ViashError Bad arguments for option \'--n_layers=*\': \'$VIASH_PAR_N_LAYERS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_LAYERS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_epochs_scvi)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SCVI" ] && ViashError Bad arguments for option \'--max_epochs_scvi\': \'$VIASH_PAR_MAX_EPOCHS_SCVI\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SCVI="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_epochs_scvi. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_epochs_scvi=*)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SCVI" ] && ViashError Bad arguments for option \'--max_epochs_scvi=*\': \'$VIASH_PAR_MAX_EPOCHS_SCVI\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SCVI=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_epochs_scanvi)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SCANVI" ] && ViashError Bad arguments for option \'--max_epochs_scanvi\': \'$VIASH_PAR_MAX_EPOCHS_SCANVI\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SCANVI="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_epochs_scanvi. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_epochs_scanvi=*)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SCANVI" ] && ViashError Bad arguments for option \'--max_epochs_scanvi=*\': \'$VIASH_PAR_MAX_EPOCHS_SCANVI\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SCANVI=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanvi:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanvi:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanvi:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanvi:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_HVG+x} ]; then
+  VIASH_PAR_N_HVG="2000"
+fi
+if [ -z ${VIASH_PAR_N_LATENT+x} ]; then
+  VIASH_PAR_N_LATENT="30"
+fi
+if [ -z ${VIASH_PAR_N_HIDDEN+x} ]; then
+  VIASH_PAR_N_HIDDEN="128"
+fi
+if [ -z ${VIASH_PAR_N_LAYERS+x} ]; then
+  VIASH_PAR_N_LAYERS="2"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_N_HVG" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hvg' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_LATENT" ]]; then
+  if ! [[ "$VIASH_PAR_N_LATENT" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_latent' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_HIDDEN" ]]; then
+  if ! [[ "$VIASH_PAR_N_HIDDEN" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hidden' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_LAYERS" ]]; then
+  if ! [[ "$VIASH_PAR_N_LAYERS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_layers' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_MAX_EPOCHS_SCVI" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_EPOCHS_SCVI" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_epochs_scvi' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_MAX_EPOCHS_SCANVI" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_EPOCHS_SCANVI" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_epochs_scanvi' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanvi:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanvi:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scanvi:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-scanvi-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+from scvi.model import SCVI, SCANVI
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_latent': $( if [ ! -z ${VIASH_PAR_N_LATENT+x} ]; then echo "int(r'${VIASH_PAR_N_LATENT//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_hidden': $( if [ ! -z ${VIASH_PAR_N_HIDDEN+x} ]; then echo "int(r'${VIASH_PAR_N_HIDDEN//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_layers': $( if [ ! -z ${VIASH_PAR_N_LAYERS+x} ]; then echo "int(r'${VIASH_PAR_N_LAYERS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'max_epochs_scvi': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_SCVI+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_SCVI//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'max_epochs_scanvi': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_SCANVI+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_SCANVI//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/counts',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    adata = adata[:, idx].copy()
+
+print("Processing data", flush=True)
+SCVI.setup_anndata(adata, batch_key="batch")
+
+print("Run scVI", flush=True)
+model_kwargs = {
+    key: par[key]
+    for key in ["n_latent", "n_hidden", "n_layers"]
+    if par[key] is not None
+}
+
+vae = SCVI(adata, **model_kwargs)
+
+vae.train(max_epochs=par["max_epochs_scvi"], train_size=1.0)
+
+print('Run SCANVI', flush=True)
+scanvae = SCANVI.from_scvi_model(
+    scvi_model=vae,
+    labels_key="label",
+    unlabeled_category="UnknownUnknown", # pick anything definitely not in a dataset
+)
+scanvae.train(max_epochs=par["max_epochs_scanvi"], train_size=1.0)
+
+print("Store outputs", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    obsm={
+        "X_emb": scanvae.get_latent_representation(),
+    },
+    uns={
+        "dataset_id": adata.uns["dataset_id"],
+        "normalization_id": adata.uns["normalization_id"],
+        "method_id": meta["functionality_name"],
+    },
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/methods/scvi/.config.vsh.yaml b/target/docker/batch_integration/methods/scvi/.config.vsh.yaml
new file mode 100644
index 0000000000..bd89f62248
--- /dev/null
+++ b/target/docker/batch_integration/methods/scvi/.config.vsh.yaml
@@ -0,0 +1,281 @@
+functionality:
+  name: "scvi"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to use."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_latent"
+    description: "Number of latent dimensions."
+    info: null
+    default:
+    - 30
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hidden"
+    description: "Number of hidden units."
+    info: null
+    default:
+    - 128
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_layers"
+    description: "Number of layers."
+    info: null
+    default:
+    - 2
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs"
+    description: "Maximum number of epochs."
+    info: null
+    example:
+    - 400
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "scVI"
+    summary: "scVI combines a variational autoencoder with a hierarchical Bayesian\
+      \ model."
+    description: "scVI combines a variational autoencoder with a hierarchical Bayesian\
+      \ model. It uses the negative binomial distribution to describe gene expression\
+      \ of each cell, conditioned on unobserved factors and the batch variable. ScVI\
+      \ is run as implemented in Luecken et al.\n"
+    reference: "lopez2018deep"
+    repository_url: "https://github.com/scverse/scvi-tools"
+    documentation_url: "https://docs.scvi-tools.org/en/stable/user_guide/models/scvi.html"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/scvi.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "counts"
+    variants:
+      scvi_full_unscaled: null
+    type: "method"
+    subtype: "embedding"
+    type_info:
+      label: "Method (embedding)"
+      summary: "A batch integration embedding method."
+      description: "A batch integration method which outputs a batch-corrected embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scvi-tools>=1.1.0"
+    upgrade: true
+  - type: "docker"
+    run:
+    - "pip install -U \"jax[cuda12_pip]\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scvi/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/scvi"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/methods/scvi/scvi"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/methods/scvi/read_anndata_partial.py b/target/docker/batch_integration/methods/scvi/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/methods/scvi/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/methods/scvi/scvi b/target/docker/batch_integration/methods/scvi/scvi
new file mode 100755
index 0000000000..c82ff25d22
--- /dev/null
+++ b/target/docker/batch_integration/methods/scvi/scvi
@@ -0,0 +1,1106 @@
+#!/usr/bin/env bash
+
+# scvi 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="scvi"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "scvi 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+  echo ""
+  echo "    --n_hvg"
+  echo "        type: integer"
+  echo "        default: 2000"
+  echo "        Number of highly variable genes to use."
+  echo ""
+  echo "    --n_latent"
+  echo "        type: integer"
+  echo "        default: 30"
+  echo "        Number of latent dimensions."
+  echo ""
+  echo "    --n_hidden"
+  echo "        type: integer"
+  echo "        default: 128"
+  echo "        Number of hidden units."
+  echo ""
+  echo "    --n_layers"
+  echo "        type: integer"
+  echo "        default: 2"
+  echo "        Number of layers."
+  echo ""
+  echo "    --max_epochs"
+  echo "        type: integer"
+  echo "        example: 400"
+  echo "        Maximum number of epochs."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scvi-tools>=1.1.0"
+
+RUN pip install -U "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/methods scvi"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:35Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-scvi-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "scvi 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hvg)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hvg=*)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg=*\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_latent)
+            [ -n "$VIASH_PAR_N_LATENT" ] && ViashError Bad arguments for option \'--n_latent\': \'$VIASH_PAR_N_LATENT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_LATENT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_latent. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_latent=*)
+            [ -n "$VIASH_PAR_N_LATENT" ] && ViashError Bad arguments for option \'--n_latent=*\': \'$VIASH_PAR_N_LATENT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_LATENT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hidden)
+            [ -n "$VIASH_PAR_N_HIDDEN" ] && ViashError Bad arguments for option \'--n_hidden\': \'$VIASH_PAR_N_HIDDEN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HIDDEN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hidden. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hidden=*)
+            [ -n "$VIASH_PAR_N_HIDDEN" ] && ViashError Bad arguments for option \'--n_hidden=*\': \'$VIASH_PAR_N_HIDDEN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HIDDEN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_layers)
+            [ -n "$VIASH_PAR_N_LAYERS" ] && ViashError Bad arguments for option \'--n_layers\': \'$VIASH_PAR_N_LAYERS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_LAYERS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_layers. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_layers=*)
+            [ -n "$VIASH_PAR_N_LAYERS" ] && ViashError Bad arguments for option \'--n_layers=*\': \'$VIASH_PAR_N_LAYERS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_LAYERS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_epochs)
+            [ -n "$VIASH_PAR_MAX_EPOCHS" ] && ViashError Bad arguments for option \'--max_epochs\': \'$VIASH_PAR_MAX_EPOCHS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_epochs. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_epochs=*)
+            [ -n "$VIASH_PAR_MAX_EPOCHS" ] && ViashError Bad arguments for option \'--max_epochs=*\': \'$VIASH_PAR_MAX_EPOCHS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scvi:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scvi:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scvi:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scvi:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_HVG+x} ]; then
+  VIASH_PAR_N_HVG="2000"
+fi
+if [ -z ${VIASH_PAR_N_LATENT+x} ]; then
+  VIASH_PAR_N_LATENT="30"
+fi
+if [ -z ${VIASH_PAR_N_HIDDEN+x} ]; then
+  VIASH_PAR_N_HIDDEN="128"
+fi
+if [ -z ${VIASH_PAR_N_LAYERS+x} ]; then
+  VIASH_PAR_N_LAYERS="2"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_N_HVG" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hvg' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_LATENT" ]]; then
+  if ! [[ "$VIASH_PAR_N_LATENT" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_latent' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_HIDDEN" ]]; then
+  if ! [[ "$VIASH_PAR_N_HIDDEN" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hidden' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_LAYERS" ]]; then
+  if ! [[ "$VIASH_PAR_N_LAYERS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_layers' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_MAX_EPOCHS" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_EPOCHS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_epochs' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scvi:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scvi:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/methods/scvi:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-scvi-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+from scvi.model import SCVI
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_latent': $( if [ ! -z ${VIASH_PAR_N_LATENT+x} ]; then echo "int(r'${VIASH_PAR_N_LATENT//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_hidden': $( if [ ! -z ${VIASH_PAR_N_HIDDEN+x} ]; then echo "int(r'${VIASH_PAR_N_HIDDEN//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_layers': $( if [ ! -z ${VIASH_PAR_N_LAYERS+x} ]; then echo "int(r'${VIASH_PAR_N_LAYERS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'max_epochs': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/counts',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    adata = adata[:, idx].copy()
+
+print("Processing data", flush=True)
+SCVI.setup_anndata(adata, batch_key="batch")
+
+print("Run scVI", flush=True)
+model_kwargs = {
+    key: par[key]
+    for key in ["n_latent", "n_hidden", "n_layers"]
+    if par[key] is not None
+}
+
+vae = SCVI(adata, **model_kwargs)
+
+vae.train(max_epochs=par["max_epochs"], train_size=1.0)
+
+print("Store outputs", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    obsm={
+        "X_emb": vae.get_latent_representation(),
+    },
+    uns={
+        "dataset_id": adata.uns["dataset_id"],
+        "normalization_id": adata.uns["normalization_id"],
+        "method_id": meta["functionality_name"],
+    },
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/metrics/asw_batch/.config.vsh.yaml b/target/docker/batch_integration/metrics/asw_batch/.config.vsh.yaml
new file mode 100644
index 0000000000..3884ca99b1
--- /dev/null
+++ b/target/docker/batch_integration/metrics/asw_batch/.config.vsh.yaml
@@ -0,0 +1,292 @@
+functionality:
+  name: "asw_batch"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/"
+    dest: "resources_test/batch_integration/"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "asw_batch"
+      label: "ASW batch"
+      summary: "Average silhouette of batches per cell identity label (cell type)"
+      description: "We consider the absolute silhouette width, s(i), on\nbatch labels\
+        \ per cell i. Here, 0 indicates that batches are well mixed, and any\ndeviation\
+        \ from 0 indicates a batch effect:\n\U0001d460batch(\U0001d456)=|\U0001d460\
+        (\U0001d456)|.\n\nTo ensure higher scores indicate better batch mixing, these\
+        \ scores are scaled by\nsubtracting them from 1. As we expect batches to integrate\
+        \ within cell identity\nclusters, we compute the batchASWj score for each\
+        \ cell label j separately,\nusing the equation:\nbatchASW\U0001d457=1|\U0001d436\
+        \U0001d457|∑\U0001d456∈\U0001d436\U0001d4571−\U0001d460batch(\U0001d456),\n\
+        \nwhere Cj is the set of cells with the cell label j and |Cj| denotes the\
+        \ number of cells\nin that set.\n\nTo obtain the final batchASW score, the\
+        \ label-specific batchASWj scores are averaged:\nbatchASW=1|\U0001d440|∑\U0001d457\
+        ∈\U0001d440batchASW\U0001d457.\n\nHere, M is the set of unique cell labels.\n"
+      reference: "luecken2022benchmarking"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_embed/metrics/sil_batch.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    subtype: "embedding"
+    type_info:
+      label: "Metric (embedding)"
+      summary: "A batch integration embedding metric."
+      description: "A metric for evaluating batch corrected embeddings.\n"
+    test_setup:
+      pancreas:
+        input_integrated: "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/pancreas/solution.h5ad"
+      cellxgene_census:
+        input_integrated: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/asw_batch/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/asw_batch"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/asw_batch/asw_batch"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/metrics/asw_batch/asw_batch b/target/docker/batch_integration/metrics/asw_batch/asw_batch
new file mode 100755
index 0000000000..54d21759b7
--- /dev/null
+++ b/target/docker/batch_integration/metrics/asw_batch/asw_batch
@@ -0,0 +1,989 @@
+#!/usr/bin/env bash
+
+# asw_batch 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="asw_batch"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "asw_batch 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_integrated"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: score.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scib==1.1.5"
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/metrics asw_batch"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:34Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-asw_batch-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "asw_batch 2.0.0"
+            exit
+            ;;
+        --input_integrated)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_integrated. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_integrated=*)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated=*\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/asw_batch:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/asw_batch:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/asw_batch:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/asw_batch:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then
+  ViashError '--input_integrated' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ] && [ ! -e "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_INTEGRATED' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_INTEGRATED")" )
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/asw_batch:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/asw_batch:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/asw_batch:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-asw_batch-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+from scib.metrics import silhouette_batch
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsm='obsm', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('compute score', flush=True)
+score = silhouette_batch(
+    adata,
+    batch_key='batch',
+    label_key='label',
+    embed='X_emb',
+)
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': adata.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashStripAutomount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/metrics/asw_batch/read_anndata_partial.py b/target/docker/batch_integration/metrics/asw_batch/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/metrics/asw_batch/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/metrics/asw_label/.config.vsh.yaml b/target/docker/batch_integration/metrics/asw_label/.config.vsh.yaml
new file mode 100644
index 0000000000..0696409f58
--- /dev/null
+++ b/target/docker/batch_integration/metrics/asw_label/.config.vsh.yaml
@@ -0,0 +1,284 @@
+functionality:
+  name: "asw_label"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/"
+    dest: "resources_test/batch_integration/"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "asw_label"
+      label: "ASW Label"
+      summary: "Average silhouette of cell identity labels (cell types)"
+      description: "For the bio-conservation score, the ASW was computed on cell identity\
+        \ labels and\nscaled to a value between 0 and 1 using the equation:\ncelltypeASW=(ASW_C+1)/2,\n\
+        \nwhere C denotes the set of all cell identity labels.\nFor information about\
+        \ the batch silhouette score, check sil_batch.\n"
+      reference: "luecken2022benchmarking"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_embed/metrics/silhouette.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    subtype: "embedding"
+    type_info:
+      label: "Metric (embedding)"
+      summary: "A batch integration embedding metric."
+      description: "A metric for evaluating batch corrected embeddings.\n"
+    test_setup:
+      pancreas:
+        input_integrated: "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/pancreas/solution.h5ad"
+      cellxgene_census:
+        input_integrated: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/asw_label/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/asw_label"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/asw_label/asw_label"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/metrics/asw_label/asw_label b/target/docker/batch_integration/metrics/asw_label/asw_label
new file mode 100755
index 0000000000..f68a8e8590
--- /dev/null
+++ b/target/docker/batch_integration/metrics/asw_label/asw_label
@@ -0,0 +1,988 @@
+#!/usr/bin/env bash
+
+# asw_label 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="asw_label"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "asw_label 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_integrated"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: score.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scib==1.1.5"
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/metrics asw_label"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:34Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-asw_label-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "asw_label 2.0.0"
+            exit
+            ;;
+        --input_integrated)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_integrated. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_integrated=*)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated=*\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/asw_label:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/asw_label:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/asw_label:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/asw_label:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then
+  ViashError '--input_integrated' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ] && [ ! -e "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_INTEGRATED' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_INTEGRATED")" )
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/asw_label:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/asw_label:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/asw_label:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-asw_label-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+from scib.metrics import silhouette
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsm='obsm', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('compute score', flush=True)
+score = silhouette(
+    adata,
+    label_key='label',
+    embed='X_emb'
+)
+
+print("Create output AnnData object", flush=True)
+output = ad.AnnData(
+    uns={
+        "dataset_id": adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        "method_id": adata.uns['method_id'],
+        "metric_ids": [meta['functionality_name']],
+        "metric_values": [score]
+    }
+)
+
+print("Write data to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashStripAutomount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/metrics/asw_label/read_anndata_partial.py b/target/docker/batch_integration/metrics/asw_label/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/metrics/asw_label/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/metrics/cell_cycle_conservation/.config.vsh.yaml b/target/docker/batch_integration/metrics/cell_cycle_conservation/.config.vsh.yaml
new file mode 100644
index 0000000000..ffbc28c68d
--- /dev/null
+++ b/target/docker/batch_integration/metrics/cell_cycle_conservation/.config.vsh.yaml
@@ -0,0 +1,296 @@
+functionality:
+  name: "cell_cycle_conservation"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/"
+    dest: "resources_test/batch_integration/"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "cell_cycle_conservation"
+      label: "Cell Cycle Conservation"
+      summary: "Cell cycle conservation score based on principle component regression\
+        \ on cell cycle gene scores"
+      description: "The cell-cycle conservation score evaluates how well the cell-cycle\
+        \ effect can be\ncaptured before and after integration. We computed cell-cycle\
+        \ scores using Scanpy’s\nscore_cell_cycle function with a reference gene set\
+        \ from Tirosh et al for the\nrespective cell-cycle phases. We used the same\
+        \ set of cell-cycle genes for mouse and\nhuman data (using capitalization\
+        \ to convert between the gene symbols). We then computed\nthe variance contribution\
+        \ of the resulting S and G2/M phase scores using principal\ncomponent regression\
+        \ (Principal component regression), which was performed for each\nbatch separately.\
+        \ The differences in variance before, Varbefore, and after, Varafter,\nintegration\
+        \ were aggregated into a final score between 0 and 1, using the equation:\n\
+        CCconservation=1−|Varafter−Varbefore|/Varbefore.\n\nIn this equation, values\
+        \ close to 0 indicate lower conservation and 1 indicates complete\nconservation\
+        \ of the variance explained by cell cycle. In other words, the variance\n\
+        remains unchanged within each batch for complete conservation, while any deviation\
+        \ from\nthe preintegration variance contribution reduces the score.\n"
+      reference: "luecken2022benchmarking"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_embed/metrics/cc_score.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    subtype: "embedding"
+    type_info:
+      label: "Metric (embedding)"
+      summary: "A batch integration embedding metric."
+      description: "A metric for evaluating batch corrected embeddings.\n"
+    test_setup:
+      pancreas:
+        input_integrated: "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/pancreas/solution.h5ad"
+      cellxgene_census:
+        input_integrated: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/cell_cycle_conservation/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/cell_cycle_conservation"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/cell_cycle_conservation/cell_cycle_conservation"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/metrics/cell_cycle_conservation/cell_cycle_conservation b/target/docker/batch_integration/metrics/cell_cycle_conservation/cell_cycle_conservation
new file mode 100755
index 0000000000..e7afed4a83
--- /dev/null
+++ b/target/docker/batch_integration/metrics/cell_cycle_conservation/cell_cycle_conservation
@@ -0,0 +1,1013 @@
+#!/usr/bin/env bash
+
+# cell_cycle_conservation 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="cell_cycle_conservation"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "cell_cycle_conservation 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_integrated"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: score.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scib==1.1.5"
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/metrics cell_cycle_conservation"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:35Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-cell_cycle_conservation-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "cell_cycle_conservation 2.0.0"
+            exit
+            ;;
+        --input_integrated)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_integrated. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_integrated=*)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated=*\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/cell_cycle_conservation:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/cell_cycle_conservation:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/cell_cycle_conservation:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/cell_cycle_conservation:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then
+  ViashError '--input_integrated' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ] && [ ! -e "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_INTEGRATED' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_INTEGRATED")" )
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/cell_cycle_conservation:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/cell_cycle_conservation:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/cell_cycle_conservation:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-cell_cycle_conservation-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+from scib.metrics import cell_cycle
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata_solution = read_anndata(
+    par['input_solution'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+adata_integrated = read_anndata(
+    par['input_integrated'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+print('Use gene symbols for features', flush=True)
+adata_solution.var_names = adata_solution.var['feature_name']
+
+translator = {
+    "homo_sapiens": "human",
+    "mus_musculus": "mouse",
+}
+
+print('Compute score', flush=True)
+if adata_solution.uns['dataset_organism'] not in translator:
+    score = np.nan
+else:
+    organism = translator[adata_solution.uns['dataset_organism']]
+    score = cell_cycle(
+        adata_solution,
+        adata_integrated,
+        batch_key='batch',
+        embed='X_emb',
+        organism=organism,
+    )
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata_solution.uns['dataset_id'],
+        'normalization_id': adata_solution.uns['normalization_id'],
+        'method_id': adata_integrated.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashStripAutomount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/metrics/cell_cycle_conservation/read_anndata_partial.py b/target/docker/batch_integration/metrics/cell_cycle_conservation/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/metrics/cell_cycle_conservation/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/metrics/clustering_overlap/.config.vsh.yaml b/target/docker/batch_integration/metrics/clustering_overlap/.config.vsh.yaml
new file mode 100644
index 0000000000..7e954bac8c
--- /dev/null
+++ b/target/docker/batch_integration/metrics/clustering_overlap/.config.vsh.yaml
@@ -0,0 +1,313 @@
+functionality:
+  name: "clustering_overlap"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "graph"
+      label: "Integrated Graph"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        obsp:
+        - type: "double"
+          name: "connectivities"
+          description: "Neighbors connectivities matrix."
+          required: true
+        - type: "double"
+          name: "distances"
+          description: "Neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "object"
+          name: "neighbors"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "ari"
+      label: "ARI"
+      summary: "Adjusted Rand Index compares clustering overlap, correcting for random\
+        \ labels and considering correct overlaps and disagreements."
+      description: "The Adjusted Rand Index (ARI) compares the overlap of two clusterings;\n\
+        it considers both correct clustering overlaps while also counting correct\n\
+        disagreements between two clusterings.\nWe compared the cell-type labels with\
+        \ the NMI-optimized\nLouvain clustering computed on the integrated dataset.\n\
+        The adjustment of the Rand index corrects for randomly correct labels.\nAn\
+        \ ARI of 0 or 1 corresponds to random labeling or a perfect match,\nrespectively.\n"
+      reference:
+      - "hubert1985comparing"
+      - "luecken2022benchmarking"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_graph/metrics/ari.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    - name: "nmi"
+      label: "NMI"
+      summary: "NMI compares overlap by scaling using mean entropy terms and optimizing\
+        \ Louvain clustering to obtain the best match between clusters and labels."
+      description: "Normalized Mutual Information (NMI) compares the overlap of two\
+        \ clusterings.\nWe used NMI to compare the cell-type labels with Louvain clusters\
+        \ computed on\nthe integrated dataset. The overlap was scaled using the mean\
+        \ of the entropy terms\nfor cell-type and cluster labels. Thus, NMI scores\
+        \ of 0 or 1 correspond to uncorrelated\nclustering or a perfect match, respectively.\
+        \ We performed optimized Louvain clustering\nfor this metric to obtain the\
+        \ best match between clusters and labels.\n"
+      reference:
+      - "amelio2015normalized"
+      - "luecken2022benchmarking"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_graph/metrics/nmi.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    subtype: "graph"
+    type_info:
+      label: "Metric (graph)"
+      summary: "A batch integration graph metric."
+      description: "A metric for evaluating batch corrected cell graphs.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/clustering_overlap/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/clustering_overlap"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/clustering_overlap/clustering_overlap"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/metrics/clustering_overlap/clustering_overlap b/target/docker/batch_integration/metrics/clustering_overlap/clustering_overlap
new file mode 100755
index 0000000000..f9cc076187
--- /dev/null
+++ b/target/docker/batch_integration/metrics/clustering_overlap/clustering_overlap
@@ -0,0 +1,996 @@
+#!/usr/bin/env bash
+
+# clustering_overlap 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="clustering_overlap"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "clustering_overlap 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_integrated"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: score.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scib==1.1.5"
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/metrics clustering_overlap"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:35Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-clustering_overlap-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "clustering_overlap 2.0.0"
+            exit
+            ;;
+        --input_integrated)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_integrated. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_integrated=*)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated=*\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/clustering_overlap:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/clustering_overlap:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/clustering_overlap:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/clustering_overlap:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then
+  ViashError '--input_integrated' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ] && [ ! -e "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_INTEGRATED' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_INTEGRATED")" )
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/clustering_overlap:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/clustering_overlap:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/clustering_overlap:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-clustering_overlap-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+import scanpy as sc
+from scib.metrics.clustering import cluster_optimal_resolution
+from scib.metrics import ari, nmi
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsp='obsp', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('Run optimal Leiden clustering', flush=True)
+cluster_optimal_resolution(
+    adata=adata,
+    label_key='label',
+    cluster_key='cluster',
+    cluster_function=sc.tl.leiden,
+)
+
+print('Compute ARI score', flush=True)
+ari_score = ari(adata, cluster_key='cluster', label_key='label')
+
+print('Compute NMI score', flush=True)
+nmi_score = nmi(adata, cluster_key='cluster', label_key='label')
+
+print("Create output AnnData object", flush=True)
+output = ad.AnnData(
+    uns={
+        "dataset_id": adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        "method_id": adata.uns['method_id'],
+        "metric_ids": [ "ari", "nmi" ],
+        "metric_values": [ ari_score, nmi_score ]
+    }
+)
+
+print("Write data to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashStripAutomount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/metrics/clustering_overlap/read_anndata_partial.py b/target/docker/batch_integration/metrics/clustering_overlap/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/metrics/clustering_overlap/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/metrics/graph_connectivity/.config.vsh.yaml b/target/docker/batch_integration/metrics/graph_connectivity/.config.vsh.yaml
new file mode 100644
index 0000000000..7c5b2767e5
--- /dev/null
+++ b/target/docker/batch_integration/metrics/graph_connectivity/.config.vsh.yaml
@@ -0,0 +1,297 @@
+functionality:
+  name: "graph_connectivity"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "graph"
+      label: "Integrated Graph"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        obsp:
+        - type: "double"
+          name: "connectivities"
+          description: "Neighbors connectivities matrix."
+          required: true
+        - type: "double"
+          name: "distances"
+          description: "Neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "object"
+          name: "neighbors"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "graph_connectivity"
+      label: "Graph Connectivity"
+      summary: "Connectivity of the subgraph per cell type label"
+      description: "The graph connectivity metric assesses whether the kNN graph representation,\n\
+        G, of the integrated data directly connects all cells with the same cell\n\
+        identity label. For each cell identity label c, we created the subset kNN\n\
+        graph G(Nc;Ec) to contain only cells from a given label. Using these subset\n\
+        kNN graphs, we computed the graph connectivity score using the equation:\n\
+        \ngc =1/|C| Σc∈C |LCC(G(Nc;Ec))|/|Nc|.\n\nHere, C represents the set of cell\
+        \ identity labels, |LCC()| is the number\nof nodes in the largest connected\
+        \ component of the graph, and |Nc| is the\nnumber of nodes with cell identity\
+        \ c. The resultant score has a range\nof (0;1], where 1 indicates that all\
+        \ cells with the same cell identity\nare connected in the integrated kNN graph,\
+        \ and the lowest possible score\nindicates a graph where no cell is connected.\
+        \ As this score is computed\non the kNN graph, it can be used to evaluate\
+        \ all integration outputs.\n"
+      reference: "luecken2022benchmarking"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "https://github.com/openproblems-bio/openproblems/blob/main/openproblems/tasks/_batch_integration/batch_integration_graph/metrics/graph_connectivity.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    subtype: "graph"
+    type_info:
+      label: "Metric (graph)"
+      summary: "A batch integration graph metric."
+      description: "A metric for evaluating batch corrected cell graphs.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/graph_connectivity/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/graph_connectivity"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/graph_connectivity/graph_connectivity"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/metrics/graph_connectivity/graph_connectivity b/target/docker/batch_integration/metrics/graph_connectivity/graph_connectivity
new file mode 100755
index 0000000000..176d2e053b
--- /dev/null
+++ b/target/docker/batch_integration/metrics/graph_connectivity/graph_connectivity
@@ -0,0 +1,986 @@
+#!/usr/bin/env bash
+
+# graph_connectivity 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="graph_connectivity"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "graph_connectivity 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_integrated"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: score.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scib==1.1.5"
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/metrics graph_connectivity"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:35Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-graph_connectivity-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "graph_connectivity 2.0.0"
+            exit
+            ;;
+        --input_integrated)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_integrated. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_integrated=*)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated=*\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/graph_connectivity:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/graph_connectivity:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/graph_connectivity:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/graph_connectivity:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then
+  ViashError '--input_integrated' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ] && [ ! -e "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_INTEGRATED' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_INTEGRATED")" )
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/graph_connectivity:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/graph_connectivity:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/graph_connectivity:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-graph_connectivity-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+import scib
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsp='obsp', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('compute score', flush=True)
+score = scib.metrics.graph_connectivity(
+    adata,
+    label_key='label'
+)
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': adata.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashStripAutomount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/metrics/graph_connectivity/read_anndata_partial.py b/target/docker/batch_integration/metrics/graph_connectivity/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/metrics/graph_connectivity/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/metrics/hvg_overlap/.config.vsh.yaml b/target/docker/batch_integration/metrics/hvg_overlap/.config.vsh.yaml
new file mode 100644
index 0000000000..441e926de0
--- /dev/null
+++ b/target/docker/batch_integration/metrics/hvg_overlap/.config.vsh.yaml
@@ -0,0 +1,287 @@
+functionality:
+  name: "hvg_overlap"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "hvg_overlap"
+      label: "HVG overlap"
+      summary: "Overlap of highly variable genes per batch before and after integration."
+      description: "The HVG conservation score is a proxy for the preservation of\n\
+        the biological signal. If the data integration method returned\na corrected\
+        \ data matrix, we computed the number of HVGs before\nand after correction\
+        \ for each batch via Scanpy’s\nhighly_variable_genes function (using the ‘\
+        cell ranger’ flavor).\nIf available, we computed 500 HVGs per batch. If fewer\
+        \ than 500\ngenes were present in the integrated object for a batch,\nthe\
+        \ number of HVGs was set to half the total genes in that batch.\nThe overlap\
+        \ coefficient is as follows:\noverlap(\U0001d44b,\U0001d44c)=|\U0001d44b∩\U0001d44c\
+        |/min(|\U0001d44b|,|\U0001d44c|),\n\nwhere X and Y denote the fraction of\
+        \ preserved informative genes.\nThe overall HVG score is the mean of the per-batch\
+        \ HVG overlap\ncoefficients.\n"
+      reference: "luecken2022benchmarking"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_feature/metrics/hvg_conservation.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    subtype: "feature"
+    type_info:
+      label: "Metric (feature)"
+      summary: "A batch integration feature metric."
+      description: "A metric for evaluating batch corrected feature spaces.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/hvg_overlap/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/hvg_overlap"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/hvg_overlap/hvg_overlap"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/metrics/hvg_overlap/hvg_overlap b/target/docker/batch_integration/metrics/hvg_overlap/hvg_overlap
new file mode 100755
index 0000000000..1da175dfae
--- /dev/null
+++ b/target/docker/batch_integration/metrics/hvg_overlap/hvg_overlap
@@ -0,0 +1,999 @@
+#!/usr/bin/env bash
+
+# hvg_overlap 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="hvg_overlap"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "hvg_overlap 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_integrated"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: score.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scib==1.1.5"
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/metrics hvg_overlap"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:33Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-hvg_overlap-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "hvg_overlap 2.0.0"
+            exit
+            ;;
+        --input_integrated)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_integrated. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_integrated=*)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated=*\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/hvg_overlap:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/hvg_overlap:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/hvg_overlap:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/hvg_overlap:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then
+  ViashError '--input_integrated' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ] && [ ! -e "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_INTEGRATED' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_INTEGRATED")" )
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/hvg_overlap:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/hvg_overlap:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/hvg_overlap:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-hvg_overlap-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+from scib.metrics import hvg_overlap
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata_solution = read_anndata(
+    par['input_solution'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+adata_integrated = read_anndata(
+    par['input_integrated'],
+    X='layers/corrected_counts',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+print('compute score', flush=True)
+score = hvg_overlap(
+    adata_solution,
+    adata_integrated,
+    batch_key="batch"
+)
+
+print("Create output AnnData object", flush=True)
+output = ad.AnnData(
+    uns={
+        "dataset_id": adata_solution.uns['dataset_id'],
+        'normalization_id': adata_solution.uns['normalization_id'],
+        "method_id": adata_integrated.uns['method_id'],
+        "metric_ids": [meta['functionality_name']],
+        "metric_values": [score]
+    }
+)
+
+print("Write data to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashStripAutomount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/metrics/hvg_overlap/read_anndata_partial.py b/target/docker/batch_integration/metrics/hvg_overlap/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/metrics/hvg_overlap/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/metrics/isolated_label_asw/.config.vsh.yaml b/target/docker/batch_integration/metrics/isolated_label_asw/.config.vsh.yaml
new file mode 100644
index 0000000000..2370f6ec9e
--- /dev/null
+++ b/target/docker/batch_integration/metrics/isolated_label_asw/.config.vsh.yaml
@@ -0,0 +1,287 @@
+functionality:
+  name: "isolated_label_asw"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/"
+    dest: "resources_test/batch_integration/"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "isolated_label_asw"
+      label: "Isolated label ASW"
+      summary: "Evaluate how well isolated labels separate by average silhouette width"
+      description: "Isolated cell labels are defined as the labels present in the\
+        \ least number\nof batches in the integration task. The score evaluates how\
+        \ well these isolated labels\nseparate from other cell identities.\n\nThe\
+        \ isolated label ASW score is obtained by computing the\nASW of isolated versus\
+        \ non-isolated labels on the PCA embedding (ASW metric above) and\nscaling\
+        \ this score to be between 0 and 1. The final score for each metric version\n\
+        consists of the mean isolated score of all isolated labels.\n"
+      reference: "luecken2022benchmarking"
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_graph/metrics/iso_label_sil.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      min: 0
+      max: 1
+      maximize: true
+    type: "metric"
+    subtype: "embedding"
+    type_info:
+      label: "Metric (embedding)"
+      summary: "A batch integration embedding metric."
+      description: "A metric for evaluating batch corrected embeddings.\n"
+    test_setup:
+      pancreas:
+        input_integrated: "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/pancreas/solution.h5ad"
+      cellxgene_census:
+        input_integrated: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/isolated_label_asw/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/isolated_label_asw"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/isolated_label_asw/isolated_label_asw"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/metrics/isolated_label_asw/isolated_label_asw b/target/docker/batch_integration/metrics/isolated_label_asw/isolated_label_asw
new file mode 100755
index 0000000000..e32a8e9de1
--- /dev/null
+++ b/target/docker/batch_integration/metrics/isolated_label_asw/isolated_label_asw
@@ -0,0 +1,993 @@
+#!/usr/bin/env bash
+
+# isolated_label_asw 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="isolated_label_asw"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "isolated_label_asw 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_integrated"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: score.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scib==1.1.5"
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/metrics isolated_label_asw"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:33Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-isolated_label_asw-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "isolated_label_asw 2.0.0"
+            exit
+            ;;
+        --input_integrated)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_integrated. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_integrated=*)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated=*\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/isolated_label_asw:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/isolated_label_asw:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/isolated_label_asw:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/isolated_label_asw:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then
+  ViashError '--input_integrated' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ] && [ ! -e "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_INTEGRATED' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_INTEGRATED")" )
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/isolated_label_asw:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/isolated_label_asw:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/isolated_label_asw:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-isolated_label_asw-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+from scib.metrics import isolated_labels_asw
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsm='obsm', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('compute score', flush=True)
+
+score = isolated_labels_asw(
+    adata,
+    label_key='label',
+    batch_key='batch',
+    embed='X_emb',
+    iso_threshold=None,
+    verbose=True,
+)
+print(score, flush=True)
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': adata.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashStripAutomount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/metrics/isolated_label_asw/read_anndata_partial.py b/target/docker/batch_integration/metrics/isolated_label_asw/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/metrics/isolated_label_asw/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/metrics/isolated_label_f1/.config.vsh.yaml b/target/docker/batch_integration/metrics/isolated_label_f1/.config.vsh.yaml
new file mode 100644
index 0000000000..7388f0b4b3
--- /dev/null
+++ b/target/docker/batch_integration/metrics/isolated_label_f1/.config.vsh.yaml
@@ -0,0 +1,303 @@
+functionality:
+  name: "isolated_label_f1"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "graph"
+      label: "Integrated Graph"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        obsp:
+        - type: "double"
+          name: "connectivities"
+          description: "Neighbors connectivities matrix."
+          required: true
+        - type: "double"
+          name: "distances"
+          description: "Neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "object"
+          name: "neighbors"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "isolated_label_f1"
+      label: "Isolated label F1 score"
+      summary: "Evaluate how well isolated labels coincide with clusters"
+      description: "We developed two isolated label scores to evaluate how well the\
+        \ data integration methods\ndealt with cell identity labels shared by few\
+        \ batches. Specifically, we identified\nisolated cell labels as the labels\
+        \ present in the least number of batches in the\nintegration task.\nThe score\
+        \ evaluates how well these isolated labels separate from other cell identities.\n\
+        We implemented the isolated label metric in two versions:\n(1) the best clustering\
+        \ of the isolated label (F1 score) and\n(2) the global ASW of the isolated\
+        \ label. For the cluster-based score,\nwe first optimize the cluster assignment\
+        \ of the isolated label using the F1 score˚\nacross louvain clustering resolutions\
+        \ ranging from 0.1 to 2 in resolution steps of 0.1.\nThe optimal F1 score\
+        \ for the isolated label is then used as the metric score.\nThe F1 score is\
+        \ a weighted mean of precision and recall given by the equation:\n\U0001d439\
+        1=2×(precision×recall)/(precision+recall).\n\nIt returns a value between 0\
+        \ and 1,\nwhere 1 shows that all of the isolated label cells and no others\
+        \ are captured in\nthe cluster. For the isolated label ASW score, we compute\
+        \ the ASW of isolated\nversus nonisolated labels on the PCA embedding (ASW\
+        \ metric above) and scale this\nscore to be between 0 and 1. The final score\
+        \ for each metric version consists of\nthe mean isolated score of all isolated\
+        \ labels.\n"
+      reference: "luecken2022benchmarking"
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_graph/metrics/iso_label_f1.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      min: 0
+      max: 1
+      maximize: true
+    type: "metric"
+    subtype: "graph"
+    type_info:
+      label: "Metric (graph)"
+      summary: "A batch integration graph metric."
+      description: "A metric for evaluating batch corrected cell graphs.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/isolated_label_f1/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/isolated_label_f1"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/isolated_label_f1/isolated_label_f1"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/metrics/isolated_label_f1/isolated_label_f1 b/target/docker/batch_integration/metrics/isolated_label_f1/isolated_label_f1
new file mode 100755
index 0000000000..3e5063a5e0
--- /dev/null
+++ b/target/docker/batch_integration/metrics/isolated_label_f1/isolated_label_f1
@@ -0,0 +1,991 @@
+#!/usr/bin/env bash
+
+# isolated_label_f1 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="isolated_label_f1"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "isolated_label_f1 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_integrated"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: score.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scib==1.1.5"
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/metrics isolated_label_f1"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:34Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-isolated_label_f1-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "isolated_label_f1 2.0.0"
+            exit
+            ;;
+        --input_integrated)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_integrated. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_integrated=*)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated=*\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/isolated_label_f1:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/isolated_label_f1:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/isolated_label_f1:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/isolated_label_f1:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then
+  ViashError '--input_integrated' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ] && [ ! -e "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_INTEGRATED' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_INTEGRATED")" )
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/isolated_label_f1:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/isolated_label_f1:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/isolated_label_f1:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-isolated_label_f1-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+from scib.metrics import isolated_labels_f1
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsp='obsp', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('compute score', flush=True)
+score = isolated_labels_f1(
+    adata,
+    label_key='label',
+    batch_key='batch',
+    embed=None,
+    iso_threshold=None,
+    verbose=True,
+)
+print(score, flush=True)
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': adata.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashStripAutomount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/metrics/isolated_label_f1/read_anndata_partial.py b/target/docker/batch_integration/metrics/isolated_label_f1/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/metrics/isolated_label_f1/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/metrics/kbet/.config.vsh.yaml b/target/docker/batch_integration/metrics/kbet/.config.vsh.yaml
new file mode 100644
index 0000000000..2cb56184a6
--- /dev/null
+++ b/target/docker/batch_integration/metrics/kbet/.config.vsh.yaml
@@ -0,0 +1,308 @@
+functionality:
+  name: "kbet"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/"
+    dest: "resources_test/batch_integration/"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "kbet"
+      label: "kBET"
+      summary: "kBET algorithm to determine how well batches are mixed within a cell\
+        \ type"
+      description: "The kBET algorithm (v.0.99.6, release 4c9dafa) determines whether\
+        \ the label composition\nof a k nearest neighborhood of a cell is similar\
+        \ to the expected (global) label\ncomposition (Buettner et al., Nat Meth 2019).\
+        \ The test is repeated for a random subset\nof cells, and the results are\
+        \ summarized as a rejection rate over all tested\nneighborhoods. Thus, kBET\
+        \ works on a kNN graph.\n\nWe compute kNN graphs where k = 50 for joint embeddings\
+        \ and corrected feature outputs\nvia Scanpy preprocessing steps. To test for\
+        \ technical effects and to account for\ncell-type frequency shifts across\
+        \ datasets, we applied kBET\nseparately on the batch variable for each cell\
+        \ identity label. Using the kBET defaults,\na k equal to the median of the\
+        \ number of cells per batch within each label is used for\nthis computation.\
+        \ Additionally, we set the minimum and maximum thresholds of k to 10 and\n\
+        100, respectively. As kNN graphs that have been subset by cell identity labels\
+        \ may no\nlonger be connected, we compute kBET per connected component. If\
+        \ >25% of cells were\nassigned to connected components too small for kBET\
+        \ computation (smaller than k × 3),\nwe assigned a kBET score of 1 to denote\
+        \ poor batch removal. Subsequently, kBET scores\nfor each label were averaged\
+        \ and subtracted from 1 to give a final kBET score.\n\nIn Open Problems we\
+        \ do not run kBET on graph outputs to avoid computation-intensive\ndiffusion\
+        \ processes being run.\n"
+      reference: "luecken2022benchmarking"
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_embed/metrics/kBET.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      min: 0
+      max: 1
+      maximize: true
+    type: "metric"
+    subtype: "embedding"
+    type_info:
+      label: "Metric (embedding)"
+      summary: "A batch integration embedding metric."
+      description: "A metric for evaluating batch corrected embeddings.\n"
+    test_setup:
+      pancreas:
+        input_integrated: "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/pancreas/solution.h5ad"
+      cellxgene_census:
+        input_integrated: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    github:
+    - "theislab/kBET"
+    bioc_force_install: false
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    - "rpy2>=3"
+    - "anndata2ri"
+    - "scipy<=1.13"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/kbet/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/kbet"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/kbet/kbet"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/metrics/kbet/kbet b/target/docker/batch_integration/metrics/kbet/kbet
new file mode 100755
index 0000000000..3b0b9d9308
--- /dev/null
+++ b/target/docker/batch_integration/metrics/kbet/kbet
@@ -0,0 +1,996 @@
+#!/usr/bin/env bash
+
+# kbet 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="kbet"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "kbet 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_integrated"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: score.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_github(c("theislab/kBET"), repos = "https://cran.rstudio.com")'
+
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scib==1.1.5" "rpy2>=3" "anndata2ri" "scipy<=1.13"
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/metrics kbet"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:34Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-kbet-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "kbet 2.0.0"
+            exit
+            ;;
+        --input_integrated)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_integrated. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_integrated=*)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated=*\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/kbet:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/kbet:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/kbet:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/kbet:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then
+  ViashError '--input_integrated' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ] && [ ! -e "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_INTEGRATED' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_INTEGRATED")" )
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/kbet:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/kbet:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/kbet:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-kbet-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+from scib.metrics import kBET
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsm='obsm', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('compute score', flush=True)
+score = kBET(
+    adata,
+    batch_key="batch",
+    label_key="label",
+    type_="embed",
+    embed="X_emb",
+    scaled=True,
+    verbose=False,
+)
+print(score, flush=True)
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': adata.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashStripAutomount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/metrics/kbet/read_anndata_partial.py b/target/docker/batch_integration/metrics/kbet/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/metrics/kbet/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/metrics/pcr/.config.vsh.yaml b/target/docker/batch_integration/metrics/pcr/.config.vsh.yaml
new file mode 100644
index 0000000000..1f0caa7af7
--- /dev/null
+++ b/target/docker/batch_integration/metrics/pcr/.config.vsh.yaml
@@ -0,0 +1,291 @@
+functionality:
+  name: "pcr"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/"
+    dest: "resources_test/batch_integration/"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "pcr"
+      label: "PCR"
+      summary: "Compare explained variance by batch before and after integration"
+      description: "Principal component regression, derived from PCA, has previously\
+        \ been used to quantify\nbatch removal. Briefly, the R2 was calculated from\
+        \ a linear regression of the\ncovariate of interest (for example, the batch\
+        \ variable B) onto each principal component.\nThe variance contribution of\
+        \ the batch effect per principal component was then\ncalculated as the product\
+        \ of the variance explained by the ith principal component (PC)\nand the corresponding\
+        \ R2(PCi|B). The sum across all variance contributions by the batch\neffects\
+        \ in all principal components gives the total variance explained by the batch\n\
+        variable as follows:\nVar(\U0001d436|\U0001d435)=∑\U0001d456=1\U0001d43aVar(\U0001d436\
+        |PC\U0001d456)×\U0001d4452(PC\U0001d456|\U0001d435),\n\nwhere Var(C|PCi) is\
+        \ the variance of the data matrix C explained by the ith principal\ncomponent.\n"
+      reference: "luecken2022benchmarking"
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_embed/metrics/pcr.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      min: 0
+      max: 1
+      maximize: true
+    type: "metric"
+    subtype: "embedding"
+    type_info:
+      label: "Metric (embedding)"
+      summary: "A batch integration embedding metric."
+      description: "A metric for evaluating batch corrected embeddings.\n"
+    test_setup:
+      pancreas:
+        input_integrated: "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/pancreas/solution.h5ad"
+      cellxgene_census:
+        input_integrated: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/pcr/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/pcr"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/metrics/pcr/pcr"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/metrics/pcr/pcr b/target/docker/batch_integration/metrics/pcr/pcr
new file mode 100755
index 0000000000..9ffd5e519f
--- /dev/null
+++ b/target/docker/batch_integration/metrics/pcr/pcr
@@ -0,0 +1,1003 @@
+#!/usr/bin/env bash
+
+# pcr 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="pcr"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "pcr 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_integrated"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: score.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scib==1.1.5"
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/metrics pcr"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:33Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-pcr-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "pcr 2.0.0"
+            exit
+            ;;
+        --input_integrated)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_integrated. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_integrated=*)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED" ] && ViashError Bad arguments for option \'--input_integrated=*\': \'$VIASH_PAR_INPUT_INTEGRATED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/pcr:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/pcr:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/pcr:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/pcr:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then
+  ViashError '--input_integrated' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ] && [ ! -e "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_INTEGRATED' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_INTEGRATED")" )
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/pcr:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/pcr:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/metrics/pcr:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-pcr-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+from scib.metrics import pcr_comparison
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata_solution = read_anndata(
+    par['input_solution'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    # obsm='obsm',
+    # varm='varm',
+    uns='uns'
+)
+adata_integrated = read_anndata(
+    par['input_integrated'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+print('compute score', flush=True)
+score = pcr_comparison(
+    adata_solution,
+    adata_integrated,
+    embed='X_emb',
+    covariate='batch',
+    verbose=False
+)
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata_solution.uns['dataset_id'],
+        'normalization_id': adata_solution.uns['normalization_id'],
+        'method_id': adata_integrated.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED" ]; then
+  VIASH_PAR_INPUT_INTEGRATED=$(ViashStripAutomount "$VIASH_PAR_INPUT_INTEGRATED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/metrics/pcr/read_anndata_partial.py b/target/docker/batch_integration/metrics/pcr/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/metrics/pcr/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/process_dataset/.config.vsh.yaml b/target/docker/batch_integration/process_dataset/.config.vsh.yaml
new file mode 100644
index 0000000000..07ef6dfd95
--- /dev/null
+++ b/target/docker/batch_integration/process_dataset/.config.vsh.yaml
@@ -0,0 +1,403 @@
+functionality:
+  name: "process_dataset"
+  namespace: "batch_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Common Dataset"
+      summary: "A subset of the common dataset."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type information"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/common/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_dataset"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_label"
+    description: "Which .obs slot to use as label."
+    info: null
+    default:
+    - "cell_type"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_batch"
+    description: "Which .obs slot to use as batch covariate."
+    info: null
+    default:
+    - "batch"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--hvgs"
+    description: "Number of highly variable genes"
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean"
+    name: "--subset_hvg"
+    description: "Whether to subset to highly variable genes"
+    info: null
+    default:
+    - false
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/subset_anndata.py"
+  description: "Preprocess adata object for data integration"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas/"
+    dest: "resources_test/common/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "process_dataset"
+    type_info:
+      label: "Data processor"
+      summary: "A label projection dataset processor."
+      description: "A component for processing a Common Dataset into a task-specific\
+        \ dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/process_dataset/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/process_dataset"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/process_dataset/process_dataset"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/process_dataset/process_dataset b/target/docker/batch_integration/process_dataset/process_dataset
new file mode 100755
index 0000000000..cce51e8626
--- /dev/null
+++ b/target/docker/batch_integration/process_dataset/process_dataset
@@ -0,0 +1,1105 @@
+#!/usr/bin/env bash
+
+# process_dataset 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="process_dataset"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "process_dataset 2.0.0"
+  echo ""
+  echo "Preprocess adata object for data integration"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/common/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output_dataset"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output_solution"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/solution.h5ad"
+  echo ""
+  echo "    --obs_label"
+  echo "        type: string"
+  echo "        default: cell_type"
+  echo "        Which .obs slot to use as label."
+  echo ""
+  echo "    --obs_batch"
+  echo "        type: string"
+  echo "        default: batch"
+  echo "        Which .obs slot to use as batch covariate."
+  echo ""
+  echo "    --hvgs"
+  echo "        type: integer"
+  echo "        default: 2000"
+  echo "        Number of highly variable genes"
+  echo ""
+  echo "    --subset_hvg"
+  echo "        type: boolean"
+  echo "        default: false"
+  echo "        Whether to subset to highly variable genes"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scib==1.1.5"
+
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration process_dataset"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:31Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-process_dataset-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "process_dataset 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_dataset)
+            [ -n "$VIASH_PAR_OUTPUT_DATASET" ] && ViashError Bad arguments for option \'--output_dataset\': \'$VIASH_PAR_OUTPUT_DATASET\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_DATASET="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_dataset. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_dataset=*)
+            [ -n "$VIASH_PAR_OUTPUT_DATASET" ] && ViashError Bad arguments for option \'--output_dataset=*\': \'$VIASH_PAR_OUTPUT_DATASET\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_DATASET=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_solution)
+            [ -n "$VIASH_PAR_OUTPUT_SOLUTION" ] && ViashError Bad arguments for option \'--output_solution\': \'$VIASH_PAR_OUTPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_solution=*)
+            [ -n "$VIASH_PAR_OUTPUT_SOLUTION" ] && ViashError Bad arguments for option \'--output_solution=*\': \'$VIASH_PAR_OUTPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obs_label)
+            [ -n "$VIASH_PAR_OBS_LABEL" ] && ViashError Bad arguments for option \'--obs_label\': \'$VIASH_PAR_OBS_LABEL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_LABEL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_label. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_label=*)
+            [ -n "$VIASH_PAR_OBS_LABEL" ] && ViashError Bad arguments for option \'--obs_label=*\': \'$VIASH_PAR_OBS_LABEL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_LABEL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obs_batch)
+            [ -n "$VIASH_PAR_OBS_BATCH" ] && ViashError Bad arguments for option \'--obs_batch\': \'$VIASH_PAR_OBS_BATCH\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_BATCH="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_batch. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_batch=*)
+            [ -n "$VIASH_PAR_OBS_BATCH" ] && ViashError Bad arguments for option \'--obs_batch=*\': \'$VIASH_PAR_OBS_BATCH\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_BATCH=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --hvgs)
+            [ -n "$VIASH_PAR_HVGS" ] && ViashError Bad arguments for option \'--hvgs\': \'$VIASH_PAR_HVGS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_HVGS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --hvgs. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --hvgs=*)
+            [ -n "$VIASH_PAR_HVGS" ] && ViashError Bad arguments for option \'--hvgs=*\': \'$VIASH_PAR_HVGS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_HVGS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --subset_hvg)
+            [ -n "$VIASH_PAR_SUBSET_HVG" ] && ViashError Bad arguments for option \'--subset_hvg\': \'$VIASH_PAR_SUBSET_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SUBSET_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --subset_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --subset_hvg=*)
+            [ -n "$VIASH_PAR_SUBSET_HVG" ] && ViashError Bad arguments for option \'--subset_hvg=*\': \'$VIASH_PAR_SUBSET_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SUBSET_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/process_dataset:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/process_dataset:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/process_dataset:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/process_dataset:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_DATASET+x} ]; then
+  ViashError '--output_dataset' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_SOLUTION+x} ]; then
+  ViashError '--output_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_OBS_LABEL+x} ]; then
+  VIASH_PAR_OBS_LABEL="cell_type"
+fi
+if [ -z ${VIASH_PAR_OBS_BATCH+x} ]; then
+  VIASH_PAR_OBS_BATCH="batch"
+fi
+if [ -z ${VIASH_PAR_HVGS+x} ]; then
+  VIASH_PAR_HVGS="2000"
+fi
+if [ -z ${VIASH_PAR_SUBSET_HVG+x} ]; then
+  VIASH_PAR_SUBSET_HVG="false"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_HVGS" ]]; then
+  if ! [[ "$VIASH_PAR_HVGS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--hvgs' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_SUBSET_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_SUBSET_HVG" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--subset_hvg' has to be a boolean. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT_DATASET" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_DATASET")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_DATASET")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_SOLUTION")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_SOLUTION")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_DATASET" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_DATASET")" )
+  VIASH_PAR_OUTPUT_DATASET=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_DATASET")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_DATASET" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_SOLUTION")" )
+  VIASH_PAR_OUTPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_SOLUTION")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_SOLUTION" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/process_dataset:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/process_dataset:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/process_dataset:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-process_dataset-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_dataset': $( if [ ! -z ${VIASH_PAR_OUTPUT_DATASET+x} ]; then echo "r'${VIASH_PAR_OUTPUT_DATASET//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_solution': $( if [ ! -z ${VIASH_PAR_OUTPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_OUTPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'obs_label': $( if [ ! -z ${VIASH_PAR_OBS_LABEL+x} ]; then echo "r'${VIASH_PAR_OBS_LABEL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'obs_batch': $( if [ ! -z ${VIASH_PAR_OBS_BATCH+x} ]; then echo "r'${VIASH_PAR_OBS_BATCH//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'hvgs': $( if [ ! -z ${VIASH_PAR_HVGS+x} ]; then echo "int(r'${VIASH_PAR_HVGS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'subset_hvg': $( if [ ! -z ${VIASH_PAR_SUBSET_HVG+x} ]; then echo "r'${VIASH_PAR_SUBSET_HVG//\'/\'\"\'\"r\'}'.lower() == 'true'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# import helper functions
+sys.path.append(meta['resources_dir'])
+from subset_anndata import read_config_slots_info, subset_anndata
+
+print('Read input', flush=True)
+input = ad.read_h5ad(par['input'])
+
+def compute_batched_hvg(adata, n_hvgs):
+    adata = adata.copy()
+    adata.X = adata.layers['normalized'].copy()
+    if n_hvgs > adata.n_vars or n_hvgs <= 0:
+        hvg_list = adata.var_names.tolist()
+    else:
+        import scib
+        hvg_list = scib.pp.hvg_batch(
+            adata,
+            batch_key='batch',
+            target_genes=n_hvgs,
+            adataOut=False
+        )
+    adata.var['hvg'] = adata.var_names.isin(hvg_list)
+    del adata.X
+    return adata
+
+print(f'Select {par["hvgs"]} highly variable genes', flush=True)
+adata_with_hvg = compute_batched_hvg(input, n_hvgs=par['hvgs'])
+
+if par['subset_hvg']:
+    print('Subsetting to HVG dimensions', flush=True)
+    adata_with_hvg = adata_with_hvg[:, adata_with_hvg.var['hvg']].copy()
+
+print(">> Figuring out which data needs to be copied to which output file", flush=True)
+# use par arguments to look for label and batch value in different slots
+slot_mapping = {
+    "obs": {
+        "label": par["obs_label"],
+        "batch": par["obs_batch"],
+    }
+}
+slot_info = read_config_slots_info(meta["config"], slot_mapping)
+
+print(">> Create output object", flush=True)
+output_dataset = subset_anndata(adata_with_hvg, slot_info["output_dataset"])
+output_solution = subset_anndata(adata_with_hvg, slot_info["output_solution"])
+
+print('Writing adatas to file', flush=True)
+output_dataset.write(par['output_dataset'], compression='gzip')
+output_solution.write(par['output_solution'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_DATASET" ]; then
+  VIASH_PAR_OUTPUT_DATASET=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_DATASET")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ]; then
+  VIASH_PAR_OUTPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT_DATASET" ] && [ ! -e "$VIASH_PAR_OUTPUT_DATASET" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_DATASET' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_OUTPUT_SOLUTION" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/process_dataset/subset_anndata.py b/target/docker/batch_integration/process_dataset/subset_anndata.py
new file mode 100644
index 0000000000..80bd160872
--- /dev/null
+++ b/target/docker/batch_integration/process_dataset/subset_anndata.py
@@ -0,0 +1,83 @@
+"""Helper functions related to subsetting AnnData objects based on the file format 
+specifications in the .config.vsh.yaml and slot mapping overrides."""
+
+def read_config_slots_info(config_file, slot_mapping = {}):
+    """Read the .config.vsh.yaml to find out which output slots need to be copied to which output file.
+    
+    Arguments:
+    config_file -- Path to the .config.vsh.yaml file (required).
+    slot_mapping -- Which slots to retain. Must be a dictionary whose keys are the names 
+      of the AnnData structs, and values is another dictionary with destination value
+      names as keys and source value names as values.
+      Example of slot_mapping: 
+        ```
+        slot_mapping = {
+          "layers": {
+            "counts": par["layer_counts"],
+          },
+          "obs": {
+            "cell_type": par["obs_cell_type"],
+            "batch": par["obs_batch"],
+          }
+        }
+        ```
+    """
+    import yaml
+    import re
+
+    # read output spec from yaml
+    with open(config_file, "r") as object_name:
+        config = yaml.safe_load(object_name)
+
+    output_struct_slots = {}
+
+    # fetch info on which slots should be copied to which file
+    for arg in config["functionality"]["arguments"]:
+        # argument is an output file with a slot specification
+        if arg["direction"] == "output" and arg.get("info", {}).get("slots"):
+            object_name = re.sub("--", "", arg["name"])
+            
+            struct_slots = arg['info']['slots']
+            out = {}
+            for (struct, slots) in struct_slots.items():
+                out_struct = {}
+                for slot in slots:
+                    # if slot_mapping[struct][slot['name']] exists, use that as the source slot name
+                    # otherwise use slot['name']
+                    source_slot = slot_mapping.get(struct, {}).get(slot["name"], slot["name"])
+                    out_struct[slot["name"]] = source_slot
+                out[struct] = out_struct
+
+            output_struct_slots[object_name] = out
+
+    return output_struct_slots
+
+# create new anndata objects according to api spec
+def subset_anndata(adata, slot_info):
+    """Create new anndata object according to slot info specifications.
+    
+    Arguments:
+    adata -- An AnnData object to subset (required)
+    slot_info -- Which slots to retain, typically one of the items in the output of read_config_slots_info.
+      Must be a dictionary whose keys are the names of the AnnData structs, and values is another 
+      dictionary with destination value names as keys and source value names as values. 
+      """
+    import pandas as pd
+    import anndata as ad
+
+    structs = ["layers", "obs", "var", "uns", "obsp", "obsm", "varp", "varm"]
+    kwargs = {}
+
+    for struct in structs:
+        slot_mapping = slot_info.get(struct, {})
+        data = {dest : getattr(adata, struct)[src] for (dest, src) in slot_mapping.items()}
+        if len(data) > 0:
+            if struct in ['obs', 'var']:
+                data = pd.concat(data, axis=1)
+            kwargs[struct] = data
+        elif struct in ['obs', 'var']:
+            # if no columns need to be copied, we still need an 'obs' and a 'var' 
+            # to help determine the shape of the adata
+            kwargs[struct] = getattr(adata, struct).iloc[:,[]]
+
+    return ad.AnnData(**kwargs)
\ No newline at end of file
diff --git a/target/docker/batch_integration/transformers/embed_to_graph/.config.vsh.yaml b/target/docker/batch_integration/transformers/embed_to_graph/.config.vsh.yaml
new file mode 100644
index 0000000000..46d581d989
--- /dev/null
+++ b/target/docker/batch_integration/transformers/embed_to_graph/.config.vsh.yaml
@@ -0,0 +1,168 @@
+functionality:
+  name: "embed_to_graph"
+  namespace: "batch_integration/transformers"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "graph"
+      label: "Integrated Graph"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        obsp:
+        - type: "double"
+          name: "connectivities"
+          description: "Neighbors connectivities matrix."
+          required: true
+        - type: "double"
+          name: "distances"
+          description: "Neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "object"
+          name: "neighbors"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Embedding to Graph"
+    summary: "Transform an embedding to a graph output."
+    description: "Transform an embedding to a graph output by applying the k nearest\
+      \ neighbors algorithm.\n"
+    type: "transformer"
+    subtype: "graph"
+    type_info:
+      label: "Embedding to Graph"
+      summary: "Transform an embedding to a graph output."
+      description: "Transform an embedding to a graph output by applying the k nearest\
+        \ neighbors algorithm.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/transformers/embed_to_graph/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/transformers/embed_to_graph"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/transformers/embed_to_graph/embed_to_graph"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/transformers/embed_to_graph/embed_to_graph b/target/docker/batch_integration/transformers/embed_to_graph/embed_to_graph
new file mode 100755
index 0000000000..ea3cc7f357
--- /dev/null
+++ b/target/docker/batch_integration/transformers/embed_to_graph/embed_to_graph
@@ -0,0 +1,943 @@
+#!/usr/bin/env bash
+
+# embed_to_graph 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="embed_to_graph"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "embed_to_graph 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/transformers embed_to_graph"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:32Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-embed_to_graph-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "embed_to_graph 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/transformers/embed_to_graph:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/transformers/embed_to_graph:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/transformers/embed_to_graph:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/transformers/embed_to_graph:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/transformers/embed_to_graph:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/transformers/embed_to_graph:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/transformers/embed_to_graph:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-embed_to_graph-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+
+print('Run kNN...', flush=True)
+sc.pp.neighbors(adata, use_rep='X_emb')
+
+print("Store outputs", flush=True)
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/transformers/embed_to_graph/read_anndata_partial.py b/target/docker/batch_integration/transformers/embed_to_graph/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/transformers/embed_to_graph/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/batch_integration/transformers/feature_to_embed/.config.vsh.yaml b/target/docker/batch_integration/transformers/feature_to_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..01b95da72a
--- /dev/null
+++ b/target/docker/batch_integration/transformers/feature_to_embed/.config.vsh.yaml
@@ -0,0 +1,160 @@
+functionality:
+  name: "feature_to_embed"
+  namespace: "batch_integration/transformers"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    type: "transformer"
+    label: "Feature to Embedding"
+    summary: "Transform a feature output to an embedding."
+    description: "Transform a feature output to an embedding by computing a PCA on\
+      \ the corrected counts.\n"
+    subtype: "embedding"
+    type_info:
+      label: "Feature to Embedding"
+      summary: "Transform a feature output to an embedding."
+      description: "Transform a feature output to an embedding by computing a PCA\
+        \ on the corrected counts.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/transformers/feature_to_embed/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/transformers/feature_to_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/batch_integration/transformers/feature_to_embed/feature_to_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/batch_integration/transformers/feature_to_embed/feature_to_embed b/target/docker/batch_integration/transformers/feature_to_embed/feature_to_embed
new file mode 100755
index 0000000000..f2b31cf4c2
--- /dev/null
+++ b/target/docker/batch_integration/transformers/feature_to_embed/feature_to_embed
@@ -0,0 +1,951 @@
+#!/usr/bin/env bash
+
+# feature_to_embed 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="feature_to_embed"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "feature_to_embed 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component batch_integration/transformers feature_to_embed"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:32Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-feature_to_embed-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "feature_to_embed 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/transformers/feature_to_embed:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/batch_integration/transformers/feature_to_embed:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/transformers/feature_to_embed:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/batch_integration/transformers/feature_to_embed:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/batch_integration/transformers/feature_to_embed:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/transformers/feature_to_embed:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/batch_integration/transformers/feature_to_embed:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-feature_to_embed-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/corrected_counts',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+
+print('Run PCA', flush=True)
+adata.obsm['X_emb'] = sc.pp.pca(
+    adata.X,
+    n_comps=50,
+    use_highly_variable=False,  # Do we want to set this to True?
+    svd_solver='arpack',
+    return_info=False
+)
+
+print('Store outputs', flush=True)
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/batch_integration/transformers/feature_to_embed/read_anndata_partial.py b/target/docker/batch_integration/transformers/feature_to_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/docker/batch_integration/transformers/feature_to_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/docker/common/check_dataset_schema/.config.vsh.yaml b/target/docker/common/check_dataset_schema/.config.vsh.yaml
new file mode 100644
index 0000000000..bfc4428188
--- /dev/null
+++ b/target/docker/common/check_dataset_schema/.config.vsh.yaml
@@ -0,0 +1,131 @@
+functionality:
+  name: "check_dataset_schema"
+  namespace: "common"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input"
+      description: "A h5ad file."
+      info: null
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--schema"
+      description: "A schema file for the h5ad object."
+      info: null
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Arguments"
+    arguments:
+    - type: "boolean"
+      name: "--stop_on_error"
+      description: "Whether or not to stop with exit code 1 if the input file does\
+        \ not adhere to the schema."
+      info: null
+      default:
+      - false
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Output"
+    arguments:
+    - type: "file"
+      name: "--output"
+      description: "If specified, this file will contain a structured log of which\
+        \ checks succeeded (or not)."
+      info: null
+      example:
+      - "checks.json"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Checks if the dataset has the necessary slots that are predefined\
+    \ in a schema."
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  test_setup:
+  - type: "python"
+    user: false
+    packages:
+    - "viashpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/common/check_dataset_schema"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/common/check_dataset_schema/check_dataset_schema"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/common/check_dataset_schema/check_dataset_schema b/target/docker/common/check_dataset_schema/check_dataset_schema
new file mode 100755
index 0000000000..4195fb761a
--- /dev/null
+++ b/target/docker/common/check_dataset_schema/check_dataset_schema
@@ -0,0 +1,1038 @@
+#!/usr/bin/env bash
+
+# check_dataset_schema 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="check_dataset_schema"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "check_dataset_schema 2.0.0"
+  echo ""
+  echo "Checks if the dataset has the necessary slots that are predefined in a schema."
+  echo ""
+  echo "Inputs:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        A h5ad file."
+  echo ""
+  echo "    --schema"
+  echo "        type: file, required parameter, file must exist"
+  echo "        A schema file for the h5ad object."
+  echo ""
+  echo "Arguments:"
+  echo "    --stop_on_error"
+  echo "        type: boolean"
+  echo "        default: false"
+  echo "        Whether or not to stop with exit code 1 if the input file does not"
+  echo "        adhere to the schema."
+  echo ""
+  echo "Output:"
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: checks.json"
+  echo "        If specified, this file will contain a structured log of which checks"
+  echo "        succeeded (or not)."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component common check_dataset_schema"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:42Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-check_dataset_schema-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "check_dataset_schema 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --schema)
+            [ -n "$VIASH_PAR_SCHEMA" ] && ViashError Bad arguments for option \'--schema\': \'$VIASH_PAR_SCHEMA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SCHEMA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --schema. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --schema=*)
+            [ -n "$VIASH_PAR_SCHEMA" ] && ViashError Bad arguments for option \'--schema=*\': \'$VIASH_PAR_SCHEMA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SCHEMA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --stop_on_error)
+            [ -n "$VIASH_PAR_STOP_ON_ERROR" ] && ViashError Bad arguments for option \'--stop_on_error\': \'$VIASH_PAR_STOP_ON_ERROR\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_STOP_ON_ERROR="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --stop_on_error. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --stop_on_error=*)
+            [ -n "$VIASH_PAR_STOP_ON_ERROR" ] && ViashError Bad arguments for option \'--stop_on_error=*\': \'$VIASH_PAR_STOP_ON_ERROR\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_STOP_ON_ERROR=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/check_dataset_schema:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/check_dataset_schema:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/check_dataset_schema:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/check_dataset_schema:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_SCHEMA+x} ]; then
+  ViashError '--schema' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_STOP_ON_ERROR+x} ]; then
+  VIASH_PAR_STOP_ON_ERROR="false"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_SCHEMA" ] && [ ! -e "$VIASH_PAR_SCHEMA" ]; then
+  ViashError "Input file '$VIASH_PAR_SCHEMA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_STOP_ON_ERROR" ]]; then
+  if ! [[ "$VIASH_PAR_STOP_ON_ERROR" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--stop_on_error' has to be a boolean. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_SCHEMA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_SCHEMA")" )
+  VIASH_PAR_SCHEMA=$(ViashAutodetectMount "$VIASH_PAR_SCHEMA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/common/check_dataset_schema:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/check_dataset_schema:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/check_dataset_schema:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-check_dataset_schema-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import yaml
+import json
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'schema': $( if [ ! -z ${VIASH_PAR_SCHEMA+x} ]; then echo "r'${VIASH_PAR_SCHEMA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'stop_on_error': $( if [ ! -z ${VIASH_PAR_STOP_ON_ERROR+x} ]; then echo "r'${VIASH_PAR_STOP_ON_ERROR//\'/\'\"\'\"r\'}'.lower() == 'true'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+def check_structure(slot, slot_info, adata_slot):
+  missing = []
+  if slot == "X":
+    slot_info["name"] = "X"
+    slot_info = [slot_info]
+  for obj in slot_info:
+    adata_data = adata_slot.get(obj['name']) if slot != 'X' else adata_slot
+    if obj.get('required') and adata_data is None:
+      missing.append(obj['name'])
+    # todo: check types
+  return missing
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input'])
+
+# create data structure
+out = {
+  "exit_code": 0,
+  "error": {},
+  "data_schema": "ok"
+}
+
+print("Check AnnData against schema", flush=True)
+with open(par["schema"], "r") as f:
+  data_struct = yaml.safe_load(f)
+
+def_slots = data_struct['info']['slots']
+
+out = {
+  "exit_code": 0,
+  "error": {},
+  "data_schema": "ok"
+}
+for slot in def_slots:
+  print("Checking slot", slot, flush=True)
+  missing = check_structure(slot, def_slots[slot], getattr(adata, slot))
+  if missing:
+    print(f"Dataset is missing {slot} {missing}", flush=True)
+    out['exit_code'] = 1
+    out['data_schema'] = 'not ok'
+    out['error'][slot] = missing
+
+with open(par["output"], "w") as f:
+  json.dump(out, f, indent=2)
+
+if par['stop_on_error']:
+  exit(out['exit_code'])  
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_SCHEMA" ]; then
+  VIASH_PAR_SCHEMA=$(ViashStripAutomount "$VIASH_PAR_SCHEMA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/common/check_yaml_schema/.config.vsh.yaml b/target/docker/common/check_yaml_schema/.config.vsh.yaml
new file mode 100644
index 0000000000..e3ec99081b
--- /dev/null
+++ b/target/docker/common/check_yaml_schema/.config.vsh.yaml
@@ -0,0 +1,90 @@
+functionality:
+  name: "check_yaml_schema"
+  namespace: "common"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input"
+      description: "A yaml file."
+      info: null
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--schema"
+      description: "A schema file for the yaml file."
+      info: null
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Checks if a YAML file adheres to a custom schema file."
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "jsonschema"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/check_yaml_schema/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/common/check_yaml_schema"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/common/check_yaml_schema/check_yaml_schema"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/common/check_yaml_schema/check_yaml_schema b/target/docker/common/check_yaml_schema/check_yaml_schema
new file mode 100755
index 0000000000..36030f4dfe
--- /dev/null
+++ b/target/docker/common/check_yaml_schema/check_yaml_schema
@@ -0,0 +1,965 @@
+#!/usr/bin/env bash
+
+# check_yaml_schema 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="check_yaml_schema"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "check_yaml_schema 2.0.0"
+  echo ""
+  echo "Checks if a YAML file adheres to a custom schema file."
+  echo ""
+  echo "Inputs:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        A yaml file."
+  echo ""
+  echo "    --schema"
+  echo "        type: file, required parameter, file must exist"
+  echo "        A schema file for the yaml file."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "jsonschema"
+
+LABEL org.opencontainers.image.description="Companion container for running component common check_yaml_schema"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:42Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-check_yaml_schema-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "check_yaml_schema 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --schema)
+            [ -n "$VIASH_PAR_SCHEMA" ] && ViashError Bad arguments for option \'--schema\': \'$VIASH_PAR_SCHEMA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SCHEMA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --schema. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --schema=*)
+            [ -n "$VIASH_PAR_SCHEMA" ] && ViashError Bad arguments for option \'--schema=*\': \'$VIASH_PAR_SCHEMA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SCHEMA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/check_yaml_schema:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/check_yaml_schema:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/check_yaml_schema:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/check_yaml_schema:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_SCHEMA+x} ]; then
+  ViashError '--schema' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_SCHEMA" ] && [ ! -e "$VIASH_PAR_SCHEMA" ]; then
+  ViashError "Input file '$VIASH_PAR_SCHEMA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_SCHEMA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_SCHEMA")" )
+  VIASH_PAR_SCHEMA=$(ViashAutodetectMount "$VIASH_PAR_SCHEMA")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/common/check_yaml_schema:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/check_yaml_schema:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/check_yaml_schema:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-check_yaml_schema-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import jsonschema
+import yaml
+from pathlib import Path
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'schema': $( if [ ! -z ${VIASH_PAR_SCHEMA+x} ]; then echo "r'${VIASH_PAR_SCHEMA//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+def yaml_to_dict(file_path):
+    with open(file_path, 'r') as stream:
+        try:
+            return yaml.safe_load(stream)
+        except yaml.YAMLError as exc:
+            print(exc)
+
+def load_schemas(schema_dir):
+    schema_files = list(schema_dir.glob("./**/schema_*.yaml"))
+    
+    schemas = {}
+    for file in schema_files:
+        schema = yaml_to_dict(file)
+        schemas[file.absolute()] = schema
+    
+    return schemas
+
+def create_validator(schema_name, schemas):
+    schema_store = {}
+    for name, value in schemas.items():
+        schema_store[f"file://{name}"] = value
+
+    # Setting the first schema as the main schema
+    
+    main_schema = schemas[schema_name]
+    resolver = jsonschema.RefResolver(
+        base_uri=f"file://{schema_name}",
+        referrer=main_schema,
+        store=schema_store
+    )
+
+    return jsonschema.Draft7Validator(main_schema, resolver=resolver)
+
+print(">> Read input yaml", flush=True)
+input_yaml_file = Path(par["input"])
+with open(input_yaml_file, 'r') as f:
+  input_yaml = yaml.safe_load(f)
+
+print(">> Read schema(s)", flush=True)
+schema_yaml_file = Path(par["schema"])
+schemas = load_schemas(schema_yaml_file.parent)
+
+print(">> Validate input yaml against schema", flush=True)
+validator = create_validator(schema_yaml_file.absolute(), schemas)
+validator.validate(input_yaml)
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_SCHEMA" ]; then
+  VIASH_PAR_SCHEMA=$(ViashStripAutomount "$VIASH_PAR_SCHEMA")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+exit 0
diff --git a/target/docker/common/create_component/.config.vsh.yaml b/target/docker/common/create_component/.config.vsh.yaml
new file mode 100644
index 0000000000..9c71d6582a
--- /dev/null
+++ b/target/docker/common/create_component/.config.vsh.yaml
@@ -0,0 +1,172 @@
+functionality:
+  name: "create_component"
+  namespace: "common"
+  version: "2.0.0"
+  arguments:
+  - type: "string"
+    name: "--task"
+    description: "Which task the component will be added to."
+    info: null
+    example:
+    - "denoising"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--type"
+    description: "The type of component to create. Typically must be one of 'method',\
+      \ 'control_method' or 'metric'."
+    info: null
+    example:
+    - "metric"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--language"
+    description: "Which scripting language to use. Options are 'python', 'r'."
+    info: null
+    default:
+    - "python"
+    required: false
+    choices:
+    - "python"
+    - "r"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--name"
+    description: "Name of the new method, formatted in snake case."
+    info: null
+    example:
+    - "new_comp"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Path to the component directory. Suggested location is `src/<TASK>/<TYPE>s/<NAME>`."
+    info: null
+    default:
+    - "src/tasks/${VIASH_PAR_TASK}/${VIASH_PAR_TYPE}s/${VIASH_PAR_NAME}"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--api_file"
+    description: "Which API file to use. Defaults to `src/<TASK>/api/comp_<TYPE>.yaml`.\n\
+      In tasks with different subtypes of method, this location might not exist and\
+      \ you might need\nto manually specify a different API file to inherit from.\n"
+    info: null
+    default:
+    - "src/tasks/${VIASH_PAR_TASK}/api/comp_${VIASH_PAR_TYPE}.yaml"
+    must_exist: false
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--viash_yaml"
+    description: "Path to the project config file. Needed for knowing the relative\
+      \ location of a file to the project root.\n"
+    info: null
+    default:
+    - "_viash.yaml"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/read_and_merge_yaml.py"
+  description: "Create a component Viash component.\n\nUsage:\n```\nbin/create_component\
+    \ --task denoising --type method --language r --name foo\nbin/create_component\
+    \ --task denoising --type metric --language python --name bar\n```\n"
+  test_resources:
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  - type: "file"
+    path: "src"
+    dest: "openproblems/src"
+  - type: "file"
+    path: "_viash.yaml"
+    dest: "openproblems/_viash.yaml"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.10-slim"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "ruamel.yaml"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/create_component/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/common/create_component"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/common/create_component/create_component"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/common/create_component/create_component b/target/docker/common/create_component/create_component
new file mode 100755
index 0000000000..deab491624
--- /dev/null
+++ b/target/docker/common/create_component/create_component
@@ -0,0 +1,1516 @@
+#!/usr/bin/env bash
+
+# create_component 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="create_component"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "create_component 2.0.0"
+  echo ""
+  echo "Create a component Viash component."
+  echo ""
+  echo "Usage:"
+  echo "\`\`\`"
+  echo "bin/create_component --task denoising --type method --language r --name foo"
+  echo "bin/create_component --task denoising --type metric --language python --name bar"
+  echo "\`\`\`"
+  echo ""
+  echo "Arguments:"
+  echo "    --task"
+  echo "        type: string"
+  echo "        example: denoising"
+  echo "        Which task the component will be added to."
+  echo ""
+  echo "    --type"
+  echo "        type: string"
+  echo "        example: metric"
+  echo "        The type of component to create. Typically must be one of 'method',"
+  echo "        'control_method' or 'metric'."
+  echo ""
+  echo "    --language"
+  echo "        type: string"
+  echo "        default: python"
+  echo "        choices: [ python, r ]"
+  echo "        Which scripting language to use. Options are 'python', 'r'."
+  echo ""
+  echo "    --name"
+  echo "        type: string"
+  echo "        example: new_comp"
+  echo "        Name of the new method, formatted in snake case."
+  echo ""
+  echo "    --output"
+  echo "        type: file, output, file must exist"
+  echo "        default:"
+  echo "src/tasks/\${VIASH_PAR_TASK}/\${VIASH_PAR_TYPE}s/\${VIASH_PAR_NAME}"
+  echo "        Path to the component directory. Suggested location is"
+  echo "        \`src/<TASK>/<TYPE>s/<NAME>\`."
+  echo ""
+  echo "    --api_file"
+  echo "        type: file"
+  echo "        default: src/tasks/\${VIASH_PAR_TASK}/api/comp_\${VIASH_PAR_TYPE}.yaml"
+  echo "        Which API file to use. Defaults to \`src/<TASK>/api/comp_<TYPE>.yaml\`."
+  echo "        In tasks with different subtypes of method, this location might not"
+  echo "        exist and you might need"
+  echo "        to manually specify a different API file to inherit from."
+  echo ""
+  echo "    --viash_yaml"
+  echo "        type: file, file must exist"
+  echo "        default: _viash.yaml"
+  echo "        Path to the project config file. Needed for knowing the relative"
+  echo "        location of a file to the project root."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM python:3.10-slim
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "ruamel.yaml"
+
+LABEL org.opencontainers.image.description="Companion container for running component common create_component"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:41Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-create_component-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "create_component 2.0.0"
+            exit
+            ;;
+        --task)
+            [ -n "$VIASH_PAR_TASK" ] && ViashError Bad arguments for option \'--task\': \'$VIASH_PAR_TASK\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --task. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --task=*)
+            [ -n "$VIASH_PAR_TASK" ] && ViashError Bad arguments for option \'--task=*\': \'$VIASH_PAR_TASK\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --type)
+            [ -n "$VIASH_PAR_TYPE" ] && ViashError Bad arguments for option \'--type\': \'$VIASH_PAR_TYPE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TYPE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --type. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --type=*)
+            [ -n "$VIASH_PAR_TYPE" ] && ViashError Bad arguments for option \'--type=*\': \'$VIASH_PAR_TYPE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TYPE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --language)
+            [ -n "$VIASH_PAR_LANGUAGE" ] && ViashError Bad arguments for option \'--language\': \'$VIASH_PAR_LANGUAGE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LANGUAGE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --language. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --language=*)
+            [ -n "$VIASH_PAR_LANGUAGE" ] && ViashError Bad arguments for option \'--language=*\': \'$VIASH_PAR_LANGUAGE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LANGUAGE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --name)
+            [ -n "$VIASH_PAR_NAME" ] && ViashError Bad arguments for option \'--name\': \'$VIASH_PAR_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NAME="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --name. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --name=*)
+            [ -n "$VIASH_PAR_NAME" ] && ViashError Bad arguments for option \'--name=*\': \'$VIASH_PAR_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NAME=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --api_file)
+            [ -n "$VIASH_PAR_API_FILE" ] && ViashError Bad arguments for option \'--api_file\': \'$VIASH_PAR_API_FILE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_API_FILE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --api_file. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --api_file=*)
+            [ -n "$VIASH_PAR_API_FILE" ] && ViashError Bad arguments for option \'--api_file=*\': \'$VIASH_PAR_API_FILE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_API_FILE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --viash_yaml)
+            [ -n "$VIASH_PAR_VIASH_YAML" ] && ViashError Bad arguments for option \'--viash_yaml\': \'$VIASH_PAR_VIASH_YAML\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VIASH_YAML="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --viash_yaml. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --viash_yaml=*)
+            [ -n "$VIASH_PAR_VIASH_YAML" ] && ViashError Bad arguments for option \'--viash_yaml=*\': \'$VIASH_PAR_VIASH_YAML\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VIASH_YAML=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/create_component:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/create_component:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/create_component:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/create_component:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_LANGUAGE+x} ]; then
+  VIASH_PAR_LANGUAGE="python"
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  VIASH_PAR_OUTPUT="src/tasks/${VIASH_PAR_TASK}/${VIASH_PAR_TYPE}s/${VIASH_PAR_NAME}"
+fi
+if [ -z ${VIASH_PAR_API_FILE+x} ]; then
+  VIASH_PAR_API_FILE="src/tasks/${VIASH_PAR_TASK}/api/comp_${VIASH_PAR_TYPE}.yaml"
+fi
+if [ -z ${VIASH_PAR_VIASH_YAML+x} ]; then
+  VIASH_PAR_VIASH_YAML="_viash.yaml"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_VIASH_YAML" ] && [ ! -e "$VIASH_PAR_VIASH_YAML" ]; then
+  ViashError "Input file '$VIASH_PAR_VIASH_YAML' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# check whether value is belongs to a set of choices
+if [ ! -z "$VIASH_PAR_LANGUAGE" ]; then
+  VIASH_PAR_LANGUAGE_CHOICES=("python:r")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_LANGUAGE_CHOICES[*]}:" =~ ":$VIASH_PAR_LANGUAGE:" ]]; then
+    ViashError '--language' specified value of \'$VIASH_PAR_LANGUAGE\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_PAR_API_FILE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_API_FILE")" )
+  VIASH_PAR_API_FILE=$(ViashAutodetectMount "$VIASH_PAR_API_FILE")
+fi
+if [ ! -z "$VIASH_PAR_VIASH_YAML" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_VIASH_YAML")" )
+  VIASH_PAR_VIASH_YAML=$(ViashAutodetectMount "$VIASH_PAR_VIASH_YAML")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/common/create_component:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/create_component:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/create_component:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-create_component-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+from typing import Any
+from pathlib import Path
+import sys
+import os
+import re
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'task': $( if [ ! -z ${VIASH_PAR_TASK+x} ]; then echo "r'${VIASH_PAR_TASK//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'type': $( if [ ! -z ${VIASH_PAR_TYPE+x} ]; then echo "r'${VIASH_PAR_TYPE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'language': $( if [ ! -z ${VIASH_PAR_LANGUAGE+x} ]; then echo "r'${VIASH_PAR_LANGUAGE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'name': $( if [ ! -z ${VIASH_PAR_NAME+x} ]; then echo "r'${VIASH_PAR_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'api_file': $( if [ ! -z ${VIASH_PAR_API_FILE+x} ]; then echo "r'${VIASH_PAR_API_FILE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'viash_yaml': $( if [ ! -z ${VIASH_PAR_VIASH_YAML+x} ]; then echo "r'${VIASH_PAR_VIASH_YAML//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# import helper function
+sys.path.append(meta["resources_dir"])
+from read_and_merge_yaml import read_and_merge_yaml
+
+def strip_margin(text: str) -> str:
+  return re.sub("(^|\\n)[ \\t]*\\|", "\\\\1", text)
+
+def create_config(par, component_type, pretty_name, script_path) -> str:
+  info_str = generate_info(par, component_type, pretty_name)
+  resources_str = generate_resources(par, script_path)
+  docker_platform = generate_docker_platform(par)
+
+  return strip_margin(f'''\\
+    |# The API specifies which type of component this is.
+    |# It contains specifications for:
+    |#   - The input/output files
+    |#   - Common parameters
+    |#   - A unit test
+    |__merge__: {os.path.relpath(par["api_file"], par["output"])}
+    |
+    |functionality:
+    |  # A unique identifier for your component (required).
+    |  # Can contain only lowercase letters or underscores.
+    |  name: {par["name"]}
+    |
+    |  # Metadata for your component
+    |  info:
+    |{info_str}
+    |  # Component-specific parameters (optional)
+    |  # arguments:
+    |  #   - name: "--n_neighbors"
+    |  #     type: "integer"
+    |  #     default: 5
+    |  #     description: Number of neighbors to use.
+    |
+    |  # Resources required to run the component
+    |  resources:
+    |{resources_str}
+    |platforms:
+    |  # Specifications for the Docker image for this component.
+    |{docker_platform}
+    |  # This platform allows running the component natively
+    |  - type: native
+    |  # Allows turning the component into a Nextflow module / pipeline.
+    |  - type: nextflow
+    |    directives:
+    |      label: [midtime,midmem, midcpu]
+    |'''
+  )
+
+def generate_info(par, component_type, pretty_name) -> str:
+  """Generate the functionality info for a component."""
+  if component_type in ["method", "control_method"]:
+    str = strip_margin(f'''\\
+      |    # A relatively short label, used when rendering visualisarions (required)
+      |    label: {pretty_name}
+      |    # A one sentence summary of how this method works (required). Used when 
+      |    # rendering summary tables.
+      |    summary: "FILL IN: A one sentence summary of this method."
+      |    # A multi-line description of how this component works (required). Used
+      |    # when rendering reference documentation.
+      |    description: |
+      |      FILL IN: A (multi-line) description of how this method works.
+      |    # Which normalisation method this component prefers to use (required).
+      |    preferred_normalization: log_cp10k
+      |''')
+    if component_type == "method":
+      str += strip_margin(f'''\\
+        |    # A reference key from the bibtex library at src/common/library.bib (required).
+        |    reference: bibtex_reference_key
+        |    # URL to the documentation for this method (required).
+        |    documentation_url: https://url.to/the/documentation
+        |    # URL to the code repository for this method (required).
+        |    repository_url: https://github.com/organisation/repository
+        |''')
+    return str
+  elif component_type == "metric":
+    return strip_margin(f'''\\
+      |    metrics:
+      |      # A unique identifier for your metric (required).
+      |      # Can contain only lowercase letters or underscores.
+      |      name: {par["name"]}
+      |      # A relatively short label, used when rendering visualisarions (required)
+      |      label: {pretty_name}
+      |      # A one sentence summary of how this metric works (required). Used when 
+      |      # rendering summary tables.
+      |      summary: "FILL IN: A one sentence summary of this metric."
+      |      # A multi-line description of how this component works (required). Used
+      |      # when rendering reference documentation.
+      |      description: |
+      |        FILL IN: A (multi-line) description of how this metric works.
+      |      # A reference key from the bibtex library at src/common/library.bib (required).
+      |      reference: bibtex_reference_key
+      |      # URL to the documentation for this metric (required).
+      |      documentation_url: https://url.to/the/documentation
+      |      # URL to the code repository for this metric (required).
+      |      repository_url: https://github.com/organisation/repository
+      |      # The minimum possible value for this metric (required)
+      |      min: 0
+      |      # The maximum possible value for this metric (required)
+      |      max: 1
+      |      # Whether a higher value represents a 'better' solution (required)
+      |      maximize: true
+      |''')
+
+
+def generate_resources(par, script_path) -> str:
+  """Add the script to the functionality resources."""
+  if par["language"] == "python":
+    type_str = "python_script"
+  elif par["language"] == "r":
+    type_str = "r_script"
+
+  return strip_margin(f'''\\
+    |    # The script of your component (required)
+    |    - type: {type_str}
+    |      path: {script_path}
+    |    # Additional resources your script needs (optional)
+    |    # - type: file
+    |    #   path: weights.pt
+    |''')
+
+def generate_docker_platform(par) -> str:
+  """Set up the docker platform for Python."""
+  if par["language"] == "python":
+    image_str = "openproblems/base_python:1.0.0"
+    setup_type = "python"
+    package_example = "scib==1.1.5"
+  elif par["language"] == "r":
+    image_str = "openproblems/base_r:1.0.0"
+    setup_type = "r"
+    package_example = "tidyverse"
+  return strip_margin(f'''\\
+    |  - type: docker
+    |    image: {image_str}
+    |    # Add custom dependencies here (optional). For more information, see
+    |    # https://viash.io/reference/config/platforms/docker/#setup .
+    |    # setup:
+    |    #   - type: {setup_type}
+    |    #     packages: {package_example}
+    |''')
+
+def set_par_values(config) -> None:
+  """Adds values to each of the arguments in a config file."""
+  args = config['functionality']['arguments']
+  for argi, arg in enumerate(args):
+    key = re.sub("^-*", "", arg['name'])
+
+    # find value
+    if arg["type"] != "file":
+      value = arg.get("default", arg.get("example", "..."))
+    elif arg.get("direction", "input") == "input":
+      key_strip = key.replace("input_", "")
+      value = f'resources_test/{par["task"]}/pancreas/{key_strip}.h5ad'
+    else:
+      key_strip = key.replace("output_", "")
+      value = f'{key_strip}.h5ad'
+
+    # store key and value
+    config['functionality']['arguments'][argi]["key"] = key
+    config['functionality']['arguments'][argi]["value"] = value
+  
+def look_for_adata_arg(args, uns_field):
+  """Look for an argument that has a .uns[uns_field] in its info.slots."""
+  for arg in args:
+    uns = arg.get("info", {}).get("slots", {}).get("uns", [])
+    for unval in uns:
+      if unval.get("name") == uns_field:
+        return arg["key"]
+  return "adata"
+
+def write_output_python(arg, copy_from_adata, is_metric):
+  """Create code for writing the output h5ad files."""
+  slots = arg.get("info", {}).get("slots", {})
+  outer = []
+  for group_name, slots in slots.items():
+    inner = []
+    for slot in slots:
+      if group_name == "uns" and slot["name"] in ["dataset_id", "normalization_id"]:
+        value = f"{copy_from_adata}.uns['{slot['name']}']"
+      elif group_name == "uns" and slot["name"] == "method_id":
+        if is_metric:
+          value = f"{copy_from_adata}.uns['{slot['name']}']"
+        else:
+          value = "meta['functionality_name']"
+      else:
+        value = group_name + "_" + slot["name"]
+      inner.append(f"'{slot['name']}': {value}")
+    inner_values = ',\\n    '.join(inner)
+    outer.append(f"{group_name}={{\\n    {inner_values}\\n  }}")
+  outer_values = ',\\n  '.join(outer)
+  return strip_margin(
+    f'''\\
+      |print("Write {arg["key"]} AnnData to file", flush=True)
+      |{arg["key"]} = ad.AnnData(
+      |  {outer_values}
+      |)
+      |{arg["key"]}.write_h5ad(par['{arg["key"]}'], compression='gzip')'''
+  )
+
+def write_output_r(arg, copy_from_adata, is_metric):
+  """Create code for writing the output h5ad files."""
+  slots = arg.get("info", {}).get("slots", {})
+  outer = []
+  for group_name, slots in slots.items():
+    inner = []
+    for slot in slots:
+      if group_name == "uns" and slot["name"] in ["dataset_id", "normalization_id"]:
+        value = f"{copy_from_adata}\$uns[[\\"{slot['name']}\\"]]"
+      elif group_name == "uns" and slot["name"] == "method_id":
+        if is_metric:
+          value = f"{copy_from_adata}\$uns[[\\"{slot['name']}\\"]]"
+        else:
+          value = "meta[[\\"functionality_name\\"]]"
+      else:
+        value = group_name + "_" + slot["name"]
+      inner.append(f"{slot['name']} = {value}")
+    inner_values = ',\\n    '.join(inner)
+    outer.append(f"{group_name} = list(\\n    {inner_values}\\n  )")
+  outer_values = ',\\n  '.join(outer)
+  return strip_margin(
+    f'''\\
+      |cat("Write {arg["key"]} AnnData to file\\\\n")
+      |{arg["key"]} <- anndata::AnnData(
+      |  {outer_values}
+      |)
+      |{arg["key"]}\$write_h5ad(par[["{arg["key"]}"]], compression = "gzip")'''
+  )
+
+def create_python_script(par, config, type):
+  args = config['functionality']['arguments']
+
+  # create the arguments of the par string
+  par_string = ",\\n  ".join(f"'{arg['key']}': '{arg['value']}'" for arg in args)
+
+  # create code for reading the input h5ad file
+  read_h5ad_string = "\\n".join(
+    f"{arg['key']} = ad.read_h5ad(par['{arg['key']}'])"
+    for arg in args
+    if arg['type'] == "file"
+    and arg.get('direction', "input") == "input"
+  )
+
+  # determine which adata to copy from
+  copy_from_adata = look_for_adata_arg(args, "method_id" if type == "metric" else "dataset_id")
+
+  # create code for writing the output h5ad files
+  write_h5ad_string = "\\n".join(
+    write_output_python(arg, copy_from_adata, type == "metric")
+    for arg in args
+    if arg["type"] == "file"
+    and arg.get("direction", "input") == "output"
+  )
+
+  if type == 'metric':
+    processing_string = strip_margin(f'''\\
+      |print('Compute metrics', flush=True)
+      |# metric_ids and metric_values can have length > 1
+      |# but should be of equal length
+      |uns_metric_ids = [ '{par['name']}' ]
+      |uns_metric_values = [ 0.5 ]''')
+  else:
+    processing_string = strip_margin(f'''\\
+      |print('Preprocess data', flush=True)
+      |# ... preprocessing ...
+      |
+      |print('Train model', flush=True)
+      |# ... train model ...
+      |
+      |print('Generate predictions', flush=True)
+      |# ... generate predictions ...''')
+
+  script = strip_margin(f'''\\
+    |import anndata as ad
+    |
+    |## VIASH START
+    |# Note: this section is auto-generated by viash at runtime. To edit it, make changes
+    |# in config.vsh.yaml and then run \`viash config inject config.vsh.yaml\`.
+    |par = {{
+    |  {par_string}
+    |}}
+    |meta = {{
+    |  'functionality_name': '{par["name"]}'
+    |}}
+    |## VIASH END
+    |
+    |print('Reading input files', flush=True)
+    |{read_h5ad_string}
+    |
+    |{processing_string}
+    |
+    |{write_h5ad_string}
+    |''')
+
+  return script
+
+def create_r_script(par, api_spec, type):
+  args = api_spec['functionality']['arguments']
+
+  # create the arguments of the par string
+  par_string = ",\\n  ".join(f'{arg["key"]} = "{arg["value"]}"' for arg in args)
+
+  # create helpers for reading the h5ad file
+  read_h5ad_string = "\\n".join(
+    f'{arg["key"]} <- anndata::read_h5ad(par[["{arg["key"]}"]])'
+    for arg in args
+    if arg['type'] == "file"
+    and arg.get("direction", "input") == "input"
+  )
+
+  # determine which adata to copy from
+  copy_from_adata = look_for_adata_arg(args, "method_id" if type == "metric" else "dataset_id")
+
+  # create code for writing the output h5ad files
+  write_h5ad_string = "\\n".join(
+    write_output_r(arg, copy_from_adata, type == "metric")
+    for arg in args
+    if arg["type"] == "file"
+    and arg.get("direction", "input") == "output"
+  )
+
+  if type == 'metric':
+    processing_string = strip_margin(f'''\\
+      |cat("Compute metrics\\\\n")
+      |# metric_ids and metric_values can have length > 1
+      |# but should be of equal length
+      |uns_metric_ids <- c("{par['name']}")
+      |uns_metric_values <- c(0.5)''')
+  else:
+    processing_string = strip_margin(f'''\\
+      |cat("Preprocess data\\\\n")
+      |# ... preprocessing ...
+      |
+      |cat("Train model\\\\n")
+      |# ... train model ...
+      |
+      |cat("Generate predictions\\\\n")
+      |# ... generate predictions ...''')
+
+  script = strip_margin(f'''\\
+    |library(anndata)
+    |
+    |## VIASH START
+    |par <- list(
+    |  {par_string}
+    |)
+    |meta <- list(
+    |  functionality_name = "{par["name"]}"
+    |)
+    |## VIASH END
+    |
+    |cat("Reading input files\\\\n")
+    |{read_h5ad_string}
+    |
+    |{processing_string}
+    |
+    |{write_h5ad_string}
+    |''')
+
+  return script
+
+# def read_viash_config(file):
+#   file = file.absolute()
+
+#   # read in config
+#   command = ["viash", "config", "view", str(file)]
+
+#   # Execute the command and capture the output
+#   output = subprocess.check_output(
+#     command,
+#     universal_newlines=True,
+#     cwd=str(file.parent)
+#   )
+
+#   # Parse the output as YAML
+#   config = yaml.load(output)
+
+#   return config
+
+
+def main(par):
+  ####### CHECK INPUTS #######
+  print("Check inputs", flush=True)
+  assert re.match("[a-z][a-z0-9_]*", par["name"]), "Name should match the regular expression '[a-z][a-z0-9_]*'. Example: 'my_component'."
+  assert len(par['name']) <= 50, "Method name should be at most 50 characters."
+
+  pretty_name = re.sub("_", " ", par['name']).title()
+
+  ####### CHECK LANGUAGE #######
+  print("Check language", flush=True)
+  # check language and determine script path
+  if par["language"] == "python":
+    script_path = "script.py"
+  elif par["language"] == "r":
+    script_path = "script.R"
+  else:
+    sys.exit(f"Unrecognized language parameter '{par['language']}'.")
+
+  ## CHECK API FILE
+  print("Check API file", flush=True)
+  api_file = Path(par["api_file"])
+  viash_yaml = Path(par["viash_yaml"])
+  project_dir = viash_yaml.parent
+  if not api_file.exists():
+    comp_types = [x.with_suffix("").name.removeprefix("comp_") for x in api_file.parent.glob("**/comp_*.y*ml")]
+    list.sort(comp_types)
+    sys.exit(strip_margin(f"""\\
+      |Error: Invalid --type argument.
+      |  Reason: Could not find API file at '{api_file.relative_to(project_dir)}'.
+      |  Possible values for --type: {', '.join(comp_types)}."""))
+  
+  ## READ API FILE
+  print("Read API file", flush=True)
+  api = read_and_merge_yaml(api_file)
+  comp_type = api.get("functionality", {}).get("info", {}).get("type", {})
+  if not comp_type:
+    sys.exit(strip_margin(f"""\\
+      |Error: API file is incorrectly formatted.
+      |  Reason: Could not find component type at \`.functionality.info.type\`.'
+      |  Please fix the formatting of the API file."""))
+
+  ####### CREATE OUTPUT DIR #######
+  print("Create output dir", flush=True)
+  out_dir = Path(par["output"])
+  out_dir.mkdir(exist_ok=True)
+
+  ####### CREATE CONFIG #######
+  print("Create config", flush=True)
+  config_file = out_dir / "config.vsh.yaml"
+
+  # get config template
+  config_str = create_config(par, comp_type, pretty_name, script_path)
+
+  with open(config_file, "w") as f:
+    f.write(config_str)
+
+  ####### CREATE SCRIPT #######
+  print("Create script", flush=True)
+  script_file = out_dir / script_path
+
+  # set reasonable values
+  set_par_values(api)
+
+  if par["language"] == "python":
+    script_out = create_python_script(par, api, comp_type)
+
+  if par["language"] == "r":
+    script_out = create_r_script(par, api, comp_type)
+  
+  # write script
+  with open(script_file, "w") as f:
+    f.write(script_out)
+
+  print("Done!", flush=True)
+
+
+if __name__ == "__main__":
+  main(par)
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_PAR_API_FILE" ]; then
+  VIASH_PAR_API_FILE=$(ViashStripAutomount "$VIASH_PAR_API_FILE")
+fi
+if [ ! -z "$VIASH_PAR_VIASH_YAML" ]; then
+  VIASH_PAR_VIASH_YAML=$(ViashStripAutomount "$VIASH_PAR_VIASH_YAML")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/common/create_component/read_and_merge_yaml.py b/target/docker/common/create_component/read_and_merge_yaml.py
new file mode 100644
index 0000000000..b74995aed1
--- /dev/null
+++ b/target/docker/common/create_component/read_and_merge_yaml.py
@@ -0,0 +1,52 @@
+def read_and_merge_yaml(path):
+    """Read a Viash YAML
+    
+    If the YAML contains a "__merge__" key anywhere in the yaml,
+    the path specified in that YAML will be read and the two
+    lists will be merged. This is a recursive procedure.
+    
+    Arguments:
+    path -- Path to the Viash YAML"""
+    from ruamel.yaml import YAML
+
+    yaml = YAML(typ='safe', pure=True)
+
+    with open(path, 'r') as stream:
+        data = yaml.load(stream)
+    return _ram_process_merge(data, path)
+
+def _ram_deep_merge(dict1, dict2):
+    if isinstance(dict1, dict) and isinstance(dict2, dict):
+        keys = set(list(dict1.keys()) + list(dict2.keys()))
+        out = {}
+        for key in keys:
+            if key in dict1:
+                if key in dict2:
+                    out[key] = _ram_deep_merge(dict1[key], dict2[key])
+                else:
+                    out[key] = dict1[key]
+            else:
+                out[key] = dict2[key]
+        return out
+    elif isinstance(dict1, list) and isinstance(dict2, list):
+        return dict1 + dict2
+    else:
+        return dict2
+
+def _ram_process_merge(data, path):
+    import os
+    if isinstance(data, dict):
+        processed_data = {k: _ram_process_merge(v, path) for k, v in data.items()}
+
+        if "__merge__" in processed_data:
+            new_data_path = os.path.join(os.path.dirname(path), processed_data["__merge__"])
+            new_data = read_and_merge_yaml(new_data_path)
+        else:
+            new_data = {}
+
+        return _ram_deep_merge(new_data, processed_data)
+    elif isinstance(data, list):
+        return [_ram_process_merge(dat, path) for dat in data]
+    else:
+        return data
+
diff --git a/target/docker/common/create_task_readme/.config.vsh.yaml b/target/docker/common/create_task_readme/.config.vsh.yaml
new file mode 100644
index 0000000000..b224c588fa
--- /dev/null
+++ b/target/docker/common/create_task_readme/.config.vsh.yaml
@@ -0,0 +1,175 @@
+functionality:
+  name: "create_task_readme"
+  namespace: "common"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "string"
+      name: "--task"
+      description: "Which task the component will be added to."
+      info: null
+      example:
+      - "denoising"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--task_dir"
+      description: "Path to the task directory."
+      info: null
+      default:
+      - "src/tasks/${VIASH_PAR_TASK}"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--viash_yaml"
+      description: "Path to the project config file. Needed for knowing the relative\
+        \ location of a file to the project root.\n"
+      info: null
+      default:
+      - "_viash.yaml"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--github_url"
+      description: "URL to the GitHub repository. Needed for linking to the source\
+        \ code.\n"
+      info: null
+      default:
+      - "https://github.com/openproblems-bio/openproblems/tree/main/"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output"
+      description: "Path to the component directory. Suggested location is `src/tasks/<TASK>/README.md`."
+      info: null
+      default:
+      - "src/tasks/${VIASH_PAR_TASK}/README.md"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/read_and_merge_yaml.R"
+  - type: "file"
+    path: "src/common/helper_functions/read_api_files.R"
+  - type: "file"
+    path: "src/common/helper_functions/strip_margin.R"
+  description: "Create a README for the task.\n"
+  test_resources:
+  - type: "r_script"
+    path: "test.R"
+    is_executable: true
+  - type: "file"
+    path: "src"
+    dest: "openproblems/src"
+  - type: "file"
+    path: "_viash.yaml"
+    dest: "openproblems/_viash.yaml"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    packages:
+    - "dplyr"
+    - "purrr"
+    - "rlang"
+    - "glue"
+    - "yaml"
+    - "fs"
+    - "cli"
+    - "igraph"
+    - "rmarkdown"
+    - "processx"
+    bioc_force_install: false
+  - type: "apt"
+    packages:
+    - "jq"
+    - "curl"
+    interactive: false
+  - type: "docker"
+    run:
+    - "release_info=$(curl -s https://api.github.com/repos/quarto-dev/quarto-cli/releases/latest)\
+      \ && \\\n  download_url=$(printf \"%s\" \"$release_info\" | jq -r '.assets[]\
+      \ | select(.name | test(\"quarto-.*-linux-amd64.deb\")) | .browser_download_url')\
+      \ && \\\n  curl -sL \"$download_url\" -o /opt/quarto.deb && \\\n  dpkg -i /opt/quarto.deb\
+      \ && \\\n  rm /opt/quarto.deb\n"
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/create_task_readme/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/common/create_task_readme"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/common/create_task_readme/create_task_readme"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/common/create_task_readme/create_task_readme b/target/docker/common/create_task_readme/create_task_readme
new file mode 100755
index 0000000000..6aac2f2f12
--- /dev/null
+++ b/target/docker/common/create_task_readme/create_task_readme
@@ -0,0 +1,1136 @@
+#!/usr/bin/env bash
+
+# create_task_readme 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="create_task_readme"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "create_task_readme 2.0.0"
+  echo ""
+  echo "Create a README for the task."
+  echo ""
+  echo "Inputs:"
+  echo "    --task"
+  echo "        type: string"
+  echo "        example: denoising"
+  echo "        Which task the component will be added to."
+  echo ""
+  echo "    --task_dir"
+  echo "        type: file, file must exist"
+  echo "        default: src/tasks/\${VIASH_PAR_TASK}"
+  echo "        Path to the task directory."
+  echo ""
+  echo "    --viash_yaml"
+  echo "        type: file, file must exist"
+  echo "        default: _viash.yaml"
+  echo "        Path to the project config file. Needed for knowing the relative"
+  echo "        location of a file to the project root."
+  echo ""
+  echo "    --github_url"
+  echo "        type: string"
+  echo "        default: https://github.com/openproblems-bio/openproblems/tree/main/"
+  echo "        URL to the GitHub repository. Needed for linking to the source code."
+  echo ""
+  echo "Outputs:"
+  echo "    --output"
+  echo "        type: file, output, file must exist"
+  echo "        default: src/tasks/\${VIASH_PAR_TASK}/README.md"
+  echo "        Path to the component directory. Suggested location is"
+  echo "        \`src/tasks/<TASK>/README.md\`."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_cran(c("dplyr", "purrr", "rlang", "glue", "yaml", "fs", "cli", "igraph", "rmarkdown", "processx"), repos = "https://cran.rstudio.com")'
+
+RUN apt-get update && \
+  DEBIAN_FRONTEND=noninteractive apt-get install -y jq curl && \
+  rm -rf /var/lib/apt/lists/*
+
+RUN release_info=$(curl -s https://api.github.com/repos/quarto-dev/quarto-cli/releases/latest) && \
+  download_url=$(printf "%s" "$release_info" | jq -r '.assets[] | select(.name | test("quarto-.*-linux-amd64.deb")) | .browser_download_url') && \
+  curl -sL "$download_url" -o /opt/quarto.deb && \
+  dpkg -i /opt/quarto.deb && \
+  rm /opt/quarto.deb
+
+LABEL org.opencontainers.image.description="Companion container for running component common create_task_readme"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:43Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-create_task_readme-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "create_task_readme 2.0.0"
+            exit
+            ;;
+        --task)
+            [ -n "$VIASH_PAR_TASK" ] && ViashError Bad arguments for option \'--task\': \'$VIASH_PAR_TASK\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --task. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --task=*)
+            [ -n "$VIASH_PAR_TASK" ] && ViashError Bad arguments for option \'--task=*\': \'$VIASH_PAR_TASK\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --task_dir)
+            [ -n "$VIASH_PAR_TASK_DIR" ] && ViashError Bad arguments for option \'--task_dir\': \'$VIASH_PAR_TASK_DIR\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_DIR="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --task_dir. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --task_dir=*)
+            [ -n "$VIASH_PAR_TASK_DIR" ] && ViashError Bad arguments for option \'--task_dir=*\': \'$VIASH_PAR_TASK_DIR\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_DIR=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --viash_yaml)
+            [ -n "$VIASH_PAR_VIASH_YAML" ] && ViashError Bad arguments for option \'--viash_yaml\': \'$VIASH_PAR_VIASH_YAML\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VIASH_YAML="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --viash_yaml. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --viash_yaml=*)
+            [ -n "$VIASH_PAR_VIASH_YAML" ] && ViashError Bad arguments for option \'--viash_yaml=*\': \'$VIASH_PAR_VIASH_YAML\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VIASH_YAML=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --github_url)
+            [ -n "$VIASH_PAR_GITHUB_URL" ] && ViashError Bad arguments for option \'--github_url\': \'$VIASH_PAR_GITHUB_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GITHUB_URL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --github_url. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --github_url=*)
+            [ -n "$VIASH_PAR_GITHUB_URL" ] && ViashError Bad arguments for option \'--github_url=*\': \'$VIASH_PAR_GITHUB_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GITHUB_URL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/create_task_readme:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/create_task_readme:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/create_task_readme:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/create_task_readme:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_TASK_DIR+x} ]; then
+  VIASH_PAR_TASK_DIR="src/tasks/${VIASH_PAR_TASK}"
+fi
+if [ -z ${VIASH_PAR_VIASH_YAML+x} ]; then
+  VIASH_PAR_VIASH_YAML="_viash.yaml"
+fi
+if [ -z ${VIASH_PAR_GITHUB_URL+x} ]; then
+  VIASH_PAR_GITHUB_URL="https://github.com/openproblems-bio/openproblems/tree/main/"
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  VIASH_PAR_OUTPUT="src/tasks/${VIASH_PAR_TASK}/README.md"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_TASK_DIR" ] && [ ! -e "$VIASH_PAR_TASK_DIR" ]; then
+  ViashError "Input file '$VIASH_PAR_TASK_DIR' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_VIASH_YAML" ] && [ ! -e "$VIASH_PAR_VIASH_YAML" ]; then
+  ViashError "Input file '$VIASH_PAR_VIASH_YAML' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_TASK_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_TASK_DIR")" )
+  VIASH_PAR_TASK_DIR=$(ViashAutodetectMount "$VIASH_PAR_TASK_DIR")
+fi
+if [ ! -z "$VIASH_PAR_VIASH_YAML" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_VIASH_YAML")" )
+  VIASH_PAR_VIASH_YAML=$(ViashAutodetectMount "$VIASH_PAR_VIASH_YAML")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/common/create_task_readme:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/create_task_readme:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/create_task_readme:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-create_task_readme-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+library(rlang, quietly = TRUE, warn.conflicts = FALSE)
+library(purrr, quietly = TRUE, warn.conflicts = FALSE)
+library(dplyr, quietly = TRUE, warn.conflicts = FALSE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "task" = $( if [ ! -z ${VIASH_PAR_TASK+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_TASK" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "task_dir" = $( if [ ! -z ${VIASH_PAR_TASK_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_TASK_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "viash_yaml" = $( if [ ! -z ${VIASH_PAR_VIASH_YAML+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_VIASH_YAML" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "github_url" = $( if [ ! -z ${VIASH_PAR_GITHUB_URL+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_GITHUB_URL" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+if (is.null(par\$task) && is.null(par\$task_dir)) {
+  stop("Either 'task' or 'task_dir' must be provided")
+}
+if (is.null(par\$viash_yaml)) {
+  stop("Argument 'viash_yaml' must be provided")
+}
+if (is.null(par\$output)) {
+  stop("Argument 'output' must be provided")
+}
+
+# import helper function
+source(paste0(meta["resources_dir"], "/read_and_merge_yaml.R"))
+source(paste0(meta["resources_dir"], "/strip_margin.R"))
+source(paste0(meta["resources_dir"], "/read_api_files.R"))
+
+cat("Read task info\\n")
+task_api <- read_task_api(par[["task_dir"]])
+
+# determine ordering
+root <- .task_graph_get_root(task_api)
+
+r_graph <- render_task_graph(task_api, root)
+
+cat("Render API details\\n")
+order <- names(igraph::bfs(task_api\$task_graph, root)\$order)
+r_details <- map_chr(
+  order,
+  function(file_name) {
+    if (file_name %in% names(task_api\$comp_specs)) {
+      render_component(task_api\$comp_specs[[file_name]])
+    } else {
+      render_file(task_api\$file_specs[[file_name]])
+    }
+  }
+)
+
+cat("Render authors\\n")
+authors_str <-
+  if (nrow(task_api\$authors) > 0) {
+    paste0(
+      "\\n## Authors & contributors\\n\\n",
+      task_api\$authors %>% knitr::kable() %>% paste(collapse = "\\n"),
+      "\\n"
+    )
+  } else {
+    ""
+  }
+readme_str <-
+  if (is.null(task_api\$task_info\$readme) || is.na(task_api\$task_info\$readme)) {
+    ""
+  } else {
+    paste0(
+      "\\n## README\\n\\n",
+      task_api\$task_info\$readme,
+      "\\n"
+    )
+  }
+
+cat("Generate qmd content\\n")
+relative_path <- par[["task_dir"]] %>%
+  gsub(paste0(dirname(par[["viash_yaml"]]), "/*"), "", .) %>%
+  gsub("/*\$", "", .)
+source_url <- paste0(par[["github_url"]], relative_path)
+qmd_content <- strip_margin(glue::glue("
+  §---
+  §title: \\"{task_api\$task_info\$label}\\"
+  §format: gfm
+  §---
+  §
+  §<!--
+  §This file is automatically generated from the tasks's api/*.yaml files.
+  §Do not edit this file directly.
+  §-->
+  §
+  §{task_api\$task_info\$summary}
+  §
+  §Path to source: [\`{relative_path}\`]({source_url})
+  §
+  §{readme_str}
+  §
+  §## Motivation
+  §
+  §{task_api\$task_info\$motivation}
+  §
+  §## Description
+  §
+  §{task_api\$task_info\$description}
+  §{authors_str}
+  §## API
+  §
+  §{r_graph}
+  §
+  §{paste(r_details, collapse = '\\n\\n')}
+  §
+  §"), symbol = "§")
+
+cat("Write README.qmd to file\\n")
+qmd_file <- tempfile(
+  pattern = "README_",
+  fileext = ".qmd",
+  tmpdir = meta\$temp_dir
+)
+
+if (!dir.exists(meta\$temp_dir)) {
+  dir.create(meta\$temp_dir, recursive = TRUE)
+}
+writeLines(qmd_content, qmd_file)
+
+cat("Render README.qmd to README.md\\n")
+out <- processx::run(
+  command = "quarto",
+  args = c("render", qmd_file, "--output", "-"),
+  echo = TRUE
+)
+
+writeLines(out\$stdout, par\$output)
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_TASK_DIR" ]; then
+  VIASH_PAR_TASK_DIR=$(ViashStripAutomount "$VIASH_PAR_TASK_DIR")
+fi
+if [ ! -z "$VIASH_PAR_VIASH_YAML" ]; then
+  VIASH_PAR_VIASH_YAML=$(ViashStripAutomount "$VIASH_PAR_VIASH_YAML")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/common/create_task_readme/read_and_merge_yaml.R b/target/docker/common/create_task_readme/read_and_merge_yaml.R
new file mode 100644
index 0000000000..932d3feb92
--- /dev/null
+++ b/target/docker/common/create_task_readme/read_and_merge_yaml.R
@@ -0,0 +1,144 @@
+#' Read a Viash YAML
+#'
+#' If the YAML contains a "__merge__" key anywhere in the yaml,
+#' the path specified in that YAML will be read and the two
+#' lists will be merged. This is a recursive procedure.
+#'
+#' @param path Path to Viash YAML
+read_and_merge_yaml <- function(path, project_path = .ram_find_project(path)) {
+  path <- normalizePath(path, mustWork = FALSE)
+  data <- tryCatch({
+    suppressWarnings(yaml::read_yaml(path))
+  }, error = function(e) {
+    stop("Could not read ", path, ". Error: ", e)
+  })
+  .ram_process_merge(data, data, path, project_path)
+}
+
+.ram_find_project <- function(path) {
+  path <- normalizePath(path, mustWork = FALSE)
+  check <- paste0(dirname(path), "/_viash.yaml")
+  if (file.exists(check)) {
+    dirname(check)
+  } else if (check == "//_viash.yaml") {
+    NULL
+  } else {
+    .ram_find_project(dirname(check))
+  }
+}
+
+.ram_is_named_list <- function(obj) {
+  is.null(obj) || (is.list(obj) && (length(obj) == 0 || !is.null(names(obj))))
+}
+
+.ram_process_merge <- function(data, root_data, path, project_path) {
+  if (.ram_is_named_list(data)) {
+    # check whether children have `__merge__` entries
+    processed_data <- lapply(data, function(dat) {
+      .ram_process_merge(dat, root_data, path, project_path)
+    })
+    processed_data <- lapply(names(data), function(nm) {
+      dat <- data[[nm]]
+      .ram_process_merge(dat, root_data, path, project_path)
+    })
+    names(processed_data) <- names(data)
+
+    # if current element has __merge__, read list2 yaml and combine with data
+    new_data <-
+      if ("__merge__" %in% names(processed_data) && !.ram_is_named_list(processed_data$`__merge__`)) {
+        new_data_path <- .ram_resolve_path(
+          path = processed_data$`__merge__`,
+          project_path = project_path,
+          parent_path = dirname(path)
+        )
+        read_and_merge_yaml(new_data_path, project_path)
+      } else if ("$ref" %in% names(processed_data) && !.ram_is_named_list(processed_data$`$ref`)) {
+        ref_parts <- strsplit(processed_data$`$ref`, "#")[[1]]
+
+        # resolve the path in $ref
+        x <-
+          if (ref_parts[[1]] == "") {
+            root_data
+          } else {
+            new_data_path <- .ram_resolve_path(
+              path = ref_parts[[1]],
+              project_path = project_path,
+              parent_path = dirname(path)
+            )
+            new_data_path <- normalizePath(new_data_path, mustWork = FALSE)
+
+            # read in the new data
+            tryCatch({
+              suppressWarnings(yaml::read_yaml(new_data_path))
+            }, error = function(e) {
+              stop("Could not read ", new_data_path, ". Error: ", e)
+            })
+          }
+        x_root <- x
+        
+
+        # Navigate the path and retrieve the referenced data
+        ref_path_parts <- unlist(strsplit(ref_parts[[2]], "/"))
+        for (part in ref_path_parts) {
+          if (part == "") {
+            next
+          } else if (part %in% names(x)) {
+            x <- x[[part]]
+          } else {
+            stop("Could not find ", processed_data$`$ref`, " in ", path)
+          }
+        }
+
+        # postprocess the new data
+        if (ref_parts[[1]] == "") {
+          x
+        } else {
+          .ram_process_merge(x, x_root, new_data_path, project_path)
+        }
+      } else {
+        list()
+      }
+
+    .ram_deep_merge(new_data, processed_data)
+  } else if (is.list(data)) {
+    lapply(data, function(dat) {
+      .ram_process_merge(dat, root_data, path, project_path)
+    })
+  } else {
+    data
+  }
+}
+
+.ram_resolve_path <- function(path, project_path, parent_path) {
+  ifelse(
+    grepl("^/", path),
+    paste0(project_path, "/", path),
+    fs::path_abs(path, parent_path)
+  )
+}
+
+.ram_deep_merge <- function(list1, list2) {
+  if (.ram_is_named_list(list1) && .ram_is_named_list(list2)) {
+    # if list1 and list2 are objects, recursively merge
+    keys <- unique(c(names(list1), names(list2)))
+    out <- lapply(keys, function(key) {
+      if (key %in% names(list1)) {
+        if (key %in% names(list2)) {
+          .ram_deep_merge(list1[[key]], list2[[key]])
+        } else {
+          list1[[key]]
+        }
+      } else {
+        list2[[key]]
+      }
+    })
+    names(out) <- keys
+    out
+  } else if (is.list(list1) && is.list(list2)) {
+    # if list1 and list2 are both lists, append
+    c(list1, list2)
+  } else {
+    # else override list1 with list2
+    list2
+  }
+}
\ No newline at end of file
diff --git a/target/docker/common/create_task_readme/read_api_files.R b/target/docker/common/create_task_readme/read_api_files.R
new file mode 100644
index 0000000000..f2cf49b2f8
--- /dev/null
+++ b/target/docker/common/create_task_readme/read_api_files.R
@@ -0,0 +1,493 @@
+
+anndata_struct_names <- c("obs", "var", "obsm", "obsp", "varm", "varp", "layers", "uns")
+
+read_file_spec <- function(path) {
+  spec <- read_and_merge_yaml(path)
+  out <- list(
+    info = read_file_info(spec, path)
+  )
+  if (out$info$file_type == "h5ad" || "slots" %in% names(spec$info)) {
+    out$info$file_type <- "h5ad"
+    out$slots <- read_anndata_slots(spec, path)
+  }
+  if (out$info$file_type == "csv" || out$info$file_type == "tsv" || out$info$file_type == "parquet") {
+    out$columns <- read_tabular_columns(spec, path)
+  }
+  out
+}
+read_file_info <- function(spec, path) {
+  # TEMP: make it readable
+  spec$info$slots <- NULL
+  df <- list_as_tibble(spec)
+  if (list_contains_tibble(spec$info)) {
+    df <- dplyr::bind_cols(df, list_as_tibble(spec$info))
+  }
+  df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+  df$description <- df$description %||% NA_character_ %>% as.character
+  df$summary <- df$summary %||% NA_character_ %>% as.character
+  as_tibble(df)
+}
+read_anndata_slots <- function(spec, path) {
+  map_df(
+    anndata_struct_names,
+    function(struct_name, slot) {
+      slot <- spec$info$slots[[struct_name]]
+      if (is.null(slot)) return(NULL)
+      df <- map_df(slot, as.data.frame)
+      df$struct <- struct_name
+      df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+      df$required <- df$required %||% TRUE %|% TRUE
+      df$multiple <- df$multiple %||% FALSE %|% FALSE
+      as_tibble(df)
+    }
+  )
+}
+read_tabular_columns <- function(spec, path) {
+  map_df(
+    spec$info$columns,
+    function(column) {
+      df <- list_as_tibble(column)
+      df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+      df$required <- df$required %||% TRUE %|% TRUE
+      df$multiple <- df$multiple %||% FALSE %|% FALSE
+      as_tibble(df)
+    }
+  )
+}
+
+format_file_format <- function(spec) {
+  if (spec$info$file_type == "h5ad") {
+    example <- spec$slots %>%
+      group_by(struct) %>%
+      summarise(
+        str = paste0(unique(struct), ": ", paste0("'", name, "'", collapse = ", "))
+      ) %>%
+      arrange(match(struct, anndata_struct_names))
+
+    c("    AnnData object", paste0("     ", example$str))
+  } else if (spec$info$file_type == "csv" || spec$info$file_type == "tsv" || spec$info$file_type == "parquet") {
+    example <- spec$columns %>%
+      summarise(
+        str = paste0("'", name, "'", collapse = ", ")
+      )
+
+    c("    Tabular data", paste0("     ", example$str))
+  } else {
+    ""
+  }
+}
+
+format_file_format_as_kable <- function(spec) {
+  if (spec$info$file_type == "h5ad") {
+    spec$slots %>%
+      mutate(
+        tag_str = pmap_chr(lst(required), function(required) {
+          out <- c()
+          if (!required) {
+            out <- c(out, "Optional")
+          }
+          if (length(out) == 0) {
+            ""
+          } else {
+            paste0("(_", paste(out, collapse = ", "), "_) ")
+          }
+        })
+      ) %>%
+      transmute(
+        Slot = paste0("`", struct, "[\"", name, "\"]`"),
+        Type = paste0("`", type, "`"),
+        Description = paste0(
+          tag_str,
+          description %>% gsub(" *\n *", " ", .) %>% gsub("\\. *$", "", .), 
+          "."
+        )
+      ) %>%
+      knitr::kable()
+  } else if (spec$info$file_type == "csv" || spec$info$file_type == "tsv" || spec$info$file_type == "parquet") {
+    spec$columns %>%
+      mutate(
+        tag_str = pmap_chr(lst(required), function(required) {
+          out <- c()
+          if (!required) {
+            out <- c(out, "Optional")
+          }
+          if (length(out) == 0) {
+            ""
+          } else {
+            paste0("(_", paste(out, collapse = ", "), "_) ")
+          }
+        })
+      ) %>%
+      transmute(
+        Column = paste0("`", name, "`"),
+        Type = paste0("`", type, "`"),
+        Description = paste0(
+          tag_str,
+          description %>% gsub(" *\n *", " ", .) %>% gsub("\\. *$", "", .), 
+          "."
+        )
+      ) %>%
+      knitr::kable()
+  } else {
+    ""
+  }
+}
+
+list_contains_tibble <- function(li) {
+  is.list(li) && any(sapply(li, is.atomic))
+}
+
+list_as_tibble <- function(li) {
+  as.data.frame(li[sapply(li, is.atomic)], check.names = FALSE)
+}
+
+read_comp_spec <- function(path) {
+  spec_yaml <- read_and_merge_yaml(path)
+  list(
+    info = read_comp_info(spec_yaml, path),
+    args = read_comp_args(spec_yaml, path)
+  )
+}
+
+read_comp_info <- function(spec_yaml, path) {
+  # TEMP: make it readable
+  spec_yaml$functionality$arguments <- NULL
+  spec_yaml$functionality$argument_groups <- NULL
+  
+  df <- list_as_tibble(spec_yaml$functionality)
+  if (nrow(df) == 0) {
+    df <- data.frame(a = 1)[, integer(0)]
+  }
+  if (list_contains_tibble(spec_yaml$functionality$info)) {
+    df <- dplyr::bind_cols(df, list_as_tibble(spec_yaml$functionality$info))
+  }
+  if (list_contains_tibble(spec_yaml$functionality$info$type_info)) {
+    df <- dplyr::bind_cols(df, list_as_tibble(spec_yaml$functionality$info$type_info))
+  }
+  df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+  as_tibble(df)
+}
+
+read_comp_args <- function(spec_yaml, path) {
+  arguments <- spec_yaml$functionality$arguments
+  for (arg_group in spec_yaml$functionality$argument_groups) {
+    arguments <- c(arguments, arg_group$arguments)
+  }
+  map_df(arguments, function(arg) {
+    df <- list_as_tibble(arg)
+    if (list_contains_tibble(arg$info)) {
+      df <- dplyr::bind_cols(df, list_as_tibble(arg$info))
+    }
+    df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+    df$arg_name <- gsub("^-*", "", arg$name)
+    df$direction <- df$direction %||% "input" %|% "input"
+    df$parent <- df$`__merge__` %||% NA_character_ %>% basename() %>% gsub("\\.yaml", "", .)
+    df$required <- df$required %||% FALSE %|% FALSE
+    df$default <- df$default %||% NA_character_ %>% as.character
+    df$example <- df$example %||% NA_character_ %>% as.character
+    df$description <- df$description %||% NA_character_ %>% as.character
+    df$summary <- df$summary %||% NA_character_ %>% as.character
+    df
+  })
+}
+
+format_comp_args_as_tibble <- function(spec) {
+  if (nrow(spec$args) == 0) return("")
+  spec$args %>%
+    mutate(
+      tag_str = pmap_chr(lst(required, direction), function(required, direction) {
+        out <- c()
+        if (!required) {
+          out <- c(out, "Optional")
+        }
+        if (direction == "output") {
+          out <- c(out, "Output")
+        }
+        if (length(out) == 0) {
+          ""
+        } else {
+          paste0("(_", paste(out, collapse = ", "), "_) ")
+        }
+      })
+    ) %>%
+    transmute(
+      Name = paste0("`--", arg_name, "`"),
+      Type = paste0("`", type, "`"),
+      Description = paste0(
+        tag_str,
+        (summary %|% description) %>% gsub(" *\n *", " ", .) %>% gsub("\\. *$", "", .), 
+        ".",
+        ifelse(!is.na(default), paste0(" Default: `", default, "`."), "")
+      )
+    ) %>%
+    knitr::kable()
+}
+
+# path <- "src/datasets/api/comp_processor_knn.yaml"
+render_component <- function(spec) {
+  if (is.character(spec)) {
+    spec <- read_comp_spec(spec)
+  }
+
+  strip_margin(glue::glue("
+    §## Component type: {spec$info$label}
+    §
+    §Path: [`src/{spec$info$namespace}`](https://github.com/openproblems-bio/openproblems/tree/main/src/{spec$info$namespace})
+    §
+    §{spec$info$summary}
+    §
+    §Arguments:
+    §
+    §:::{{.small}}
+    §{paste(format_comp_args_as_tibble(spec), collapse = '\n')}
+    §:::
+    §
+    §"), symbol = "§")
+}
+
+# path <- "src/datasets/api/file_pca.yaml"
+render_file <- function(spec) {
+  if (is.character(spec)) {
+    spec <- read_file_spec(spec)
+  }
+
+  if (!"label" %in% names(spec$info)) {
+    spec$info$label <- basename(spec$info$example)
+  }
+
+  example <-
+    if (is.null(spec$info$example) || is.na(spec$info$example)) {
+      ""
+    } else {
+      paste0("Example file: `", spec$info$example, "`")
+    }
+
+  description <-
+    if (is.null(spec$info$description) || is.na(spec$info$description)) {
+      ""
+    } else {
+      paste0("Description:\n\n", spec$info$description)
+    }
+
+  strip_margin(glue::glue("
+    §## File format: {spec$info$label}
+    §
+    §{spec$info$summary %||% ''}
+    §
+    §{example}
+    §
+    §{description}
+    §
+    §Format:
+    §
+    §:::{{.small}}
+    §{paste(format_file_format(spec), collapse = '\n')}
+    §:::
+    §
+    §Slot description:
+    §
+    §:::{{.small}}
+    §{paste(format_file_format_as_kable(spec), collapse = '\n')}
+    §:::
+    §
+    §"), symbol = "§")
+}
+
+# path <- "src/tasks/denoising"
+read_task_api <- function(path) {
+  cli::cli_inform("Looking for project root")
+  project_path <- .ram_find_project(path)
+  api_dir <- paste0(path, "/api")
+
+  cli::cli_inform("Reading task info")
+  task_info_yaml <- list.files(api_dir, pattern = "task_info.ya?ml", full.names = TRUE)
+  assertthat::assert_that(length(task_info_yaml) == 1)
+  task_info <- read_and_merge_yaml(task_info_yaml, project_path)
+
+  cli::cli_inform("Reading task authors")
+  authors <- map_df(task_info$authors, function(aut) {
+    aut$roles <- paste(aut$roles, collapse = ", ")
+    list_as_tibble(aut)
+  })
+
+  cli::cli_inform("Reading component yamls")
+  comp_yamls <- list.files(api_dir, pattern = "comp_.*\\.ya?ml", full.names = TRUE)
+  comps <- map(comp_yamls, read_comp_spec)
+  comp_info <- map_df(comps, "info")
+  comp_args <- map_df(comps, "args")
+  names(comps) <- basename(comp_yamls) %>% gsub("\\..*$", "", .)
+
+  cli::cli_inform("Reading file yamls")
+  file_yamls <- .ram_resolve_path(
+    path = na.omit(unique(comp_args$`__merge__`)),
+    project_path = project_path,
+    parent_path = api_dir
+  )
+  files <- map(file_yamls, read_file_spec)
+  names(files) <- basename(file_yamls) %>% gsub("\\..*$", "", .)
+  file_info <- map_df(files, "info")
+  file_slots <- map_df(files, "slots")
+
+  cli::cli_inform("Generating task graph")
+  task_graph <- create_task_graph(file_info, comp_info, comp_args)
+
+  list(
+    task_info = task_info,
+    file_specs = files,
+    file_info = file_info,
+    file_slots = file_slots,
+    comp_specs = comps,
+    comp_info = comp_info,
+    comp_args = comp_args,
+    task_graph = task_graph,
+    authors = authors
+  )
+}
+
+
+create_task_graph <- function(file_info, comp_info, comp_args) {
+  clean_id <- function(id) {
+    gsub("graph", "graaf", id)
+  }
+  nodes <-
+    bind_rows(
+      file_info %>%
+        mutate(id = file_name, label = label, is_comp = FALSE),
+      comp_info %>%
+        mutate(id = file_name, label = label, is_comp = TRUE)
+    ) %>%
+      select(id, label, everything()) %>%
+      mutate(str = paste0(
+        "  ",
+        clean_id(id),
+        ifelse(is_comp, "[/\"", "(\""),
+        label,
+        ifelse(is_comp, "\"/]", "\")")
+      ))
+  edges <- bind_rows(
+    comp_args %>%
+      filter(type == "file", direction == "input") %>%
+      mutate(
+        from = parent,
+        to = file_name,
+        arrow = "---"
+      ),
+    comp_args %>%
+      filter(type == "file", direction == "output") %>%
+      mutate(
+        from = file_name,
+        to = parent,
+        arrow = "-->"
+      )
+  ) %>%
+    select(from, to, everything()) %>%
+    mutate(str = paste0("  ", clean_id(from), arrow, clean_id(to)))
+
+  igraph::graph_from_data_frame(
+    edges,
+    vertices = nodes,
+    directed = TRUE
+  )
+}
+
+.task_graph_get_root <- function(task_api) {
+  root <- names(which(igraph::degree(task_api$task_graph, mode = "in") == 0))
+  if (length(root) > 1) {
+    warning(
+      "There should probably only be one node with in-degree equal to 0.\n",
+      "  Nodes with in-degree == 0: ", paste(root, collapse = ", ")
+    )
+  }
+  root[[1]]
+}
+
+render_task_graph <- function(task_api, root = .task_graph_get_root(task_api)) {
+  order <- names(igraph::bfs(task_api$task_graph, root)$order)
+
+  vdf <- igraph::as_data_frame(task_api$task_graph, "vertices") %>%
+    arrange(match(name, order))
+  edf <- igraph::as_data_frame(task_api$task_graph, "edges") %>%
+    arrange(match(from, order), match(to, order))
+
+  strip_margin(glue::glue("
+    §```mermaid
+    §flowchart LR
+    §{paste(vdf$str, collapse = '\n')}
+    §{paste(edf$str, collapse = '\n')}
+    §```
+    §"), symbol = "§")
+}
+
+
+
+# Recursive function to process each property with indentation
+.render_example_process_property <- function(prop, prop_name = NULL, indent_level = 0) {
+  if (is.null(prop_name)) {
+    prop_name <- ""
+  }
+
+  out <- c()
+
+  # define helper variables
+  indent_spaces <- strrep(" ", indent_level)
+  next_indent_spaces <- strrep(" ", indent_level + 2)
+
+  # add comment if available
+  if ("description" %in% names(prop)) {
+    comment <- gsub("\n", paste0("\n", indent_spaces, "# "), stringr::str_trim(prop$description))
+    out <- c(out, indent_spaces, "# ", comment, "\n")
+  }
+
+  # add variable
+  out <- c(out, indent_spaces, prop_name, ": ")
+
+  if (prop$type == "object" && "properties" %in% names(prop)) {
+    # Handle object with properties
+    prop_names <- setdiff(names(prop$properties), "additionalProperties")
+    sub_props <- unlist(lapply(prop_names, function(sub_prop_name) {
+      prop_out <- .render_example_process_property(
+        prop$properties[[sub_prop_name]],
+        sub_prop_name,
+        indent_level + 2
+      )
+      c(prop_out, "\n")
+    }))
+    c(out, "\n", sub_props[-length(sub_props)])
+  } else if (prop$type == "array") {
+    if (is.list(prop$items) && "properties" %in% names(prop$items)) {
+      # Handle array of objects
+      array_items_yaml <- unlist(lapply(names(prop$items$properties), function(item_prop_name) {
+        prop_out <- .render_example_process_property(
+          prop$items$properties[[item_prop_name]],
+          item_prop_name,
+          indent_level + 4
+        )
+        c(prop_out, "\n")
+      }))
+      c(out, "\n", next_indent_spaces, "- ", array_items_yaml[-1])
+    } else {
+      # Handle simple array
+      c(out, "[ ... ]")
+    }
+  } else {
+    c(out, "...")
+  }
+}
+
+# Function for rendering an example yaml based on a JSON schema
+render_example <- function(json_schema) {
+  if (!"properties" %in% names(json_schema)) {
+    return("")
+  }
+  text <-
+    unlist(lapply(names(json_schema$properties), function(prop_name) {
+      out <- .render_example_process_property(
+        json_schema$properties[[prop_name]],
+        prop_name,
+        0
+      )
+      c(out, "\n")
+    }))
+
+  paste(text, collapse = "")
+}
\ No newline at end of file
diff --git a/target/docker/common/create_task_readme/strip_margin.R b/target/docker/common/create_task_readme/strip_margin.R
new file mode 100644
index 0000000000..3830d58d79
--- /dev/null
+++ b/target/docker/common/create_task_readme/strip_margin.R
@@ -0,0 +1,3 @@
+strip_margin <- function(text, symbol = "\\|") {
+  gsub(paste0("(^|\n)[ \t]*", symbol), "\\1", text)
+}
\ No newline at end of file
diff --git a/target/docker/common/decompress_gzip/.config.vsh.yaml b/target/docker/common/decompress_gzip/.config.vsh.yaml
new file mode 100644
index 0000000000..da928a9192
--- /dev/null
+++ b/target/docker/common/decompress_gzip/.config.vsh.yaml
@@ -0,0 +1,93 @@
+functionality:
+  name: "decompress_gzip"
+  namespace: "common"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    description: "Input file"
+    info: null
+    example:
+    - "/path/to/file.gz"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Output file"
+    info: null
+    example:
+    - "/path/to/file"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "bash_script"
+    path: "script.sh"
+    is_executable: true
+  test_resources:
+  - type: "bash_script"
+    path: "test.sh"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ubuntu:latest"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/decompress_gzip/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/common/decompress_gzip"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/common/decompress_gzip/decompress_gzip"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/common/decompress_gzip/decompress_gzip b/target/docker/common/decompress_gzip/decompress_gzip
new file mode 100755
index 0000000000..f220194199
--- /dev/null
+++ b/target/docker/common/decompress_gzip/decompress_gzip
@@ -0,0 +1,910 @@
+#!/usr/bin/env bash
+
+# decompress_gzip 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="decompress_gzip"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "decompress_gzip 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, file must exist"
+  echo "        example: /path/to/file.gz"
+  echo "        Input file"
+  echo ""
+  echo "    --output"
+  echo "        type: file, output, file must exist"
+  echo "        example: /path/to/file"
+  echo "        Output file"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM ubuntu:latest
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component common decompress_gzip"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:42Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-decompress_gzip-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "decompress_gzip 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/decompress_gzip:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/decompress_gzip:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/decompress_gzip:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/decompress_gzip:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/common/decompress_gzip:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/decompress_gzip:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/decompress_gzip:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-decompress_gzip-XXXXXX").sh
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+## VIASH START
+# The following code has been auto-generated by Viash.
+$( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "${VIASH_PAR_INPUT}" | sed "s#'#'\"'\"'#g;s#.*#par_input='&'#" ; else echo "# par_input="; fi )
+$( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "${VIASH_PAR_OUTPUT}" | sed "s#'#'\"'\"'#g;s#.*#par_output='&'#" ; else echo "# par_output="; fi )
+$( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "${VIASH_META_FUNCTIONALITY_NAME}" | sed "s#'#'\"'\"'#g;s#.*#meta_functionality_name='&'#" ; else echo "# meta_functionality_name="; fi )
+$( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "${VIASH_META_RESOURCES_DIR}" | sed "s#'#'\"'\"'#g;s#.*#meta_resources_dir='&'#" ; else echo "# meta_resources_dir="; fi )
+$( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "${VIASH_META_EXECUTABLE}" | sed "s#'#'\"'\"'#g;s#.*#meta_executable='&'#" ; else echo "# meta_executable="; fi )
+$( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "${VIASH_META_CONFIG}" | sed "s#'#'\"'\"'#g;s#.*#meta_config='&'#" ; else echo "# meta_config="; fi )
+$( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "${VIASH_META_TEMP_DIR}" | sed "s#'#'\"'\"'#g;s#.*#meta_temp_dir='&'#" ; else echo "# meta_temp_dir="; fi )
+$( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "${VIASH_META_CPUS}" | sed "s#'#'\"'\"'#g;s#.*#meta_cpus='&'#" ; else echo "# meta_cpus="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "${VIASH_META_MEMORY_B}" | sed "s#'#'\"'\"'#g;s#.*#meta_memory_b='&'#" ; else echo "# meta_memory_b="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "${VIASH_META_MEMORY_KB}" | sed "s#'#'\"'\"'#g;s#.*#meta_memory_kb='&'#" ; else echo "# meta_memory_kb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "${VIASH_META_MEMORY_MB}" | sed "s#'#'\"'\"'#g;s#.*#meta_memory_mb='&'#" ; else echo "# meta_memory_mb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "${VIASH_META_MEMORY_GB}" | sed "s#'#'\"'\"'#g;s#.*#meta_memory_gb='&'#" ; else echo "# meta_memory_gb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "${VIASH_META_MEMORY_TB}" | sed "s#'#'\"'\"'#g;s#.*#meta_memory_tb='&'#" ; else echo "# meta_memory_tb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "${VIASH_META_MEMORY_PB}" | sed "s#'#'\"'\"'#g;s#.*#meta_memory_pb='&'#" ; else echo "# meta_memory_pb="; fi )
+
+## VIASH END
+#!/bin/bash
+
+gunzip "\$par_input" -c > "\$par_output"
+VIASHMAIN
+bash "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/common/extract_metadata/.config.vsh.yaml b/target/docker/common/extract_metadata/.config.vsh.yaml
new file mode 100644
index 0000000000..9c274f69e2
--- /dev/null
+++ b/target/docker/common/extract_metadata/.config.vsh.yaml
@@ -0,0 +1,117 @@
+functionality:
+  name: "extract_metadata"
+  namespace: "common"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input"
+      description: "A h5ad file."
+      info: null
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--schema"
+      description: "An optional schema with which to annotate the output"
+      info: null
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Output"
+    arguments:
+    - type: "file"
+      name: "--output"
+      description: "A yaml file containing the metadata."
+      info: null
+      example:
+      - "output_meta.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Extract the metadata from an h5ad file."
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+  - type: "file"
+    path: "src/datasets/api/file_raw.yaml"
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  test_setup:
+  - type: "python"
+    user: false
+    packages:
+    - "viashpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/common/extract_metadata"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/common/extract_metadata/extract_metadata"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/common/extract_metadata/extract_metadata b/target/docker/common/extract_metadata/extract_metadata
new file mode 100755
index 0000000000..eac3936c9a
--- /dev/null
+++ b/target/docker/common/extract_metadata/extract_metadata
@@ -0,0 +1,1150 @@
+#!/usr/bin/env bash
+
+# extract_metadata 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="extract_metadata"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "extract_metadata 2.0.0"
+  echo ""
+  echo "Extract the metadata from an h5ad file."
+  echo ""
+  echo "Inputs:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        A h5ad file."
+  echo ""
+  echo "    --schema"
+  echo "        type: file, file must exist"
+  echo "        An optional schema with which to annotate the output"
+  echo ""
+  echo "Output:"
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: output_meta.yaml"
+  echo "        A yaml file containing the metadata."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component common extract_metadata"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:43Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-extract_metadata-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "extract_metadata 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --schema)
+            [ -n "$VIASH_PAR_SCHEMA" ] && ViashError Bad arguments for option \'--schema\': \'$VIASH_PAR_SCHEMA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SCHEMA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --schema. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --schema=*)
+            [ -n "$VIASH_PAR_SCHEMA" ] && ViashError Bad arguments for option \'--schema=*\': \'$VIASH_PAR_SCHEMA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SCHEMA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/extract_metadata:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/extract_metadata:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/extract_metadata:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/extract_metadata:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_SCHEMA" ] && [ ! -e "$VIASH_PAR_SCHEMA" ]; then
+  ViashError "Input file '$VIASH_PAR_SCHEMA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_SCHEMA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_SCHEMA")" )
+  VIASH_PAR_SCHEMA=$(ViashAutodetectMount "$VIASH_PAR_SCHEMA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/common/extract_metadata:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/extract_metadata:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/extract_metadata:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-extract_metadata-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import yaml
+import numpy as np
+import pandas as pd
+import scipy
+import os
+import datetime
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'schema': $( if [ ! -z ${VIASH_PAR_SCHEMA+x} ]; then echo "r'${VIASH_PAR_SCHEMA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input']).copy()
+
+if par["schema"]:
+  print("Load schema", flush=True)
+  with open(par["schema"], "r") as f:
+    schema = yaml.safe_load(f)
+else:
+  schema = None
+
+####################################################################################################
+## Helper functions for extracting the dataset metadata in uns                                    ##
+####################################################################################################
+def is_atomic(obj):
+  return isinstance(obj, str) or isinstance(obj, int) or isinstance(obj, bool) or isinstance(obj, float)
+
+def to_atomic(obj):
+  if isinstance(obj, np.float64):
+    return float(obj)
+  elif isinstance(obj, np.int64):
+    return int(obj)
+  elif isinstance(obj, np.bool_):
+    return bool(obj)
+  elif isinstance(obj, np.str_):
+    return str(obj)
+  return obj
+
+def is_list_of_atomics(obj):
+  if not isinstance(obj, (list,pd.core.series.Series,np.ndarray)):
+    return False
+  return all(is_atomic(elem) for elem in obj)
+
+def to_list_of_atomics(obj):
+  if isinstance(obj, pd.core.series.Series):
+    obj = obj.to_numpy()
+  if isinstance(obj, np.ndarray):
+    obj = obj.tolist()
+  return [to_atomic(elem) for elem in obj]
+
+def is_dict_of_atomics(obj):
+  if not isinstance(obj, dict):
+    return False
+  return all(is_atomic(elem) for _, elem in obj.items())
+
+def to_dict_of_atomics(obj):
+  return {k: to_atomic(v) for k, v in obj.items()}
+
+
+####################################################################################################
+## Helper functions for extracting metadata about the used data structures                        ##
+####################################################################################################
+def get_structure_shape(obj) -> list:
+  if isinstance(obj, np.ndarray):
+    return list(obj.shape)
+  elif scipy.sparse.issparse(obj):
+    return list(obj.shape)
+  elif isinstance(obj, pd.core.frame.DataFrame):
+    return list(obj.shape)
+  elif isinstance(obj, pd.core.series.Series):
+    return list(obj.shape)
+  elif isinstance(obj, list):
+    return [len(obj)]
+  elif isinstance(obj, dict):
+    return [len(obj)]
+  elif is_atomic(obj):
+    return [1]
+  return None
+
+def get_structure_type(obj) -> str:
+  # return one of: atomic, dataFrame, vector, dict, denseMatrix, sparseMatrix
+  if is_atomic(obj):
+    return "atomic"
+  elif isinstance(obj, (list,pd.core.series.Series)):
+    return "vector"
+  elif isinstance(obj, dict):
+    return "dict"
+  elif isinstance(obj, pd.core.frame.DataFrame):
+    return "dataframe"
+  elif scipy.sparse.issparse(obj):
+    return "sparsematrix"
+  elif isinstance(obj, np.ndarray):
+    return "densematrix"
+  return "other: " + str(type(obj))
+
+def get_structure_dtype(obj) -> str:
+  if isinstance(obj, np.ndarray):
+    return obj.dtype.name
+  elif isinstance(obj, pd.core.series.Series):
+    return obj.dtype.name
+  elif isinstance(obj, pd.core.frame.DataFrame):
+    return [dtype.name for dtype in obj.dtypes]
+  elif scipy.sparse.issparse(obj):
+    return obj.dtype.name
+  elif is_atomic(obj):
+    return type(obj).__name__
+  return None
+
+def get_structure_schema_info(struct, key) -> dict:
+  if schema is None:
+    return {}
+  struct_args = schema.get("info", {}).get("slots", {}).get(struct, {})
+  if struct_args is None:
+    return {}
+  if struct == "X":
+    return struct_args
+  
+  # look for item with the correct name
+  struct_results = [x for x in struct_args if x.get("name") == key]
+
+  # return None if no match is found
+  if len(struct_results) != 1:
+    return {}
+
+  return struct_results[0]
+
+def get_structure(adata, struct):
+  adata_struct = getattr(adata, struct)
+
+  # turn \`adata_struct\` into a dict for \`X\`
+  if (struct == "X"):
+    adata_struct = {"X": adata_struct} if adata_struct is not None else {}
+
+  output = []
+
+  for key, value in adata_struct.items():
+    out = {
+      "name": key,
+      "type": get_structure_type(value),
+      "shape": get_structure_shape(value),
+      "dtype": get_structure_dtype(value),
+    }
+
+    # see if the schema has information about this struct
+    schema_info = get_structure_schema_info(struct, key)
+
+    if schema_info.get("description"):
+      out["description"] = schema_info.get("description")
+    if schema_info.get("type"):
+      out["schema_type"] = schema_info.get("type")
+
+    output.append(out)
+  
+  return output
+
+####################################################################################################
+## Other helper functions                                                                         ##
+####################################################################################################
+
+def get_file_size(path: str) -> int:
+  """Get the file size in bytes of the file at the given path."""
+  return os.path.getsize(path)
+
+def get_file_creation_time(path: str) -> str:
+  """Get the creation time of the file at the given path."""
+  # Get file creation time
+  creation_time = os.path.getctime(path)
+  # Convert creation time from seconds since epoch to a readable timestamp
+  creation_time = datetime.datetime.fromtimestamp(creation_time)
+  # Format the datetime object as 'DD-MM-YYYY'
+  creation_time = creation_time.strftime('%d-%m-%Y')
+  return str(creation_time)
+
+
+print("Extract metadata from object", flush=True)
+# Extract metadata about the adata object
+uns = {}
+for key, val in adata.uns.items():
+  if is_atomic(val):
+    uns[key] = to_atomic(val)
+  elif is_list_of_atomics(val) and len(val) <= 10:
+    uns[key] = to_list_of_atomics(val)
+  elif is_dict_of_atomics(val) and len(val) <= 10:
+    uns[key] = to_dict_of_atomics(val)
+
+uns["file_size"] = get_file_size(par["input"])
+uns["date_created"] = get_file_creation_time(par["input"])
+
+# Extract metadata about the data structures
+structure = {
+  struct: get_structure(adata, struct)
+  for struct
+  in ["X", "obs", "var", "obsp", "varp", "obsm", "varm", "layers", "uns"]
+}
+
+# ¢reate metadata object
+meta = {"uns": uns, "structure": structure}
+
+print("Write metadata to file", flush=True)
+with open(par["output"], "w") as f:
+  yaml.dump(meta, f, indent=2)
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_SCHEMA" ]; then
+  VIASH_PAR_SCHEMA=$(ViashStripAutomount "$VIASH_PAR_SCHEMA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/common/process_task_results/generate_qc/.config.vsh.yaml b/target/docker/common/process_task_results/generate_qc/.config.vsh.yaml
new file mode 100644
index 0000000000..fb2b0319f8
--- /dev/null
+++ b/target/docker/common/process_task_results/generate_qc/.config.vsh.yaml
@@ -0,0 +1,142 @@
+functionality:
+  name: "generate_qc"
+  namespace: "common/process_task_results"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--task_info"
+    description: "Task info file"
+    info: null
+    example:
+    - "task_info.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--method_info"
+    description: "Method info file"
+    info: null
+    example:
+    - "method_info.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--metric_info"
+    description: "Metric info file"
+    info: null
+    example:
+    - "metric_info.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--dataset_info"
+    description: "Dataset info file"
+    info: null
+    example:
+    - "dataset_info.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--results"
+    description: "Results file"
+    info: null
+    example:
+    - "results.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Output json"
+    info: null
+    default:
+    - "output.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Generate task QC metrics"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowmem"
+    - "lowtime"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/generate_qc/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/common/process_task_results/generate_qc"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/common/process_task_results/generate_qc/generate_qc"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/common/process_task_results/generate_qc/generate_qc b/target/docker/common/process_task_results/generate_qc/generate_qc
new file mode 100755
index 0000000000..4b3088ee33
--- /dev/null
+++ b/target/docker/common/process_task_results/generate_qc/generate_qc
@@ -0,0 +1,1325 @@
+#!/usr/bin/env bash
+
+# generate_qc 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="generate_qc"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "generate_qc 2.0.0"
+  echo ""
+  echo "Generate task QC metrics"
+  echo ""
+  echo "Arguments:"
+  echo "    --task_info"
+  echo "        type: file, file must exist"
+  echo "        example: task_info.json"
+  echo "        Task info file"
+  echo ""
+  echo "    --method_info"
+  echo "        type: file, file must exist"
+  echo "        example: method_info.json"
+  echo "        Method info file"
+  echo ""
+  echo "    --metric_info"
+  echo "        type: file, file must exist"
+  echo "        example: metric_info.json"
+  echo "        Metric info file"
+  echo ""
+  echo "    --dataset_info"
+  echo "        type: file, file must exist"
+  echo "        example: dataset_info.json"
+  echo "        Dataset info file"
+  echo ""
+  echo "    --results"
+  echo "        type: file, file must exist"
+  echo "        example: results.json"
+  echo "        Results file"
+  echo ""
+  echo "    --output"
+  echo "        type: file, output, file must exist"
+  echo "        default: output.json"
+  echo "        Output json"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component common/process_task_results generate_qc"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:44Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-generate_qc-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "generate_qc 2.0.0"
+            exit
+            ;;
+        --task_info)
+            [ -n "$VIASH_PAR_TASK_INFO" ] && ViashError Bad arguments for option \'--task_info\': \'$VIASH_PAR_TASK_INFO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_INFO="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --task_info. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --task_info=*)
+            [ -n "$VIASH_PAR_TASK_INFO" ] && ViashError Bad arguments for option \'--task_info=*\': \'$VIASH_PAR_TASK_INFO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_INFO=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --method_info)
+            [ -n "$VIASH_PAR_METHOD_INFO" ] && ViashError Bad arguments for option \'--method_info\': \'$VIASH_PAR_METHOD_INFO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_METHOD_INFO="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --method_info. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --method_info=*)
+            [ -n "$VIASH_PAR_METHOD_INFO" ] && ViashError Bad arguments for option \'--method_info=*\': \'$VIASH_PAR_METHOD_INFO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_METHOD_INFO=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --metric_info)
+            [ -n "$VIASH_PAR_METRIC_INFO" ] && ViashError Bad arguments for option \'--metric_info\': \'$VIASH_PAR_METRIC_INFO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_METRIC_INFO="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --metric_info. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --metric_info=*)
+            [ -n "$VIASH_PAR_METRIC_INFO" ] && ViashError Bad arguments for option \'--metric_info=*\': \'$VIASH_PAR_METRIC_INFO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_METRIC_INFO=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_info)
+            [ -n "$VIASH_PAR_DATASET_INFO" ] && ViashError Bad arguments for option \'--dataset_info\': \'$VIASH_PAR_DATASET_INFO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_INFO="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_info. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_info=*)
+            [ -n "$VIASH_PAR_DATASET_INFO" ] && ViashError Bad arguments for option \'--dataset_info=*\': \'$VIASH_PAR_DATASET_INFO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_INFO=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --results)
+            [ -n "$VIASH_PAR_RESULTS" ] && ViashError Bad arguments for option \'--results\': \'$VIASH_PAR_RESULTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_RESULTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --results. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --results=*)
+            [ -n "$VIASH_PAR_RESULTS" ] && ViashError Bad arguments for option \'--results=*\': \'$VIASH_PAR_RESULTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_RESULTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/generate_qc:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/generate_qc:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/generate_qc:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/generate_qc:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  VIASH_PAR_OUTPUT="output.json"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_TASK_INFO" ] && [ ! -e "$VIASH_PAR_TASK_INFO" ]; then
+  ViashError "Input file '$VIASH_PAR_TASK_INFO' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_METHOD_INFO" ] && [ ! -e "$VIASH_PAR_METHOD_INFO" ]; then
+  ViashError "Input file '$VIASH_PAR_METHOD_INFO' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_METRIC_INFO" ] && [ ! -e "$VIASH_PAR_METRIC_INFO" ]; then
+  ViashError "Input file '$VIASH_PAR_METRIC_INFO' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_DATASET_INFO" ] && [ ! -e "$VIASH_PAR_DATASET_INFO" ]; then
+  ViashError "Input file '$VIASH_PAR_DATASET_INFO' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_RESULTS" ] && [ ! -e "$VIASH_PAR_RESULTS" ]; then
+  ViashError "Input file '$VIASH_PAR_RESULTS' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_TASK_INFO" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_TASK_INFO")" )
+  VIASH_PAR_TASK_INFO=$(ViashAutodetectMount "$VIASH_PAR_TASK_INFO")
+fi
+if [ ! -z "$VIASH_PAR_METHOD_INFO" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_METHOD_INFO")" )
+  VIASH_PAR_METHOD_INFO=$(ViashAutodetectMount "$VIASH_PAR_METHOD_INFO")
+fi
+if [ ! -z "$VIASH_PAR_METRIC_INFO" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_METRIC_INFO")" )
+  VIASH_PAR_METRIC_INFO=$(ViashAutodetectMount "$VIASH_PAR_METRIC_INFO")
+fi
+if [ ! -z "$VIASH_PAR_DATASET_INFO" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_DATASET_INFO")" )
+  VIASH_PAR_DATASET_INFO=$(ViashAutodetectMount "$VIASH_PAR_DATASET_INFO")
+fi
+if [ ! -z "$VIASH_PAR_RESULTS" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_RESULTS")" )
+  VIASH_PAR_RESULTS=$(ViashAutodetectMount "$VIASH_PAR_RESULTS")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/common/process_task_results/generate_qc:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/process_task_results/generate_qc:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/process_task_results/generate_qc:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-generate_qc-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import json
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'task_info': $( if [ ! -z ${VIASH_PAR_TASK_INFO+x} ]; then echo "r'${VIASH_PAR_TASK_INFO//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'method_info': $( if [ ! -z ${VIASH_PAR_METHOD_INFO+x} ]; then echo "r'${VIASH_PAR_METHOD_INFO//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'metric_info': $( if [ ! -z ${VIASH_PAR_METRIC_INFO+x} ]; then echo "r'${VIASH_PAR_METRIC_INFO//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_info': $( if [ ! -z ${VIASH_PAR_DATASET_INFO+x} ]; then echo "r'${VIASH_PAR_DATASET_INFO//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'results': $( if [ ! -z ${VIASH_PAR_RESULTS+x} ]; then echo "r'${VIASH_PAR_RESULTS//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+EXPECTED_TASK_FIELDS = ["task_id", "task_name", "task_summary", "task_description"]
+EXPECTED_METHOD_FIELDS = ["task_id", "commit_sha", "method_id", "method_name", "method_summary", "paper_reference", "is_baseline"]
+EXPECTED_METRIC_FIELDS = ["task_id", "commit_sha", "metric_id", "metric_name", "metric_summary", "paper_reference", "maximize"]
+EXPECTED_DATASET_FIELDS = ["task_id", "dataset_id", "dataset_name", "dataset_summary", "data_reference", "data_url"]
+
+def dump_json(obj, fp):
+    """Dump to JSON in a numpy-safe fashion."""
+    json.dump(
+        obj,
+        fp,
+        indent=4,
+        sort_keys=False,
+        separators=(", ", ": "),
+        ensure_ascii=False,
+    )
+
+def create_quality_control(task_info, dataset_info, method_info, metric_info, results):
+    """Quality control to detect anomalies in the results."""
+    task_id = task_info["task_id"]
+
+    result_qc = []
+
+    def add_qc(
+        category: str,
+        name: str,
+        value,
+        severity_value: float,
+        code: str,
+        message: str,
+    ) -> None:
+        "Add an entry to the result qc"
+        if severity_value <= 1:
+            severity = 0
+        elif severity_value <= 2:
+            severity = 1
+        elif severity_value <= 3:
+            severity = 2
+        else:
+            severity = 3
+        result_qc.append({
+            "task_id": task_id,
+            "category": category,
+            "name": name,
+            "value": value,
+            "severity": severity,
+            "severity_value": severity_value,
+            "code": code,
+            "message": message
+        })
+    
+    def percent_missing(list_of_dicts, field):
+        are_missing = []
+        for item in list_of_dicts:
+            if field == 'paper_reference' and item.get('is_baseline', False):
+                are_missing.append(0.0)
+            elif field in item and item[field] is not None:
+                are_missing.append(0.0)
+            else:
+                are_missing.append(1.0)
+        return np.mean(are_missing)
+    
+    # check task_info
+    for field in EXPECTED_TASK_FIELDS:
+        pct_missing = percent_missing([task_info], field)
+        add_qc(
+            "Task info",
+            f"Pct '{field}' missing",
+            pct_missing,
+            3.0 if pct_missing > 0 else 0.0,
+            "percent_missing([task_info], field)",
+            f"Task metadata field '{field}' should be defined\\n"
+            f"  Task id: {task_id}\\n"
+            f"  Field: {field}\\n"
+        )
+    
+    # check method_info
+    for field in EXPECTED_METHOD_FIELDS:
+        pct_missing = percent_missing(method_info, field)
+        add_qc(
+            "Method info",
+            f"Pct '{field}' missing",
+            pct_missing,
+            3.0 if pct_missing > 0 else 0.0,
+            "percent_missing(method_info, field)",
+            f"Method metadata field '{field}' should be defined\\n"
+            f"  Task id: {task_id}\\n"
+            f"  Field: {field}\\n"
+        )
+
+    # check metric_info
+    for field in EXPECTED_METRIC_FIELDS:
+        pct_missing = percent_missing(metric_info, field)
+        add_qc(
+            "Metric info",
+            f"Pct '{field}' missing",
+            pct_missing,
+            3.0 if pct_missing > 0 else 0.0,
+            "percent_missing(metric_info, field)",
+            f"Metric metadata field '{field}' should be defined\\n"
+            f"  Task id: {task_id}\\n"
+            f"  Field: {field}\\n"
+        )
+
+    # check dataset_info
+    for field in EXPECTED_DATASET_FIELDS:
+        pct_missing = percent_missing(dataset_info, field)
+        add_qc(
+            "Dataset info",
+            f"Pct '{field}' missing",
+            pct_missing,
+            3.0 if pct_missing > 0 else 0.0,
+            "percent_missing(dataset_info, field)",
+            f"Dataset metadata field '{field}' should be defined\\n"
+            f"  Task id: {task_id}\\n"
+            f"  Field: {field}\\n"
+        )
+
+    # turn results into long format for easier processing
+    results_long = [
+        {
+            "task_id": x["task_id"],
+            "method_id": x["method_id"],
+            "dataset_id": x["dataset_id"],
+            "metric_id": metric["metric_id"],
+            "metric_value" : x["metric_values"].get(metric["metric_id"]),
+            "scaled_score" : x["scaled_scores"].get(metric["metric_id"]),
+        }
+        for metric in metric_info
+        for x in results
+    ]
+
+    # check percentage missing
+    pct_missing = 1 - len(results_long) / (len(method_info) * len(metric_info) * len(dataset_info))
+    add_qc(
+        "Raw data",
+        "Number of results",
+        len(results),
+        pct_missing / .1,
+        "len(results) == len(method_info) * len(metric_info) * len(dataset_info)",
+        f"Number of results should be equal to #methods × #metrics × #datasets.\\n"
+        f"  Task id: {task_id}\\n"
+        f"  Number of results: {len(results)}\\n"
+        f"  Number of methods: {len(method_info)}\\n"
+        f"  Number of metrics: {len(metric_info)}\\n"
+        f"  Number of datasets: {len(dataset_info)}\\n"
+    )
+
+    # QC per metric
+    for metric in metric_info:
+        metric_id = metric["metric_id"]
+        values = [
+            res
+            for res in results_long
+            if res["metric_id"] == metric_id
+            and res["metric_value"] is not None
+            and np.isreal(res["metric_value"])
+        ]
+        pct_missing = 1 - len(values) / len(dataset_info) / len(method_info)
+
+        add_qc(
+            "Raw results",
+            f"Metric '{metric_id}' %missing",
+            pct_missing,
+            pct_missing / .1,
+            "pct_missing <= .1",
+            f"Percentage of missing results should be less than 10%.\\n"
+            f"  Task id: {task_id}\\n"
+            f"  Metric id: {metric_id}\\n"
+            f"  Percentage missing: {pct_missing*100:.0f}%\\n"
+        )
+
+    # QC per method
+    for method in method_info:
+        method_id = method["method_id"]
+        values = [ 
+            res
+            for res in results_long
+            if res["method_id"] == method_id
+            and res["metric_value"] is not None
+            and np.isreal(res["metric_value"])
+        ]
+        pct_missing = 1 - len(values) / len(dataset_info) / len(metric_info)
+
+        add_qc(
+            "Raw results",
+            f"Method '{method_id}' %missing",
+            pct_missing,
+            pct_missing / .1,
+            "pct_missing <= .1",
+            f"Percentage of missing results should be less than 10%.\\n"
+            f"  Task id: {task_id}\\n"
+            f"  method id: {method_id}\\n"
+            f"  Percentage missing: {pct_missing*100:.0f}%\\n"
+        )
+
+    # QC per dataset
+    for dataset in dataset_info:
+        dataset_id = dataset["dataset_id"]
+        values = [
+            res
+            for res in results_long
+            if res["dataset_id"] == dataset_id
+            and res["metric_value"] is not None
+            and np.isreal(res["metric_value"])
+        ]
+        pct_missing = 1 - len(values) / len(metric_info) / len(method_info)
+
+        add_qc(
+            "Raw results",
+            f"Dataset '{dataset_id}' %missing",
+            pct_missing,
+            pct_missing / .1,
+            "pct_missing <= .1",
+            f"Percentage of missing results should be less than 10%.\\n"
+            f"  Task id: {task_id}\\n"
+            f"  dataset id: {dataset_id}\\n"
+            f"  Percentage missing: {pct_missing*100:.0f}%\\n"
+        )
+
+
+    # QC per metric and method
+    for metric in metric_info:
+        for method in method_info:
+            metric_id = metric["metric_id"]
+            method_id = method["method_id"]
+            scores = [ 
+                res["scaled_score"]
+                for res in results_long
+                if res["metric_id"] == metric_id
+                and res["method_id"] == method_id
+                and res["scaled_score"] is not None
+                and np.isreal(res["scaled_score"])
+            ]
+
+            if len(scores) >= 1:
+                worst_score = np.min(scores).item()
+                best_score = np.max(scores).item()
+
+                add_qc(
+                    "Scaling",
+                    f"Worst score {method_id} {metric_id}",
+                    worst_score,
+                    worst_score / -1,
+                    "worst_score >= -1",
+                    f"Method {method_id} performs much worse than baselines.\\n"
+                    f"  Task id: {task_id}\\n"
+                    f"  Method id: {method_id}\\n"
+                    f"  Metric id: {metric_id}\\n"
+                    f"  Worst score: {worst_score}%\\n"
+                )
+
+                add_qc(
+                    "Scaling",
+                    f"Best score {method_id} {metric_id}",
+                    best_score,
+                    best_score / 2,
+                    "best_score <= 2",
+                    f"Method {method_id} performs a lot better than baselines.\\n"
+                    f"  Task id: {task_id}\\n"
+                    f"  Method id: {method_id}\\n"
+                    f"  Metric id: {metric_id}\\n"
+                    f"  Best score: {best_score}%\\n"
+                )
+
+    return result_qc
+
+def main(par):
+    # read data from files
+    with open(par["task_info"], "r", encoding="utf8") as file:
+        task_info = json.load(file)
+    with open(par["method_info"], "r", encoding="utf8") as file:
+        method_info = json.load(file)
+    with open(par["metric_info"], "r", encoding="utf8") as file:
+        metric_info = json.load(file)
+    with open(par["dataset_info"], "r", encoding="utf8") as file:
+        dataset_info = json.load(file)
+    with open(par["results"], "r", encoding="utf8") as file:
+        results = json.load(file)
+
+    # create info objects
+    quality_control = create_quality_control(task_info, dataset_info, method_info, metric_info, results)
+
+    # write data to files
+    with open(par["output"], "w", encoding="utf8") as file:
+        dump_json(quality_control, file)
+
+if __name__ == "__main__":
+    main(par)
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_TASK_INFO" ]; then
+  VIASH_PAR_TASK_INFO=$(ViashStripAutomount "$VIASH_PAR_TASK_INFO")
+fi
+if [ ! -z "$VIASH_PAR_METHOD_INFO" ]; then
+  VIASH_PAR_METHOD_INFO=$(ViashStripAutomount "$VIASH_PAR_METHOD_INFO")
+fi
+if [ ! -z "$VIASH_PAR_METRIC_INFO" ]; then
+  VIASH_PAR_METRIC_INFO=$(ViashStripAutomount "$VIASH_PAR_METRIC_INFO")
+fi
+if [ ! -z "$VIASH_PAR_DATASET_INFO" ]; then
+  VIASH_PAR_DATASET_INFO=$(ViashStripAutomount "$VIASH_PAR_DATASET_INFO")
+fi
+if [ ! -z "$VIASH_PAR_RESULTS" ]; then
+  VIASH_PAR_RESULTS=$(ViashStripAutomount "$VIASH_PAR_RESULTS")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/common/process_task_results/get_dataset_info/.config.vsh.yaml b/target/docker/common/process_task_results/get_dataset_info/.config.vsh.yaml
new file mode 100644
index 0000000000..26f3b7543d
--- /dev/null
+++ b/target/docker/common/process_task_results/get_dataset_info/.config.vsh.yaml
@@ -0,0 +1,120 @@
+functionality:
+  name: "get_dataset_info"
+  namespace: "common/process_task_results"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    description: "A yaml file"
+    info: null
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--task_id"
+    description: "A task dir"
+    info: null
+    example:
+    - "label_projection"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Output json"
+    info: null
+    default:
+    - "output.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  description: "Extract dataset info and convert to expected format for website results"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_get_info.py"
+    is_executable: true
+  - type: "file"
+    path: "src"
+    dest: "openproblems/src"
+  - type: "file"
+    path: "_viash.yaml"
+    dest: "openproblems/_viash.yaml"
+  - type: "file"
+    path: "resources_test/common/task_metadata/dataset_info.yaml"
+    dest: "test_file.yaml"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "purrr"
+    - "yaml"
+    - "rlang"
+    - "processx"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowmem"
+    - "lowtime"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_dataset_info/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/common/process_task_results/get_dataset_info"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/common/process_task_results/get_dataset_info/get_dataset_info"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/common/process_task_results/get_dataset_info/get_dataset_info b/target/docker/common/process_task_results/get_dataset_info/get_dataset_info
new file mode 100755
index 0000000000..34a6b99c73
--- /dev/null
+++ b/target/docker/common/process_task_results/get_dataset_info/get_dataset_info
@@ -0,0 +1,995 @@
+#!/usr/bin/env bash
+
+# get_dataset_info 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="get_dataset_info"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "get_dataset_info 2.0.0"
+  echo ""
+  echo "Extract dataset info and convert to expected format for website results"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, file must exist"
+  echo "        A yaml file"
+  echo ""
+  echo "    --task_id"
+  echo "        type: string"
+  echo "        example: label_projection"
+  echo "        A task dir"
+  echo ""
+  echo "    --output"
+  echo "        type: file, output, file must exist"
+  echo "        default: output.json"
+  echo "        Output json"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_cran(c("purrr", "yaml", "rlang", "processx"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component common/process_task_results get_dataset_info"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:44Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-get_dataset_info-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "get_dataset_info 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --task_id)
+            [ -n "$VIASH_PAR_TASK_ID" ] && ViashError Bad arguments for option \'--task_id\': \'$VIASH_PAR_TASK_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --task_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --task_id=*)
+            [ -n "$VIASH_PAR_TASK_ID" ] && ViashError Bad arguments for option \'--task_id=*\': \'$VIASH_PAR_TASK_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_dataset_info:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_dataset_info:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_dataset_info:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_dataset_info:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  VIASH_PAR_OUTPUT="output.json"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_dataset_info:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_dataset_info:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_dataset_info:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-get_dataset_info-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+requireNamespace("jsonlite", quietly = TRUE)
+requireNamespace("yaml", quietly = TRUE)
+library(purrr, warn.conflicts = FALSE)
+library(rlang, warn.conflicts = FALSE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "task_id" = $( if [ ! -z ${VIASH_PAR_TASK_ID+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_TASK_ID" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+datasets <- yaml::yaml.load_file(par\$input)
+
+# transform into format expected by website
+outputs <- map(datasets, function(dataset) {
+  # ↑ the 'dataset' object could be used as the new format
+
+  # TODO: it'd be nice if the s3 path was also included in the dataset info
+
+  # construct v1 format
+  out <- list(
+    "task_id" = par\$task_id,
+    "dataset_id" = dataset\$dataset_id,
+    "dataset_name" = dataset\$dataset_name,
+    "dataset_summary" = dataset\$dataset_summary,
+    "dataset_description" = dataset\$dataset_description %||% NA_character_,
+    "data_reference" = dataset\$dataset_reference %||% NA_character_,
+    "data_url" = dataset\$dataset_url %||% NA_character_,
+    "date_created" = dataset\$date_created %||% NA_character_,
+    "file_size" = dataset\$file_size %||% NA_character_
+  )
+
+  if (!is.null(dataset[["common_dataset_id"]])) {
+    out[["common_dataset_id"]] <- dataset[["common_dataset_id"]]
+  }
+
+  # show warning when certain data is missing and return null?
+  for (n in names(out)) {
+    if (is.null(out[[n]])) {
+      out_as_str <- jsonlite::toJSON(out, auto_unbox = TRUE, pretty = TRUE)
+      stop("missing value for value '", n, "' in ", out_as_str)
+    }
+  }
+
+  out
+})
+
+jsonlite::write_json(
+  outputs,
+  par\$output,
+  auto_unbox = TRUE,
+  pretty = TRUE
+)
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/common/process_task_results/get_method_info/.config.vsh.yaml b/target/docker/common/process_task_results/get_method_info/.config.vsh.yaml
new file mode 100644
index 0000000000..73765c541a
--- /dev/null
+++ b/target/docker/common/process_task_results/get_method_info/.config.vsh.yaml
@@ -0,0 +1,120 @@
+functionality:
+  name: "get_method_info"
+  namespace: "common/process_task_results"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    description: "A yaml file"
+    info: null
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--task_id"
+    description: "A task dir"
+    info: null
+    example:
+    - "label_projection"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Output json"
+    info: null
+    default:
+    - "output.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  description: "Extract method info"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_get_info.py"
+    is_executable: true
+  - type: "file"
+    path: "src"
+    dest: "openproblems/src"
+  - type: "file"
+    path: "_viash.yaml"
+    dest: "openproblems/_viash.yaml"
+  - type: "file"
+    path: "resources_test/common/task_metadata/method_configs.yaml"
+    dest: "test_file.yaml"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "purrr"
+    - "yaml"
+    - "rlang"
+    - "processx"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowmem"
+    - "lowtime"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_method_info/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/common/process_task_results/get_method_info"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/common/process_task_results/get_method_info/get_method_info"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/common/process_task_results/get_method_info/get_method_info b/target/docker/common/process_task_results/get_method_info/get_method_info
new file mode 100755
index 0000000000..d28594ad22
--- /dev/null
+++ b/target/docker/common/process_task_results/get_method_info/get_method_info
@@ -0,0 +1,1017 @@
+#!/usr/bin/env bash
+
+# get_method_info 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="get_method_info"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "get_method_info 2.0.0"
+  echo ""
+  echo "Extract method info"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, file must exist"
+  echo "        A yaml file"
+  echo ""
+  echo "    --task_id"
+  echo "        type: string"
+  echo "        example: label_projection"
+  echo "        A task dir"
+  echo ""
+  echo "    --output"
+  echo "        type: file, output, file must exist"
+  echo "        default: output.json"
+  echo "        Output json"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_cran(c("purrr", "yaml", "rlang", "processx"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component common/process_task_results get_method_info"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:43Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-get_method_info-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "get_method_info 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --task_id)
+            [ -n "$VIASH_PAR_TASK_ID" ] && ViashError Bad arguments for option \'--task_id\': \'$VIASH_PAR_TASK_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --task_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --task_id=*)
+            [ -n "$VIASH_PAR_TASK_ID" ] && ViashError Bad arguments for option \'--task_id=*\': \'$VIASH_PAR_TASK_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_method_info:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_method_info:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_method_info:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_method_info:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  VIASH_PAR_OUTPUT="output.json"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_method_info:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_method_info:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_method_info:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-get_method_info-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+requireNamespace("jsonlite", quietly = TRUE)
+requireNamespace("yaml", quietly = TRUE)
+library(purrr, warn.conflicts = FALSE)
+library(rlang, warn.conflicts = FALSE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "task_id" = $( if [ ! -z ${VIASH_PAR_TASK_ID+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_TASK_ID" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+configs <- yaml::yaml.load_file(par\$input)
+
+outputs <- map(configs, function(config) {
+  if (length(config\$functionality\$status) > 0 && config\$functionality\$status == "disabled") {
+    return(NULL)
+  }
+
+  # prep for viash 0.9.0
+  build_info <- config\$build_info %||% config\$info
+  if ("functionality" %in% names(config)) {
+    config[names(config\$functionality)] <- config\$functionality
+    config[["functionality"]] <- NULL
+  }
+
+  info <- config\$info
+
+  # add extra info
+  info\$config_path <- gsub(".*/src/", "src/", build_info\$config)
+  info\$task_id <- gsub("/.*", "", config\$namespace)
+  info\$id <- config\$name
+  info\$namespace <- config\$namespace
+  info\$commit_sha <- build_info\$git_commit %||% "missing-sha"
+  info\$code_version <- "missing-version"
+    info\$implementation_url <- paste0(
+      build_info\$git_remote, "/blob/",
+      build_info\$git_commit, "/",
+      info\$config_path
+    )
+
+  # ↑ this could be used as the new format
+
+  # construct v1 format
+  out <- list(
+    task_id = info\$task_id,
+    method_id = info\$id,
+    method_name = info\$label,
+    method_summary = info\$summary,
+    method_description = info\$description,
+    is_baseline = grepl("control", info\$type),
+    paper_reference = info\$reference %||% NA_character_,
+    code_url = info\$repository_url %||% NA_character_,
+    implementation_url = info\$implementation_url %||% NA_character_,
+    code_version = NA_character_,
+    commit_sha = info\$commit_sha
+  )
+
+  # show warning when certain data is missing and return null?
+  for (n in names(out)) {
+    if (is.null(out[[n]])) {
+      out_as_str <- jsonlite::toJSON(out, auto_unbox = TRUE, pretty = TRUE)
+      stop("missing value for value '", n, "' in ", out_as_str)
+    }
+  }
+
+  # return output
+  out
+})
+
+jsonlite::write_json(
+  outputs,
+  par\$output,
+  auto_unbox = TRUE,
+  pretty = TRUE
+)
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/common/process_task_results/get_metric_info/.config.vsh.yaml b/target/docker/common/process_task_results/get_metric_info/.config.vsh.yaml
new file mode 100644
index 0000000000..c802e53f8d
--- /dev/null
+++ b/target/docker/common/process_task_results/get_metric_info/.config.vsh.yaml
@@ -0,0 +1,120 @@
+functionality:
+  name: "get_metric_info"
+  namespace: "common/process_task_results"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    description: "A yaml file"
+    info: null
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--task_id"
+    description: "A task dir"
+    info: null
+    example:
+    - "label_projection"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Output json"
+    info: null
+    default:
+    - "output.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  description: "Extract metric info"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_get_info.py"
+    is_executable: true
+  - type: "file"
+    path: "src"
+    dest: "openproblems/src"
+  - type: "file"
+    path: "_viash.yaml"
+    dest: "openproblems/_viash.yaml"
+  - type: "file"
+    path: "resources_test/common/task_metadata/metric_configs.yaml"
+    dest: "test_file.yaml"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "purrr"
+    - "yaml"
+    - "rlang"
+    - "processx"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowmem"
+    - "lowtime"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_metric_info/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/common/process_task_results/get_metric_info"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/common/process_task_results/get_metric_info/get_metric_info"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/common/process_task_results/get_metric_info/get_metric_info b/target/docker/common/process_task_results/get_metric_info/get_metric_info
new file mode 100755
index 0000000000..d393cdd493
--- /dev/null
+++ b/target/docker/common/process_task_results/get_metric_info/get_metric_info
@@ -0,0 +1,1022 @@
+#!/usr/bin/env bash
+
+# get_metric_info 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="get_metric_info"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "get_metric_info 2.0.0"
+  echo ""
+  echo "Extract metric info"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, file must exist"
+  echo "        A yaml file"
+  echo ""
+  echo "    --task_id"
+  echo "        type: string"
+  echo "        example: label_projection"
+  echo "        A task dir"
+  echo ""
+  echo "    --output"
+  echo "        type: file, output, file must exist"
+  echo "        default: output.json"
+  echo "        Output json"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_cran(c("purrr", "yaml", "rlang", "processx"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component common/process_task_results get_metric_info"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:44Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-get_metric_info-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "get_metric_info 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --task_id)
+            [ -n "$VIASH_PAR_TASK_ID" ] && ViashError Bad arguments for option \'--task_id\': \'$VIASH_PAR_TASK_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --task_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --task_id=*)
+            [ -n "$VIASH_PAR_TASK_ID" ] && ViashError Bad arguments for option \'--task_id=*\': \'$VIASH_PAR_TASK_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_metric_info:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_metric_info:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_metric_info:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_metric_info:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  VIASH_PAR_OUTPUT="output.json"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_metric_info:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_metric_info:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_metric_info:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-get_metric_info-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+requireNamespace("jsonlite", quietly = TRUE)
+requireNamespace("yaml", quietly = TRUE)
+library(purrr, warn.conflicts = FALSE)
+library(rlang, warn.conflicts = FALSE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "task_id" = $( if [ ! -z ${VIASH_PAR_TASK_ID+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_TASK_ID" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+configs <- yaml::yaml.load_file(par\$input)
+
+outputs <- map(configs, function(config) {
+  if (length(config\$functionality\$status) > 0 && config\$functionality\$status == "disabled") {
+    return(NULL)
+  }
+
+  # prep for viash 0.9.0
+  build_info <- config\$build_info %||% config\$info
+  if ("functionality" %in% names(config)) {
+    config[names(config\$functionality)] <- config\$functionality
+    config[["functionality"]] <- NULL
+  }
+
+  map(
+    config\$info\$metrics,
+    function(info) {
+      # add extra info
+      info\$config_path <- gsub(".*/src/", "src/", build_info\$config)
+      info\$task_id <- gsub("/.*", "", config\$namespace)
+      info\$id <- info\$name
+      info\$component_id <- config\$name
+      info\$namespace <- config\$namespace
+      info\$commit_sha <- build_info\$git_commit %||% "missing-sha"
+      info\$code_version <- "missing-version"
+      info\$implementation_url <- paste0(
+        build_info\$git_remote, "/blob/",
+        build_info\$git_commit, "/",
+        info\$config_path
+      )
+
+      # ↑ this could be used as the new format
+
+      # construct v1 format
+      out <- list(
+        task_id = info\$task_id,
+        metric_id = info\$id,
+        metric_name = info\$label,
+        metric_summary = info\$summary,
+        metric_description = info\$description,
+        paper_reference = info\$reference %||% NA_character_,
+        implementation_url = info\$implementation_url %||% NA_character_,
+        code_version = NA_character_,
+        commit_sha = info\$commit_sha,
+        maximize = info\$maximize
+      )
+
+      # show warning when certain data is missing and return null?
+      for (n in names(out)) {
+        if (is.null(out[[n]])) {
+          out_as_str <- jsonlite::toJSON(out, auto_unbox = TRUE, pretty = TRUE)
+          stop("missing value for value '", n, "' in ", out_as_str)
+        }
+      }
+
+      # return output
+      out
+    }
+  )
+})
+
+outputs <- unlist(outputs, recursive = FALSE)
+
+jsonlite::write_json(
+  outputs,
+  par\$output,
+  auto_unbox = TRUE,
+  pretty = TRUE
+)
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/common/process_task_results/get_results/.config.vsh.yaml b/target/docker/common/process_task_results/get_results/.config.vsh.yaml
new file mode 100644
index 0000000000..ba29edaf8c
--- /dev/null
+++ b/target/docker/common/process_task_results/get_results/.config.vsh.yaml
@@ -0,0 +1,179 @@
+functionality:
+  name: "get_results"
+  namespace: "common/process_task_results"
+  version: "2.0.0"
+  arguments:
+  - type: "string"
+    name: "--task_id"
+    description: "Task id"
+    info: null
+    example:
+    - "batch_integration"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_scores"
+    description: "Scores file"
+    info: null
+    example:
+    - "score_uns.yaml"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_execution"
+    description: "Nextflow log file"
+    info: null
+    example:
+    - "trace.txt"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_dataset_info"
+    description: "Method info file"
+    info: null
+    example:
+    - "dataset_info.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_method_info"
+    description: "Method info file"
+    info: null
+    example:
+    - "method_info.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_metric_info"
+    description: "Metric info file"
+    info: null
+    example:
+    - "metric_info.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_results"
+    description: "Output json"
+    info: null
+    default:
+    - "results.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_metric_execution_info"
+    description: "Output metric execution info"
+    info: null
+    default:
+    - "metric_execution_info.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  description: "Extract execution info"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "purrr"
+    - "yaml"
+    - "rlang"
+    - "dplyr"
+    - "tidyr"
+    - "readr"
+    - "lubridate"
+    - "dynutils"
+    - "processx"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowmem"
+    - "lowtime"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_results/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/common/process_task_results/get_results"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/common/process_task_results/get_results/get_results"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/common/process_task_results/get_results/get_results b/target/docker/common/process_task_results/get_results/get_results
new file mode 100755
index 0000000000..c542537be4
--- /dev/null
+++ b/target/docker/common/process_task_results/get_results/get_results
@@ -0,0 +1,1319 @@
+#!/usr/bin/env bash
+
+# get_results 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="get_results"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "get_results 2.0.0"
+  echo ""
+  echo "Extract execution info"
+  echo ""
+  echo "Arguments:"
+  echo "    --task_id"
+  echo "        type: string"
+  echo "        example: batch_integration"
+  echo "        Task id"
+  echo ""
+  echo "    --input_scores"
+  echo "        type: file, file must exist"
+  echo "        example: score_uns.yaml"
+  echo "        Scores file"
+  echo ""
+  echo "    --input_execution"
+  echo "        type: file, file must exist"
+  echo "        example: trace.txt"
+  echo "        Nextflow log file"
+  echo ""
+  echo "    --input_dataset_info"
+  echo "        type: file, file must exist"
+  echo "        example: dataset_info.json"
+  echo "        Method info file"
+  echo ""
+  echo "    --input_method_info"
+  echo "        type: file, file must exist"
+  echo "        example: method_info.json"
+  echo "        Method info file"
+  echo ""
+  echo "    --input_metric_info"
+  echo "        type: file, file must exist"
+  echo "        example: metric_info.json"
+  echo "        Metric info file"
+  echo ""
+  echo "    --output_results"
+  echo "        type: file, output, file must exist"
+  echo "        default: results.json"
+  echo "        Output json"
+  echo ""
+  echo "    --output_metric_execution_info"
+  echo "        type: file, output, file must exist"
+  echo "        default: metric_execution_info.json"
+  echo "        Output metric execution info"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_cran(c("purrr", "yaml", "rlang", "dplyr", "tidyr", "readr", "lubridate", "dynutils", "processx"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component common/process_task_results get_results"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:43Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-get_results-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "get_results 2.0.0"
+            exit
+            ;;
+        --task_id)
+            [ -n "$VIASH_PAR_TASK_ID" ] && ViashError Bad arguments for option \'--task_id\': \'$VIASH_PAR_TASK_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --task_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --task_id=*)
+            [ -n "$VIASH_PAR_TASK_ID" ] && ViashError Bad arguments for option \'--task_id=*\': \'$VIASH_PAR_TASK_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_scores)
+            [ -n "$VIASH_PAR_INPUT_SCORES" ] && ViashError Bad arguments for option \'--input_scores\': \'$VIASH_PAR_INPUT_SCORES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SCORES="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_scores. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_scores=*)
+            [ -n "$VIASH_PAR_INPUT_SCORES" ] && ViashError Bad arguments for option \'--input_scores=*\': \'$VIASH_PAR_INPUT_SCORES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SCORES=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_execution)
+            [ -n "$VIASH_PAR_INPUT_EXECUTION" ] && ViashError Bad arguments for option \'--input_execution\': \'$VIASH_PAR_INPUT_EXECUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_EXECUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_execution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_execution=*)
+            [ -n "$VIASH_PAR_INPUT_EXECUTION" ] && ViashError Bad arguments for option \'--input_execution=*\': \'$VIASH_PAR_INPUT_EXECUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_EXECUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_dataset_info)
+            [ -n "$VIASH_PAR_INPUT_DATASET_INFO" ] && ViashError Bad arguments for option \'--input_dataset_info\': \'$VIASH_PAR_INPUT_DATASET_INFO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATASET_INFO="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_dataset_info. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_dataset_info=*)
+            [ -n "$VIASH_PAR_INPUT_DATASET_INFO" ] && ViashError Bad arguments for option \'--input_dataset_info=*\': \'$VIASH_PAR_INPUT_DATASET_INFO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATASET_INFO=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_method_info)
+            [ -n "$VIASH_PAR_INPUT_METHOD_INFO" ] && ViashError Bad arguments for option \'--input_method_info\': \'$VIASH_PAR_INPUT_METHOD_INFO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_METHOD_INFO="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_method_info. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_method_info=*)
+            [ -n "$VIASH_PAR_INPUT_METHOD_INFO" ] && ViashError Bad arguments for option \'--input_method_info=*\': \'$VIASH_PAR_INPUT_METHOD_INFO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_METHOD_INFO=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_metric_info)
+            [ -n "$VIASH_PAR_INPUT_METRIC_INFO" ] && ViashError Bad arguments for option \'--input_metric_info\': \'$VIASH_PAR_INPUT_METRIC_INFO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_METRIC_INFO="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_metric_info. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_metric_info=*)
+            [ -n "$VIASH_PAR_INPUT_METRIC_INFO" ] && ViashError Bad arguments for option \'--input_metric_info=*\': \'$VIASH_PAR_INPUT_METRIC_INFO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_METRIC_INFO=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_results)
+            [ -n "$VIASH_PAR_OUTPUT_RESULTS" ] && ViashError Bad arguments for option \'--output_results\': \'$VIASH_PAR_OUTPUT_RESULTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_RESULTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_results. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_results=*)
+            [ -n "$VIASH_PAR_OUTPUT_RESULTS" ] && ViashError Bad arguments for option \'--output_results=*\': \'$VIASH_PAR_OUTPUT_RESULTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_RESULTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_metric_execution_info)
+            [ -n "$VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO" ] && ViashError Bad arguments for option \'--output_metric_execution_info\': \'$VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_metric_execution_info. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_metric_execution_info=*)
+            [ -n "$VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO" ] && ViashError Bad arguments for option \'--output_metric_execution_info=*\': \'$VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_results:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_results:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_results:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_results:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_OUTPUT_RESULTS+x} ]; then
+  VIASH_PAR_OUTPUT_RESULTS="results.json"
+fi
+if [ -z ${VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO+x} ]; then
+  VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO="metric_execution_info.json"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SCORES" ] && [ ! -e "$VIASH_PAR_INPUT_SCORES" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SCORES' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_EXECUTION" ] && [ ! -e "$VIASH_PAR_INPUT_EXECUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_EXECUTION' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_DATASET_INFO" ] && [ ! -e "$VIASH_PAR_INPUT_DATASET_INFO" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATASET_INFO' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_METHOD_INFO" ] && [ ! -e "$VIASH_PAR_INPUT_METHOD_INFO" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_METHOD_INFO' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_METRIC_INFO" ] && [ ! -e "$VIASH_PAR_INPUT_METRIC_INFO" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_METRIC_INFO' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT_RESULTS" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_RESULTS")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_RESULTS")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_SCORES" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SCORES")" )
+  VIASH_PAR_INPUT_SCORES=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SCORES")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_EXECUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_EXECUTION")" )
+  VIASH_PAR_INPUT_EXECUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_EXECUTION")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_DATASET_INFO" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_DATASET_INFO")" )
+  VIASH_PAR_INPUT_DATASET_INFO=$(ViashAutodetectMount "$VIASH_PAR_INPUT_DATASET_INFO")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_METHOD_INFO" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_METHOD_INFO")" )
+  VIASH_PAR_INPUT_METHOD_INFO=$(ViashAutodetectMount "$VIASH_PAR_INPUT_METHOD_INFO")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_METRIC_INFO" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_METRIC_INFO")" )
+  VIASH_PAR_INPUT_METRIC_INFO=$(ViashAutodetectMount "$VIASH_PAR_INPUT_METRIC_INFO")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_RESULTS" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_RESULTS")" )
+  VIASH_PAR_OUTPUT_RESULTS=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_RESULTS")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_RESULTS" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO")" )
+  VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_results:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_results:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_results:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-get_results-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+requireNamespace("jsonlite", quietly = TRUE)
+requireNamespace("yaml", quietly = TRUE)
+requireNamespace("dynutils", quietly = TRUE)
+requireNamespace("readr", quietly = TRUE)
+requireNamespace("lubridate", quietly = TRUE)
+library(dplyr, warn.conflicts = FALSE)
+library(tidyr, warn.conflicts = FALSE)
+library(purrr, warn.conflicts = FALSE)
+library(rlang, warn.conflicts = FALSE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "task_id" = $( if [ ! -z ${VIASH_PAR_TASK_ID+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_TASK_ID" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_scores" = $( if [ ! -z ${VIASH_PAR_INPUT_SCORES+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_SCORES" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_execution" = $( if [ ! -z ${VIASH_PAR_INPUT_EXECUTION+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_EXECUTION" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_dataset_info" = $( if [ ! -z ${VIASH_PAR_INPUT_DATASET_INFO+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_DATASET_INFO" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_method_info" = $( if [ ! -z ${VIASH_PAR_INPUT_METHOD_INFO+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_METHOD_INFO" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_metric_info" = $( if [ ! -z ${VIASH_PAR_INPUT_METRIC_INFO+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_METRIC_INFO" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output_results" = $( if [ ! -z ${VIASH_PAR_OUTPUT_RESULTS+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT_RESULTS" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output_metric_execution_info" = $( if [ ! -z ${VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+# --- helper functions ---------------------------------------------------------
+cat("Loading helper functions\\n")
+parse_exit <- function(x) {
+  if (is.na(x) || x == "-") {
+    NA_integer_
+  } else {
+    as.integer(x)
+  }
+}
+parse_duration <- function(x) {
+  if (is.na(x) || x == "-") {
+    NA_real_
+  } else {
+    as.numeric(lubridate::duration(toupper(x)))
+  }
+}
+parse_cpu <- function(x) {
+  if (is.na(x) || x == "-") {
+    NA_real_
+  } else {
+    as.numeric(gsub(" *%", "", x))
+  }
+}
+parse_size <- function(x) {
+  out <-
+    if (is.na(x) || x == "-") {
+      NA_integer_
+    } else if (grepl("GB", x)) {
+      as.numeric(gsub(" *GB", "", x)) * 1024
+    } else if (grepl("MB", x)) {
+      as.numeric(gsub(" *MB", "", x))
+    } else if (grepl("KB", x)) {
+      as.numeric(gsub(" *KB", "", x)) / 1024
+    } else if (grepl("B", x)) {
+      as.numeric(gsub(" *B", "", x)) / 1024 / 1024
+    } else {
+      NA_integer_
+    }
+  as.integer(ceiling(out))
+}
+
+# --- read input files ---------------------------------------------------------
+cat("Reading input files\\n")
+# read scores
+raw_scores <-
+  yaml::yaml.load_file(par\$input_scores) %>%
+  map_df(function(x) {
+    tryCatch({
+      as_tibble(as.data.frame(
+        x[c("dataset_id", "method_id", "metric_ids", "metric_values")]
+      ))
+    }, error = function(e) {
+      message("Encountered error while reading scores: ", e\$message)
+      NULL
+    })
+  })
+
+# read metric info
+dataset_info <- jsonlite::read_json(par\$input_dataset_info, simplifyVector = TRUE)
+method_info <- jsonlite::read_json(par\$input_method_info, simplifyVector = TRUE)
+metric_info <- jsonlite::read_json(par\$input_metric_info, simplifyVector = TRUE)
+
+# --- process scores and execution info ----------------------------------------
+cat("Processing scores and execution info\\n")
+scale_scores <- function(values, is_control, maximize) {
+  control_values <- values[is_control & !is.na(values)]
+  if (length(control_values) < 2) {
+    return(NA_real_)
+  }
+
+  min_control_value <- min(control_values)
+  max_control_value <- max(control_values)
+
+  if (min_control_value == max_control_value) {
+    return(NA_real_)
+  }
+
+  scaled <- (values - min_control_value) / (max_control_value - min_control_value)
+
+  if (maximize) {
+    scaled
+  } else {
+    1 - scaled
+  }
+}
+aggregate_scores <- function(scaled_score) {
+  mean(pmin(1, pmax(0, scaled_score)) %|% 0)
+}
+scores <- raw_scores %>%
+  complete(
+    dataset_id,
+    method_id,
+    metric_ids,
+    fill = list(metric_values = NA_real_)
+  ) %>%
+  left_join(method_info %>% select(method_id, is_baseline), by = "method_id") %>%
+  left_join(metric_info %>% select(metric_ids = metric_id, maximize), by = "metric_ids") %>%
+  group_by(metric_ids, dataset_id) %>%
+  mutate(scaled_score = scale_scores(metric_values, is_baseline, maximize[[1]]) %|% 0) %>%
+  group_by(dataset_id, method_id) %>%
+  summarise(
+    metric_values = list(as.list(setNames(metric_values, metric_ids))),
+    scaled_scores = list(as.list(setNames(scaled_score, metric_ids))),
+    mean_score = aggregate_scores(scaled_score),
+    .groups = "drop"
+  )
+
+# read nxf log and process the task id
+norm_methods <- "/log_cp10k|/log_cpm|/sqrt_cp10k|/sqrt_cpm|/l1_sqrt|/log_scran_pooling"
+id_regex <- paste0("^.*:(.*)_process \\\\(([^\\\\.]*)(", norm_methods, ")?(.[^\\\\.]*)?\\\\.(.*)\\\\)\$")
+
+trace <- readr::read_tsv(par\$input_execution) %>%
+  mutate(
+    id = name,
+    process_id = stringr::str_extract(id, id_regex, 1L),
+    dataset_id = stringr::str_extract(id, id_regex, 2L),
+    normalization_id = gsub("^/", "", stringr::str_extract(id, id_regex, 3L)),
+    grp4 = gsub("^\\\\.", "", stringr::str_extract(id, id_regex, 4L)),
+    grp5 = stringr::str_extract(id, id_regex, 5L),
+    submit = strptime(submit, "%Y-%m-%d %H:%M:%S"),
+  ) %>%
+  # detect whether entry is a metric or a method
+  mutate(
+    method_id = ifelse(is.na(grp4), grp5, grp4),
+    metric_id = ifelse(is.na(grp4), grp4, grp5)
+  ) %>%
+  select(-grp4, -grp5) %>%
+  filter(!is.na(method_id)) %>%
+  # take last entry for each run
+  arrange(desc(submit)) %>%
+  group_by(name) %>%
+  slice(1) %>%
+  ungroup()
+
+# parse values
+execution_info <- trace %>%
+  filter(process_id == method_id) %>% # only keep method entries
+  rowwise() %>%
+  transmute(
+    dataset_id,
+    normalization_id,
+    method_id,
+    resources = list(list(
+      exit_code = parse_exit(exit),
+      duration_sec = parse_duration(realtime),
+      cpu_pct = parse_cpu(\`%cpu\`),
+      peak_memory_mb = parse_size(peak_vmem),
+      disk_read_mb = parse_size(rchar),
+      disk_write_mb = parse_size(wchar)
+    ))
+  ) %>%
+  ungroup()
+
+# combine scores with execution info
+# fill up missing entries with NAs and 0s
+metric_ids <- unique(raw_scores\$metric_ids)
+rep_names <- function(val) {
+  setNames(
+    as.list(rep(val, length(metric_ids))),
+    metric_ids
+  )
+}
+out <- full_join(
+  scores,
+  execution_info,
+  by = c("method_id", "dataset_id")
+) %>%
+  rowwise() %>%
+  mutate(
+    task_id = par\$task_id,
+    metric_values = list(metric_values %||% rep_names(NA_real_)),
+    scaled_scores = list(scaled_scores %||% rep_names(0)),
+    mean_score = mean_score %|% 0,
+  ) %>%
+  ungroup()
+
+
+# --- process metric execution info --------------------------------------------
+cat("Processing metric execution info\\n")
+metric_execution_info <- trace %>%
+  filter(process_id == metric_id) %>% # only keep metric entries
+  rowwise() %>%
+  transmute(
+    dataset_id,
+    normalization_id,
+    method_id,
+    metric_id,
+    resources = list(list(
+      exit_code = parse_exit(exit),
+      duration_sec = parse_duration(realtime),
+      cpu_pct = parse_cpu(\`%cpu\`),
+      peak_memory_mb = parse_size(peak_vmem),
+      disk_read_mb = parse_size(rchar),
+      disk_write_mb = parse_size(wchar)
+    ))
+  ) %>%
+  ungroup()
+
+# --- write output files -------------------------------------------------------
+cat("Writing output files\\n")
+# write output files
+jsonlite::write_json(
+  purrr::transpose(out),
+  par\$output_results,
+  auto_unbox = TRUE,
+  pretty = TRUE
+)
+jsonlite::write_json(
+  purrr::transpose(metric_execution_info),
+  par\$output_metric_execution_info,
+  auto_unbox = TRUE,
+  pretty = TRUE
+)
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_SCORES" ]; then
+  VIASH_PAR_INPUT_SCORES=$(ViashStripAutomount "$VIASH_PAR_INPUT_SCORES")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_EXECUTION" ]; then
+  VIASH_PAR_INPUT_EXECUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_EXECUTION")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_DATASET_INFO" ]; then
+  VIASH_PAR_INPUT_DATASET_INFO=$(ViashStripAutomount "$VIASH_PAR_INPUT_DATASET_INFO")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_METHOD_INFO" ]; then
+  VIASH_PAR_INPUT_METHOD_INFO=$(ViashStripAutomount "$VIASH_PAR_INPUT_METHOD_INFO")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_METRIC_INFO" ]; then
+  VIASH_PAR_INPUT_METRIC_INFO=$(ViashStripAutomount "$VIASH_PAR_INPUT_METRIC_INFO")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_RESULTS" ]; then
+  VIASH_PAR_OUTPUT_RESULTS=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_RESULTS")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO" ]; then
+  VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT_RESULTS" ] && [ ! -e "$VIASH_PAR_OUTPUT_RESULTS" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_RESULTS' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO" ] && [ ! -e "$VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/common/process_task_results/get_task_info/.config.vsh.yaml b/target/docker/common/process_task_results/get_task_info/.config.vsh.yaml
new file mode 100644
index 0000000000..1fc9fde76b
--- /dev/null
+++ b/target/docker/common/process_task_results/get_task_info/.config.vsh.yaml
@@ -0,0 +1,120 @@
+functionality:
+  name: "get_task_info"
+  namespace: "common/process_task_results"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    description: "A yaml file"
+    info: null
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--task_id"
+    description: "A task dir"
+    info: null
+    example:
+    - "label_projection"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Output json"
+    info: null
+    default:
+    - "output.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  description: "Extract task info"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_get_info.py"
+    is_executable: true
+  - type: "file"
+    path: "src"
+    dest: "openproblems/src"
+  - type: "file"
+    path: "_viash.yaml"
+    dest: "openproblems/_viash.yaml"
+  - type: "file"
+    path: "resources_test/common/task_metadata/task_info.yaml"
+    dest: "test_file.yaml"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "purrr"
+    - "yaml"
+    - "rlang"
+    - "processx"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowmem"
+    - "lowtime"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_task_info/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/common/process_task_results/get_task_info"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/common/process_task_results/get_task_info/get_task_info"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/common/process_task_results/get_task_info/get_task_info b/target/docker/common/process_task_results/get_task_info/get_task_info
new file mode 100755
index 0000000000..61e376c84c
--- /dev/null
+++ b/target/docker/common/process_task_results/get_task_info/get_task_info
@@ -0,0 +1,981 @@
+#!/usr/bin/env bash
+
+# get_task_info 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="get_task_info"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "get_task_info 2.0.0"
+  echo ""
+  echo "Extract task info"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, file must exist"
+  echo "        A yaml file"
+  echo ""
+  echo "    --task_id"
+  echo "        type: string"
+  echo "        example: label_projection"
+  echo "        A task dir"
+  echo ""
+  echo "    --output"
+  echo "        type: file, output, file must exist"
+  echo "        default: output.json"
+  echo "        Output json"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_cran(c("purrr", "yaml", "rlang", "processx"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component common/process_task_results get_task_info"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:39Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-get_task_info-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "get_task_info 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --task_id)
+            [ -n "$VIASH_PAR_TASK_ID" ] && ViashError Bad arguments for option \'--task_id\': \'$VIASH_PAR_TASK_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --task_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --task_id=*)
+            [ -n "$VIASH_PAR_TASK_ID" ] && ViashError Bad arguments for option \'--task_id=*\': \'$VIASH_PAR_TASK_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_task_info:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_task_info:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_task_info:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_task_info:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  VIASH_PAR_OUTPUT="output.json"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_task_info:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_task_info:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/process_task_results/get_task_info:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-get_task_info-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+requireNamespace("jsonlite", quietly = TRUE)
+requireNamespace("yaml", quietly = TRUE)
+library(purrr, warn.conflicts = FALSE)
+library(rlang, warn.conflicts = FALSE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "task_id" = $( if [ ! -z ${VIASH_PAR_TASK_ID+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_TASK_ID" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+info <- yaml::yaml.load_file(par\$input)
+# ↑ this could be used as the new format
+
+# construct v1 format
+out <- list(
+  task_id = info\$name,
+  commit_sha = NA_character_,
+  task_name = info\$label,
+  task_summary = info\$summary,
+  task_description = paste0(info\$motivation, "\\n\\n", info\$description),
+  repo = "openproblems-bio/openproblems",
+  authors = info\$authors
+)
+
+# show warning when certain data is missing and return null?
+for (n in names(out)) {
+  if (is.null(out[[n]])) {
+    out_as_str <- jsonlite::toJSON(out, auto_unbox = TRUE, pretty = TRUE)
+    stop("missing value for value '", n, "' in ", out_as_str)
+  }
+}
+
+jsonlite::write_json(
+  out,
+  par\$output,
+  auto_unbox = TRUE,
+  pretty = TRUE
+)
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/common/process_task_results/yaml_to_json/.config.vsh.yaml b/target/docker/common/process_task_results/yaml_to_json/.config.vsh.yaml
new file mode 100644
index 0000000000..5a0eece1b2
--- /dev/null
+++ b/target/docker/common/process_task_results/yaml_to_json/.config.vsh.yaml
@@ -0,0 +1,110 @@
+functionality:
+  name: "yaml_to_json"
+  namespace: "common/process_task_results"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    description: "A yaml file"
+    info: null
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--task_id"
+    description: "A task dir"
+    info: null
+    example:
+    - "label_projection"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Output json"
+    info: null
+    default:
+    - "output.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "convert yaml file to json file"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_get_info.py"
+    is_executable: true
+  - type: "file"
+    path: "src"
+    dest: "openproblems/src"
+  - type: "file"
+    path: "_viash.yaml"
+    dest: "openproblems/_viash.yaml"
+  - type: "file"
+    path: "resources_test/common/task_metadata/dataset_info.yaml"
+    dest: "test_file.yaml"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+- type: "native"
+  id: "native"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/yaml_to_json/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/common/process_task_results/yaml_to_json"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/common/process_task_results/yaml_to_json/yaml_to_json"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/common/process_task_results/yaml_to_json/yaml_to_json b/target/docker/common/process_task_results/yaml_to_json/yaml_to_json
new file mode 100755
index 0000000000..3d1f9a9c1e
--- /dev/null
+++ b/target/docker/common/process_task_results/yaml_to_json/yaml_to_json
@@ -0,0 +1,946 @@
+#!/usr/bin/env bash
+
+# yaml_to_json 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="yaml_to_json"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "yaml_to_json 2.0.0"
+  echo ""
+  echo "convert yaml file to json file"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, file must exist"
+  echo "        A yaml file"
+  echo ""
+  echo "    --task_id"
+  echo "        type: string"
+  echo "        example: label_projection"
+  echo "        A task dir"
+  echo ""
+  echo "    --output"
+  echo "        type: file, output, file must exist"
+  echo "        default: output.json"
+  echo "        Output json"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component common/process_task_results yaml_to_json"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:45Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-yaml_to_json-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "yaml_to_json 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --task_id)
+            [ -n "$VIASH_PAR_TASK_ID" ] && ViashError Bad arguments for option \'--task_id\': \'$VIASH_PAR_TASK_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --task_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --task_id=*)
+            [ -n "$VIASH_PAR_TASK_ID" ] && ViashError Bad arguments for option \'--task_id=*\': \'$VIASH_PAR_TASK_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/yaml_to_json:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/yaml_to_json:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/yaml_to_json:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/process_task_results/yaml_to_json:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  VIASH_PAR_OUTPUT="output.json"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/common/process_task_results/yaml_to_json:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/process_task_results/yaml_to_json:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/process_task_results/yaml_to_json:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-yaml_to_json-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import yaml
+import json
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'task_id': $( if [ ! -z ${VIASH_PAR_TASK_ID+x} ]; then echo "r'${VIASH_PAR_TASK_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+with open(par["input"], "r") as f:
+    yaml_file = yaml.safe_load(f)
+
+with open(par["output"], "w") as out:
+    json.dump(yaml_file, out, indent=2)
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/common/sync_test_resources/.config.vsh.yaml b/target/docker/common/sync_test_resources/.config.vsh.yaml
new file mode 100644
index 0000000000..5f58aa3a73
--- /dev/null
+++ b/target/docker/common/sync_test_resources/.config.vsh.yaml
@@ -0,0 +1,126 @@
+functionality:
+  name: "sync_test_resources"
+  namespace: "common"
+  version: "2.0.0"
+  arguments:
+  - type: "string"
+    name: "--input"
+    alternatives:
+    - "-i"
+    description: "Path to the S3 bucket to sync from."
+    info: null
+    default:
+    - "s3://openproblems-data/resources_test"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    alternatives:
+    - "-o"
+    description: "Path to the test resource directory."
+    info: null
+    default:
+    - "resources_test"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean_true"
+    name: "--quiet"
+    description: "Displays the operations that would be performed using the specified\
+      \ command without actually running them."
+    info: null
+    direction: "input"
+    dest: "par"
+  - type: "boolean_true"
+    name: "--dryrun"
+    description: "Does not display the operations performed from the specified command."
+    info: null
+    direction: "input"
+    dest: "par"
+  - type: "boolean_true"
+    name: "--delete"
+    description: "Files that exist in the destination but not in the source are deleted\
+      \ during sync."
+    info: null
+    direction: "input"
+    dest: "par"
+  - type: "string"
+    name: "--exclude"
+    description: "Exclude all files or objects from the command that matches the specified\
+      \ pattern."
+    info: null
+    required: false
+    direction: "input"
+    multiple: true
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "bash_script"
+    path: "script.sh"
+    is_executable: true
+  description: "Synchronise the test resources from s3 to resources_test"
+  usage: "sync_test_resources\nsync_test_resources --input s3://openproblems-data/resources_test\
+    \ --output resources_test\n"
+  test_resources:
+  - type: "bash_script"
+    path: "run_test.sh"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "amazon/aws-cli:2.7.12"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/sync_test_resources/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/common/sync_test_resources"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/common/sync_test_resources/sync_test_resources"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/common/sync_test_resources/sync_test_resources b/target/docker/common/sync_test_resources/sync_test_resources
new file mode 100755
index 0000000000..32864dbfab
--- /dev/null
+++ b/target/docker/common/sync_test_resources/sync_test_resources
@@ -0,0 +1,1033 @@
+#!/usr/bin/env bash
+
+# sync_test_resources 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="sync_test_resources"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "sync_test_resources 2.0.0"
+  echo ""
+  echo "Synchronise the test resources from s3 to resources_test"
+  echo ""
+  echo "Usage:"
+  echo "sync_test_resources"
+  echo "sync_test_resources --input s3://openproblems-data/resources_test --output"
+  echo "resources_test"
+  echo ""
+  echo "Arguments:"
+  echo "    -i, --input"
+  echo "        type: string"
+  echo "        default: s3://openproblems-data/resources_test"
+  echo "        Path to the S3 bucket to sync from."
+  echo ""
+  echo "    -o, --output"
+  echo "        type: file, output, file must exist"
+  echo "        default: resources_test"
+  echo "        Path to the test resource directory."
+  echo ""
+  echo "    --quiet"
+  echo "        type: boolean_true"
+  echo "        Displays the operations that would be performed using the specified"
+  echo "        command without actually running them."
+  echo ""
+  echo "    --dryrun"
+  echo "        type: boolean_true"
+  echo "        Does not display the operations performed from the specified command."
+  echo ""
+  echo "    --delete"
+  echo "        type: boolean_true"
+  echo "        Files that exist in the destination but not in the source are deleted"
+  echo "        during sync."
+  echo ""
+  echo "    --exclude"
+  echo "        type: string, multiple values allowed"
+  echo "        Exclude all files or objects from the command that matches the specified"
+  echo "        pattern."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM amazon/aws-cli:2.7.12
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component common sync_test_resources"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:41Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-sync_test_resources-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "sync_test_resources 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        -i)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'-i\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to -i. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        -o)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'-o\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to -o. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --quiet)
+            [ -n "$VIASH_PAR_QUIET" ] && ViashError Bad arguments for option \'--quiet\': \'$VIASH_PAR_QUIET\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_QUIET=true
+            shift 1
+            ;;
+        --dryrun)
+            [ -n "$VIASH_PAR_DRYRUN" ] && ViashError Bad arguments for option \'--dryrun\': \'$VIASH_PAR_DRYRUN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DRYRUN=true
+            shift 1
+            ;;
+        --delete)
+            [ -n "$VIASH_PAR_DELETE" ] && ViashError Bad arguments for option \'--delete\': \'$VIASH_PAR_DELETE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DELETE=true
+            shift 1
+            ;;
+        --exclude)
+            if [ -z "$VIASH_PAR_EXCLUDE" ]; then
+              VIASH_PAR_EXCLUDE="$2"
+            else
+              VIASH_PAR_EXCLUDE="$VIASH_PAR_EXCLUDE:""$2"
+            fi
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --exclude. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --exclude=*)
+            if [ -z "$VIASH_PAR_EXCLUDE" ]; then
+              VIASH_PAR_EXCLUDE=$(ViashRemoveFlags "$1")
+            else
+              VIASH_PAR_EXCLUDE="$VIASH_PAR_EXCLUDE:"$(ViashRemoveFlags "$1")
+            fi
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/sync_test_resources:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/common/sync_test_resources:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/sync_test_resources:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/common/sync_test_resources:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  VIASH_PAR_INPUT="s3://openproblems-data/resources_test"
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  VIASH_PAR_OUTPUT="resources_test"
+fi
+if [ -z ${VIASH_PAR_QUIET+x} ]; then
+  VIASH_PAR_QUIET="false"
+fi
+if [ -z ${VIASH_PAR_DRYRUN+x} ]; then
+  VIASH_PAR_DRYRUN="false"
+fi
+if [ -z ${VIASH_PAR_DELETE+x} ]; then
+  VIASH_PAR_DELETE="false"
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_QUIET" ]]; then
+  if ! [[ "$VIASH_PAR_QUIET" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--quiet' has to be a boolean_true. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_DRYRUN" ]]; then
+  if ! [[ "$VIASH_PAR_DRYRUN" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--dryrun' has to be a boolean_true. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_DELETE" ]]; then
+  if ! [[ "$VIASH_PAR_DELETE" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--delete' has to be a boolean_true. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/common/sync_test_resources:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/sync_test_resources:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/common/sync_test_resources:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-sync_test_resources-XXXXXX").sh
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+#!/bin/bash
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+$( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "${VIASH_PAR_INPUT}" | sed "s#'#'\"'\"'#g;s#.*#par_input='&'#" ; else echo "# par_input="; fi )
+$( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "${VIASH_PAR_OUTPUT}" | sed "s#'#'\"'\"'#g;s#.*#par_output='&'#" ; else echo "# par_output="; fi )
+$( if [ ! -z ${VIASH_PAR_QUIET+x} ]; then echo "${VIASH_PAR_QUIET}" | sed "s#'#'\"'\"'#g;s#.*#par_quiet='&'#" ; else echo "# par_quiet="; fi )
+$( if [ ! -z ${VIASH_PAR_DRYRUN+x} ]; then echo "${VIASH_PAR_DRYRUN}" | sed "s#'#'\"'\"'#g;s#.*#par_dryrun='&'#" ; else echo "# par_dryrun="; fi )
+$( if [ ! -z ${VIASH_PAR_DELETE+x} ]; then echo "${VIASH_PAR_DELETE}" | sed "s#'#'\"'\"'#g;s#.*#par_delete='&'#" ; else echo "# par_delete="; fi )
+$( if [ ! -z ${VIASH_PAR_EXCLUDE+x} ]; then echo "${VIASH_PAR_EXCLUDE}" | sed "s#'#'\"'\"'#g;s#.*#par_exclude='&'#" ; else echo "# par_exclude="; fi )
+$( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "${VIASH_META_FUNCTIONALITY_NAME}" | sed "s#'#'\"'\"'#g;s#.*#meta_functionality_name='&'#" ; else echo "# meta_functionality_name="; fi )
+$( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "${VIASH_META_RESOURCES_DIR}" | sed "s#'#'\"'\"'#g;s#.*#meta_resources_dir='&'#" ; else echo "# meta_resources_dir="; fi )
+$( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "${VIASH_META_EXECUTABLE}" | sed "s#'#'\"'\"'#g;s#.*#meta_executable='&'#" ; else echo "# meta_executable="; fi )
+$( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "${VIASH_META_CONFIG}" | sed "s#'#'\"'\"'#g;s#.*#meta_config='&'#" ; else echo "# meta_config="; fi )
+$( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "${VIASH_META_TEMP_DIR}" | sed "s#'#'\"'\"'#g;s#.*#meta_temp_dir='&'#" ; else echo "# meta_temp_dir="; fi )
+$( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "${VIASH_META_CPUS}" | sed "s#'#'\"'\"'#g;s#.*#meta_cpus='&'#" ; else echo "# meta_cpus="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "${VIASH_META_MEMORY_B}" | sed "s#'#'\"'\"'#g;s#.*#meta_memory_b='&'#" ; else echo "# meta_memory_b="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "${VIASH_META_MEMORY_KB}" | sed "s#'#'\"'\"'#g;s#.*#meta_memory_kb='&'#" ; else echo "# meta_memory_kb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "${VIASH_META_MEMORY_MB}" | sed "s#'#'\"'\"'#g;s#.*#meta_memory_mb='&'#" ; else echo "# meta_memory_mb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "${VIASH_META_MEMORY_GB}" | sed "s#'#'\"'\"'#g;s#.*#meta_memory_gb='&'#" ; else echo "# meta_memory_gb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "${VIASH_META_MEMORY_TB}" | sed "s#'#'\"'\"'#g;s#.*#meta_memory_tb='&'#" ; else echo "# meta_memory_tb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "${VIASH_META_MEMORY_PB}" | sed "s#'#'\"'\"'#g;s#.*#meta_memory_pb='&'#" ; else echo "# meta_memory_pb="; fi )
+
+## VIASH END
+
+extra_params=( )
+
+if [ "\$par_quiet" == "true" ]; then
+  extra_params+=( "--quiet" )
+fi
+if [ "\$par_dryrun" == "true" ]; then
+  extra_params+=( "--dryrun" )
+fi
+if [ "\$par_delete" == "true" ]; then
+  extra_params+=( "--delete" )
+fi
+
+if [ ! -z \${par_exclude+x} ]; then
+  IFS=":"
+  for var in \$par_exclude; do
+    unset IFS
+    extra_params+=( "--exclude" "\$var" )
+  done
+fi
+
+
+# Disable the use of the Amazon EC2 instance metadata service (IMDS).
+# see https://florian.ec/blog/github-actions-awscli-errors/
+# or https://github.com/aws/aws-cli/issues/5234#issuecomment-705831465
+export AWS_EC2_METADATA_DISABLED=true
+
+aws s3 sync "\$par_input" "\$par_output" --no-sign-request "\${extra_params[@]}"
+VIASHMAIN
+bash "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/loaders/cellxgene_census/.config.vsh.yaml b/target/docker/datasets/loaders/cellxgene_census/.config.vsh.yaml
new file mode 100644
index 0000000000..707fa0ceb6
--- /dev/null
+++ b/target/docker/datasets/loaders/cellxgene_census/.config.vsh.yaml
@@ -0,0 +1,345 @@
+functionality:
+  name: "cellxgene_census"
+  namespace: "datasets/loaders"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Input database"
+    description: "Open CellxGene Census by version or URI."
+    arguments:
+    - type: "string"
+      name: "--input_uri"
+      description: "If specified, a URI containing the Census SOMA objects. If specified,\
+        \ will take precedence over the `--census_version` argument."
+      info: null
+      example:
+      - "s3://bucket/path"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--census_version"
+      description: "Which release of CellxGene census to use. Possible values are\
+        \ \"latest\", \"stable\", or the date of one of the releases (e.g. \"2023-07-25\"\
+        ). For more information, check the documentation on [Census data releases](https://chanzuckerberg.github.io/cellxgene-census/cellxgene_census_docsite_data_release_info.html)."
+      info: null
+      example:
+      - "stable"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Cell query"
+    description: "Arguments related to the query."
+    arguments:
+    - type: "string"
+      name: "--species"
+      description: "The organism to query, usually one of `Homo sapiens` or `Mus musculus`."
+      info: null
+      example:
+      - "homo_sapiens"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_value_filter"
+      description: "Filter for selecting the `obs` metadata (i.e. cells). Value is\
+        \ a filter query written in the SOMA `value_filter` syntax."
+      info: null
+      example:
+      - "is_primary_data == True and cell_type_ontology_term_id in ['CL:0000136',\
+        \ 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Filter cells by grouping"
+    arguments:
+    - type: "string"
+      name: "--cell_filter_grouping"
+      description: "A subset of 'obs' columns by which to group the cells for filtering.\n\
+        Only groups surpassing or equal to the `--cell_filter_minimum_count`\nthreshold\
+        \ will be retained. Take care not to introduce a selection\nbias against cells\
+        \ with more fine-grained ontology annotations.\n"
+      info: null
+      example:
+      - "dataset_id"
+      - "tissue"
+      - "assay"
+      - "disease"
+      - "cell_type"
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--cell_filter_minimum_count"
+      description: "A minimum number of cells per group to retain. If `--cell_filter_grouping`\n\
+        is defined, this parameter should also be provided and vice versa.\n"
+      info: null
+      example:
+      - 100
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Count filtering"
+    description: "Arguments related to filtering cells and genes by counts."
+    arguments:
+    - type: "integer"
+      name: "--cell_filter_min_genes"
+      description: "Remove cells with less than this number of genes."
+      info: null
+      default:
+      - 50
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--cell_filter_min_counts"
+      description: "Remove cells with less than this number of counts."
+      info: null
+      default:
+      - 0
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_cells"
+      description: "Remove genes expressed in less than this number of cells."
+      info: null
+      default:
+      - 5
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_counts"
+      description: "Remove genes with less than this number of counts."
+      info: null
+      default:
+      - 0
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Cell metadata"
+    description: "Cell metadata arguments"
+    arguments:
+    - type: "string"
+      name: "--obs_batch"
+      description: "Location of where to find the observation batch IDs.  \n\n* If\
+        \ not specified, the `.obs[\"batch\"]` field will not be included.\n* If one\
+        \ or more values are specified, the `.obs[\"batch\"]` field will be \n  set\
+        \ to the concatenated values of the specified fields, separated by\n  the\
+        \ `obs_batch_separator`.\n"
+      info: null
+      example:
+      - "batch"
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ","
+      dest: "par"
+    - type: "string"
+      name: "--obs_batch_separator"
+      description: "Separator to use when concatenating the values of the `--obs_batch`\
+        \ fields."
+      info: null
+      default:
+      - "+"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Dataset metadata"
+    description: "Information about the dataset that will be stored in the `.uns`\
+      \ slot."
+    arguments:
+    - type: "string"
+      name: "--dataset_id"
+      description: "Unique identifier of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    description: "Output arguments."
+    arguments:
+    - type: "file"
+      name: "--output"
+      description: "Output h5ad file."
+      info: null
+      example:
+      - "output.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--output_compression"
+      info: null
+      example:
+      - "gzip"
+      required: false
+      choices:
+      - "gzip"
+      - "lzf"
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/setup_logger.py"
+  description: "Query cells from a CellxGene Census or custom TileDBSoma object.\n\
+    Aside from fetching the cells' RNA counts (`.X`), cell metadata\n(`.obs`) and\
+    \ gene metadata (`.var`), this component also fetches\nthe dataset metadata and\
+    \ joins it into the cell metadata.\n"
+  test_resources:
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.11"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "cellxgene-census"
+    - "scanpy"
+    upgrade: true
+  test_setup:
+  - type: "python"
+    user: false
+    packages:
+    - "viashpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/cellxgene_census/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/loaders/cellxgene_census"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/loaders/cellxgene_census/cellxgene_census"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/loaders/cellxgene_census/cellxgene_census b/target/docker/datasets/loaders/cellxgene_census/cellxgene_census
new file mode 100755
index 0000000000..2d926b8e82
--- /dev/null
+++ b/target/docker/datasets/loaders/cellxgene_census/cellxgene_census
@@ -0,0 +1,1532 @@
+#!/usr/bin/env bash
+
+# cellxgene_census 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="cellxgene_census"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "cellxgene_census 2.0.0"
+  echo ""
+  echo "Query cells from a CellxGene Census or custom TileDBSoma object."
+  echo "Aside from fetching the cells' RNA counts (\`.X\`), cell metadata"
+  echo "(\`.obs\`) and gene metadata (\`.var\`), this component also fetches"
+  echo "the dataset metadata and joins it into the cell metadata."
+  echo ""
+  echo "Input database:"
+  echo "    Open CellxGene Census by version or URI."
+  echo ""
+  echo "    --input_uri"
+  echo "        type: string"
+  echo "        example: s3://bucket/path"
+  echo "        If specified, a URI containing the Census SOMA objects. If specified,"
+  echo "        will take precedence over the \`--census_version\` argument."
+  echo ""
+  echo "    --census_version"
+  echo "        type: string"
+  echo "        example: stable"
+  echo "        Which release of CellxGene census to use. Possible values are \"latest\","
+  echo "        \"stable\", or the date of one of the releases (e.g. \"2023-07-25\"). For"
+  echo "        more information, check the documentation on [Census data"
+  echo "       "
+  echo "releases](https://chanzuckerberg.github.io/cellxgene-census/cellxgene_census_docsite_data_release_info.html)."
+  echo ""
+  echo "Cell query:"
+  echo "    Arguments related to the query."
+  echo ""
+  echo "    --species"
+  echo "        type: string, required parameter"
+  echo "        example: homo_sapiens"
+  echo "        The organism to query, usually one of \`Homo sapiens\` or \`Mus musculus\`."
+  echo ""
+  echo "    --obs_value_filter"
+  echo "        type: string, required parameter"
+  echo "        example: is_primary_data == True and cell_type_ontology_term_id in"
+  echo "['CL:0000136', 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'"
+  echo "        Filter for selecting the \`obs\` metadata (i.e. cells). Value is a filter"
+  echo "        query written in the SOMA \`value_filter\` syntax."
+  echo ""
+  echo "Filter cells by grouping:"
+  echo "    --cell_filter_grouping"
+  echo "        type: string, multiple values allowed"
+  echo "        example: dataset_id:tissue:assay:disease:cell_type"
+  echo "        A subset of 'obs' columns by which to group the cells for filtering."
+  echo "        Only groups surpassing or equal to the \`--cell_filter_minimum_count\`"
+  echo "        threshold will be retained. Take care not to introduce a selection"
+  echo "        bias against cells with more fine-grained ontology annotations."
+  echo ""
+  echo "    --cell_filter_minimum_count"
+  echo "        type: integer"
+  echo "        example: 100"
+  echo "        A minimum number of cells per group to retain. If"
+  echo "        \`--cell_filter_grouping\`"
+  echo "        is defined, this parameter should also be provided and vice versa."
+  echo ""
+  echo "Count filtering:"
+  echo "    Arguments related to filtering cells and genes by counts."
+  echo ""
+  echo "    --cell_filter_min_genes"
+  echo "        type: integer"
+  echo "        default: 50"
+  echo "        Remove cells with less than this number of genes."
+  echo ""
+  echo "    --cell_filter_min_counts"
+  echo "        type: integer"
+  echo "        default: 0"
+  echo "        Remove cells with less than this number of counts."
+  echo ""
+  echo "    --gene_filter_min_cells"
+  echo "        type: integer"
+  echo "        default: 5"
+  echo "        Remove genes expressed in less than this number of cells."
+  echo ""
+  echo "    --gene_filter_min_counts"
+  echo "        type: integer"
+  echo "        default: 0"
+  echo "        Remove genes with less than this number of counts."
+  echo ""
+  echo "Cell metadata:"
+  echo "    Cell metadata arguments"
+  echo ""
+  echo "    --obs_batch"
+  echo "        type: string, multiple values allowed"
+  echo "        example: batch"
+  echo "        Location of where to find the observation batch IDs."
+  echo "        * If not specified, the \`.obs[\"batch\"]\` field will not be included."
+  echo "        * If one or more values are specified, the \`.obs[\"batch\"]\` field will be"
+  echo "          set to the concatenated values of the specified fields, separated by"
+  echo "          the \`obs_batch_separator\`."
+  echo ""
+  echo "    --obs_batch_separator"
+  echo "        type: string"
+  echo "        default: +"
+  echo "        Separator to use when concatenating the values of the \`--obs_batch\`"
+  echo "        fields."
+  echo ""
+  echo "Dataset metadata:"
+  echo "    Information about the dataset that will be stored in the \`.uns\` slot."
+  echo ""
+  echo "    --dataset_id"
+  echo "        type: string, required parameter"
+  echo "        Unique identifier of the dataset."
+  echo ""
+  echo "    --dataset_name"
+  echo "        type: string, required parameter"
+  echo "        Nicely formatted name."
+  echo ""
+  echo "    --dataset_url"
+  echo "        type: string"
+  echo "        Link to the original source of the dataset."
+  echo ""
+  echo "    --dataset_reference"
+  echo "        type: string"
+  echo "        Bibtex reference of the paper in which the dataset was published."
+  echo ""
+  echo "    --dataset_summary"
+  echo "        type: string, required parameter"
+  echo "        Short description of the dataset."
+  echo ""
+  echo "    --dataset_description"
+  echo "        type: string, required parameter"
+  echo "        Long description of the dataset."
+  echo ""
+  echo "    --dataset_organism"
+  echo "        type: string, required parameter"
+  echo "        The organism of the dataset."
+  echo ""
+  echo "Outputs:"
+  echo "    Output arguments."
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: output.h5ad"
+  echo "        Output h5ad file."
+  echo ""
+  echo "    --output_compression"
+  echo "        type: string"
+  echo "        example: gzip"
+  echo "        choices: [ gzip, lzf ]"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM python:3.11
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "cellxgene-census" "scanpy"
+
+LABEL org.opencontainers.image.description="Companion container for running component datasets/loaders cellxgene_census"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:42Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-cellxgene_census-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "cellxgene_census 2.0.0"
+            exit
+            ;;
+        --input_uri)
+            [ -n "$VIASH_PAR_INPUT_URI" ] && ViashError Bad arguments for option \'--input_uri\': \'$VIASH_PAR_INPUT_URI\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_URI="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_uri. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_uri=*)
+            [ -n "$VIASH_PAR_INPUT_URI" ] && ViashError Bad arguments for option \'--input_uri=*\': \'$VIASH_PAR_INPUT_URI\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_URI=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --census_version)
+            [ -n "$VIASH_PAR_CENSUS_VERSION" ] && ViashError Bad arguments for option \'--census_version\': \'$VIASH_PAR_CENSUS_VERSION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CENSUS_VERSION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --census_version. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --census_version=*)
+            [ -n "$VIASH_PAR_CENSUS_VERSION" ] && ViashError Bad arguments for option \'--census_version=*\': \'$VIASH_PAR_CENSUS_VERSION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CENSUS_VERSION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --species)
+            [ -n "$VIASH_PAR_SPECIES" ] && ViashError Bad arguments for option \'--species\': \'$VIASH_PAR_SPECIES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SPECIES="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --species. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --species=*)
+            [ -n "$VIASH_PAR_SPECIES" ] && ViashError Bad arguments for option \'--species=*\': \'$VIASH_PAR_SPECIES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SPECIES=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obs_value_filter)
+            [ -n "$VIASH_PAR_OBS_VALUE_FILTER" ] && ViashError Bad arguments for option \'--obs_value_filter\': \'$VIASH_PAR_OBS_VALUE_FILTER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_VALUE_FILTER="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_value_filter. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_value_filter=*)
+            [ -n "$VIASH_PAR_OBS_VALUE_FILTER" ] && ViashError Bad arguments for option \'--obs_value_filter=*\': \'$VIASH_PAR_OBS_VALUE_FILTER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_VALUE_FILTER=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --cell_filter_grouping)
+            if [ -z "$VIASH_PAR_CELL_FILTER_GROUPING" ]; then
+              VIASH_PAR_CELL_FILTER_GROUPING="$2"
+            else
+              VIASH_PAR_CELL_FILTER_GROUPING="$VIASH_PAR_CELL_FILTER_GROUPING:""$2"
+            fi
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --cell_filter_grouping. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --cell_filter_grouping=*)
+            if [ -z "$VIASH_PAR_CELL_FILTER_GROUPING" ]; then
+              VIASH_PAR_CELL_FILTER_GROUPING=$(ViashRemoveFlags "$1")
+            else
+              VIASH_PAR_CELL_FILTER_GROUPING="$VIASH_PAR_CELL_FILTER_GROUPING:"$(ViashRemoveFlags "$1")
+            fi
+            shift 1
+            ;;
+        --cell_filter_minimum_count)
+            [ -n "$VIASH_PAR_CELL_FILTER_MINIMUM_COUNT" ] && ViashError Bad arguments for option \'--cell_filter_minimum_count\': \'$VIASH_PAR_CELL_FILTER_MINIMUM_COUNT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CELL_FILTER_MINIMUM_COUNT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --cell_filter_minimum_count. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --cell_filter_minimum_count=*)
+            [ -n "$VIASH_PAR_CELL_FILTER_MINIMUM_COUNT" ] && ViashError Bad arguments for option \'--cell_filter_minimum_count=*\': \'$VIASH_PAR_CELL_FILTER_MINIMUM_COUNT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CELL_FILTER_MINIMUM_COUNT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --cell_filter_min_genes)
+            [ -n "$VIASH_PAR_CELL_FILTER_MIN_GENES" ] && ViashError Bad arguments for option \'--cell_filter_min_genes\': \'$VIASH_PAR_CELL_FILTER_MIN_GENES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CELL_FILTER_MIN_GENES="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --cell_filter_min_genes. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --cell_filter_min_genes=*)
+            [ -n "$VIASH_PAR_CELL_FILTER_MIN_GENES" ] && ViashError Bad arguments for option \'--cell_filter_min_genes=*\': \'$VIASH_PAR_CELL_FILTER_MIN_GENES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CELL_FILTER_MIN_GENES=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --cell_filter_min_counts)
+            [ -n "$VIASH_PAR_CELL_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--cell_filter_min_counts\': \'$VIASH_PAR_CELL_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CELL_FILTER_MIN_COUNTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --cell_filter_min_counts. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --cell_filter_min_counts=*)
+            [ -n "$VIASH_PAR_CELL_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--cell_filter_min_counts=*\': \'$VIASH_PAR_CELL_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CELL_FILTER_MIN_COUNTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --gene_filter_min_cells)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_CELLS" ] && ViashError Bad arguments for option \'--gene_filter_min_cells\': \'$VIASH_PAR_GENE_FILTER_MIN_CELLS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_CELLS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --gene_filter_min_cells. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --gene_filter_min_cells=*)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_CELLS" ] && ViashError Bad arguments for option \'--gene_filter_min_cells=*\': \'$VIASH_PAR_GENE_FILTER_MIN_CELLS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_CELLS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --gene_filter_min_counts)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--gene_filter_min_counts\': \'$VIASH_PAR_GENE_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_COUNTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --gene_filter_min_counts. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --gene_filter_min_counts=*)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--gene_filter_min_counts=*\': \'$VIASH_PAR_GENE_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_COUNTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obs_batch)
+            if [ -z "$VIASH_PAR_OBS_BATCH" ]; then
+              VIASH_PAR_OBS_BATCH="$2"
+            else
+              VIASH_PAR_OBS_BATCH="$VIASH_PAR_OBS_BATCH,""$2"
+            fi
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_batch. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_batch=*)
+            if [ -z "$VIASH_PAR_OBS_BATCH" ]; then
+              VIASH_PAR_OBS_BATCH=$(ViashRemoveFlags "$1")
+            else
+              VIASH_PAR_OBS_BATCH="$VIASH_PAR_OBS_BATCH,"$(ViashRemoveFlags "$1")
+            fi
+            shift 1
+            ;;
+        --obs_batch_separator)
+            [ -n "$VIASH_PAR_OBS_BATCH_SEPARATOR" ] && ViashError Bad arguments for option \'--obs_batch_separator\': \'$VIASH_PAR_OBS_BATCH_SEPARATOR\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_BATCH_SEPARATOR="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_batch_separator. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_batch_separator=*)
+            [ -n "$VIASH_PAR_OBS_BATCH_SEPARATOR" ] && ViashError Bad arguments for option \'--obs_batch_separator=*\': \'$VIASH_PAR_OBS_BATCH_SEPARATOR\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_BATCH_SEPARATOR=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_id)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_id=*)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id=*\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_name)
+            [ -n "$VIASH_PAR_DATASET_NAME" ] && ViashError Bad arguments for option \'--dataset_name\': \'$VIASH_PAR_DATASET_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_NAME="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_name. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_name=*)
+            [ -n "$VIASH_PAR_DATASET_NAME" ] && ViashError Bad arguments for option \'--dataset_name=*\': \'$VIASH_PAR_DATASET_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_NAME=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_url)
+            [ -n "$VIASH_PAR_DATASET_URL" ] && ViashError Bad arguments for option \'--dataset_url\': \'$VIASH_PAR_DATASET_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_URL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_url. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_url=*)
+            [ -n "$VIASH_PAR_DATASET_URL" ] && ViashError Bad arguments for option \'--dataset_url=*\': \'$VIASH_PAR_DATASET_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_URL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_reference)
+            [ -n "$VIASH_PAR_DATASET_REFERENCE" ] && ViashError Bad arguments for option \'--dataset_reference\': \'$VIASH_PAR_DATASET_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_REFERENCE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_reference. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_reference=*)
+            [ -n "$VIASH_PAR_DATASET_REFERENCE" ] && ViashError Bad arguments for option \'--dataset_reference=*\': \'$VIASH_PAR_DATASET_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_REFERENCE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_summary)
+            [ -n "$VIASH_PAR_DATASET_SUMMARY" ] && ViashError Bad arguments for option \'--dataset_summary\': \'$VIASH_PAR_DATASET_SUMMARY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_SUMMARY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_summary. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_summary=*)
+            [ -n "$VIASH_PAR_DATASET_SUMMARY" ] && ViashError Bad arguments for option \'--dataset_summary=*\': \'$VIASH_PAR_DATASET_SUMMARY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_SUMMARY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_description)
+            [ -n "$VIASH_PAR_DATASET_DESCRIPTION" ] && ViashError Bad arguments for option \'--dataset_description\': \'$VIASH_PAR_DATASET_DESCRIPTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_DESCRIPTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_description. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_description=*)
+            [ -n "$VIASH_PAR_DATASET_DESCRIPTION" ] && ViashError Bad arguments for option \'--dataset_description=*\': \'$VIASH_PAR_DATASET_DESCRIPTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_DESCRIPTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_organism)
+            [ -n "$VIASH_PAR_DATASET_ORGANISM" ] && ViashError Bad arguments for option \'--dataset_organism\': \'$VIASH_PAR_DATASET_ORGANISM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ORGANISM="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_organism. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_organism=*)
+            [ -n "$VIASH_PAR_DATASET_ORGANISM" ] && ViashError Bad arguments for option \'--dataset_organism=*\': \'$VIASH_PAR_DATASET_ORGANISM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ORGANISM=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_compression)
+            [ -n "$VIASH_PAR_OUTPUT_COMPRESSION" ] && ViashError Bad arguments for option \'--output_compression\': \'$VIASH_PAR_OUTPUT_COMPRESSION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_COMPRESSION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_compression. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_compression=*)
+            [ -n "$VIASH_PAR_OUTPUT_COMPRESSION" ] && ViashError Bad arguments for option \'--output_compression=*\': \'$VIASH_PAR_OUTPUT_COMPRESSION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_COMPRESSION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/cellxgene_census:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/cellxgene_census:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/cellxgene_census:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/cellxgene_census:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_SPECIES+x} ]; then
+  ViashError '--species' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OBS_VALUE_FILTER+x} ]; then
+  ViashError '--obs_value_filter' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_ID+x} ]; then
+  ViashError '--dataset_id' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_NAME+x} ]; then
+  ViashError '--dataset_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then
+  ViashError '--dataset_summary' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then
+  ViashError '--dataset_description' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then
+  ViashError '--dataset_organism' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_CELL_FILTER_MIN_GENES+x} ]; then
+  VIASH_PAR_CELL_FILTER_MIN_GENES="50"
+fi
+if [ -z ${VIASH_PAR_CELL_FILTER_MIN_COUNTS+x} ]; then
+  VIASH_PAR_CELL_FILTER_MIN_COUNTS="0"
+fi
+if [ -z ${VIASH_PAR_GENE_FILTER_MIN_CELLS+x} ]; then
+  VIASH_PAR_GENE_FILTER_MIN_CELLS="5"
+fi
+if [ -z ${VIASH_PAR_GENE_FILTER_MIN_COUNTS+x} ]; then
+  VIASH_PAR_GENE_FILTER_MIN_COUNTS="0"
+fi
+if [ -z ${VIASH_PAR_OBS_BATCH_SEPARATOR+x} ]; then
+  VIASH_PAR_OBS_BATCH_SEPARATOR="+"
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_CELL_FILTER_MINIMUM_COUNT" ]]; then
+  if ! [[ "$VIASH_PAR_CELL_FILTER_MINIMUM_COUNT" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--cell_filter_minimum_count' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_CELL_FILTER_MIN_GENES" ]]; then
+  if ! [[ "$VIASH_PAR_CELL_FILTER_MIN_GENES" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--cell_filter_min_genes' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_CELL_FILTER_MIN_COUNTS" ]]; then
+  if ! [[ "$VIASH_PAR_CELL_FILTER_MIN_COUNTS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--cell_filter_min_counts' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_GENE_FILTER_MIN_CELLS" ]]; then
+  if ! [[ "$VIASH_PAR_GENE_FILTER_MIN_CELLS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--gene_filter_min_cells' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" ]]; then
+  if ! [[ "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--gene_filter_min_counts' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# check whether value is belongs to a set of choices
+if [ ! -z "$VIASH_PAR_OUTPUT_COMPRESSION" ]; then
+  VIASH_PAR_OUTPUT_COMPRESSION_CHOICES=("gzip:lzf")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_OUTPUT_COMPRESSION_CHOICES[*]}:" =~ ":$VIASH_PAR_OUTPUT_COMPRESSION:" ]]; then
+    ViashError '--output_compression' specified value of \'$VIASH_PAR_OUTPUT_COMPRESSION\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/loaders/cellxgene_census:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/loaders/cellxgene_census:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/loaders/cellxgene_census:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-cellxgene_census-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import cellxgene_census
+import scanpy as sc
+import tiledbsoma as soma
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_uri': $( if [ ! -z ${VIASH_PAR_INPUT_URI+x} ]; then echo "r'${VIASH_PAR_INPUT_URI//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'census_version': $( if [ ! -z ${VIASH_PAR_CENSUS_VERSION+x} ]; then echo "r'${VIASH_PAR_CENSUS_VERSION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'species': $( if [ ! -z ${VIASH_PAR_SPECIES+x} ]; then echo "r'${VIASH_PAR_SPECIES//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'obs_value_filter': $( if [ ! -z ${VIASH_PAR_OBS_VALUE_FILTER+x} ]; then echo "r'${VIASH_PAR_OBS_VALUE_FILTER//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cell_filter_grouping': $( if [ ! -z ${VIASH_PAR_CELL_FILTER_GROUPING+x} ]; then echo "r'${VIASH_PAR_CELL_FILTER_GROUPING//\'/\'\"\'\"r\'}'.split(':')"; else echo None; fi ),
+  'cell_filter_minimum_count': $( if [ ! -z ${VIASH_PAR_CELL_FILTER_MINIMUM_COUNT+x} ]; then echo "int(r'${VIASH_PAR_CELL_FILTER_MINIMUM_COUNT//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'cell_filter_min_genes': $( if [ ! -z ${VIASH_PAR_CELL_FILTER_MIN_GENES+x} ]; then echo "int(r'${VIASH_PAR_CELL_FILTER_MIN_GENES//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'cell_filter_min_counts': $( if [ ! -z ${VIASH_PAR_CELL_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_CELL_FILTER_MIN_COUNTS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'gene_filter_min_cells': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_CELLS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_CELLS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'gene_filter_min_counts': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_COUNTS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'obs_batch': $( if [ ! -z ${VIASH_PAR_OBS_BATCH+x} ]; then echo "r'${VIASH_PAR_OBS_BATCH//\'/\'\"\'\"r\'}'.split(',')"; else echo None; fi ),
+  'obs_batch_separator': $( if [ ! -z ${VIASH_PAR_OBS_BATCH_SEPARATOR+x} ]; then echo "r'${VIASH_PAR_OBS_BATCH_SEPARATOR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_id': $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo "r'${VIASH_PAR_DATASET_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_name': $( if [ ! -z ${VIASH_PAR_DATASET_NAME+x} ]; then echo "r'${VIASH_PAR_DATASET_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_url': $( if [ ! -z ${VIASH_PAR_DATASET_URL+x} ]; then echo "r'${VIASH_PAR_DATASET_URL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_reference': $( if [ ! -z ${VIASH_PAR_DATASET_REFERENCE+x} ]; then echo "r'${VIASH_PAR_DATASET_REFERENCE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_summary': $( if [ ! -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then echo "r'${VIASH_PAR_DATASET_SUMMARY//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_description': $( if [ ! -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then echo "r'${VIASH_PAR_DATASET_DESCRIPTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_organism': $( if [ ! -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then echo "r'${VIASH_PAR_DATASET_ORGANISM//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_compression': $( if [ ! -z ${VIASH_PAR_OUTPUT_COMPRESSION+x} ]; then echo "r'${VIASH_PAR_OUTPUT_COMPRESSION//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+
+from setup_logger import setup_logger
+logger = setup_logger()
+
+def connect_census(uri, census_version):
+    """
+    Connect to CellxGene Census or user-provided TileDBSoma object
+    """
+    ver = census_version or "stable"
+    logger.info("Connecting to CellxGene Census at %s", f"'{uri}'" if uri else f"version '{ver}'")
+    return cellxgene_census.open_soma(uri=uri, census_version=ver)
+
+def get_anndata(census_connection, par):
+    logger.info("Getting gene expression data based on \`%s\` query.", par["obs_value_filter"])
+    # workaround for https://github.com/chanzuckerberg/cellxgene-census/issues/891
+    return cellxgene_census.get_anndata(
+        census=census_connection,
+        obs_value_filter=par["obs_value_filter"],
+        organism=par["species"]
+    )
+
+    # exp = census_connection["census_data"][par["species"]]
+    # query = exp.axis_query(
+    #     "RNA",
+    #     obs_query=soma.AxisQuery(value_filter=par["obs_value_filter"]),
+    #     var_query=soma.AxisQuery(),
+    # )
+
+    # n_obs = query.n_obs
+    # n_vars = query.n_vars
+    # logger.info(f"Query yields {n_obs} cells and {n_vars} genes.")
+
+    # logger.info("Fetching obs.")
+    # obs = query.obs().concat().to_pandas()
+
+    # logger.info("Fetching var.")
+    # var = query.var().concat().to_pandas()
+
+    # logger.info("Fetching X.")
+    # X = query.X("raw")
+    # Xcoo = X.coos().concat()
+    # Xcoos = Xcoo.to_scipy().tocsr()
+    # Xcoos_subset = Xcoos[obs["soma_joinid"]]
+
+    # logger.info("Creating AnnData object.")
+    # return sc.AnnData(
+    #     layers={"counts": Xcoos_subset},
+    #     obs=obs,
+    #     var=var
+    # )
+
+def filter_min_cells_per_group(adata, par):
+    n_cells_before, _ = adata.shape
+    cell_count = adata.obs \\
+        .groupby(par["cell_filter_grouping"])["soma_joinid"] \\
+        .transform("count") \\
+        
+    adata = adata[cell_count >= par["cell_filter_minimum_count"]]
+    n_cells_after, _ = adata.shape
+    logger.info(
+        "Removed %s cells based on %s cell_filter_minimum_count of %s cell_filter_grouping."
+        % ((n_cells_before - n_cells_after), par["cell_filter_minimum_count"], par["cell_filter_grouping"])
+    )
+    return adata
+
+def filter_by_counts(adata, par):
+    logger.info("Remove cells with few counts and genes with few counts.")
+    n_cells_before, n_genes_before = adata.shape
+    # remove cells with few counts and genes with few counts
+    scanpy_proc = {
+        par["cell_filter_min_counts"]: (sc.pp.filter_cells, "min_counts"),
+        par["cell_filter_min_genes"]: (sc.pp.filter_cells, "min_genes"),
+        par["gene_filter_min_counts"]: (sc.pp.filter_genes, "min_counts"),
+        par["gene_filter_min_cells"]: (sc.pp.filter_genes, "min_cells"),
+    }
+    for threshold, (func, arg) in scanpy_proc.items():
+        if threshold:
+            func(adata, **{arg: threshold})
+    n_cells_after, n_genes_after = adata.shape
+    logger.info("Removed %s cells and %s genes.", (n_cells_before - n_cells_after), (n_genes_before - n_genes_after))
+
+def move_x_to_layers(adata):
+    logger.info("Move .X to .layers['counts']")
+    adata.layers["counts"] = adata.X
+    adata.X = None
+
+def add_batch_to_obs(adata, par):
+    logger.info("Add batch to the AnnData object.")
+    if par["obs_batch"]:
+        # fetch batch columns from obs
+        cols = [adata.obs[key] for key in par["obs_batch"]]
+        
+        # join cols
+        obs_batch = [par["obs_batch_separator"].join(row) for row in zip(*cols)]
+
+        # store in adata
+        adata.obs["batch"] = obs_batch
+
+def add_metadata_to_uns(adata, par):
+    logger.info("Add metadata to the AnnData object.")
+    for key in ["dataset_id", "dataset_name", "dataset_url", "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism"]:
+        adata.uns[key] = par[key]
+
+def print_unique(adata, column):
+    formatted = "', '".join(adata.obs[column].unique())
+    logger.info(f"Unique {column}: ['{formatted}']")
+
+def print_summary(adata):
+    logger.info(f"Resulting dataset: {adata}")
+
+    logger.info("Summary of dataset:")
+    obs_fields = ["assay", "assay_ontology_term_id", "cell_type", "cell_type_ontology_term_id", "dataset_id", "development_stage", "development_stage_ontology_term_id", "disease", "disease_ontology_term_id", "tissue", "tissue_ontology_term_id", "tissue_general", "tissue_general_ontology_term_id"]
+    for field in obs_fields:
+        print_unique(adata, field)
+def write_anndata(adata, par):
+    logger.info("Writing AnnData object to '%s'", par["output"])
+
+    adata.write_h5ad(par["output"], compression=par["output_compression"])
+
+def main(par, meta):
+    # check arguments
+    if (par["cell_filter_grouping"] is None) != (par["cell_filter_minimum_count"] is None):
+        raise NotImplementedError(
+            "You need to specify either both or none of the following parameters: cell_filter_grouping, cell_filter_minimum_count"
+        )
+    
+    with connect_census(uri=par["input_uri"], census_version=par["census_version"]) as conn:
+        adata = get_anndata(conn, par)
+    
+    print(f"AnnData: {adata}", flush=True)
+
+    if par["cell_filter_grouping"] is not None:
+        adata = filter_min_cells_per_group(adata, par)
+
+    # remove cells with few counts and genes with few counts
+    filter_by_counts(adata, par)
+
+    # logger.log(f"Filtered AnnData: {adata}")
+    print(f"Filtered AnnData: {adata}", flush=True)
+
+    # use feature_id as var_names
+    adata.var_names = adata.var["feature_id"]
+
+    # not needed as long as we have our own implementation of \`get_anndata\`
+    # move .X to .layers["counts"]
+    move_x_to_layers(adata)
+
+    # add batch to obs
+    add_batch_to_obs(adata, par)
+
+    # add metadata to uns
+    add_metadata_to_uns(adata, par)
+
+    # print summary
+    print_summary(adata)
+
+    # write output to file
+    write_anndata(adata, par)
+
+
+if __name__ == "__main__":
+    main(par, meta)
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/loaders/cellxgene_census/setup_logger.py b/target/docker/datasets/loaders/cellxgene_census/setup_logger.py
new file mode 100644
index 0000000000..ae71eb9611
--- /dev/null
+++ b/target/docker/datasets/loaders/cellxgene_census/setup_logger.py
@@ -0,0 +1,12 @@
+def setup_logger():
+    import logging
+    from sys import stdout
+
+    logger = logging.getLogger()
+    logger.setLevel(logging.INFO)
+    console_handler = logging.StreamHandler(stdout)
+    logFormatter = logging.Formatter("%(asctime)s %(levelname)-8s %(message)s")
+    console_handler.setFormatter(logFormatter)
+    logger.addHandler(console_handler)
+
+    return logger
\ No newline at end of file
diff --git a/target/docker/datasets/loaders/cellxgene_census_from_source_h5ad/.config.vsh.yaml b/target/docker/datasets/loaders/cellxgene_census_from_source_h5ad/.config.vsh.yaml
new file mode 100644
index 0000000000..9208f791fe
--- /dev/null
+++ b/target/docker/datasets/loaders/cellxgene_census_from_source_h5ad/.config.vsh.yaml
@@ -0,0 +1,272 @@
+functionality:
+  name: "cellxgene_census_from_source_h5ad"
+  namespace: "datasets/loaders"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Input"
+    description: "Input arguments"
+    arguments:
+    - type: "string"
+      name: "--input_id"
+      description: "The dataset ID of the CellxGene Census dataset to query.\n"
+      info: null
+      example:
+      - "a93eab58-3d82-4b61-8a2f-d7666dcdb7c4"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Count filtering"
+    description: "Arguments related to filtering cells and genes by counts."
+    arguments:
+    - type: "integer"
+      name: "--cell_filter_min_genes"
+      description: "Remove cells with less than this number of genes."
+      info: null
+      default:
+      - 50
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--cell_filter_min_counts"
+      description: "Remove cells with less than this number of counts."
+      info: null
+      default:
+      - 0
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_cells"
+      description: "Remove genes expressed in less than this number of cells."
+      info: null
+      default:
+      - 5
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_counts"
+      description: "Remove genes with less than this number of counts."
+      info: null
+      default:
+      - 0
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Cell metadata"
+    description: "Cell metadata arguments"
+    arguments:
+    - type: "string"
+      name: "--obs_batch"
+      description: "Location of where to find the observation batch IDs.  \n\n* If\
+        \ not specified, the `.obs[\"batch\"]` field will not be included.\n* If one\
+        \ or more values are specified, the `.obs[\"batch\"]` field will be \n  set\
+        \ to the concatenated values of the specified fields, separated by\n  the\
+        \ `obs_batch_separator`.\n"
+      info: null
+      example:
+      - "batch"
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ","
+      dest: "par"
+    - type: "string"
+      name: "--obs_batch_separator"
+      description: "Separator to use when concatenating the values of the `--obs_batch`\
+        \ fields."
+      info: null
+      default:
+      - "+"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Dataset metadata"
+    description: "Information about the dataset that will be stored in the `.uns`\
+      \ slot."
+    arguments:
+    - type: "string"
+      name: "--dataset_id"
+      description: "Unique identifier of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    description: "Output arguments."
+    arguments:
+    - type: "file"
+      name: "--output"
+      description: "Output h5ad file."
+      info: null
+      example:
+      - "output.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--output_compression"
+      info: null
+      example:
+      - "gzip"
+      required: false
+      choices:
+      - "gzip"
+      - "lzf"
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/setup_logger.py"
+  description: "Query cells from a CellxGene Census or custom TileDBSoma object.\n\
+    Aside from fetching the cells' RNA counts (`.X`), cell metadata\n(`.obs`) and\
+    \ gene metadata (`.var`), this component also fetches\nthe dataset metadata and\
+    \ joins it into the cell metadata.\n"
+  test_resources:
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.11"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "cellxgene-census"
+    - "scanpy"
+    upgrade: true
+  test_setup:
+  - type: "python"
+    user: false
+    packages:
+    - "viashpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/cellxgene_census_from_source_h5ad/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/loaders/cellxgene_census_from_source_h5ad"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/loaders/cellxgene_census_from_source_h5ad/cellxgene_census_from_source_h5ad"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/loaders/cellxgene_census_from_source_h5ad/cellxgene_census_from_source_h5ad b/target/docker/datasets/loaders/cellxgene_census_from_source_h5ad/cellxgene_census_from_source_h5ad
new file mode 100755
index 0000000000..32846c3577
--- /dev/null
+++ b/target/docker/datasets/loaders/cellxgene_census_from_source_h5ad/cellxgene_census_from_source_h5ad
@@ -0,0 +1,1361 @@
+#!/usr/bin/env bash
+
+# cellxgene_census_from_source_h5ad 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="cellxgene_census_from_source_h5ad"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "cellxgene_census_from_source_h5ad 2.0.0"
+  echo ""
+  echo "Query cells from a CellxGene Census or custom TileDBSoma object."
+  echo "Aside from fetching the cells' RNA counts (\`.X\`), cell metadata"
+  echo "(\`.obs\`) and gene metadata (\`.var\`), this component also fetches"
+  echo "the dataset metadata and joins it into the cell metadata."
+  echo ""
+  echo "Input:"
+  echo "    Input arguments"
+  echo ""
+  echo "    --input_id"
+  echo "        type: string, required parameter"
+  echo "        example: a93eab58-3d82-4b61-8a2f-d7666dcdb7c4"
+  echo "        The dataset ID of the CellxGene Census dataset to query."
+  echo ""
+  echo "Count filtering:"
+  echo "    Arguments related to filtering cells and genes by counts."
+  echo ""
+  echo "    --cell_filter_min_genes"
+  echo "        type: integer"
+  echo "        default: 50"
+  echo "        Remove cells with less than this number of genes."
+  echo ""
+  echo "    --cell_filter_min_counts"
+  echo "        type: integer"
+  echo "        default: 0"
+  echo "        Remove cells with less than this number of counts."
+  echo ""
+  echo "    --gene_filter_min_cells"
+  echo "        type: integer"
+  echo "        default: 5"
+  echo "        Remove genes expressed in less than this number of cells."
+  echo ""
+  echo "    --gene_filter_min_counts"
+  echo "        type: integer"
+  echo "        default: 0"
+  echo "        Remove genes with less than this number of counts."
+  echo ""
+  echo "Cell metadata:"
+  echo "    Cell metadata arguments"
+  echo ""
+  echo "    --obs_batch"
+  echo "        type: string, multiple values allowed"
+  echo "        example: batch"
+  echo "        Location of where to find the observation batch IDs."
+  echo "        * If not specified, the \`.obs[\"batch\"]\` field will not be included."
+  echo "        * If one or more values are specified, the \`.obs[\"batch\"]\` field will be"
+  echo "          set to the concatenated values of the specified fields, separated by"
+  echo "          the \`obs_batch_separator\`."
+  echo ""
+  echo "    --obs_batch_separator"
+  echo "        type: string"
+  echo "        default: +"
+  echo "        Separator to use when concatenating the values of the \`--obs_batch\`"
+  echo "        fields."
+  echo ""
+  echo "Dataset metadata:"
+  echo "    Information about the dataset that will be stored in the \`.uns\` slot."
+  echo ""
+  echo "    --dataset_id"
+  echo "        type: string, required parameter"
+  echo "        Unique identifier of the dataset."
+  echo ""
+  echo "    --dataset_name"
+  echo "        type: string, required parameter"
+  echo "        Nicely formatted name."
+  echo ""
+  echo "    --dataset_url"
+  echo "        type: string"
+  echo "        Link to the original source of the dataset."
+  echo ""
+  echo "    --dataset_reference"
+  echo "        type: string"
+  echo "        Bibtex reference of the paper in which the dataset was published."
+  echo ""
+  echo "    --dataset_summary"
+  echo "        type: string, required parameter"
+  echo "        Short description of the dataset."
+  echo ""
+  echo "    --dataset_description"
+  echo "        type: string, required parameter"
+  echo "        Long description of the dataset."
+  echo ""
+  echo "    --dataset_organism"
+  echo "        type: string, required parameter"
+  echo "        The organism of the dataset."
+  echo ""
+  echo "Outputs:"
+  echo "    Output arguments."
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: output.h5ad"
+  echo "        Output h5ad file."
+  echo ""
+  echo "    --output_compression"
+  echo "        type: string"
+  echo "        example: gzip"
+  echo "        choices: [ gzip, lzf ]"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM python:3.11
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "cellxgene-census" "scanpy"
+
+LABEL org.opencontainers.image.description="Companion container for running component datasets/loaders cellxgene_census_from_source_h5ad"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:41Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-cellxgene_census_from_source_h5ad-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "cellxgene_census_from_source_h5ad 2.0.0"
+            exit
+            ;;
+        --input_id)
+            [ -n "$VIASH_PAR_INPUT_ID" ] && ViashError Bad arguments for option \'--input_id\': \'$VIASH_PAR_INPUT_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_id=*)
+            [ -n "$VIASH_PAR_INPUT_ID" ] && ViashError Bad arguments for option \'--input_id=*\': \'$VIASH_PAR_INPUT_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --cell_filter_min_genes)
+            [ -n "$VIASH_PAR_CELL_FILTER_MIN_GENES" ] && ViashError Bad arguments for option \'--cell_filter_min_genes\': \'$VIASH_PAR_CELL_FILTER_MIN_GENES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CELL_FILTER_MIN_GENES="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --cell_filter_min_genes. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --cell_filter_min_genes=*)
+            [ -n "$VIASH_PAR_CELL_FILTER_MIN_GENES" ] && ViashError Bad arguments for option \'--cell_filter_min_genes=*\': \'$VIASH_PAR_CELL_FILTER_MIN_GENES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CELL_FILTER_MIN_GENES=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --cell_filter_min_counts)
+            [ -n "$VIASH_PAR_CELL_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--cell_filter_min_counts\': \'$VIASH_PAR_CELL_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CELL_FILTER_MIN_COUNTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --cell_filter_min_counts. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --cell_filter_min_counts=*)
+            [ -n "$VIASH_PAR_CELL_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--cell_filter_min_counts=*\': \'$VIASH_PAR_CELL_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CELL_FILTER_MIN_COUNTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --gene_filter_min_cells)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_CELLS" ] && ViashError Bad arguments for option \'--gene_filter_min_cells\': \'$VIASH_PAR_GENE_FILTER_MIN_CELLS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_CELLS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --gene_filter_min_cells. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --gene_filter_min_cells=*)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_CELLS" ] && ViashError Bad arguments for option \'--gene_filter_min_cells=*\': \'$VIASH_PAR_GENE_FILTER_MIN_CELLS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_CELLS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --gene_filter_min_counts)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--gene_filter_min_counts\': \'$VIASH_PAR_GENE_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_COUNTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --gene_filter_min_counts. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --gene_filter_min_counts=*)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--gene_filter_min_counts=*\': \'$VIASH_PAR_GENE_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_COUNTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obs_batch)
+            if [ -z "$VIASH_PAR_OBS_BATCH" ]; then
+              VIASH_PAR_OBS_BATCH="$2"
+            else
+              VIASH_PAR_OBS_BATCH="$VIASH_PAR_OBS_BATCH,""$2"
+            fi
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_batch. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_batch=*)
+            if [ -z "$VIASH_PAR_OBS_BATCH" ]; then
+              VIASH_PAR_OBS_BATCH=$(ViashRemoveFlags "$1")
+            else
+              VIASH_PAR_OBS_BATCH="$VIASH_PAR_OBS_BATCH,"$(ViashRemoveFlags "$1")
+            fi
+            shift 1
+            ;;
+        --obs_batch_separator)
+            [ -n "$VIASH_PAR_OBS_BATCH_SEPARATOR" ] && ViashError Bad arguments for option \'--obs_batch_separator\': \'$VIASH_PAR_OBS_BATCH_SEPARATOR\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_BATCH_SEPARATOR="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_batch_separator. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_batch_separator=*)
+            [ -n "$VIASH_PAR_OBS_BATCH_SEPARATOR" ] && ViashError Bad arguments for option \'--obs_batch_separator=*\': \'$VIASH_PAR_OBS_BATCH_SEPARATOR\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_BATCH_SEPARATOR=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_id)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_id=*)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id=*\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_name)
+            [ -n "$VIASH_PAR_DATASET_NAME" ] && ViashError Bad arguments for option \'--dataset_name\': \'$VIASH_PAR_DATASET_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_NAME="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_name. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_name=*)
+            [ -n "$VIASH_PAR_DATASET_NAME" ] && ViashError Bad arguments for option \'--dataset_name=*\': \'$VIASH_PAR_DATASET_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_NAME=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_url)
+            [ -n "$VIASH_PAR_DATASET_URL" ] && ViashError Bad arguments for option \'--dataset_url\': \'$VIASH_PAR_DATASET_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_URL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_url. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_url=*)
+            [ -n "$VIASH_PAR_DATASET_URL" ] && ViashError Bad arguments for option \'--dataset_url=*\': \'$VIASH_PAR_DATASET_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_URL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_reference)
+            [ -n "$VIASH_PAR_DATASET_REFERENCE" ] && ViashError Bad arguments for option \'--dataset_reference\': \'$VIASH_PAR_DATASET_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_REFERENCE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_reference. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_reference=*)
+            [ -n "$VIASH_PAR_DATASET_REFERENCE" ] && ViashError Bad arguments for option \'--dataset_reference=*\': \'$VIASH_PAR_DATASET_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_REFERENCE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_summary)
+            [ -n "$VIASH_PAR_DATASET_SUMMARY" ] && ViashError Bad arguments for option \'--dataset_summary\': \'$VIASH_PAR_DATASET_SUMMARY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_SUMMARY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_summary. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_summary=*)
+            [ -n "$VIASH_PAR_DATASET_SUMMARY" ] && ViashError Bad arguments for option \'--dataset_summary=*\': \'$VIASH_PAR_DATASET_SUMMARY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_SUMMARY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_description)
+            [ -n "$VIASH_PAR_DATASET_DESCRIPTION" ] && ViashError Bad arguments for option \'--dataset_description\': \'$VIASH_PAR_DATASET_DESCRIPTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_DESCRIPTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_description. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_description=*)
+            [ -n "$VIASH_PAR_DATASET_DESCRIPTION" ] && ViashError Bad arguments for option \'--dataset_description=*\': \'$VIASH_PAR_DATASET_DESCRIPTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_DESCRIPTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_organism)
+            [ -n "$VIASH_PAR_DATASET_ORGANISM" ] && ViashError Bad arguments for option \'--dataset_organism\': \'$VIASH_PAR_DATASET_ORGANISM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ORGANISM="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_organism. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_organism=*)
+            [ -n "$VIASH_PAR_DATASET_ORGANISM" ] && ViashError Bad arguments for option \'--dataset_organism=*\': \'$VIASH_PAR_DATASET_ORGANISM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ORGANISM=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_compression)
+            [ -n "$VIASH_PAR_OUTPUT_COMPRESSION" ] && ViashError Bad arguments for option \'--output_compression\': \'$VIASH_PAR_OUTPUT_COMPRESSION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_COMPRESSION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_compression. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_compression=*)
+            [ -n "$VIASH_PAR_OUTPUT_COMPRESSION" ] && ViashError Bad arguments for option \'--output_compression=*\': \'$VIASH_PAR_OUTPUT_COMPRESSION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_COMPRESSION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/cellxgene_census_from_source_h5ad:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/cellxgene_census_from_source_h5ad:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/cellxgene_census_from_source_h5ad:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/cellxgene_census_from_source_h5ad:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_ID+x} ]; then
+  ViashError '--input_id' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_ID+x} ]; then
+  ViashError '--dataset_id' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_NAME+x} ]; then
+  ViashError '--dataset_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then
+  ViashError '--dataset_summary' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then
+  ViashError '--dataset_description' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then
+  ViashError '--dataset_organism' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_CELL_FILTER_MIN_GENES+x} ]; then
+  VIASH_PAR_CELL_FILTER_MIN_GENES="50"
+fi
+if [ -z ${VIASH_PAR_CELL_FILTER_MIN_COUNTS+x} ]; then
+  VIASH_PAR_CELL_FILTER_MIN_COUNTS="0"
+fi
+if [ -z ${VIASH_PAR_GENE_FILTER_MIN_CELLS+x} ]; then
+  VIASH_PAR_GENE_FILTER_MIN_CELLS="5"
+fi
+if [ -z ${VIASH_PAR_GENE_FILTER_MIN_COUNTS+x} ]; then
+  VIASH_PAR_GENE_FILTER_MIN_COUNTS="0"
+fi
+if [ -z ${VIASH_PAR_OBS_BATCH_SEPARATOR+x} ]; then
+  VIASH_PAR_OBS_BATCH_SEPARATOR="+"
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_CELL_FILTER_MIN_GENES" ]]; then
+  if ! [[ "$VIASH_PAR_CELL_FILTER_MIN_GENES" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--cell_filter_min_genes' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_CELL_FILTER_MIN_COUNTS" ]]; then
+  if ! [[ "$VIASH_PAR_CELL_FILTER_MIN_COUNTS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--cell_filter_min_counts' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_GENE_FILTER_MIN_CELLS" ]]; then
+  if ! [[ "$VIASH_PAR_GENE_FILTER_MIN_CELLS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--gene_filter_min_cells' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" ]]; then
+  if ! [[ "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--gene_filter_min_counts' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# check whether value is belongs to a set of choices
+if [ ! -z "$VIASH_PAR_OUTPUT_COMPRESSION" ]; then
+  VIASH_PAR_OUTPUT_COMPRESSION_CHOICES=("gzip:lzf")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_OUTPUT_COMPRESSION_CHOICES[*]}:" =~ ":$VIASH_PAR_OUTPUT_COMPRESSION:" ]]; then
+    ViashError '--output_compression' specified value of \'$VIASH_PAR_OUTPUT_COMPRESSION\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/loaders/cellxgene_census_from_source_h5ad:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/loaders/cellxgene_census_from_source_h5ad:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/loaders/cellxgene_census_from_source_h5ad:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-cellxgene_census_from_source_h5ad-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import cellxgene_census
+import scanpy as sc
+import tempfile
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_id': $( if [ ! -z ${VIASH_PAR_INPUT_ID+x} ]; then echo "r'${VIASH_PAR_INPUT_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cell_filter_min_genes': $( if [ ! -z ${VIASH_PAR_CELL_FILTER_MIN_GENES+x} ]; then echo "int(r'${VIASH_PAR_CELL_FILTER_MIN_GENES//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'cell_filter_min_counts': $( if [ ! -z ${VIASH_PAR_CELL_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_CELL_FILTER_MIN_COUNTS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'gene_filter_min_cells': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_CELLS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_CELLS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'gene_filter_min_counts': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_COUNTS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'obs_batch': $( if [ ! -z ${VIASH_PAR_OBS_BATCH+x} ]; then echo "r'${VIASH_PAR_OBS_BATCH//\'/\'\"\'\"r\'}'.split(',')"; else echo None; fi ),
+  'obs_batch_separator': $( if [ ! -z ${VIASH_PAR_OBS_BATCH_SEPARATOR+x} ]; then echo "r'${VIASH_PAR_OBS_BATCH_SEPARATOR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_id': $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo "r'${VIASH_PAR_DATASET_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_name': $( if [ ! -z ${VIASH_PAR_DATASET_NAME+x} ]; then echo "r'${VIASH_PAR_DATASET_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_url': $( if [ ! -z ${VIASH_PAR_DATASET_URL+x} ]; then echo "r'${VIASH_PAR_DATASET_URL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_reference': $( if [ ! -z ${VIASH_PAR_DATASET_REFERENCE+x} ]; then echo "r'${VIASH_PAR_DATASET_REFERENCE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_summary': $( if [ ! -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then echo "r'${VIASH_PAR_DATASET_SUMMARY//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_description': $( if [ ! -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then echo "r'${VIASH_PAR_DATASET_DESCRIPTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_organism': $( if [ ! -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then echo "r'${VIASH_PAR_DATASET_ORGANISM//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_compression': $( if [ ! -z ${VIASH_PAR_OUTPUT_COMPRESSION+x} ]; then echo "r'${VIASH_PAR_OUTPUT_COMPRESSION//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+
+from setup_logger import setup_logger
+logger = setup_logger()
+
+def get_anndata(par):
+    with tempfile.TemporaryDirectory() as tmp:
+        path = tmp + "/source.h5ad"
+        logger.info("Downloading source h5ad for dataset '%s' to '%s'.", par["input_id"], path)
+        cellxgene_census.download_source_h5ad(par["input_id"], path)
+        return sc.read_h5ad(path)
+
+def filter_by_counts(adata, par):
+    logger.info("Remove cells with few counts and genes with few counts.")
+    t0 = adata.shape
+    # remove cells with few counts and genes with few counts
+    if par["cell_filter_min_counts"]:
+        sc.pp.filter_cells(adata, min_counts=par["cell_filter_min_counts"])
+    if par["cell_filter_min_genes"]:
+        sc.pp.filter_cells(adata, min_genes=par["cell_filter_min_genes"])
+    if par["gene_filter_min_counts"]:
+        sc.pp.filter_genes(adata, min_counts=par["gene_filter_min_counts"])
+    if par["gene_filter_min_cells"]:
+        sc.pp.filter_genes(adata, min_cells=par["gene_filter_min_cells"])
+    t1 = adata.shape
+    logger.info("Removed %s cells and %s genes.", (t0[0] - t1[0]), (t0[1] - t1[1]))
+
+def move_x_to_layers(adata):
+    logger.info("Move .X to .layers['counts']")
+    adata.layers["counts"] = adata.X
+    adata.X = None
+
+def add_batch_to_obs(adata, par):
+    logger.info("Add batch to the AnnData object.")
+    if par["obs_batch"]:
+        # fetch batch columns from obs
+        cols = [adata.obs[key] for key in par["obs_batch"]]
+        
+        # join cols
+        obs_batch = [par["obs_batch_separator"].join(row) for row in zip(*cols)]
+
+        # store in adata
+        adata.obs["batch"] = obs_batch
+
+def add_metadata_to_uns(adata, par):
+    logger.info("Add metadata to the AnnData object.")
+    for key in ["dataset_id", "dataset_name", "dataset_url", "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism"]:
+        adata.uns[key] = par[key]
+
+def print_unique(adata, column):
+    if column not in adata.obs.columns:
+        logger.info(f"Column {column} not found in obs")
+        return
+    formatted = "', '".join(adata.obs[column].unique())
+    logger.info(f"Unique {column}: ['{formatted}']")
+
+def print_summary(adata):
+    logger.info(f"Resulting dataset: {adata}")
+
+    logger.info("Summary of dataset:")
+    print_unique(adata, "assay")
+    print_unique(adata, "assay_ontology_term_id")
+    print_unique(adata, "cell_type")
+    print_unique(adata, "cell_type_ontology_term_id")
+    print_unique(adata, "dataset_id")
+    print_unique(adata, "development_stage")
+    print_unique(adata, "development_stage_ontology_term_id")
+    print_unique(adata, "disease")
+    print_unique(adata, "disease_ontology_term_id")
+    print_unique(adata, "tissue")
+    print_unique(adata, "tissue_ontology_term_id")
+    print_unique(adata, "tissue_general")
+    print_unique(adata, "tissue_general_ontology_term_id")
+
+def write_anndata(adata, par):
+    logger.info("Writing AnnData object to '%s'", par["output"])
+
+    adata.write_h5ad(par["output"], compression=par["output_compression"])
+
+def main(par, meta):
+    adata = get_anndata(par)
+
+    logger.info("AnnData: %s", str(adata))
+
+    # remove cells with few counts and genes with few counts
+    filter_by_counts(adata, par)
+
+    # this is not needed in source h5ads
+    # # use feature_id as var_names
+    # adata.var_names = adata.var["feature_id"]
+
+    # move .X to .layers["counts"]
+    move_x_to_layers(adata)
+
+    # add batch to obs
+    add_batch_to_obs(adata, par)
+
+    # add metadata to uns
+    add_metadata_to_uns(adata, par)
+
+    # print summary
+    print_summary(adata)
+
+    # write output to file
+    write_anndata(adata, par)
+
+
+if __name__ == "__main__":
+    main(par, meta)
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/loaders/cellxgene_census_from_source_h5ad/setup_logger.py b/target/docker/datasets/loaders/cellxgene_census_from_source_h5ad/setup_logger.py
new file mode 100644
index 0000000000..ae71eb9611
--- /dev/null
+++ b/target/docker/datasets/loaders/cellxgene_census_from_source_h5ad/setup_logger.py
@@ -0,0 +1,12 @@
+def setup_logger():
+    import logging
+    from sys import stdout
+
+    logger = logging.getLogger()
+    logger.setLevel(logging.INFO)
+    console_handler = logging.StreamHandler(stdout)
+    logFormatter = logging.Formatter("%(asctime)s %(levelname)-8s %(message)s")
+    console_handler.setFormatter(logFormatter)
+    logger.addHandler(console_handler)
+
+    return logger
\ No newline at end of file
diff --git a/target/docker/datasets/loaders/openproblems_neurips2021_bmmc/.config.vsh.yaml b/target/docker/datasets/loaders/openproblems_neurips2021_bmmc/.config.vsh.yaml
new file mode 100644
index 0000000000..a8a0cf46c3
--- /dev/null
+++ b/target/docker/datasets/loaders/openproblems_neurips2021_bmmc/.config.vsh.yaml
@@ -0,0 +1,592 @@
+functionality:
+  name: "openproblems_neurips2021_bmmc"
+  namespace: "datasets/loaders"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input"
+      description: "Processed h5ad file published at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122."
+      info: null
+      example:
+      - "GSE194122_openproblems_neurips2021_cite_BMMC_processed.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--mod1"
+      description: "Name of the first modality."
+      info: null
+      example:
+      - "GEX"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--mod2"
+      description: "Name of the second modality."
+      info: null
+      example:
+      - "ADT"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--dataset_id"
+      description: "A unique identifier for the dataset"
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_mod1"
+      info:
+        label: "Raw dataset"
+        summary: "An unprocessed dataset as output by a dataset loader."
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+      example:
+      - "resources_test/common/pancreas/raw.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_mod2"
+      info:
+        label: "Raw dataset"
+        summary: "An unprocessed dataset as output by a dataset loader."
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+      example:
+      - "resources_test/common/pancreas/raw.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Fetch a dataset from the OpenProblems NeurIPS2021 competition"
+  test_resources:
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_neurips2021_bmmc/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/loaders/openproblems_neurips2021_bmmc"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/loaders/openproblems_neurips2021_bmmc/openproblems_neurips2021_bmmc"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/loaders/openproblems_neurips2021_bmmc/openproblems_neurips2021_bmmc b/target/docker/datasets/loaders/openproblems_neurips2021_bmmc/openproblems_neurips2021_bmmc
new file mode 100755
index 0000000000..c3c7b9016e
--- /dev/null
+++ b/target/docker/datasets/loaders/openproblems_neurips2021_bmmc/openproblems_neurips2021_bmmc
@@ -0,0 +1,1225 @@
+#!/usr/bin/env bash
+
+# openproblems_neurips2021_bmmc 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="openproblems_neurips2021_bmmc"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "openproblems_neurips2021_bmmc 2.0.0"
+  echo ""
+  echo "Fetch a dataset from the OpenProblems NeurIPS2021 competition"
+  echo ""
+  echo "Inputs:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: GSE194122_openproblems_neurips2021_cite_BMMC_processed.h5ad"
+  echo "        Processed h5ad file published at"
+  echo "        https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122."
+  echo ""
+  echo "    --mod1"
+  echo "        type: string, required parameter"
+  echo "        example: GEX"
+  echo "        Name of the first modality."
+  echo ""
+  echo "    --mod2"
+  echo "        type: string, required parameter"
+  echo "        example: ADT"
+  echo "        Name of the second modality."
+  echo ""
+  echo "Metadata:"
+  echo "    --dataset_id"
+  echo "        type: string, required parameter"
+  echo "        A unique identifier for the dataset"
+  echo ""
+  echo "    --dataset_name"
+  echo "        type: string, required parameter"
+  echo "        Nicely formatted name."
+  echo ""
+  echo "    --dataset_url"
+  echo "        type: string"
+  echo "        Link to the original source of the dataset."
+  echo ""
+  echo "    --dataset_reference"
+  echo "        type: string"
+  echo "        Bibtex reference of the paper in which the dataset was published."
+  echo ""
+  echo "    --dataset_summary"
+  echo "        type: string, required parameter"
+  echo "        Short description of the dataset."
+  echo ""
+  echo "    --dataset_description"
+  echo "        type: string, required parameter"
+  echo "        Long description of the dataset."
+  echo ""
+  echo "    --dataset_organism"
+  echo "        type: string"
+  echo "        The organism of the dataset."
+  echo ""
+  echo "Outputs:"
+  echo "    --output_mod1"
+  echo "        type: file, output, file must exist"
+  echo "        example: resources_test/common/pancreas/raw.h5ad"
+  echo ""
+  echo "    --output_mod2"
+  echo "        type: file, output, file must exist"
+  echo "        example: resources_test/common/pancreas/raw.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component datasets/loaders openproblems_neurips2021_bmmc"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:41Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-openproblems_neurips2021_bmmc-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "openproblems_neurips2021_bmmc 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --mod1)
+            [ -n "$VIASH_PAR_MOD1" ] && ViashError Bad arguments for option \'--mod1\': \'$VIASH_PAR_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --mod1=*)
+            [ -n "$VIASH_PAR_MOD1" ] && ViashError Bad arguments for option \'--mod1=*\': \'$VIASH_PAR_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --mod2)
+            [ -n "$VIASH_PAR_MOD2" ] && ViashError Bad arguments for option \'--mod2\': \'$VIASH_PAR_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --mod2=*)
+            [ -n "$VIASH_PAR_MOD2" ] && ViashError Bad arguments for option \'--mod2=*\': \'$VIASH_PAR_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_id)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_id=*)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id=*\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_name)
+            [ -n "$VIASH_PAR_DATASET_NAME" ] && ViashError Bad arguments for option \'--dataset_name\': \'$VIASH_PAR_DATASET_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_NAME="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_name. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_name=*)
+            [ -n "$VIASH_PAR_DATASET_NAME" ] && ViashError Bad arguments for option \'--dataset_name=*\': \'$VIASH_PAR_DATASET_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_NAME=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_url)
+            [ -n "$VIASH_PAR_DATASET_URL" ] && ViashError Bad arguments for option \'--dataset_url\': \'$VIASH_PAR_DATASET_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_URL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_url. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_url=*)
+            [ -n "$VIASH_PAR_DATASET_URL" ] && ViashError Bad arguments for option \'--dataset_url=*\': \'$VIASH_PAR_DATASET_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_URL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_reference)
+            [ -n "$VIASH_PAR_DATASET_REFERENCE" ] && ViashError Bad arguments for option \'--dataset_reference\': \'$VIASH_PAR_DATASET_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_REFERENCE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_reference. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_reference=*)
+            [ -n "$VIASH_PAR_DATASET_REFERENCE" ] && ViashError Bad arguments for option \'--dataset_reference=*\': \'$VIASH_PAR_DATASET_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_REFERENCE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_summary)
+            [ -n "$VIASH_PAR_DATASET_SUMMARY" ] && ViashError Bad arguments for option \'--dataset_summary\': \'$VIASH_PAR_DATASET_SUMMARY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_SUMMARY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_summary. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_summary=*)
+            [ -n "$VIASH_PAR_DATASET_SUMMARY" ] && ViashError Bad arguments for option \'--dataset_summary=*\': \'$VIASH_PAR_DATASET_SUMMARY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_SUMMARY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_description)
+            [ -n "$VIASH_PAR_DATASET_DESCRIPTION" ] && ViashError Bad arguments for option \'--dataset_description\': \'$VIASH_PAR_DATASET_DESCRIPTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_DESCRIPTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_description. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_description=*)
+            [ -n "$VIASH_PAR_DATASET_DESCRIPTION" ] && ViashError Bad arguments for option \'--dataset_description=*\': \'$VIASH_PAR_DATASET_DESCRIPTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_DESCRIPTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_organism)
+            [ -n "$VIASH_PAR_DATASET_ORGANISM" ] && ViashError Bad arguments for option \'--dataset_organism\': \'$VIASH_PAR_DATASET_ORGANISM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ORGANISM="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_organism. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_organism=*)
+            [ -n "$VIASH_PAR_DATASET_ORGANISM" ] && ViashError Bad arguments for option \'--dataset_organism=*\': \'$VIASH_PAR_DATASET_ORGANISM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ORGANISM=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod1)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod1=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1=*\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod2)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod2=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2=*\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_neurips2021_bmmc:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_neurips2021_bmmc:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_neurips2021_bmmc:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_neurips2021_bmmc:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_MOD1+x} ]; then
+  ViashError '--mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_MOD2+x} ]; then
+  ViashError '--mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_ID+x} ]; then
+  ViashError '--dataset_id' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_NAME+x} ]; then
+  ViashError '--dataset_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then
+  ViashError '--dataset_summary' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then
+  ViashError '--dataset_description' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD1")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD1")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD2")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD2")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD1")" )
+  VIASH_PAR_OUTPUT_MOD1=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD1")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD1" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD2")" )
+  VIASH_PAR_OUTPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD2")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD2" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_neurips2021_bmmc:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_neurips2021_bmmc:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_neurips2021_bmmc:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-openproblems_neurips2021_bmmc-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import pandas as pd
+import numpy as np
+from scipy import sparse
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'mod1': $( if [ ! -z ${VIASH_PAR_MOD1+x} ]; then echo "r'${VIASH_PAR_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'mod2': $( if [ ! -z ${VIASH_PAR_MOD2+x} ]; then echo "r'${VIASH_PAR_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_id': $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo "r'${VIASH_PAR_DATASET_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_name': $( if [ ! -z ${VIASH_PAR_DATASET_NAME+x} ]; then echo "r'${VIASH_PAR_DATASET_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_url': $( if [ ! -z ${VIASH_PAR_DATASET_URL+x} ]; then echo "r'${VIASH_PAR_DATASET_URL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_reference': $( if [ ! -z ${VIASH_PAR_DATASET_REFERENCE+x} ]; then echo "r'${VIASH_PAR_DATASET_REFERENCE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_summary': $( if [ ! -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then echo "r'${VIASH_PAR_DATASET_SUMMARY//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_description': $( if [ ! -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then echo "r'${VIASH_PAR_DATASET_DESCRIPTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_organism': $( if [ ! -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then echo "r'${VIASH_PAR_DATASET_ORGANISM//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+def remove_mod_col(df, mod):
+  df.drop(list(df.filter(like=mod)), axis=1, inplace=True)
+
+def remove_mod_prefix(df, mod):
+  suffix = f"{mod}_"
+  df.columns = df.columns.str.removeprefix(suffix)
+
+def convert_matrix(adata):
+  for key in adata:
+      if isinstance(adata[key], sparse.csr_matrix):
+        adata[key] = sparse.csc_matrix(adata[key])
+      
+
+print("load dataset file", flush=True)
+adata = ad.read_h5ad(par["input"])
+
+# Convert to sparse csc_matrix
+convert_matrix(adata.layers)
+convert_matrix(adata.obsm)
+
+# Add is_train to obs if it is missing
+if "is_train" not in adata.obs.columns:
+  batch_info = adata.obs["batch"]
+  batch_categories = batch_info.dtype.categories
+  # From https://github.com/openproblems-bio/neurips2021_multimodal_viash/blob/75281c039ab98b459edbf52058a18597e710ed4d/src/common/datasets/process_inhouse_datasets/script.R#L14-L17
+  train = ["s1d1", "s1d2", "s2d1", "s2d4", "s3d1", "s3d6", "s3d7"]
+  adata.obs["is_train"] = [ "train" if x in train else "test" for x in batch_info ]
+
+# Construct Modality datasets
+print("Construct Mod datasets", flush=True)
+mask_mod1 = adata.var['feature_types'] == par["mod1"]
+mask_mod2 = adata.var['feature_types'] == par["mod2"]
+
+adata_mod1 = adata[:, mask_mod1]
+adata_mod2 = adata[:, mask_mod2]
+
+# Remove other modality data from obs and var
+mod1_var = pd.DataFrame(adata_mod1.var)
+remove_mod_col(mod1_var, par["mod2"])
+remove_mod_prefix(mod1_var, par["mod1"])
+mod1_var.index.name = "feature_name"
+mod1_var.reset_index("feature_name", inplace=True)
+mod1_var["feature_id"] = np.where(mod1_var.gene_id.isna(), mod1_var.feature_name, mod1_var.gene_id.astype(str))
+mod1_var.drop("gene_id", axis=1, inplace=True)
+mod1_var.set_index("feature_id", drop=False, inplace=True)
+
+mod1_obs = pd.DataFrame(adata_mod1.obs)
+remove_mod_col(mod1_obs, par["mod2"])
+remove_mod_prefix(mod1_obs, par["mod1"])
+
+adata_mod1.var = mod1_var
+adata_mod1.obs = mod1_obs
+
+adata_mod1.uns = { key.replace(f"{par['mod1']}_", ""): value for key, value in adata.uns.items() if not key.startswith(par['mod2'])}
+del adata_mod1.obsm
+del adata_mod1.X
+
+mod2_var = pd.DataFrame(adata_mod2.var)
+remove_mod_col(mod2_var, par["mod1"])
+remove_mod_prefix(mod2_var, par["mod2"])
+mod2_var.index.name = "feature_name"
+mod2_var.reset_index("feature_name", inplace=True)
+mod2_var["feature_id"] = np.where(mod2_var.gene_id.isna(), mod2_var.feature_name, mod2_var.gene_id.astype(str))
+mod2_var.drop("gene_id", axis=1, inplace=True)
+mod2_var.set_index("feature_id", drop=False, inplace=True)
+
+mod2_obs = pd.DataFrame(adata_mod2.obs)
+remove_mod_col(mod2_obs, par["mod1"])
+remove_mod_prefix(mod2_obs, par["mod2"])
+
+adata_mod2.var = mod2_var
+adata_mod2.obs = mod2_obs
+
+adata_mod2.uns = { key.replace(f"{par['mod2']}_", ""): value for key, value in adata.uns.items() if not key.startswith(par['mod1'])}
+if par["mod2"] == "ATAC":
+  adata_mod2.obsm = { key.replace(f"{par['mod2']}_", ""): value for key, value in adata_mod2.obsm.items() if key.startswith(par['mod2'])}
+else:
+  del adata_mod2.obsm
+
+
+del adata_mod2.X
+
+print("Add metadata to uns", flush=True)
+metadata_fields = [
+    "dataset_id", "dataset_name", "dataset_url", "dataset_reference",
+    "dataset_summary", "dataset_description", "dataset_organism"
+]
+for key in metadata_fields:
+    if key in par:
+        print(f"  Setting .uns['{key}']", flush=True)
+        adata_mod1.uns[key] = par[key]
+        adata_mod2.uns[key] = par[key]
+
+print("Writing adata to file", flush=True)
+adata_mod1.write_h5ad(par["output_mod1"], compression="gzip")
+adata_mod2.write_h5ad(par["output_mod2"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_PAR_OUTPUT_MOD1=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_PAR_OUTPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD2")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD2' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/loaders/openproblems_neurips2022_pbmc/.config.vsh.yaml b/target/docker/datasets/loaders/openproblems_neurips2022_pbmc/.config.vsh.yaml
new file mode 100644
index 0000000000..f6764482d7
--- /dev/null
+++ b/target/docker/datasets/loaders/openproblems_neurips2022_pbmc/.config.vsh.yaml
@@ -0,0 +1,601 @@
+functionality:
+  name: "openproblems_neurips2022_pbmc"
+  namespace: "datasets/loaders"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input_mod1"
+      description: "Processed RNA h5ad file"
+      info: null
+      example:
+      - "cite_rna_merged.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_mod2"
+      description: "Processed ADT or ATAC h5ad file"
+      info: null
+      example:
+      - "cite_prot_merged.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--mod1"
+      description: "Name of the first modality."
+      info: null
+      example:
+      - "GEX"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--mod2"
+      description: "Name of the second modality."
+      info: null
+      example:
+      - "ADT"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--dataset_id"
+      description: "A unique identifier for the dataset"
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_mod1"
+      info:
+        label: "Raw dataset"
+        summary: "An unprocessed dataset as output by a dataset loader."
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+      example:
+      - "resources_test/common/pancreas/raw.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_mod2"
+      info:
+        label: "Raw dataset"
+        summary: "An unprocessed dataset as output by a dataset loader."
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+      example:
+      - "resources_test/common/pancreas/raw.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Fetch a dataset from the OpenProblems NeurIPS2022 competition"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_neurips2022_pbmc/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/loaders/openproblems_neurips2022_pbmc"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/loaders/openproblems_neurips2022_pbmc/openproblems_neurips2022_pbmc"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/loaders/openproblems_neurips2022_pbmc/openproblems_neurips2022_pbmc b/target/docker/datasets/loaders/openproblems_neurips2022_pbmc/openproblems_neurips2022_pbmc
new file mode 100755
index 0000000000..a5e177ddf2
--- /dev/null
+++ b/target/docker/datasets/loaders/openproblems_neurips2022_pbmc/openproblems_neurips2022_pbmc
@@ -0,0 +1,1224 @@
+#!/usr/bin/env bash
+
+# openproblems_neurips2022_pbmc 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="openproblems_neurips2022_pbmc"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "openproblems_neurips2022_pbmc 2.0.0"
+  echo ""
+  echo "Fetch a dataset from the OpenProblems NeurIPS2022 competition"
+  echo ""
+  echo "Inputs:"
+  echo "    --input_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: cite_rna_merged.h5ad"
+  echo "        Processed RNA h5ad file"
+  echo ""
+  echo "    --input_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: cite_prot_merged.h5ad"
+  echo "        Processed ADT or ATAC h5ad file"
+  echo ""
+  echo "    --mod1"
+  echo "        type: string, required parameter"
+  echo "        example: GEX"
+  echo "        Name of the first modality."
+  echo ""
+  echo "    --mod2"
+  echo "        type: string, required parameter"
+  echo "        example: ADT"
+  echo "        Name of the second modality."
+  echo ""
+  echo "Metadata:"
+  echo "    --dataset_id"
+  echo "        type: string, required parameter"
+  echo "        A unique identifier for the dataset"
+  echo ""
+  echo "    --dataset_name"
+  echo "        type: string, required parameter"
+  echo "        Nicely formatted name."
+  echo ""
+  echo "    --dataset_url"
+  echo "        type: string"
+  echo "        Link to the original source of the dataset."
+  echo ""
+  echo "    --dataset_reference"
+  echo "        type: string"
+  echo "        Bibtex reference of the paper in which the dataset was published."
+  echo ""
+  echo "    --dataset_summary"
+  echo "        type: string, required parameter"
+  echo "        Short description of the dataset."
+  echo ""
+  echo "    --dataset_description"
+  echo "        type: string, required parameter"
+  echo "        Long description of the dataset."
+  echo ""
+  echo "    --dataset_organism"
+  echo "        type: string"
+  echo "        The organism of the dataset."
+  echo ""
+  echo "Outputs:"
+  echo "    --output_mod1"
+  echo "        type: file, output, file must exist"
+  echo "        example: resources_test/common/pancreas/raw.h5ad"
+  echo ""
+  echo "    --output_mod2"
+  echo "        type: file, output, file must exist"
+  echo "        example: resources_test/common/pancreas/raw.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component datasets/loaders openproblems_neurips2022_pbmc"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:43Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-openproblems_neurips2022_pbmc-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "openproblems_neurips2022_pbmc 2.0.0"
+            exit
+            ;;
+        --input_mod1)
+            [ -n "$VIASH_PAR_INPUT_MOD1" ] && ViashError Bad arguments for option \'--input_mod1\': \'$VIASH_PAR_INPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_MOD1" ] && ViashError Bad arguments for option \'--input_mod1=*\': \'$VIASH_PAR_INPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_mod2)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2=*\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --mod1)
+            [ -n "$VIASH_PAR_MOD1" ] && ViashError Bad arguments for option \'--mod1\': \'$VIASH_PAR_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --mod1=*)
+            [ -n "$VIASH_PAR_MOD1" ] && ViashError Bad arguments for option \'--mod1=*\': \'$VIASH_PAR_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --mod2)
+            [ -n "$VIASH_PAR_MOD2" ] && ViashError Bad arguments for option \'--mod2\': \'$VIASH_PAR_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --mod2=*)
+            [ -n "$VIASH_PAR_MOD2" ] && ViashError Bad arguments for option \'--mod2=*\': \'$VIASH_PAR_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_id)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_id=*)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id=*\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_name)
+            [ -n "$VIASH_PAR_DATASET_NAME" ] && ViashError Bad arguments for option \'--dataset_name\': \'$VIASH_PAR_DATASET_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_NAME="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_name. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_name=*)
+            [ -n "$VIASH_PAR_DATASET_NAME" ] && ViashError Bad arguments for option \'--dataset_name=*\': \'$VIASH_PAR_DATASET_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_NAME=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_url)
+            [ -n "$VIASH_PAR_DATASET_URL" ] && ViashError Bad arguments for option \'--dataset_url\': \'$VIASH_PAR_DATASET_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_URL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_url. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_url=*)
+            [ -n "$VIASH_PAR_DATASET_URL" ] && ViashError Bad arguments for option \'--dataset_url=*\': \'$VIASH_PAR_DATASET_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_URL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_reference)
+            [ -n "$VIASH_PAR_DATASET_REFERENCE" ] && ViashError Bad arguments for option \'--dataset_reference\': \'$VIASH_PAR_DATASET_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_REFERENCE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_reference. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_reference=*)
+            [ -n "$VIASH_PAR_DATASET_REFERENCE" ] && ViashError Bad arguments for option \'--dataset_reference=*\': \'$VIASH_PAR_DATASET_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_REFERENCE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_summary)
+            [ -n "$VIASH_PAR_DATASET_SUMMARY" ] && ViashError Bad arguments for option \'--dataset_summary\': \'$VIASH_PAR_DATASET_SUMMARY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_SUMMARY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_summary. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_summary=*)
+            [ -n "$VIASH_PAR_DATASET_SUMMARY" ] && ViashError Bad arguments for option \'--dataset_summary=*\': \'$VIASH_PAR_DATASET_SUMMARY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_SUMMARY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_description)
+            [ -n "$VIASH_PAR_DATASET_DESCRIPTION" ] && ViashError Bad arguments for option \'--dataset_description\': \'$VIASH_PAR_DATASET_DESCRIPTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_DESCRIPTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_description. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_description=*)
+            [ -n "$VIASH_PAR_DATASET_DESCRIPTION" ] && ViashError Bad arguments for option \'--dataset_description=*\': \'$VIASH_PAR_DATASET_DESCRIPTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_DESCRIPTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_organism)
+            [ -n "$VIASH_PAR_DATASET_ORGANISM" ] && ViashError Bad arguments for option \'--dataset_organism\': \'$VIASH_PAR_DATASET_ORGANISM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ORGANISM="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_organism. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_organism=*)
+            [ -n "$VIASH_PAR_DATASET_ORGANISM" ] && ViashError Bad arguments for option \'--dataset_organism=*\': \'$VIASH_PAR_DATASET_ORGANISM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ORGANISM=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod1)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod1=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1=*\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod2)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod2=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2=*\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_neurips2022_pbmc:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_neurips2022_pbmc:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_neurips2022_pbmc:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_neurips2022_pbmc:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_MOD1+x} ]; then
+  ViashError '--input_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_MOD2+x} ]; then
+  ViashError '--input_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_MOD1+x} ]; then
+  ViashError '--mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_MOD2+x} ]; then
+  ViashError '--mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_ID+x} ]; then
+  ViashError '--dataset_id' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_NAME+x} ]; then
+  ViashError '--dataset_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then
+  ViashError '--dataset_summary' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then
+  ViashError '--dataset_description' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD2' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD1")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD1")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD2")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD2")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD1")" )
+  VIASH_PAR_INPUT_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD2")" )
+  VIASH_PAR_INPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD1")" )
+  VIASH_PAR_OUTPUT_MOD1=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD1")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD1" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD2")" )
+  VIASH_PAR_OUTPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD2")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD2" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_neurips2022_pbmc:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_neurips2022_pbmc:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_neurips2022_pbmc:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-openproblems_neurips2022_pbmc-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+from scipy import sparse
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'mod1': $( if [ ! -z ${VIASH_PAR_MOD1+x} ]; then echo "r'${VIASH_PAR_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'mod2': $( if [ ! -z ${VIASH_PAR_MOD2+x} ]; then echo "r'${VIASH_PAR_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_id': $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo "r'${VIASH_PAR_DATASET_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_name': $( if [ ! -z ${VIASH_PAR_DATASET_NAME+x} ]; then echo "r'${VIASH_PAR_DATASET_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_url': $( if [ ! -z ${VIASH_PAR_DATASET_URL+x} ]; then echo "r'${VIASH_PAR_DATASET_URL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_reference': $( if [ ! -z ${VIASH_PAR_DATASET_REFERENCE+x} ]; then echo "r'${VIASH_PAR_DATASET_REFERENCE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_summary': $( if [ ! -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then echo "r'${VIASH_PAR_DATASET_SUMMARY//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_description': $( if [ ! -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then echo "r'${VIASH_PAR_DATASET_DESCRIPTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_organism': $( if [ ! -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then echo "r'${VIASH_PAR_DATASET_ORGANISM//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+
+def convert_matrix(adata):
+  for key in adata:
+      if isinstance(adata[key], sparse.csr_matrix):
+        adata[key] = sparse.csc_matrix(adata[key])
+      
+
+print("load dataset modality 1 file", flush=True)
+adata_mod1 = ad.read_h5ad(par["input_mod1"])
+
+print("load dataset modality 2 file", flush=True)
+adata_mod2 = ad.read_h5ad(par["input_mod2"])
+
+# Convert to sparse csc_matrix
+convert_matrix(adata_mod1.layers)
+convert_matrix(adata_mod1.obsm)
+convert_matrix(adata_mod2.layers)
+convert_matrix(adata_mod2.obsm)
+
+
+# Add is_train to obs (modality 1)
+if "is_train" not in adata_mod1.obs.columns:
+    split_info = adata_mod1.obs["kaggle_dataset"]
+    train_sets = ["train", "test_public"]
+    adata_mod1.obs["is_train"] = [ "train" if x in train_sets else "test" for x in split_info ]
+
+# Add is_train to obs if it is missing (modality 2)
+if "is_train" not in adata_mod2.obs.columns:
+    split_info = adata_mod2.obs["kaggle_dataset"]
+    train_sets = ["train", "test_public"]
+    adata_mod2.obs["is_train"] = [ "train" if x in train_sets else "test" for x in split_info ]
+
+
+# split up index in modality 1 into feature ID and feature name
+adata_mod1.var['feature_id'] = [str(s).split('_')[0] for s in adata_mod1.var.index.tolist()]
+# TODO: index does not always contain an underscore.
+if "_" in adata_mod1.var.index[0]:
+  adata_mod1.var['feature_name'] = [str(s).split('_')[1] for s in adata_mod1.var.index.tolist()]
+adata_mod1.var.set_index('feature_id',drop=False, inplace=True)
+
+# set feature_name (proteins have only partial ensemble IDs))
+adata_mod2.var['feature_id'] = adata_mod2.var.index.tolist() # feature id needs to be filled in
+adata_mod2.var['feature_name'] = adata_mod2.var.index.tolist()
+adata_mod2.var.set_index('feature_name',drop=False, inplace=True)
+
+
+# remove adata.X
+del adata_mod1.X
+del adata_mod2.X
+
+
+print("Add metadata to uns", flush=True)
+metadata_fields = [
+    "dataset_id", "dataset_name", "dataset_url", "dataset_reference",
+    "dataset_summary", "dataset_description", "dataset_organism"
+]
+for key in metadata_fields:
+    if key in par:
+        print(f"  Setting .uns['{key}']", flush=True)
+        adata_mod1.uns[key] = par[key]
+        adata_mod2.uns[key] = par[key]
+
+
+print("Writing adata to file", flush=True)
+adata_mod1.write_h5ad(par["output_mod1"], compression="gzip")
+adata_mod2.write_h5ad(par["output_mod2"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ]; then
+  VIASH_PAR_INPUT_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_PAR_INPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_PAR_OUTPUT_MOD1=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_PAR_OUTPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD2")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD2' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/loaders/openproblems_v1/.config.vsh.yaml b/target/docker/datasets/loaders/openproblems_v1/.config.vsh.yaml
new file mode 100644
index 0000000000..921ca13b57
--- /dev/null
+++ b/target/docker/datasets/loaders/openproblems_v1/.config.vsh.yaml
@@ -0,0 +1,446 @@
+functionality:
+  name: "openproblems_v1"
+  namespace: "datasets/loaders"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Raw dataset"
+      summary: "An unprocessed dataset as output by a dataset loader."
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+    example:
+    - "resources_test/common/pancreas/raw.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "string"
+      name: "--input_id"
+      description: "The ID of the dataset in OpenProblems v1"
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_cell_type"
+      description: "Location of where to find the observation cell types."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_batch"
+      description: "Location of where to find the observation batch IDs."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_tissue"
+      description: "Location of where to find the observation tissue information."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--layer_counts"
+      description: "In which layer to find the counts matrix. Leave undefined to use\
+        \ `.X`."
+      info: null
+      example:
+      - "counts"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean"
+      name: "--sparse"
+      description: "Convert layers to a sparse CSR format."
+      info: null
+      default:
+      - true
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--var_feature_id"
+      description: "Location of where to find the feature IDs. Can be set to index\
+        \ if the feature IDs are the index."
+      info: null
+      example:
+      - "gene_ids"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--var_feature_name"
+      description: "Location of where to find the feature names. Can be set to index\
+        \ if the feature names are the index."
+      info: null
+      default:
+      - "index"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--dataset_id"
+      description: "Unique identifier of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Fetch a dataset from OpenProblems v1"
+  test_resources:
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info:
+    type: "dataset_loader"
+    type_info:
+      label: "Dataset loader"
+      summary: "A component which generates a \"Common dataset\"."
+      description: "A dataset loader will typically have an identifier (e.g. a GEO\
+        \ identifier)\nor URL as input argument and additional arguments to define\
+        \ where the script needs to download a dataset from and how to process it.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone -b 'v0.8.0' --depth 1 https://github.com/openproblems-bio/openproblems.git\
+      \ /opt/openproblems && \\\n  pip install --no-cache-dir -r /opt/openproblems/docker/openproblems/requirements.txt\
+      \ && \\\n  pip install --no-cache-dir --editable /opt/openproblems\n"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_v1/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/loaders/openproblems_v1"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/loaders/openproblems_v1/openproblems_v1"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/loaders/openproblems_v1/openproblems_v1 b/target/docker/datasets/loaders/openproblems_v1/openproblems_v1
new file mode 100755
index 0000000000..b609e4830e
--- /dev/null
+++ b/target/docker/datasets/loaders/openproblems_v1/openproblems_v1
@@ -0,0 +1,1292 @@
+#!/usr/bin/env bash
+
+# openproblems_v1 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="openproblems_v1"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "openproblems_v1 2.0.0"
+  echo ""
+  echo "Fetch a dataset from OpenProblems v1"
+  echo ""
+  echo "Inputs:"
+  echo "    --input_id"
+  echo "        type: string, required parameter"
+  echo "        The ID of the dataset in OpenProblems v1"
+  echo ""
+  echo "    --obs_cell_type"
+  echo "        type: string"
+  echo "        Location of where to find the observation cell types."
+  echo ""
+  echo "    --obs_batch"
+  echo "        type: string"
+  echo "        Location of where to find the observation batch IDs."
+  echo ""
+  echo "    --obs_tissue"
+  echo "        type: string"
+  echo "        Location of where to find the observation tissue information."
+  echo ""
+  echo "    --layer_counts"
+  echo "        type: string"
+  echo "        example: counts"
+  echo "        In which layer to find the counts matrix. Leave undefined to use \`.X\`."
+  echo ""
+  echo "    --sparse"
+  echo "        type: boolean"
+  echo "        default: true"
+  echo "        Convert layers to a sparse CSR format."
+  echo ""
+  echo "    --var_feature_id"
+  echo "        type: string"
+  echo "        example: gene_ids"
+  echo "        Location of where to find the feature IDs. Can be set to index if the"
+  echo "        feature IDs are the index."
+  echo ""
+  echo "    --var_feature_name"
+  echo "        type: string"
+  echo "        default: index"
+  echo "        Location of where to find the feature names. Can be set to index if the"
+  echo "        feature names are the index."
+  echo ""
+  echo "Metadata:"
+  echo "    --dataset_id"
+  echo "        type: string, required parameter"
+  echo "        Unique identifier of the dataset."
+  echo ""
+  echo "    --dataset_name"
+  echo "        type: string, required parameter"
+  echo "        Nicely formatted name."
+  echo ""
+  echo "    --dataset_url"
+  echo "        type: string"
+  echo "        Link to the original source of the dataset."
+  echo ""
+  echo "    --dataset_reference"
+  echo "        type: string"
+  echo "        Bibtex reference of the paper in which the dataset was published."
+  echo ""
+  echo "    --dataset_summary"
+  echo "        type: string, required parameter"
+  echo "        Short description of the dataset."
+  echo ""
+  echo "    --dataset_description"
+  echo "        type: string, required parameter"
+  echo "        Long description of the dataset."
+  echo ""
+  echo "    --dataset_organism"
+  echo "        type: string"
+  echo "        The organism of the dataset."
+  echo ""
+  echo "Arguments:"
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/common/pancreas/raw.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN apt-get update && \
+  DEBIAN_FRONTEND=noninteractive apt-get install -y git && \
+  rm -rf /var/lib/apt/lists/*
+
+RUN git clone -b 'v0.8.0' --depth 1 https://github.com/openproblems-bio/openproblems.git /opt/openproblems && \
+  pip install --no-cache-dir -r /opt/openproblems/docker/openproblems/requirements.txt && \
+  pip install --no-cache-dir --editable /opt/openproblems
+
+LABEL org.opencontainers.image.description="Companion container for running component datasets/loaders openproblems_v1"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:43Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-openproblems_v1-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "openproblems_v1 2.0.0"
+            exit
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_id)
+            [ -n "$VIASH_PAR_INPUT_ID" ] && ViashError Bad arguments for option \'--input_id\': \'$VIASH_PAR_INPUT_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_id=*)
+            [ -n "$VIASH_PAR_INPUT_ID" ] && ViashError Bad arguments for option \'--input_id=*\': \'$VIASH_PAR_INPUT_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obs_cell_type)
+            [ -n "$VIASH_PAR_OBS_CELL_TYPE" ] && ViashError Bad arguments for option \'--obs_cell_type\': \'$VIASH_PAR_OBS_CELL_TYPE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_CELL_TYPE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_cell_type. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_cell_type=*)
+            [ -n "$VIASH_PAR_OBS_CELL_TYPE" ] && ViashError Bad arguments for option \'--obs_cell_type=*\': \'$VIASH_PAR_OBS_CELL_TYPE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_CELL_TYPE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obs_batch)
+            [ -n "$VIASH_PAR_OBS_BATCH" ] && ViashError Bad arguments for option \'--obs_batch\': \'$VIASH_PAR_OBS_BATCH\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_BATCH="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_batch. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_batch=*)
+            [ -n "$VIASH_PAR_OBS_BATCH" ] && ViashError Bad arguments for option \'--obs_batch=*\': \'$VIASH_PAR_OBS_BATCH\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_BATCH=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obs_tissue)
+            [ -n "$VIASH_PAR_OBS_TISSUE" ] && ViashError Bad arguments for option \'--obs_tissue\': \'$VIASH_PAR_OBS_TISSUE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_TISSUE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_tissue. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_tissue=*)
+            [ -n "$VIASH_PAR_OBS_TISSUE" ] && ViashError Bad arguments for option \'--obs_tissue=*\': \'$VIASH_PAR_OBS_TISSUE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_TISSUE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --layer_counts)
+            [ -n "$VIASH_PAR_LAYER_COUNTS" ] && ViashError Bad arguments for option \'--layer_counts\': \'$VIASH_PAR_LAYER_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LAYER_COUNTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --layer_counts. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --layer_counts=*)
+            [ -n "$VIASH_PAR_LAYER_COUNTS" ] && ViashError Bad arguments for option \'--layer_counts=*\': \'$VIASH_PAR_LAYER_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LAYER_COUNTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --sparse)
+            [ -n "$VIASH_PAR_SPARSE" ] && ViashError Bad arguments for option \'--sparse\': \'$VIASH_PAR_SPARSE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SPARSE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --sparse. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --sparse=*)
+            [ -n "$VIASH_PAR_SPARSE" ] && ViashError Bad arguments for option \'--sparse=*\': \'$VIASH_PAR_SPARSE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SPARSE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --var_feature_id)
+            [ -n "$VIASH_PAR_VAR_FEATURE_ID" ] && ViashError Bad arguments for option \'--var_feature_id\': \'$VIASH_PAR_VAR_FEATURE_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VAR_FEATURE_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --var_feature_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --var_feature_id=*)
+            [ -n "$VIASH_PAR_VAR_FEATURE_ID" ] && ViashError Bad arguments for option \'--var_feature_id=*\': \'$VIASH_PAR_VAR_FEATURE_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VAR_FEATURE_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --var_feature_name)
+            [ -n "$VIASH_PAR_VAR_FEATURE_NAME" ] && ViashError Bad arguments for option \'--var_feature_name\': \'$VIASH_PAR_VAR_FEATURE_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VAR_FEATURE_NAME="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --var_feature_name. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --var_feature_name=*)
+            [ -n "$VIASH_PAR_VAR_FEATURE_NAME" ] && ViashError Bad arguments for option \'--var_feature_name=*\': \'$VIASH_PAR_VAR_FEATURE_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VAR_FEATURE_NAME=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_id)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_id=*)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id=*\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_name)
+            [ -n "$VIASH_PAR_DATASET_NAME" ] && ViashError Bad arguments for option \'--dataset_name\': \'$VIASH_PAR_DATASET_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_NAME="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_name. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_name=*)
+            [ -n "$VIASH_PAR_DATASET_NAME" ] && ViashError Bad arguments for option \'--dataset_name=*\': \'$VIASH_PAR_DATASET_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_NAME=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_url)
+            [ -n "$VIASH_PAR_DATASET_URL" ] && ViashError Bad arguments for option \'--dataset_url\': \'$VIASH_PAR_DATASET_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_URL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_url. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_url=*)
+            [ -n "$VIASH_PAR_DATASET_URL" ] && ViashError Bad arguments for option \'--dataset_url=*\': \'$VIASH_PAR_DATASET_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_URL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_reference)
+            [ -n "$VIASH_PAR_DATASET_REFERENCE" ] && ViashError Bad arguments for option \'--dataset_reference\': \'$VIASH_PAR_DATASET_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_REFERENCE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_reference. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_reference=*)
+            [ -n "$VIASH_PAR_DATASET_REFERENCE" ] && ViashError Bad arguments for option \'--dataset_reference=*\': \'$VIASH_PAR_DATASET_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_REFERENCE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_summary)
+            [ -n "$VIASH_PAR_DATASET_SUMMARY" ] && ViashError Bad arguments for option \'--dataset_summary\': \'$VIASH_PAR_DATASET_SUMMARY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_SUMMARY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_summary. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_summary=*)
+            [ -n "$VIASH_PAR_DATASET_SUMMARY" ] && ViashError Bad arguments for option \'--dataset_summary=*\': \'$VIASH_PAR_DATASET_SUMMARY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_SUMMARY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_description)
+            [ -n "$VIASH_PAR_DATASET_DESCRIPTION" ] && ViashError Bad arguments for option \'--dataset_description\': \'$VIASH_PAR_DATASET_DESCRIPTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_DESCRIPTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_description. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_description=*)
+            [ -n "$VIASH_PAR_DATASET_DESCRIPTION" ] && ViashError Bad arguments for option \'--dataset_description=*\': \'$VIASH_PAR_DATASET_DESCRIPTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_DESCRIPTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_organism)
+            [ -n "$VIASH_PAR_DATASET_ORGANISM" ] && ViashError Bad arguments for option \'--dataset_organism\': \'$VIASH_PAR_DATASET_ORGANISM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ORGANISM="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_organism. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_organism=*)
+            [ -n "$VIASH_PAR_DATASET_ORGANISM" ] && ViashError Bad arguments for option \'--dataset_organism=*\': \'$VIASH_PAR_DATASET_ORGANISM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ORGANISM=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_v1:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_v1:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_v1:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_v1:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_ID+x} ]; then
+  ViashError '--input_id' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_ID+x} ]; then
+  ViashError '--dataset_id' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_NAME+x} ]; then
+  ViashError '--dataset_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then
+  ViashError '--dataset_summary' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then
+  ViashError '--dataset_description' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_SPARSE+x} ]; then
+  VIASH_PAR_SPARSE="true"
+fi
+if [ -z ${VIASH_PAR_VAR_FEATURE_NAME+x} ]; then
+  VIASH_PAR_VAR_FEATURE_NAME="index"
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_SPARSE" ]]; then
+  if ! [[ "$VIASH_PAR_SPARSE" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--sparse' has to be a boolean. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_v1:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_v1:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_v1:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-openproblems_v1-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+from typing import Any, Callable, Dict, Tuple
+import openproblems as op
+import scanpy as sc
+import scipy
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_id': $( if [ ! -z ${VIASH_PAR_INPUT_ID+x} ]; then echo "r'${VIASH_PAR_INPUT_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'obs_cell_type': $( if [ ! -z ${VIASH_PAR_OBS_CELL_TYPE+x} ]; then echo "r'${VIASH_PAR_OBS_CELL_TYPE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'obs_batch': $( if [ ! -z ${VIASH_PAR_OBS_BATCH+x} ]; then echo "r'${VIASH_PAR_OBS_BATCH//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'obs_tissue': $( if [ ! -z ${VIASH_PAR_OBS_TISSUE+x} ]; then echo "r'${VIASH_PAR_OBS_TISSUE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'layer_counts': $( if [ ! -z ${VIASH_PAR_LAYER_COUNTS+x} ]; then echo "r'${VIASH_PAR_LAYER_COUNTS//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'sparse': $( if [ ! -z ${VIASH_PAR_SPARSE+x} ]; then echo "r'${VIASH_PAR_SPARSE//\'/\'\"\'\"r\'}'.lower() == 'true'"; else echo None; fi ),
+  'var_feature_id': $( if [ ! -z ${VIASH_PAR_VAR_FEATURE_ID+x} ]; then echo "r'${VIASH_PAR_VAR_FEATURE_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'var_feature_name': $( if [ ! -z ${VIASH_PAR_VAR_FEATURE_NAME+x} ]; then echo "r'${VIASH_PAR_VAR_FEATURE_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_id': $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo "r'${VIASH_PAR_DATASET_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_name': $( if [ ! -z ${VIASH_PAR_DATASET_NAME+x} ]; then echo "r'${VIASH_PAR_DATASET_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_url': $( if [ ! -z ${VIASH_PAR_DATASET_URL+x} ]; then echo "r'${VIASH_PAR_DATASET_URL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_reference': $( if [ ! -z ${VIASH_PAR_DATASET_REFERENCE+x} ]; then echo "r'${VIASH_PAR_DATASET_REFERENCE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_summary': $( if [ ! -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then echo "r'${VIASH_PAR_DATASET_SUMMARY//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_description': $( if [ ! -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then echo "r'${VIASH_PAR_DATASET_DESCRIPTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_organism': $( if [ ! -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then echo "r'${VIASH_PAR_DATASET_ORGANISM//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# make dataset lookup table
+# If need be, this could be stored in a separate yaml file
+dataset_funs: Dict[str, Tuple[Callable, Dict[str, Any]]] = {
+    "allen_brain_atlas": (op.data.allen_brain_atlas.load_mouse_brain_atlas, {}),
+    "cengen": (op.data.cengen.load_cengen, {}),
+    "immune_cells": (op.data.immune_cells.load_immune, {}),
+    "mouse_blood_olsson_labelled": (op.data.mouse_blood_olsson_labelled.load_olsson_2016_mouse_blood, {}),
+    "mouse_hspc_nestorowa2016": (op.data.mouse_hspc_nestorowa2016.load_mouse_hspc_nestorowa2016, {}),
+    "pancreas": (op.data.pancreas.load_pancreas, {}),
+    # "tabula_muris_senis": op.data.tabula_muris_senis.load_tabula_muris_senis,
+    "tabula_muris_senis_droplet_lung": (
+        op.data.tabula_muris_senis.load_tabula_muris_senis,
+        {"organ_list": ["lung"], "method_list": ["droplet"]}
+    ),
+    "tenx_1k_pbmc": (op.data.tenx.load_tenx_1k_pbmc, {}),
+    "tenx_5k_pbmc": (op.data.tenx.load_tenx_5k_pbmc, {}),
+    "tnbc_wu2021": (op.data.tnbc_wu2021.load_tnbc_data, {}),
+    "zebrafish": (op.data.zebrafish.load_zebrafish, {})
+}
+
+# fetch dataset
+dataset_fun, kwargs = dataset_funs[par["input_id"]]
+
+print("Fetch dataset", flush=True)
+adata = dataset_fun(**kwargs)
+
+# override values one by one because adata.uns and
+# metadata are two different classes.
+for key, value in dataset_fun.metadata.items():
+    print(f"Setting .uns['{key}']", flush=True)
+    adata.uns[key] = value
+
+print("Setting .obs['cell_type']", flush=True)
+if par["obs_cell_type"]:
+    if par["obs_cell_type"] in adata.obs:
+        adata.obs["cell_type"] = adata.obs[par["obs_cell_type"]]
+    else:
+        print(f"Warning: key '{par['obs_cell_type']}' could not be found in adata.obs.", flush=True)
+
+print("Setting .obs['batch']", flush=True)
+if par["obs_batch"]:
+    if par["obs_batch"] in adata.obs:
+        adata.obs["batch"] = adata.obs[par["obs_batch"]]
+    else:
+        print(f"Warning: key '{par['obs_batch']}' could not be found in adata.obs.", flush=True)
+
+print("Setting .obs['tissue']", flush=True)
+if par["obs_tissue"]:
+    if par["obs_tissue"] in adata.obs:
+        adata.obs["tissue"] = adata.obs[par["obs_tissue"]]
+    else:
+        print(f"Warning: key '{par['obs_tissue']}' could not be found in adata.obs.", flush=True)
+
+if par["layer_counts"] and par["layer_counts"] in adata.layers:
+    print(f"Temporarily moving .layers['{par['layer_counts']}'] to .X", flush=True)
+    adata.X = adata.layers[par["layer_counts"]]
+    del adata.layers[par["layer_counts"]]
+
+if par["sparse"] and not scipy.sparse.issparse(adata.X):
+    print("Make counts sparse", flush=True)
+    adata.X = scipy.sparse.csr_matrix(adata.X)
+
+print("Removing empty genes", flush=True)
+sc.pp.filter_genes(adata, min_cells=1)
+
+print("Removing empty cells", flush=True)
+sc.pp.filter_cells(adata, min_counts=2)
+
+print("Moving .X to .layers['counts']", flush=True)
+adata.layers["counts"] = adata.X
+del adata.X
+
+print("Add metadata to uns", flush=True)
+metadata_fields = [
+    "dataset_id", "dataset_name", "dataset_url", "dataset_reference",
+    "dataset_summary", "dataset_description", "dataset_organism"
+]
+uns_metadata = {
+    id: par[id]
+    for id in metadata_fields
+    if id in par
+}
+adata.uns.update(uns_metadata)
+
+print("Setting .var['feature_name']", flush=True)
+
+if par["var_feature_name"] == "index":
+    adata.var["feature_name"] = adata.var.index
+else:
+    if par["var_feature_name"] in adata.var:
+        adata.var["feature_name"] = adata.var[par["feature_name"]]
+        del adata.var[par["feature_name"]]
+    else:
+        print(f"Warning: key '{par['var_feature_name']}' could not be found in adata.var.", flush=True)
+
+print("Setting .var['feature_id']", flush=True)
+
+if par["var_feature_id"] == "index":
+    adata.var["feature_id"] = adata.var.index
+else:
+    if par["var_feature_id"] in adata.var:
+        adata.var["feature_id"] = adata.var[par["feature_id"]]
+        del adata.var[par["feature_id"]]
+    else:
+        print(f"Warning: key '{par['var_feature_id']}' could not be found in adata.var.", flush=True)
+
+print("Writing adata to file", flush=True)
+adata.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/loaders/openproblems_v1_multimodal/.config.vsh.yaml b/target/docker/datasets/loaders/openproblems_v1_multimodal/.config.vsh.yaml
new file mode 100644
index 0000000000..63ce428724
--- /dev/null
+++ b/target/docker/datasets/loaders/openproblems_v1_multimodal/.config.vsh.yaml
@@ -0,0 +1,650 @@
+functionality:
+  name: "openproblems_v1_multimodal"
+  namespace: "datasets/loaders"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "string"
+      name: "--input_id"
+      description: "The ID of the dataset in OpenProblems v1"
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_cell_type"
+      description: "Location of where to find the observation cell types."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_batch"
+      description: "Location of where to find the observation batch IDs."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_tissue"
+      description: "Location of where to find the observation tissue information."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--layer_counts"
+      description: "In which layer to find the counts matrix. Leave undefined to use\
+        \ `.X`."
+      info: null
+      example:
+      - "counts"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean"
+      name: "--sparse"
+      description: "Convert layers to a sparse CSR format."
+      info: null
+      default:
+      - true
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--var_feature_id"
+      description: "Location of where to find the feature IDs. Can be set to index\
+        \ if the feature IDs are the index."
+      info: null
+      example:
+      - "gene_ids"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--var_feature_name"
+      description: "Location of where to find the feature names. Can be set to index\
+        \ if the feature names are the index."
+      info: null
+      default:
+      - "index"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--dataset_id"
+      description: "Unique identifier of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_mod1"
+      info:
+        label: "Raw dataset"
+        summary: "An unprocessed dataset as output by a dataset loader."
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+      example:
+      - "resources_test/common/pancreas/raw.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_mod2"
+      info:
+        label: "Raw dataset"
+        summary: "An unprocessed dataset as output by a dataset loader."
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+      example:
+      - "resources_test/common/pancreas/raw.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Fetch a dataset from OpenProblems v1"
+  test_resources:
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone -b 'v0.8.0' --depth 1 https://github.com/openproblems-bio/openproblems.git\
+      \ /opt/openproblems && \\\n  pip install --no-cache-dir -r /opt/openproblems/docker/openproblems/requirements.txt\
+      \ && \\\n  pip install --no-cache-dir --editable /opt/openproblems\n"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_v1_multimodal/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/loaders/openproblems_v1_multimodal"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/loaders/openproblems_v1_multimodal/openproblems_v1_multimodal"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/loaders/openproblems_v1_multimodal/openproblems_v1_multimodal b/target/docker/datasets/loaders/openproblems_v1_multimodal/openproblems_v1_multimodal
new file mode 100755
index 0000000000..6d92963fb8
--- /dev/null
+++ b/target/docker/datasets/loaders/openproblems_v1_multimodal/openproblems_v1_multimodal
@@ -0,0 +1,1356 @@
+#!/usr/bin/env bash
+
+# openproblems_v1_multimodal 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="openproblems_v1_multimodal"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "openproblems_v1_multimodal 2.0.0"
+  echo ""
+  echo "Fetch a dataset from OpenProblems v1"
+  echo ""
+  echo "Inputs:"
+  echo "    --input_id"
+  echo "        type: string, required parameter"
+  echo "        The ID of the dataset in OpenProblems v1"
+  echo ""
+  echo "    --obs_cell_type"
+  echo "        type: string"
+  echo "        Location of where to find the observation cell types."
+  echo ""
+  echo "    --obs_batch"
+  echo "        type: string"
+  echo "        Location of where to find the observation batch IDs."
+  echo ""
+  echo "    --obs_tissue"
+  echo "        type: string"
+  echo "        Location of where to find the observation tissue information."
+  echo ""
+  echo "    --layer_counts"
+  echo "        type: string"
+  echo "        example: counts"
+  echo "        In which layer to find the counts matrix. Leave undefined to use \`.X\`."
+  echo ""
+  echo "    --sparse"
+  echo "        type: boolean"
+  echo "        default: true"
+  echo "        Convert layers to a sparse CSR format."
+  echo ""
+  echo "    --var_feature_id"
+  echo "        type: string"
+  echo "        example: gene_ids"
+  echo "        Location of where to find the feature IDs. Can be set to index if the"
+  echo "        feature IDs are the index."
+  echo ""
+  echo "    --var_feature_name"
+  echo "        type: string"
+  echo "        default: index"
+  echo "        Location of where to find the feature names. Can be set to index if the"
+  echo "        feature names are the index."
+  echo ""
+  echo "Metadata:"
+  echo "    --dataset_id"
+  echo "        type: string, required parameter"
+  echo "        Unique identifier of the dataset."
+  echo ""
+  echo "    --dataset_name"
+  echo "        type: string, required parameter"
+  echo "        Nicely formatted name."
+  echo ""
+  echo "    --dataset_url"
+  echo "        type: string"
+  echo "        Link to the original source of the dataset."
+  echo ""
+  echo "    --dataset_reference"
+  echo "        type: string"
+  echo "        Bibtex reference of the paper in which the dataset was published."
+  echo ""
+  echo "    --dataset_summary"
+  echo "        type: string, required parameter"
+  echo "        Short description of the dataset."
+  echo ""
+  echo "    --dataset_description"
+  echo "        type: string, required parameter"
+  echo "        Long description of the dataset."
+  echo ""
+  echo "    --dataset_organism"
+  echo "        type: string"
+  echo "        The organism of the dataset."
+  echo ""
+  echo "Outputs:"
+  echo "    --output_mod1"
+  echo "        type: file, output, file must exist"
+  echo "        example: resources_test/common/pancreas/raw.h5ad"
+  echo ""
+  echo "    --output_mod2"
+  echo "        type: file, output, file must exist"
+  echo "        example: resources_test/common/pancreas/raw.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN apt-get update && \
+  DEBIAN_FRONTEND=noninteractive apt-get install -y git && \
+  rm -rf /var/lib/apt/lists/*
+
+RUN git clone -b 'v0.8.0' --depth 1 https://github.com/openproblems-bio/openproblems.git /opt/openproblems && \
+  pip install --no-cache-dir -r /opt/openproblems/docker/openproblems/requirements.txt && \
+  pip install --no-cache-dir --editable /opt/openproblems
+
+LABEL org.opencontainers.image.description="Companion container for running component datasets/loaders openproblems_v1_multimodal"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:42Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-openproblems_v1_multimodal-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "openproblems_v1_multimodal 2.0.0"
+            exit
+            ;;
+        --input_id)
+            [ -n "$VIASH_PAR_INPUT_ID" ] && ViashError Bad arguments for option \'--input_id\': \'$VIASH_PAR_INPUT_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_id=*)
+            [ -n "$VIASH_PAR_INPUT_ID" ] && ViashError Bad arguments for option \'--input_id=*\': \'$VIASH_PAR_INPUT_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obs_cell_type)
+            [ -n "$VIASH_PAR_OBS_CELL_TYPE" ] && ViashError Bad arguments for option \'--obs_cell_type\': \'$VIASH_PAR_OBS_CELL_TYPE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_CELL_TYPE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_cell_type. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_cell_type=*)
+            [ -n "$VIASH_PAR_OBS_CELL_TYPE" ] && ViashError Bad arguments for option \'--obs_cell_type=*\': \'$VIASH_PAR_OBS_CELL_TYPE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_CELL_TYPE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obs_batch)
+            [ -n "$VIASH_PAR_OBS_BATCH" ] && ViashError Bad arguments for option \'--obs_batch\': \'$VIASH_PAR_OBS_BATCH\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_BATCH="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_batch. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_batch=*)
+            [ -n "$VIASH_PAR_OBS_BATCH" ] && ViashError Bad arguments for option \'--obs_batch=*\': \'$VIASH_PAR_OBS_BATCH\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_BATCH=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obs_tissue)
+            [ -n "$VIASH_PAR_OBS_TISSUE" ] && ViashError Bad arguments for option \'--obs_tissue\': \'$VIASH_PAR_OBS_TISSUE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_TISSUE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_tissue. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_tissue=*)
+            [ -n "$VIASH_PAR_OBS_TISSUE" ] && ViashError Bad arguments for option \'--obs_tissue=*\': \'$VIASH_PAR_OBS_TISSUE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_TISSUE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --layer_counts)
+            [ -n "$VIASH_PAR_LAYER_COUNTS" ] && ViashError Bad arguments for option \'--layer_counts\': \'$VIASH_PAR_LAYER_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LAYER_COUNTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --layer_counts. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --layer_counts=*)
+            [ -n "$VIASH_PAR_LAYER_COUNTS" ] && ViashError Bad arguments for option \'--layer_counts=*\': \'$VIASH_PAR_LAYER_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LAYER_COUNTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --sparse)
+            [ -n "$VIASH_PAR_SPARSE" ] && ViashError Bad arguments for option \'--sparse\': \'$VIASH_PAR_SPARSE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SPARSE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --sparse. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --sparse=*)
+            [ -n "$VIASH_PAR_SPARSE" ] && ViashError Bad arguments for option \'--sparse=*\': \'$VIASH_PAR_SPARSE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SPARSE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --var_feature_id)
+            [ -n "$VIASH_PAR_VAR_FEATURE_ID" ] && ViashError Bad arguments for option \'--var_feature_id\': \'$VIASH_PAR_VAR_FEATURE_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VAR_FEATURE_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --var_feature_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --var_feature_id=*)
+            [ -n "$VIASH_PAR_VAR_FEATURE_ID" ] && ViashError Bad arguments for option \'--var_feature_id=*\': \'$VIASH_PAR_VAR_FEATURE_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VAR_FEATURE_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --var_feature_name)
+            [ -n "$VIASH_PAR_VAR_FEATURE_NAME" ] && ViashError Bad arguments for option \'--var_feature_name\': \'$VIASH_PAR_VAR_FEATURE_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VAR_FEATURE_NAME="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --var_feature_name. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --var_feature_name=*)
+            [ -n "$VIASH_PAR_VAR_FEATURE_NAME" ] && ViashError Bad arguments for option \'--var_feature_name=*\': \'$VIASH_PAR_VAR_FEATURE_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VAR_FEATURE_NAME=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_id)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_id=*)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id=*\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_name)
+            [ -n "$VIASH_PAR_DATASET_NAME" ] && ViashError Bad arguments for option \'--dataset_name\': \'$VIASH_PAR_DATASET_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_NAME="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_name. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_name=*)
+            [ -n "$VIASH_PAR_DATASET_NAME" ] && ViashError Bad arguments for option \'--dataset_name=*\': \'$VIASH_PAR_DATASET_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_NAME=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_url)
+            [ -n "$VIASH_PAR_DATASET_URL" ] && ViashError Bad arguments for option \'--dataset_url\': \'$VIASH_PAR_DATASET_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_URL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_url. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_url=*)
+            [ -n "$VIASH_PAR_DATASET_URL" ] && ViashError Bad arguments for option \'--dataset_url=*\': \'$VIASH_PAR_DATASET_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_URL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_reference)
+            [ -n "$VIASH_PAR_DATASET_REFERENCE" ] && ViashError Bad arguments for option \'--dataset_reference\': \'$VIASH_PAR_DATASET_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_REFERENCE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_reference. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_reference=*)
+            [ -n "$VIASH_PAR_DATASET_REFERENCE" ] && ViashError Bad arguments for option \'--dataset_reference=*\': \'$VIASH_PAR_DATASET_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_REFERENCE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_summary)
+            [ -n "$VIASH_PAR_DATASET_SUMMARY" ] && ViashError Bad arguments for option \'--dataset_summary\': \'$VIASH_PAR_DATASET_SUMMARY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_SUMMARY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_summary. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_summary=*)
+            [ -n "$VIASH_PAR_DATASET_SUMMARY" ] && ViashError Bad arguments for option \'--dataset_summary=*\': \'$VIASH_PAR_DATASET_SUMMARY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_SUMMARY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_description)
+            [ -n "$VIASH_PAR_DATASET_DESCRIPTION" ] && ViashError Bad arguments for option \'--dataset_description\': \'$VIASH_PAR_DATASET_DESCRIPTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_DESCRIPTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_description. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_description=*)
+            [ -n "$VIASH_PAR_DATASET_DESCRIPTION" ] && ViashError Bad arguments for option \'--dataset_description=*\': \'$VIASH_PAR_DATASET_DESCRIPTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_DESCRIPTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_organism)
+            [ -n "$VIASH_PAR_DATASET_ORGANISM" ] && ViashError Bad arguments for option \'--dataset_organism\': \'$VIASH_PAR_DATASET_ORGANISM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ORGANISM="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_organism. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_organism=*)
+            [ -n "$VIASH_PAR_DATASET_ORGANISM" ] && ViashError Bad arguments for option \'--dataset_organism=*\': \'$VIASH_PAR_DATASET_ORGANISM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ORGANISM=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod1)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod1=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1=*\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod2)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod2=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2=*\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_v1_multimodal:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_v1_multimodal:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_v1_multimodal:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_v1_multimodal:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_ID+x} ]; then
+  ViashError '--input_id' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_ID+x} ]; then
+  ViashError '--dataset_id' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_NAME+x} ]; then
+  ViashError '--dataset_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then
+  ViashError '--dataset_summary' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then
+  ViashError '--dataset_description' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_SPARSE+x} ]; then
+  VIASH_PAR_SPARSE="true"
+fi
+if [ -z ${VIASH_PAR_VAR_FEATURE_NAME+x} ]; then
+  VIASH_PAR_VAR_FEATURE_NAME="index"
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_SPARSE" ]]; then
+  if ! [[ "$VIASH_PAR_SPARSE" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--sparse' has to be a boolean. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD1")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD1")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD2")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD2")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD1")" )
+  VIASH_PAR_OUTPUT_MOD1=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD1")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD1" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD2")" )
+  VIASH_PAR_OUTPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD2")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD2" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_v1_multimodal:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_v1_multimodal:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/loaders/openproblems_v1_multimodal:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-openproblems_v1_multimodal-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+from typing import Any, Callable, Dict, Tuple
+import openproblems as op
+import scanpy as sc
+import scipy
+import pandas as pd
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_id': $( if [ ! -z ${VIASH_PAR_INPUT_ID+x} ]; then echo "r'${VIASH_PAR_INPUT_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'obs_cell_type': $( if [ ! -z ${VIASH_PAR_OBS_CELL_TYPE+x} ]; then echo "r'${VIASH_PAR_OBS_CELL_TYPE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'obs_batch': $( if [ ! -z ${VIASH_PAR_OBS_BATCH+x} ]; then echo "r'${VIASH_PAR_OBS_BATCH//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'obs_tissue': $( if [ ! -z ${VIASH_PAR_OBS_TISSUE+x} ]; then echo "r'${VIASH_PAR_OBS_TISSUE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'layer_counts': $( if [ ! -z ${VIASH_PAR_LAYER_COUNTS+x} ]; then echo "r'${VIASH_PAR_LAYER_COUNTS//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'sparse': $( if [ ! -z ${VIASH_PAR_SPARSE+x} ]; then echo "r'${VIASH_PAR_SPARSE//\'/\'\"\'\"r\'}'.lower() == 'true'"; else echo None; fi ),
+  'var_feature_id': $( if [ ! -z ${VIASH_PAR_VAR_FEATURE_ID+x} ]; then echo "r'${VIASH_PAR_VAR_FEATURE_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'var_feature_name': $( if [ ! -z ${VIASH_PAR_VAR_FEATURE_NAME+x} ]; then echo "r'${VIASH_PAR_VAR_FEATURE_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_id': $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo "r'${VIASH_PAR_DATASET_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_name': $( if [ ! -z ${VIASH_PAR_DATASET_NAME+x} ]; then echo "r'${VIASH_PAR_DATASET_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_url': $( if [ ! -z ${VIASH_PAR_DATASET_URL+x} ]; then echo "r'${VIASH_PAR_DATASET_URL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_reference': $( if [ ! -z ${VIASH_PAR_DATASET_REFERENCE+x} ]; then echo "r'${VIASH_PAR_DATASET_REFERENCE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_summary': $( if [ ! -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then echo "r'${VIASH_PAR_DATASET_SUMMARY//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_description': $( if [ ! -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then echo "r'${VIASH_PAR_DATASET_DESCRIPTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_organism': $( if [ ! -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then echo "r'${VIASH_PAR_DATASET_ORGANISM//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+
+# make dataset lookup table
+# If need be, this could be stored in a separate yaml file
+dataset_funs: Dict[str, Tuple[Callable, Dict[str, Any]]] = {
+    "citeseq_cbmc": (op.data.multimodal.citeseq.load_citeseq_cbmc, {}),
+    "scicar_cell_lines": (op.data.multimodal.scicar.load_scicar_cell_lines, {}),
+    "scicar_mouse_kidney": (op.data.multimodal.scicar.load_scicar_mouse_kidney, {}),
+}
+
+# fetch dataset
+dataset_fun, kwargs = dataset_funs[par["input_id"]]
+
+print("Fetch dataset", flush=True)
+adata = dataset_fun(**kwargs)
+
+print(f"source adata: {adata}", flush=True)
+
+# construct modality2 dataset
+mod2_var_data = {
+    key.replace("mode2_var_", ""): adata.uns[key]
+    for key in adata.uns.keys()
+    if key.startswith("mode2_var_")
+}
+mod2_var = pd.DataFrame(
+    mod2_var_data,
+    index=adata.uns["mode2_var"]
+)
+mod2_obs = adata.obs.loc[adata.uns["mode2_obs"]]
+mod2 = sc.AnnData(
+    obs=mod2_obs,
+    var=mod2_var,
+    layers={ "counts": adata.obsm["mode2"] }
+)
+
+# construct modality1 dataset
+mod1 = adata.copy()
+mod1.uns = { key: value for key, value in mod1.uns.items() if not key.startswith("mode2_")}
+mod1.obsm = { key: value for key, value in mod1.obsm.items() if not key.startswith("mode2_")}
+mod1.obsp = { key: value for key, value in mod1.obsp.items() if not key.startswith("mode2_")}
+mod1.varm = { key: value for key, value in mod1.varm.items() if not key.startswith("mode2_")}
+mod1.varp = { key: value for key, value in mod1.varp.items() if not key.startswith("mode2_")}
+
+# override values one by one because adata.uns and
+# metadata are two different classes.
+for key, value in dataset_fun.metadata.items():
+    print(f"Setting .uns['{key}']", flush=True)
+    mod1.uns[key] = value
+    mod2.uns[key] = value
+
+print("Setting .obs['cell_type']", flush=True)
+if par["obs_cell_type"]:
+    if par["obs_cell_type"] in mod1.obs:
+        mod1.obs["cell_type"] = mod1.obs[par["obs_cell_type"]]
+        mod2.obs["cell_type"] = mod2.obs[par["obs_cell_type"]]
+    else:
+        print(f"Warning: key '{par['obs_cell_type']}' could not be found in adata.obs.", flush=True)
+
+print("Setting .obs['batch']", flush=True)
+if par["obs_batch"]:
+    if par["obs_batch"] in mod1.obs:
+        mod1.obs["batch"] = mod1.obs[par["obs_batch"]]
+        mod2.obs["batch"] = mod2.obs[par["obs_batch"]]
+    else:
+        print(f"Warning: key '{par['obs_batch']}' could not be found in adata.obs.", flush=True)
+
+print("Setting .obs['tissue']", flush=True)
+if par["obs_tissue"]:
+    if par["obs_tissue"] in mod1.obs:
+        mod1.obs["tissue"] = mod1.obs[par["obs_tissue"]]
+        mod2.obs["tissue"] = mod2.obs[par["obs_tissue"]]
+    else:
+        print(f"Warning: key '{par['obs_tissue']}' could not be found in adata.obs.", flush=True)
+
+if par["layer_counts"] and par["layer_counts"] in mod1.layers:
+    print(f"Temporarily moving mod1.layers['{par['layer_counts']}']", flush=True)
+    mod1_X = mod1.layers[par["layer_counts"]]
+    del mod1.layers[par["layer_counts"]]
+else:
+    print("Temporarily moving mod1.X", flush=True)
+    mod1_X = mod1.X
+    del mod1.X
+
+if par["sparse"] and not scipy.sparse.issparse(mod1_X):
+    print("Make mod1 counts sparse", flush=True)
+    mod1_X = scipy.sparse.csr_matrix(mod1_X)
+
+if par["sparse"] and not scipy.sparse.issparse(mod2.layers["counts"]):
+    print("Make mod2 counts sparse", flush=True)
+    mod2.layers["counts"] = scipy.sparse.csr_matrix(mod2.layers["counts"])
+
+print("Moving .X to .layers['counts']", flush=True)
+mod1.layers["counts"] = mod1_X
+
+# just in case
+del mod1.X
+del mod2.X
+
+print("Setting .var['feature_name']", flush=True)
+if par["var_feature_name"] == "index":
+    mod1.var["feature_name"] = mod1.var.index
+    mod2.var["feature_name"] = mod2.var.index
+else: 
+    if par["var_feature_name"] in mod1.var:
+        mod1.var["feature_name"] = mod1.var[par["feature_name"]]
+        del mod1.var[par["feature_name"]]
+    else:
+        print(f"Warning: key '{par['var_feature_name']}' could not be found in adata_mod1.var.", flush=True)
+    if par["var_feature_name"] in mod2.var:
+        mod2.var["feature_name"] = mod2.var[par["feature_name"]]
+        del mod2.var[par["feature_name"]]
+    else:
+        print(f"Warning: key '{par['var_feature_name']}' could not be found in adata_mod2.var.", flush=True)
+
+print("Setting .var['feature_id']", flush=True)
+if par["var_feature_id"] == "index":
+    mod1.var["feature_id"] = mod1.var.index
+    mod2.var["feature_id"] = mod2.var.index
+else:
+    if par["var_feature_id"] in mod1.var:
+        mod1.var["feature_id"] = mod1.var[par["feature_id"]]
+        del mod1.var[par["feature_id"]]
+    else:
+        print(f"Warning: key '{par['var_feature_id']}' could not be found in adata_mod1.var.", flush=True)
+    if par["var_feature_id"] in mod2.var:
+        mod2.var["feature_id"] = mod2.var[par["feature_id"]]
+        del mod2.var[par["feature_id"]]
+    else:
+        print(f"Warning: key '{par['var_feature_id']}' could not be found in adata_mod2.var.", flush=True)
+
+
+print("Add metadata to uns", flush=True)
+metadata_fields = [
+    "dataset_id", "dataset_name", "dataset_url", "dataset_reference",
+    "dataset_summary", "dataset_description", "dataset_organism"
+]
+for key in metadata_fields:
+    if key in par:
+        print(f"  Setting .uns['{key}']", flush=True)
+        mod1.uns[key] = par[key]
+        mod2.uns[key] = par[key]
+
+print("Writing adata to file", flush=True)
+mod1.write_h5ad(par["output_mod1"], compression="gzip")
+mod2.write_h5ad(par["output_mod2"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_PAR_OUTPUT_MOD1=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_PAR_OUTPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD2")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD2' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/loaders/tenx_visium/.config.vsh.yaml b/target/docker/datasets/loaders/tenx_visium/.config.vsh.yaml
new file mode 100644
index 0000000000..ee9d557279
--- /dev/null
+++ b/target/docker/datasets/loaders/tenx_visium/.config.vsh.yaml
@@ -0,0 +1,223 @@
+functionality:
+  name: "tenx_visium"
+  namespace: "datasets/loaders"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "string"
+      name: "--input_expression"
+      description: "URL to the feature / barcode matrix HDF5 of the 10x dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--input_spatial"
+      description: "URL to the Spatial imaging data of the 10x dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--dataset"
+      description: "Output h5ad file"
+      info: null
+      example:
+      - "dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--dataset_id"
+      description: "Unique identifier of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Gene or spot filtering"
+    description: "Arguments related to filtering cells and genes by counts."
+    arguments:
+    - type: "integer"
+      name: "--spot_filter_min_genes"
+      description: "Remove spots with less than this number of genes."
+      info: null
+      example:
+      - 200
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--spot_filter_min_counts"
+      description: "Remove spots with less than this number of counts."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_spots"
+      description: "Remove genes expressed in less than this number of cells."
+      info: null
+      example:
+      - 50
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_counts"
+      description: "Remove genes with less than this number of counts."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean"
+      name: "--remove_mitochondrial"
+      description: "Remove mitochondrial genes?"
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Download a SpaceRanger h5 gene expression file and spatial imaging\
+    \ data from the 10x genomics website (or someplace else).\n"
+  test_resources:
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "squidpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/tenx_visium/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/loaders/tenx_visium"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/loaders/tenx_visium/tenx_visium"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/loaders/tenx_visium/tenx_visium b/target/docker/datasets/loaders/tenx_visium/tenx_visium
new file mode 100755
index 0000000000..e78db16820
--- /dev/null
+++ b/target/docker/datasets/loaders/tenx_visium/tenx_visium
@@ -0,0 +1,1242 @@
+#!/usr/bin/env bash
+
+# tenx_visium 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="tenx_visium"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "tenx_visium 2.0.0"
+  echo ""
+  echo "Download a SpaceRanger h5 gene expression file and spatial imaging data from the"
+  echo "10x genomics website (or someplace else)."
+  echo ""
+  echo "Inputs:"
+  echo "    --input_expression"
+  echo "        type: string, required parameter"
+  echo "        URL to the feature / barcode matrix HDF5 of the 10x dataset."
+  echo ""
+  echo "    --input_spatial"
+  echo "        type: string, required parameter"
+  echo "        URL to the Spatial imaging data of the 10x dataset."
+  echo ""
+  echo "Outputs:"
+  echo "    --dataset"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: dataset.h5ad"
+  echo "        Output h5ad file"
+  echo ""
+  echo "Metadata:"
+  echo "    --dataset_id"
+  echo "        type: string, required parameter"
+  echo "        Unique identifier of the dataset."
+  echo ""
+  echo "    --dataset_name"
+  echo "        type: string, required parameter"
+  echo "        Nicely formatted name."
+  echo ""
+  echo "    --dataset_url"
+  echo "        type: string"
+  echo "        Link to the original source of the dataset."
+  echo ""
+  echo "    --dataset_reference"
+  echo "        type: string"
+  echo "        Bibtex reference of the paper in which the dataset was published."
+  echo ""
+  echo "    --dataset_summary"
+  echo "        type: string, required parameter"
+  echo "        Short description of the dataset."
+  echo ""
+  echo "    --dataset_description"
+  echo "        type: string, required parameter"
+  echo "        Long description of the dataset."
+  echo ""
+  echo "    --dataset_organism"
+  echo "        type: string"
+  echo "        The organism of the dataset."
+  echo ""
+  echo "Gene or spot filtering:"
+  echo "    Arguments related to filtering cells and genes by counts."
+  echo ""
+  echo "    --spot_filter_min_genes"
+  echo "        type: integer"
+  echo "        example: 200"
+  echo "        Remove spots with less than this number of genes."
+  echo ""
+  echo "    --spot_filter_min_counts"
+  echo "        type: integer"
+  echo "        Remove spots with less than this number of counts."
+  echo ""
+  echo "    --gene_filter_min_spots"
+  echo "        type: integer"
+  echo "        example: 50"
+  echo "        Remove genes expressed in less than this number of cells."
+  echo ""
+  echo "    --gene_filter_min_counts"
+  echo "        type: integer"
+  echo "        Remove genes with less than this number of counts."
+  echo ""
+  echo "    --remove_mitochondrial"
+  echo "        type: boolean"
+  echo "        Remove mitochondrial genes?"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM ghcr.io/openproblems-bio/base_python:1.0.4
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "squidpy"
+
+LABEL org.opencontainers.image.description="Companion container for running component datasets/loaders tenx_visium"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:42Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-tenx_visium-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "tenx_visium 2.0.0"
+            exit
+            ;;
+        --input_expression)
+            [ -n "$VIASH_PAR_INPUT_EXPRESSION" ] && ViashError Bad arguments for option \'--input_expression\': \'$VIASH_PAR_INPUT_EXPRESSION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_EXPRESSION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_expression. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_expression=*)
+            [ -n "$VIASH_PAR_INPUT_EXPRESSION" ] && ViashError Bad arguments for option \'--input_expression=*\': \'$VIASH_PAR_INPUT_EXPRESSION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_EXPRESSION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL" ] && ViashError Bad arguments for option \'--input_spatial\': \'$VIASH_PAR_INPUT_SPATIAL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL" ] && ViashError Bad arguments for option \'--input_spatial=*\': \'$VIASH_PAR_INPUT_SPATIAL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset)
+            [ -n "$VIASH_PAR_DATASET" ] && ViashError Bad arguments for option \'--dataset\': \'$VIASH_PAR_DATASET\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset=*)
+            [ -n "$VIASH_PAR_DATASET" ] && ViashError Bad arguments for option \'--dataset=*\': \'$VIASH_PAR_DATASET\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_id)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_id=*)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id=*\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_name)
+            [ -n "$VIASH_PAR_DATASET_NAME" ] && ViashError Bad arguments for option \'--dataset_name\': \'$VIASH_PAR_DATASET_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_NAME="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_name. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_name=*)
+            [ -n "$VIASH_PAR_DATASET_NAME" ] && ViashError Bad arguments for option \'--dataset_name=*\': \'$VIASH_PAR_DATASET_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_NAME=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_url)
+            [ -n "$VIASH_PAR_DATASET_URL" ] && ViashError Bad arguments for option \'--dataset_url\': \'$VIASH_PAR_DATASET_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_URL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_url. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_url=*)
+            [ -n "$VIASH_PAR_DATASET_URL" ] && ViashError Bad arguments for option \'--dataset_url=*\': \'$VIASH_PAR_DATASET_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_URL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_reference)
+            [ -n "$VIASH_PAR_DATASET_REFERENCE" ] && ViashError Bad arguments for option \'--dataset_reference\': \'$VIASH_PAR_DATASET_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_REFERENCE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_reference. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_reference=*)
+            [ -n "$VIASH_PAR_DATASET_REFERENCE" ] && ViashError Bad arguments for option \'--dataset_reference=*\': \'$VIASH_PAR_DATASET_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_REFERENCE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_summary)
+            [ -n "$VIASH_PAR_DATASET_SUMMARY" ] && ViashError Bad arguments for option \'--dataset_summary\': \'$VIASH_PAR_DATASET_SUMMARY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_SUMMARY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_summary. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_summary=*)
+            [ -n "$VIASH_PAR_DATASET_SUMMARY" ] && ViashError Bad arguments for option \'--dataset_summary=*\': \'$VIASH_PAR_DATASET_SUMMARY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_SUMMARY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_description)
+            [ -n "$VIASH_PAR_DATASET_DESCRIPTION" ] && ViashError Bad arguments for option \'--dataset_description\': \'$VIASH_PAR_DATASET_DESCRIPTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_DESCRIPTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_description. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_description=*)
+            [ -n "$VIASH_PAR_DATASET_DESCRIPTION" ] && ViashError Bad arguments for option \'--dataset_description=*\': \'$VIASH_PAR_DATASET_DESCRIPTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_DESCRIPTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_organism)
+            [ -n "$VIASH_PAR_DATASET_ORGANISM" ] && ViashError Bad arguments for option \'--dataset_organism\': \'$VIASH_PAR_DATASET_ORGANISM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ORGANISM="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_organism. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_organism=*)
+            [ -n "$VIASH_PAR_DATASET_ORGANISM" ] && ViashError Bad arguments for option \'--dataset_organism=*\': \'$VIASH_PAR_DATASET_ORGANISM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ORGANISM=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --spot_filter_min_genes)
+            [ -n "$VIASH_PAR_SPOT_FILTER_MIN_GENES" ] && ViashError Bad arguments for option \'--spot_filter_min_genes\': \'$VIASH_PAR_SPOT_FILTER_MIN_GENES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SPOT_FILTER_MIN_GENES="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --spot_filter_min_genes. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --spot_filter_min_genes=*)
+            [ -n "$VIASH_PAR_SPOT_FILTER_MIN_GENES" ] && ViashError Bad arguments for option \'--spot_filter_min_genes=*\': \'$VIASH_PAR_SPOT_FILTER_MIN_GENES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SPOT_FILTER_MIN_GENES=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --spot_filter_min_counts)
+            [ -n "$VIASH_PAR_SPOT_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--spot_filter_min_counts\': \'$VIASH_PAR_SPOT_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SPOT_FILTER_MIN_COUNTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --spot_filter_min_counts. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --spot_filter_min_counts=*)
+            [ -n "$VIASH_PAR_SPOT_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--spot_filter_min_counts=*\': \'$VIASH_PAR_SPOT_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SPOT_FILTER_MIN_COUNTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --gene_filter_min_spots)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_SPOTS" ] && ViashError Bad arguments for option \'--gene_filter_min_spots\': \'$VIASH_PAR_GENE_FILTER_MIN_SPOTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_SPOTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --gene_filter_min_spots. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --gene_filter_min_spots=*)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_SPOTS" ] && ViashError Bad arguments for option \'--gene_filter_min_spots=*\': \'$VIASH_PAR_GENE_FILTER_MIN_SPOTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_SPOTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --gene_filter_min_counts)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--gene_filter_min_counts\': \'$VIASH_PAR_GENE_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_COUNTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --gene_filter_min_counts. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --gene_filter_min_counts=*)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--gene_filter_min_counts=*\': \'$VIASH_PAR_GENE_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_COUNTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --remove_mitochondrial)
+            [ -n "$VIASH_PAR_REMOVE_MITOCHONDRIAL" ] && ViashError Bad arguments for option \'--remove_mitochondrial\': \'$VIASH_PAR_REMOVE_MITOCHONDRIAL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_REMOVE_MITOCHONDRIAL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --remove_mitochondrial. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --remove_mitochondrial=*)
+            [ -n "$VIASH_PAR_REMOVE_MITOCHONDRIAL" ] && ViashError Bad arguments for option \'--remove_mitochondrial=*\': \'$VIASH_PAR_REMOVE_MITOCHONDRIAL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_REMOVE_MITOCHONDRIAL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/tenx_visium:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/tenx_visium:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/tenx_visium:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/tenx_visium:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_EXPRESSION+x} ]; then
+  ViashError '--input_expression' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL+x} ]; then
+  ViashError '--input_spatial' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET+x} ]; then
+  ViashError '--dataset' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_ID+x} ]; then
+  ViashError '--dataset_id' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_NAME+x} ]; then
+  ViashError '--dataset_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then
+  ViashError '--dataset_summary' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then
+  ViashError '--dataset_description' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_SPOT_FILTER_MIN_GENES" ]]; then
+  if ! [[ "$VIASH_PAR_SPOT_FILTER_MIN_GENES" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--spot_filter_min_genes' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_SPOT_FILTER_MIN_COUNTS" ]]; then
+  if ! [[ "$VIASH_PAR_SPOT_FILTER_MIN_COUNTS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--spot_filter_min_counts' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_GENE_FILTER_MIN_SPOTS" ]]; then
+  if ! [[ "$VIASH_PAR_GENE_FILTER_MIN_SPOTS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--gene_filter_min_spots' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" ]]; then
+  if ! [[ "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--gene_filter_min_counts' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_REMOVE_MITOCHONDRIAL" ]]; then
+  if ! [[ "$VIASH_PAR_REMOVE_MITOCHONDRIAL" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--remove_mitochondrial' has to be a boolean. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_DATASET" ] && [ ! -d "$(dirname "$VIASH_PAR_DATASET")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_DATASET")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_DATASET" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_DATASET")" )
+  VIASH_PAR_DATASET=$(ViashAutodetectMount "$VIASH_PAR_DATASET")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_DATASET" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/loaders/tenx_visium:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/loaders/tenx_visium:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/loaders/tenx_visium:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-tenx_visium-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import subprocess
+import squidpy as sq
+import tempfile
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_expression': $( if [ ! -z ${VIASH_PAR_INPUT_EXPRESSION+x} ]; then echo "r'${VIASH_PAR_INPUT_EXPRESSION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_spatial': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset': $( if [ ! -z ${VIASH_PAR_DATASET+x} ]; then echo "r'${VIASH_PAR_DATASET//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_id': $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo "r'${VIASH_PAR_DATASET_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_name': $( if [ ! -z ${VIASH_PAR_DATASET_NAME+x} ]; then echo "r'${VIASH_PAR_DATASET_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_url': $( if [ ! -z ${VIASH_PAR_DATASET_URL+x} ]; then echo "r'${VIASH_PAR_DATASET_URL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_reference': $( if [ ! -z ${VIASH_PAR_DATASET_REFERENCE+x} ]; then echo "r'${VIASH_PAR_DATASET_REFERENCE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_summary': $( if [ ! -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then echo "r'${VIASH_PAR_DATASET_SUMMARY//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_description': $( if [ ! -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then echo "r'${VIASH_PAR_DATASET_DESCRIPTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_organism': $( if [ ! -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then echo "r'${VIASH_PAR_DATASET_ORGANISM//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'spot_filter_min_genes': $( if [ ! -z ${VIASH_PAR_SPOT_FILTER_MIN_GENES+x} ]; then echo "int(r'${VIASH_PAR_SPOT_FILTER_MIN_GENES//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'spot_filter_min_counts': $( if [ ! -z ${VIASH_PAR_SPOT_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_SPOT_FILTER_MIN_COUNTS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'gene_filter_min_spots': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_SPOTS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_SPOTS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'gene_filter_min_counts': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_COUNTS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'remove_mitochondrial': $( if [ ! -z ${VIASH_PAR_REMOVE_MITOCHONDRIAL+x} ]; then echo "r'${VIASH_PAR_REMOVE_MITOCHONDRIAL//\'/\'\"\'\"r\'}'.lower() == 'true'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print(f"Downloading data", flush=True)
+with tempfile.TemporaryDirectory() as tempdir:
+  input_exp = "feature_bc_matrix.h5"
+  input_sp = "image_data.tar.gz"
+  epx_data = subprocess.run(["wget", "-O", f"{tempdir}/{input_exp}", par['input_expression']], stderr=subprocess.STDOUT)
+  sp_data = subprocess.run(["wget", "-O", f"{tempdir}/{input_sp}", par['input_spatial']], stderr=subprocess.STDOUT)
+  extract_spatial = subprocess.run(["tar", "-xzf", f"{tempdir}/{input_sp}", "-C", tempdir], stderr=subprocess.STDOUT)
+
+  # Read visium data and create anndata object
+  adata = sq.read.visium(path=tempdir, counts_file=input_exp)
+
+# Make variable names unique
+adata.var_names_make_unique()
+
+sc.pp.calculate_qc_metrics(adata, inplace=True)
+
+print("Filtering spots or genes")
+t0 = adata.shape
+# remove cells with few counts
+if par["spot_filter_min_counts"]:
+  sc.pp.filter_cells(adata, min_counts=par["spot_filter_min_counts"], inplace=True)
+# remove cells with few genes 
+if par["spot_filter_min_genes"]:
+  sc.pp.filter_cells(adata, min_genes=par["spot_filter_min_genes"], inplace=True)
+# remove genes that have few counts
+if par["gene_filter_min_counts"]:
+  sc.pp.filter_genes(adata, min_counts=par["gene_filter_min_counts"], inplace=True)
+# remove genes that are found in few cells
+if par["gene_filter_min_spots"]:
+  sc.pp.filter_genes(adata, min_cells=par["gene_filter_min_spots"], inplace=True)
+t1 = adata.shape
+print(f"Removed {t0[0] - t1[0]} cells and {(t0[1] - t1[1])} genes.")
+
+if par["remove_mitochondrial"]:
+  print("Removing mitochondrial genes")
+  non_mito_genes_list = [name for name in adata.var_names if not (name.startswith('MT-') or name.startswith('mt-'))]
+  adata = adata[:, non_mito_genes_list]
+  
+# Rename .var columns
+adata.var['feature_name'] = adata.var_names
+adata.var.set_index(adata.var['gene_ids'], inplace=True)
+adata.var.rename(columns={"gene_ids": "feature_id"}, inplace=True)
+
+# Move counts to .layers
+print("Add metadata to uns", flush=True)
+adata.layers["counts"] = adata.X
+adata.X = None
+
+# Add metadata
+print("Add metadata to uns", flush=True)
+metadata_fields = ["dataset_id", "dataset_name", "dataset_url", "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism"]
+for key in metadata_fields:
+  if key in par:
+    print(f"Setting .uns['{key}']", flush=True)
+    adata.uns[key] = par[key]
+
+print("Writing adata to file", flush=True)
+adata.write_h5ad(par["dataset"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_DATASET" ]; then
+  VIASH_PAR_DATASET=$(ViashStripAutomount "$VIASH_PAR_DATASET")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_DATASET" ] && [ ! -e "$VIASH_PAR_DATASET" ]; then
+  ViashError "Output file '$VIASH_PAR_DATASET' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/loaders/zenodo_spatial/.config.vsh.yaml b/target/docker/datasets/loaders/zenodo_spatial/.config.vsh.yaml
new file mode 100644
index 0000000000..b6693cde5e
--- /dev/null
+++ b/target/docker/datasets/loaders/zenodo_spatial/.config.vsh.yaml
@@ -0,0 +1,208 @@
+functionality:
+  name: "zenodo_spatial"
+  namespace: "datasets/loaders"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "string"
+      name: "--input_data"
+      description: "URL to the Anndata file."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--dataset"
+      description: "Output h5ad file"
+      info: null
+      example:
+      - "dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--dataset_id"
+      description: "Unique identifier of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Gene or spot filtering"
+    description: "Arguments related to filtering cells and genes by counts."
+    arguments:
+    - type: "integer"
+      name: "--spot_filter_min_genes"
+      description: "Remove spots with less than this number of genes."
+      info: null
+      example:
+      - 200
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--spot_filter_min_counts"
+      description: "Remove spots with less than this number of counts."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_spots"
+      description: "Remove genes expressed in less than this number of cells."
+      info: null
+      example:
+      - 50
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_counts"
+      description: "Remove genes with less than this number of counts."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean"
+      name: "--remove_mitochondrial"
+      description: "Remove mitochondrial genes?"
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Download an Anndata file containing DBiT seq, MERFISH, seqFISH, Slide-seq\
+    \ v2, STARmap, and Stereo-seq data from Zenodo.\n"
+  test_resources:
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/zenodo_spatial/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/loaders/zenodo_spatial"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/loaders/zenodo_spatial/zenodo_spatial"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/loaders/zenodo_spatial/zenodo_spatial b/target/docker/datasets/loaders/zenodo_spatial/zenodo_spatial
new file mode 100755
index 0000000000..113b0f91ad
--- /dev/null
+++ b/target/docker/datasets/loaders/zenodo_spatial/zenodo_spatial
@@ -0,0 +1,1224 @@
+#!/usr/bin/env bash
+
+# zenodo_spatial 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="zenodo_spatial"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "zenodo_spatial 2.0.0"
+  echo ""
+  echo "Download an Anndata file containing DBiT seq, MERFISH, seqFISH, Slide-seq v2,"
+  echo "STARmap, and Stereo-seq data from Zenodo."
+  echo ""
+  echo "Inputs:"
+  echo "    --input_data"
+  echo "        type: string, required parameter"
+  echo "        URL to the Anndata file."
+  echo ""
+  echo "Outputs:"
+  echo "    --dataset"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: dataset.h5ad"
+  echo "        Output h5ad file"
+  echo ""
+  echo "Metadata:"
+  echo "    --dataset_id"
+  echo "        type: string, required parameter"
+  echo "        Unique identifier of the dataset."
+  echo ""
+  echo "    --dataset_name"
+  echo "        type: string, required parameter"
+  echo "        Nicely formatted name."
+  echo ""
+  echo "    --dataset_url"
+  echo "        type: string"
+  echo "        Link to the original source of the dataset."
+  echo ""
+  echo "    --dataset_reference"
+  echo "        type: string"
+  echo "        Bibtex reference of the paper in which the dataset was published."
+  echo ""
+  echo "    --dataset_summary"
+  echo "        type: string, required parameter"
+  echo "        Short description of the dataset."
+  echo ""
+  echo "    --dataset_description"
+  echo "        type: string, required parameter"
+  echo "        Long description of the dataset."
+  echo ""
+  echo "    --dataset_organism"
+  echo "        type: string"
+  echo "        The organism of the dataset."
+  echo ""
+  echo "Gene or spot filtering:"
+  echo "    Arguments related to filtering cells and genes by counts."
+  echo ""
+  echo "    --spot_filter_min_genes"
+  echo "        type: integer"
+  echo "        example: 200"
+  echo "        Remove spots with less than this number of genes."
+  echo ""
+  echo "    --spot_filter_min_counts"
+  echo "        type: integer"
+  echo "        Remove spots with less than this number of counts."
+  echo ""
+  echo "    --gene_filter_min_spots"
+  echo "        type: integer"
+  echo "        example: 50"
+  echo "        Remove genes expressed in less than this number of cells."
+  echo ""
+  echo "    --gene_filter_min_counts"
+  echo "        type: integer"
+  echo "        Remove genes with less than this number of counts."
+  echo ""
+  echo "    --remove_mitochondrial"
+  echo "        type: boolean"
+  echo "        Remove mitochondrial genes?"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component datasets/loaders zenodo_spatial"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:42Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-zenodo_spatial-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "zenodo_spatial 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset)
+            [ -n "$VIASH_PAR_DATASET" ] && ViashError Bad arguments for option \'--dataset\': \'$VIASH_PAR_DATASET\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset=*)
+            [ -n "$VIASH_PAR_DATASET" ] && ViashError Bad arguments for option \'--dataset=*\': \'$VIASH_PAR_DATASET\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_id)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_id=*)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id=*\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_name)
+            [ -n "$VIASH_PAR_DATASET_NAME" ] && ViashError Bad arguments for option \'--dataset_name\': \'$VIASH_PAR_DATASET_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_NAME="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_name. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_name=*)
+            [ -n "$VIASH_PAR_DATASET_NAME" ] && ViashError Bad arguments for option \'--dataset_name=*\': \'$VIASH_PAR_DATASET_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_NAME=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_url)
+            [ -n "$VIASH_PAR_DATASET_URL" ] && ViashError Bad arguments for option \'--dataset_url\': \'$VIASH_PAR_DATASET_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_URL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_url. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_url=*)
+            [ -n "$VIASH_PAR_DATASET_URL" ] && ViashError Bad arguments for option \'--dataset_url=*\': \'$VIASH_PAR_DATASET_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_URL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_reference)
+            [ -n "$VIASH_PAR_DATASET_REFERENCE" ] && ViashError Bad arguments for option \'--dataset_reference\': \'$VIASH_PAR_DATASET_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_REFERENCE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_reference. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_reference=*)
+            [ -n "$VIASH_PAR_DATASET_REFERENCE" ] && ViashError Bad arguments for option \'--dataset_reference=*\': \'$VIASH_PAR_DATASET_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_REFERENCE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_summary)
+            [ -n "$VIASH_PAR_DATASET_SUMMARY" ] && ViashError Bad arguments for option \'--dataset_summary\': \'$VIASH_PAR_DATASET_SUMMARY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_SUMMARY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_summary. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_summary=*)
+            [ -n "$VIASH_PAR_DATASET_SUMMARY" ] && ViashError Bad arguments for option \'--dataset_summary=*\': \'$VIASH_PAR_DATASET_SUMMARY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_SUMMARY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_description)
+            [ -n "$VIASH_PAR_DATASET_DESCRIPTION" ] && ViashError Bad arguments for option \'--dataset_description\': \'$VIASH_PAR_DATASET_DESCRIPTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_DESCRIPTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_description. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_description=*)
+            [ -n "$VIASH_PAR_DATASET_DESCRIPTION" ] && ViashError Bad arguments for option \'--dataset_description=*\': \'$VIASH_PAR_DATASET_DESCRIPTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_DESCRIPTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_organism)
+            [ -n "$VIASH_PAR_DATASET_ORGANISM" ] && ViashError Bad arguments for option \'--dataset_organism\': \'$VIASH_PAR_DATASET_ORGANISM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ORGANISM="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_organism. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_organism=*)
+            [ -n "$VIASH_PAR_DATASET_ORGANISM" ] && ViashError Bad arguments for option \'--dataset_organism=*\': \'$VIASH_PAR_DATASET_ORGANISM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ORGANISM=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --spot_filter_min_genes)
+            [ -n "$VIASH_PAR_SPOT_FILTER_MIN_GENES" ] && ViashError Bad arguments for option \'--spot_filter_min_genes\': \'$VIASH_PAR_SPOT_FILTER_MIN_GENES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SPOT_FILTER_MIN_GENES="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --spot_filter_min_genes. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --spot_filter_min_genes=*)
+            [ -n "$VIASH_PAR_SPOT_FILTER_MIN_GENES" ] && ViashError Bad arguments for option \'--spot_filter_min_genes=*\': \'$VIASH_PAR_SPOT_FILTER_MIN_GENES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SPOT_FILTER_MIN_GENES=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --spot_filter_min_counts)
+            [ -n "$VIASH_PAR_SPOT_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--spot_filter_min_counts\': \'$VIASH_PAR_SPOT_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SPOT_FILTER_MIN_COUNTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --spot_filter_min_counts. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --spot_filter_min_counts=*)
+            [ -n "$VIASH_PAR_SPOT_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--spot_filter_min_counts=*\': \'$VIASH_PAR_SPOT_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SPOT_FILTER_MIN_COUNTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --gene_filter_min_spots)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_SPOTS" ] && ViashError Bad arguments for option \'--gene_filter_min_spots\': \'$VIASH_PAR_GENE_FILTER_MIN_SPOTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_SPOTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --gene_filter_min_spots. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --gene_filter_min_spots=*)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_SPOTS" ] && ViashError Bad arguments for option \'--gene_filter_min_spots=*\': \'$VIASH_PAR_GENE_FILTER_MIN_SPOTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_SPOTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --gene_filter_min_counts)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--gene_filter_min_counts\': \'$VIASH_PAR_GENE_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_COUNTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --gene_filter_min_counts. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --gene_filter_min_counts=*)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--gene_filter_min_counts=*\': \'$VIASH_PAR_GENE_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_COUNTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --remove_mitochondrial)
+            [ -n "$VIASH_PAR_REMOVE_MITOCHONDRIAL" ] && ViashError Bad arguments for option \'--remove_mitochondrial\': \'$VIASH_PAR_REMOVE_MITOCHONDRIAL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_REMOVE_MITOCHONDRIAL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --remove_mitochondrial. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --remove_mitochondrial=*)
+            [ -n "$VIASH_PAR_REMOVE_MITOCHONDRIAL" ] && ViashError Bad arguments for option \'--remove_mitochondrial=*\': \'$VIASH_PAR_REMOVE_MITOCHONDRIAL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_REMOVE_MITOCHONDRIAL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/zenodo_spatial:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/zenodo_spatial:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/zenodo_spatial:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/zenodo_spatial:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET+x} ]; then
+  ViashError '--dataset' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_ID+x} ]; then
+  ViashError '--dataset_id' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_NAME+x} ]; then
+  ViashError '--dataset_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then
+  ViashError '--dataset_summary' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then
+  ViashError '--dataset_description' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_SPOT_FILTER_MIN_GENES" ]]; then
+  if ! [[ "$VIASH_PAR_SPOT_FILTER_MIN_GENES" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--spot_filter_min_genes' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_SPOT_FILTER_MIN_COUNTS" ]]; then
+  if ! [[ "$VIASH_PAR_SPOT_FILTER_MIN_COUNTS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--spot_filter_min_counts' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_GENE_FILTER_MIN_SPOTS" ]]; then
+  if ! [[ "$VIASH_PAR_GENE_FILTER_MIN_SPOTS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--gene_filter_min_spots' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" ]]; then
+  if ! [[ "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--gene_filter_min_counts' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_REMOVE_MITOCHONDRIAL" ]]; then
+  if ! [[ "$VIASH_PAR_REMOVE_MITOCHONDRIAL" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--remove_mitochondrial' has to be a boolean. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_DATASET" ] && [ ! -d "$(dirname "$VIASH_PAR_DATASET")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_DATASET")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_DATASET" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_DATASET")" )
+  VIASH_PAR_DATASET=$(ViashAutodetectMount "$VIASH_PAR_DATASET")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_DATASET" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/loaders/zenodo_spatial:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/loaders/zenodo_spatial:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/loaders/zenodo_spatial:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-zenodo_spatial-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import subprocess
+import tempfile
+import scanpy as sc
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset': $( if [ ! -z ${VIASH_PAR_DATASET+x} ]; then echo "r'${VIASH_PAR_DATASET//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_id': $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo "r'${VIASH_PAR_DATASET_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_name': $( if [ ! -z ${VIASH_PAR_DATASET_NAME+x} ]; then echo "r'${VIASH_PAR_DATASET_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_url': $( if [ ! -z ${VIASH_PAR_DATASET_URL+x} ]; then echo "r'${VIASH_PAR_DATASET_URL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_reference': $( if [ ! -z ${VIASH_PAR_DATASET_REFERENCE+x} ]; then echo "r'${VIASH_PAR_DATASET_REFERENCE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_summary': $( if [ ! -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then echo "r'${VIASH_PAR_DATASET_SUMMARY//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_description': $( if [ ! -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then echo "r'${VIASH_PAR_DATASET_DESCRIPTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_organism': $( if [ ! -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then echo "r'${VIASH_PAR_DATASET_ORGANISM//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'spot_filter_min_genes': $( if [ ! -z ${VIASH_PAR_SPOT_FILTER_MIN_GENES+x} ]; then echo "int(r'${VIASH_PAR_SPOT_FILTER_MIN_GENES//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'spot_filter_min_counts': $( if [ ! -z ${VIASH_PAR_SPOT_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_SPOT_FILTER_MIN_COUNTS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'gene_filter_min_spots': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_SPOTS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_SPOTS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'gene_filter_min_counts': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_COUNTS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'remove_mitochondrial': $( if [ ! -z ${VIASH_PAR_REMOVE_MITOCHONDRIAL+x} ]; then echo "r'${VIASH_PAR_REMOVE_MITOCHONDRIAL//\'/\'\"\'\"r\'}'.lower() == 'true'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print(f"Downloading data", flush=True)
+with tempfile.TemporaryDirectory() as tempdir:
+    input_data = "input_data.h5ad"
+    epx_data = subprocess.run(["wget", "-O", f"{tempdir}/{input_data}", par['input_data']], stderr=subprocess.STDOUT)
+    adata = sc.read_h5ad(filename=f"{tempdir}/{input_data}")
+
+# Make variable names unique
+adata.var_names_make_unique()
+
+sc.pp.calculate_qc_metrics(adata, inplace=True, percent_top=None)
+
+print("Filtering spots or genes")
+t0 = adata.shape
+# remove cells with few counts
+if par["spot_filter_min_counts"]:
+    sc.pp.filter_cells(
+        adata, min_counts=par["spot_filter_min_counts"], inplace=True)
+
+# remove cells with few genes
+if par["spot_filter_min_genes"]:
+    sc.pp.filter_cells(
+        adata, min_genes=par["spot_filter_min_genes"], inplace=True)
+
+# remove genes that have few counts
+if par["gene_filter_min_counts"]:
+    sc.pp.filter_genes(
+        adata, min_counts=par["gene_filter_min_counts"], inplace=True)
+
+# remove genes that are found in few cells
+if par["gene_filter_min_spots"]:
+    sc.pp.filter_genes(
+        adata, min_cells=par["gene_filter_min_spots"], inplace=True)
+
+t1 = adata.shape
+print(f"Removed {t0[0] - t1[0]} cells and {(t0[1] - t1[1])} genes.")
+
+if par["remove_mitochondrial"]:
+    print("Removing mitochondrial genes")
+    non_mito_genes_list = [name for name in adata.var_names if not (
+        name.startswith('MT-') or name.startswith('mt-'))]
+    adata = adata[:, non_mito_genes_list]
+
+# Rename .var columns
+adata.var['feature_name'] = adata.var_names
+if('gene_ids' in adata.var):
+    adata.var.set_index(adata.var['gene_ids'], inplace=True)
+    adata.var.rename(columns={"gene_ids": "feature_id"}, inplace=True)
+
+# Move counts to .layers
+print("Add metadata to uns", flush=True)
+adata.layers["counts"] = adata.X
+adata.X = None
+
+# Add metadata
+print("Add metadata to uns", flush=True)
+metadata_fields = ["dataset_id", "dataset_name", "dataset_url", "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism"]
+for key in metadata_fields:
+    if key in par:
+        print(f"Setting .uns['{key}']", flush=True)
+        adata.uns[key] = par[key]
+
+print("Writing adata to file", flush=True)
+adata.write_h5ad(par["dataset"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_DATASET" ]; then
+  VIASH_PAR_DATASET=$(ViashStripAutomount "$VIASH_PAR_DATASET")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_DATASET" ] && [ ! -e "$VIASH_PAR_DATASET" ]; then
+  ViashError "Output file '$VIASH_PAR_DATASET' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/loaders/zenodo_spatial_slidetags/.config.vsh.yaml b/target/docker/datasets/loaders/zenodo_spatial_slidetags/.config.vsh.yaml
new file mode 100644
index 0000000000..cb444aa1c9
--- /dev/null
+++ b/target/docker/datasets/loaders/zenodo_spatial_slidetags/.config.vsh.yaml
@@ -0,0 +1,208 @@
+functionality:
+  name: "zenodo_spatial_slidetags"
+  namespace: "datasets/loaders"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "string"
+      name: "--input_data"
+      description: "URL to the file."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--dataset"
+      description: "Output h5ad file"
+      info: null
+      example:
+      - "dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--dataset_id"
+      description: "Unique identifier of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Gene or spot filtering"
+    description: "Arguments related to filtering cells and genes by counts."
+    arguments:
+    - type: "integer"
+      name: "--spot_filter_min_genes"
+      description: "Remove spots with less than this number of genes."
+      info: null
+      example:
+      - 200
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--spot_filter_min_counts"
+      description: "Remove spots with less than this number of counts."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_spots"
+      description: "Remove genes expressed in less than this number of cells."
+      info: null
+      example:
+      - 50
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_counts"
+      description: "Remove genes with less than this number of counts."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean"
+      name: "--remove_mitochondrial"
+      description: "Remove mitochondrial genes?"
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Download a compressed file containing gene expression matrix and spatial\
+    \ locations from zenodo.\n"
+  test_resources:
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/zenodo_spatial_slidetags/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/loaders/zenodo_spatial_slidetags"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/loaders/zenodo_spatial_slidetags/zenodo_spatial_slidetags"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/loaders/zenodo_spatial_slidetags/zenodo_spatial_slidetags b/target/docker/datasets/loaders/zenodo_spatial_slidetags/zenodo_spatial_slidetags
new file mode 100755
index 0000000000..fd3d1bd8ea
--- /dev/null
+++ b/target/docker/datasets/loaders/zenodo_spatial_slidetags/zenodo_spatial_slidetags
@@ -0,0 +1,1242 @@
+#!/usr/bin/env bash
+
+# zenodo_spatial_slidetags 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="zenodo_spatial_slidetags"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "zenodo_spatial_slidetags 2.0.0"
+  echo ""
+  echo "Download a compressed file containing gene expression matrix and spatial"
+  echo "locations from zenodo."
+  echo ""
+  echo "Inputs:"
+  echo "    --input_data"
+  echo "        type: string, required parameter"
+  echo "        URL to the file."
+  echo ""
+  echo "Outputs:"
+  echo "    --dataset"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: dataset.h5ad"
+  echo "        Output h5ad file"
+  echo ""
+  echo "Metadata:"
+  echo "    --dataset_id"
+  echo "        type: string, required parameter"
+  echo "        Unique identifier of the dataset."
+  echo ""
+  echo "    --dataset_name"
+  echo "        type: string, required parameter"
+  echo "        Nicely formatted name."
+  echo ""
+  echo "    --dataset_url"
+  echo "        type: string"
+  echo "        Link to the original source of the dataset."
+  echo ""
+  echo "    --dataset_reference"
+  echo "        type: string"
+  echo "        Bibtex reference of the paper in which the dataset was published."
+  echo ""
+  echo "    --dataset_summary"
+  echo "        type: string, required parameter"
+  echo "        Short description of the dataset."
+  echo ""
+  echo "    --dataset_description"
+  echo "        type: string, required parameter"
+  echo "        Long description of the dataset."
+  echo ""
+  echo "    --dataset_organism"
+  echo "        type: string"
+  echo "        The organism of the dataset."
+  echo ""
+  echo "Gene or spot filtering:"
+  echo "    Arguments related to filtering cells and genes by counts."
+  echo ""
+  echo "    --spot_filter_min_genes"
+  echo "        type: integer"
+  echo "        example: 200"
+  echo "        Remove spots with less than this number of genes."
+  echo ""
+  echo "    --spot_filter_min_counts"
+  echo "        type: integer"
+  echo "        Remove spots with less than this number of counts."
+  echo ""
+  echo "    --gene_filter_min_spots"
+  echo "        type: integer"
+  echo "        example: 50"
+  echo "        Remove genes expressed in less than this number of cells."
+  echo ""
+  echo "    --gene_filter_min_counts"
+  echo "        type: integer"
+  echo "        Remove genes with less than this number of counts."
+  echo ""
+  echo "    --remove_mitochondrial"
+  echo "        type: boolean"
+  echo "        Remove mitochondrial genes?"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component datasets/loaders zenodo_spatial_slidetags"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:41Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-zenodo_spatial_slidetags-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "zenodo_spatial_slidetags 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset)
+            [ -n "$VIASH_PAR_DATASET" ] && ViashError Bad arguments for option \'--dataset\': \'$VIASH_PAR_DATASET\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset=*)
+            [ -n "$VIASH_PAR_DATASET" ] && ViashError Bad arguments for option \'--dataset=*\': \'$VIASH_PAR_DATASET\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_id)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_id=*)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id=*\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_name)
+            [ -n "$VIASH_PAR_DATASET_NAME" ] && ViashError Bad arguments for option \'--dataset_name\': \'$VIASH_PAR_DATASET_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_NAME="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_name. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_name=*)
+            [ -n "$VIASH_PAR_DATASET_NAME" ] && ViashError Bad arguments for option \'--dataset_name=*\': \'$VIASH_PAR_DATASET_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_NAME=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_url)
+            [ -n "$VIASH_PAR_DATASET_URL" ] && ViashError Bad arguments for option \'--dataset_url\': \'$VIASH_PAR_DATASET_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_URL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_url. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_url=*)
+            [ -n "$VIASH_PAR_DATASET_URL" ] && ViashError Bad arguments for option \'--dataset_url=*\': \'$VIASH_PAR_DATASET_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_URL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_reference)
+            [ -n "$VIASH_PAR_DATASET_REFERENCE" ] && ViashError Bad arguments for option \'--dataset_reference\': \'$VIASH_PAR_DATASET_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_REFERENCE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_reference. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_reference=*)
+            [ -n "$VIASH_PAR_DATASET_REFERENCE" ] && ViashError Bad arguments for option \'--dataset_reference=*\': \'$VIASH_PAR_DATASET_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_REFERENCE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_summary)
+            [ -n "$VIASH_PAR_DATASET_SUMMARY" ] && ViashError Bad arguments for option \'--dataset_summary\': \'$VIASH_PAR_DATASET_SUMMARY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_SUMMARY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_summary. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_summary=*)
+            [ -n "$VIASH_PAR_DATASET_SUMMARY" ] && ViashError Bad arguments for option \'--dataset_summary=*\': \'$VIASH_PAR_DATASET_SUMMARY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_SUMMARY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_description)
+            [ -n "$VIASH_PAR_DATASET_DESCRIPTION" ] && ViashError Bad arguments for option \'--dataset_description\': \'$VIASH_PAR_DATASET_DESCRIPTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_DESCRIPTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_description. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_description=*)
+            [ -n "$VIASH_PAR_DATASET_DESCRIPTION" ] && ViashError Bad arguments for option \'--dataset_description=*\': \'$VIASH_PAR_DATASET_DESCRIPTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_DESCRIPTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_organism)
+            [ -n "$VIASH_PAR_DATASET_ORGANISM" ] && ViashError Bad arguments for option \'--dataset_organism\': \'$VIASH_PAR_DATASET_ORGANISM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ORGANISM="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_organism. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_organism=*)
+            [ -n "$VIASH_PAR_DATASET_ORGANISM" ] && ViashError Bad arguments for option \'--dataset_organism=*\': \'$VIASH_PAR_DATASET_ORGANISM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ORGANISM=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --spot_filter_min_genes)
+            [ -n "$VIASH_PAR_SPOT_FILTER_MIN_GENES" ] && ViashError Bad arguments for option \'--spot_filter_min_genes\': \'$VIASH_PAR_SPOT_FILTER_MIN_GENES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SPOT_FILTER_MIN_GENES="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --spot_filter_min_genes. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --spot_filter_min_genes=*)
+            [ -n "$VIASH_PAR_SPOT_FILTER_MIN_GENES" ] && ViashError Bad arguments for option \'--spot_filter_min_genes=*\': \'$VIASH_PAR_SPOT_FILTER_MIN_GENES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SPOT_FILTER_MIN_GENES=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --spot_filter_min_counts)
+            [ -n "$VIASH_PAR_SPOT_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--spot_filter_min_counts\': \'$VIASH_PAR_SPOT_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SPOT_FILTER_MIN_COUNTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --spot_filter_min_counts. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --spot_filter_min_counts=*)
+            [ -n "$VIASH_PAR_SPOT_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--spot_filter_min_counts=*\': \'$VIASH_PAR_SPOT_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SPOT_FILTER_MIN_COUNTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --gene_filter_min_spots)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_SPOTS" ] && ViashError Bad arguments for option \'--gene_filter_min_spots\': \'$VIASH_PAR_GENE_FILTER_MIN_SPOTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_SPOTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --gene_filter_min_spots. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --gene_filter_min_spots=*)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_SPOTS" ] && ViashError Bad arguments for option \'--gene_filter_min_spots=*\': \'$VIASH_PAR_GENE_FILTER_MIN_SPOTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_SPOTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --gene_filter_min_counts)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--gene_filter_min_counts\': \'$VIASH_PAR_GENE_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_COUNTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --gene_filter_min_counts. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --gene_filter_min_counts=*)
+            [ -n "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" ] && ViashError Bad arguments for option \'--gene_filter_min_counts=*\': \'$VIASH_PAR_GENE_FILTER_MIN_COUNTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GENE_FILTER_MIN_COUNTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --remove_mitochondrial)
+            [ -n "$VIASH_PAR_REMOVE_MITOCHONDRIAL" ] && ViashError Bad arguments for option \'--remove_mitochondrial\': \'$VIASH_PAR_REMOVE_MITOCHONDRIAL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_REMOVE_MITOCHONDRIAL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --remove_mitochondrial. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --remove_mitochondrial=*)
+            [ -n "$VIASH_PAR_REMOVE_MITOCHONDRIAL" ] && ViashError Bad arguments for option \'--remove_mitochondrial=*\': \'$VIASH_PAR_REMOVE_MITOCHONDRIAL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_REMOVE_MITOCHONDRIAL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/zenodo_spatial_slidetags:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/zenodo_spatial_slidetags:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/zenodo_spatial_slidetags:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/loaders/zenodo_spatial_slidetags:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET+x} ]; then
+  ViashError '--dataset' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_ID+x} ]; then
+  ViashError '--dataset_id' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_NAME+x} ]; then
+  ViashError '--dataset_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then
+  ViashError '--dataset_summary' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then
+  ViashError '--dataset_description' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_SPOT_FILTER_MIN_GENES" ]]; then
+  if ! [[ "$VIASH_PAR_SPOT_FILTER_MIN_GENES" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--spot_filter_min_genes' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_SPOT_FILTER_MIN_COUNTS" ]]; then
+  if ! [[ "$VIASH_PAR_SPOT_FILTER_MIN_COUNTS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--spot_filter_min_counts' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_GENE_FILTER_MIN_SPOTS" ]]; then
+  if ! [[ "$VIASH_PAR_GENE_FILTER_MIN_SPOTS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--gene_filter_min_spots' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" ]]; then
+  if ! [[ "$VIASH_PAR_GENE_FILTER_MIN_COUNTS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--gene_filter_min_counts' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_REMOVE_MITOCHONDRIAL" ]]; then
+  if ! [[ "$VIASH_PAR_REMOVE_MITOCHONDRIAL" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--remove_mitochondrial' has to be a boolean. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_DATASET" ] && [ ! -d "$(dirname "$VIASH_PAR_DATASET")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_DATASET")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_DATASET" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_DATASET")" )
+  VIASH_PAR_DATASET=$(ViashAutodetectMount "$VIASH_PAR_DATASET")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_DATASET" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/loaders/zenodo_spatial_slidetags:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/loaders/zenodo_spatial_slidetags:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/loaders/zenodo_spatial_slidetags:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-zenodo_spatial_slidetags-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import subprocess
+import pandas as pd
+import tempfile
+import scanpy as sc
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset': $( if [ ! -z ${VIASH_PAR_DATASET+x} ]; then echo "r'${VIASH_PAR_DATASET//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_id': $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo "r'${VIASH_PAR_DATASET_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_name': $( if [ ! -z ${VIASH_PAR_DATASET_NAME+x} ]; then echo "r'${VIASH_PAR_DATASET_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_url': $( if [ ! -z ${VIASH_PAR_DATASET_URL+x} ]; then echo "r'${VIASH_PAR_DATASET_URL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_reference': $( if [ ! -z ${VIASH_PAR_DATASET_REFERENCE+x} ]; then echo "r'${VIASH_PAR_DATASET_REFERENCE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_summary': $( if [ ! -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then echo "r'${VIASH_PAR_DATASET_SUMMARY//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_description': $( if [ ! -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then echo "r'${VIASH_PAR_DATASET_DESCRIPTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'dataset_organism': $( if [ ! -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then echo "r'${VIASH_PAR_DATASET_ORGANISM//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'spot_filter_min_genes': $( if [ ! -z ${VIASH_PAR_SPOT_FILTER_MIN_GENES+x} ]; then echo "int(r'${VIASH_PAR_SPOT_FILTER_MIN_GENES//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'spot_filter_min_counts': $( if [ ! -z ${VIASH_PAR_SPOT_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_SPOT_FILTER_MIN_COUNTS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'gene_filter_min_spots': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_SPOTS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_SPOTS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'gene_filter_min_counts': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_COUNTS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'remove_mitochondrial': $( if [ ! -z ${VIASH_PAR_REMOVE_MITOCHONDRIAL+x} ]; then echo "r'${VIASH_PAR_REMOVE_MITOCHONDRIAL//\'/\'\"\'\"r\'}'.lower() == 'true'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print(f"Downloading data", flush=True)
+with tempfile.TemporaryDirectory() as tempdir:
+    input_data = "input_data.tar.gz"
+    dataset_name = par['dataset_name']
+    epx_data = subprocess.run(
+        ["wget", "-O", f"{tempdir}/{input_data}", par['input_data']], stderr=subprocess.STDOUT)
+    extract_spatial = subprocess.run(
+        ["tar", "-xzf", f"{tempdir}/{input_data}", "-C", tempdir, "--strip-components=1"], stderr=subprocess.STDOUT)
+
+    # Read gene expression and create anndata object
+    adata = sc.read_10x_mtx(path=tempdir)
+
+    # Read spatial locations
+    df = pd.read_csv(f"{tempdir}/spatial.csv", skiprows=1)
+    df = df.set_index('TYPE')
+    df.columns = ['spatial1', 'spatial2', 'cell_type']
+
+    # add spatial locations to anndata object
+    sel_cells = list(set(df.index) & set(adata.obs_names))
+
+    df = df.loc[sel_cells, ]
+    adata = adata[sel_cells, ]
+
+    adata.obs = df
+    adata.obsm['spatial'] = df[['spatial2', 'spatial1']].values
+
+# Make variable names unique
+adata.var_names_make_unique()
+
+sc.pp.calculate_qc_metrics(adata, inplace=True)
+
+print("Filtering spots or genes")
+t0 = adata.shape
+# remove cells with few counts
+if par["spot_filter_min_counts"]:
+    sc.pp.filter_cells(
+        adata, min_counts=par["spot_filter_min_counts"], inplace=True)
+# remove cells with few genes
+if par["spot_filter_min_genes"]:
+    sc.pp.filter_cells(
+        adata, min_genes=par["spot_filter_min_genes"], inplace=True)
+# remove genes that have few counts
+if par["gene_filter_min_counts"]:
+    sc.pp.filter_genes(
+        adata, min_counts=par["gene_filter_min_counts"], inplace=True)
+# remove genes that are found in few cells
+if par["gene_filter_min_spots"]:
+    sc.pp.filter_genes(
+        adata, min_cells=par["gene_filter_min_spots"], inplace=True)
+t1 = adata.shape
+print(f"Removed {t0[0] - t1[0]} cells and {(t0[1] - t1[1])} genes.")
+
+if par["remove_mitochondrial"]:
+    print("Removing mitochondrial genes")
+    non_mito_genes_list = [name for name in adata.var_names if not (
+        name.startswith('MT-') or name.startswith('mt-'))]
+    adata = adata[:, non_mito_genes_list]
+
+
+# Rename .var columns
+adata.var['feature_name'] = adata.var_names
+adata.var.set_index(adata.var['gene_ids'], inplace=True)
+adata.var.rename(columns={"gene_ids": "feature_id"}, inplace=True)
+
+# Move counts to .layers
+print("Add metadata to uns", flush=True)
+adata.layers["counts"] = adata.X
+adata.X = None
+
+# Add metadata
+print("Add metadata to uns", flush=True)
+metadata_fields = ["dataset_id", "dataset_name", "dataset_url",
+                   "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism"]
+for key in metadata_fields:
+    if key in par:
+        print(f"Setting .uns['{key}']", flush=True)
+        adata.uns[key] = par[key]
+
+print("Writing adata to file", flush=True)
+adata.write_h5ad(par["dataset"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_DATASET" ]; then
+  VIASH_PAR_DATASET=$(ViashStripAutomount "$VIASH_PAR_DATASET")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_DATASET" ] && [ ! -e "$VIASH_PAR_DATASET" ]; then
+  ViashError "Output file '$VIASH_PAR_DATASET' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/normalization/atac_tfidf/.config.vsh.yaml b/target/docker/datasets/normalization/atac_tfidf/.config.vsh.yaml
new file mode 100644
index 0000000000..b387e5a758
--- /dev/null
+++ b/target/docker/datasets/normalization/atac_tfidf/.config.vsh.yaml
@@ -0,0 +1,551 @@
+functionality:
+  name: "atac_tfidf"
+  namespace: "datasets/normalization"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Raw dataset"
+      summary: "An unprocessed dataset as output by a dataset loader."
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+    example:
+    - "resources_test/common/pancreas/raw.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Normalized dataset"
+      summary: "A normalized dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/normalized.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--normalization_id"
+    description: "The normalization id to store in the dataset metadata. If not specified,\
+      \ the functionality name will be used."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--layer_output"
+    description: "The name of the layer in which to store the normalized data."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_size_factors"
+    description: "In which .obs slot to store the size factors (if any)."
+    info: null
+    default:
+    - "size_factors"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Transform peak counts with TF-IDF (Term Frequency - Inverse Document\
+    \ Frequency).\n\nTF: peak counts are normalised by total number of counts per\
+    \ cell DF: total number of counts for each peak IDF: number of cells divided by\
+    \ DF\n\nBy default, log(TF) * log(IDF) is returned.\n"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_normalization"
+    type_info:
+      label: "Dataset normalization"
+      summary: "A normalization method which processes the raw counts into a normalized\
+        \ dataset.\n"
+      description: "A component for normalizing the raw counts as output by dataset\
+        \ loaders into a normalized dataset."
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "muon"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/atac_tfidf/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/normalization/atac_tfidf"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/normalization/atac_tfidf/atac_tfidf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/normalization/atac_tfidf/atac_tfidf b/target/docker/datasets/normalization/atac_tfidf/atac_tfidf
new file mode 100755
index 0000000000..9577ef6973
--- /dev/null
+++ b/target/docker/datasets/normalization/atac_tfidf/atac_tfidf
@@ -0,0 +1,1005 @@
+#!/usr/bin/env bash
+
+# atac_tfidf 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="atac_tfidf"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "atac_tfidf 2.0.0"
+  echo ""
+  echo "Transform peak counts with TF-IDF (Term Frequency - Inverse Document Frequency)."
+  echo ""
+  echo "TF: peak counts are normalised by total number of counts per cell DF: total"
+  echo "number of counts for each peak IDF: number of cells divided by DF"
+  echo ""
+  echo "By default, log(TF) * log(IDF) is returned."
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/common/pancreas/raw.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/common/pancreas/normalized.h5ad"
+  echo ""
+  echo "    --normalization_id"
+  echo "        type: string"
+  echo "        The normalization id to store in the dataset metadata. If not specified,"
+  echo "        the functionality name will be used."
+  echo ""
+  echo "    --layer_output"
+  echo "        type: string"
+  echo "        default: normalized"
+  echo "        The name of the layer in which to store the normalized data."
+  echo ""
+  echo "    --obs_size_factors"
+  echo "        type: string"
+  echo "        default: size_factors"
+  echo "        In which .obs slot to store the size factors (if any)."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "muon"
+
+LABEL org.opencontainers.image.description="Companion container for running component datasets/normalization atac_tfidf"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:43Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-atac_tfidf-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "atac_tfidf 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --normalization_id)
+            [ -n "$VIASH_PAR_NORMALIZATION_ID" ] && ViashError Bad arguments for option \'--normalization_id\': \'$VIASH_PAR_NORMALIZATION_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORMALIZATION_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --normalization_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --normalization_id=*)
+            [ -n "$VIASH_PAR_NORMALIZATION_ID" ] && ViashError Bad arguments for option \'--normalization_id=*\': \'$VIASH_PAR_NORMALIZATION_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORMALIZATION_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --layer_output)
+            [ -n "$VIASH_PAR_LAYER_OUTPUT" ] && ViashError Bad arguments for option \'--layer_output\': \'$VIASH_PAR_LAYER_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LAYER_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --layer_output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --layer_output=*)
+            [ -n "$VIASH_PAR_LAYER_OUTPUT" ] && ViashError Bad arguments for option \'--layer_output=*\': \'$VIASH_PAR_LAYER_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LAYER_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obs_size_factors)
+            [ -n "$VIASH_PAR_OBS_SIZE_FACTORS" ] && ViashError Bad arguments for option \'--obs_size_factors\': \'$VIASH_PAR_OBS_SIZE_FACTORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_SIZE_FACTORS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_size_factors. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_size_factors=*)
+            [ -n "$VIASH_PAR_OBS_SIZE_FACTORS" ] && ViashError Bad arguments for option \'--obs_size_factors=*\': \'$VIASH_PAR_OBS_SIZE_FACTORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_SIZE_FACTORS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/atac_tfidf:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/atac_tfidf:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/atac_tfidf:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/atac_tfidf:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_LAYER_OUTPUT+x} ]; then
+  VIASH_PAR_LAYER_OUTPUT="normalized"
+fi
+if [ -z ${VIASH_PAR_OBS_SIZE_FACTORS+x} ]; then
+  VIASH_PAR_OBS_SIZE_FACTORS="size_factors"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/normalization/atac_tfidf:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/normalization/atac_tfidf:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/normalization/atac_tfidf:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-atac_tfidf-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+from muon import atac as ac
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'normalization_id': $( if [ ! -z ${VIASH_PAR_NORMALIZATION_ID+x} ]; then echo "r'${VIASH_PAR_NORMALIZATION_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'layer_output': $( if [ ! -z ${VIASH_PAR_LAYER_OUTPUT+x} ]; then echo "r'${VIASH_PAR_LAYER_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'obs_size_factors': $( if [ ! -z ${VIASH_PAR_OBS_SIZE_FACTORS+x} ]; then echo "r'${VIASH_PAR_OBS_SIZE_FACTORS//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+adata = ad.read_h5ad(par['input'])
+
+print("Normalize data", flush=True)
+input_adata = ad.AnnData(X=adata.layers["counts"])
+normalized_counts = ac.pp.tfidf(input_adata, inplace=False)
+
+print("Store output in adata", flush=True)
+adata.layers[par["layer_output"]] = normalized_counts
+adata.uns["normalization_id"] = par["normalization_id"] or meta['functionality_name']
+
+print("Write data", flush=True)
+adata.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/normalization/l1_sqrt/.config.vsh.yaml b/target/docker/datasets/normalization/l1_sqrt/.config.vsh.yaml
new file mode 100644
index 0000000000..fddb2736af
--- /dev/null
+++ b/target/docker/datasets/normalization/l1_sqrt/.config.vsh.yaml
@@ -0,0 +1,553 @@
+functionality:
+  name: "l1_sqrt"
+  namespace: "datasets/normalization"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Raw dataset"
+      summary: "An unprocessed dataset as output by a dataset loader."
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+    example:
+    - "resources_test/common/pancreas/raw.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Normalized dataset"
+      summary: "A normalized dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/normalized.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--normalization_id"
+    description: "The normalization id to store in the dataset metadata. If not specified,\
+      \ the functionality name will be used."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--layer_output"
+    description: "The name of the layer in which to store the normalized data."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_size_factors"
+    description: "In which .obs slot to store the size factors (if any)."
+    info: null
+    default:
+    - "size_factors"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Scaled L1 sqrt normalization.\n\nThis normalization method causes\
+    \ all cells to have the same sum of values.\n\nSteps:\n\n* Compute the square\
+    \ root of the counts.\n* Apply L1 normalization (rescaled such that the sum of\
+    \ the values of each cell sum to 1).\n* Multiply by the median UMI count per cell,\
+    \ causing all cells to have the sum of values.\n"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_normalization"
+    type_info:
+      label: "Dataset normalization"
+      summary: "A normalization method which processes the raw counts into a normalized\
+        \ dataset.\n"
+      description: "A component for normalizing the raw counts as output by dataset\
+        \ loaders into a normalized dataset."
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scprep"
+    - "numpy<2"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/normalization/l1_sqrt"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/normalization/l1_sqrt/l1_sqrt"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/normalization/l1_sqrt/l1_sqrt b/target/docker/datasets/normalization/l1_sqrt/l1_sqrt
new file mode 100755
index 0000000000..c54a196bcc
--- /dev/null
+++ b/target/docker/datasets/normalization/l1_sqrt/l1_sqrt
@@ -0,0 +1,1013 @@
+#!/usr/bin/env bash
+
+# l1_sqrt 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="l1_sqrt"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "l1_sqrt 2.0.0"
+  echo ""
+  echo "Scaled L1 sqrt normalization."
+  echo ""
+  echo "This normalization method causes all cells to have the same sum of values."
+  echo ""
+  echo "Steps:"
+  echo ""
+  echo "* Compute the square root of the counts."
+  echo "* Apply L1 normalization (rescaled such that the sum of the values of each cell"
+  echo "sum to 1)."
+  echo "* Multiply by the median UMI count per cell, causing all cells to have the sum"
+  echo "of values."
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/common/pancreas/raw.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/common/pancreas/normalized.h5ad"
+  echo ""
+  echo "    --normalization_id"
+  echo "        type: string"
+  echo "        The normalization id to store in the dataset metadata. If not specified,"
+  echo "        the functionality name will be used."
+  echo ""
+  echo "    --layer_output"
+  echo "        type: string"
+  echo "        default: normalized"
+  echo "        The name of the layer in which to store the normalized data."
+  echo ""
+  echo "    --obs_size_factors"
+  echo "        type: string"
+  echo "        default: size_factors"
+  echo "        In which .obs slot to store the size factors (if any)."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scprep" "numpy<2"
+
+LABEL org.opencontainers.image.description="Companion container for running component datasets/normalization l1_sqrt"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:44Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-l1_sqrt-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "l1_sqrt 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --normalization_id)
+            [ -n "$VIASH_PAR_NORMALIZATION_ID" ] && ViashError Bad arguments for option \'--normalization_id\': \'$VIASH_PAR_NORMALIZATION_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORMALIZATION_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --normalization_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --normalization_id=*)
+            [ -n "$VIASH_PAR_NORMALIZATION_ID" ] && ViashError Bad arguments for option \'--normalization_id=*\': \'$VIASH_PAR_NORMALIZATION_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORMALIZATION_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --layer_output)
+            [ -n "$VIASH_PAR_LAYER_OUTPUT" ] && ViashError Bad arguments for option \'--layer_output\': \'$VIASH_PAR_LAYER_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LAYER_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --layer_output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --layer_output=*)
+            [ -n "$VIASH_PAR_LAYER_OUTPUT" ] && ViashError Bad arguments for option \'--layer_output=*\': \'$VIASH_PAR_LAYER_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LAYER_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obs_size_factors)
+            [ -n "$VIASH_PAR_OBS_SIZE_FACTORS" ] && ViashError Bad arguments for option \'--obs_size_factors\': \'$VIASH_PAR_OBS_SIZE_FACTORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_SIZE_FACTORS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_size_factors. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_size_factors=*)
+            [ -n "$VIASH_PAR_OBS_SIZE_FACTORS" ] && ViashError Bad arguments for option \'--obs_size_factors=*\': \'$VIASH_PAR_OBS_SIZE_FACTORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_SIZE_FACTORS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/l1_sqrt:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/l1_sqrt:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/l1_sqrt:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/l1_sqrt:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_LAYER_OUTPUT+x} ]; then
+  VIASH_PAR_LAYER_OUTPUT="normalized"
+fi
+if [ -z ${VIASH_PAR_OBS_SIZE_FACTORS+x} ]; then
+  VIASH_PAR_OBS_SIZE_FACTORS="size_factors"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/normalization/l1_sqrt:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/normalization/l1_sqrt:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/normalization/l1_sqrt:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-l1_sqrt-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import scprep
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'normalization_id': $( if [ ! -z ${VIASH_PAR_NORMALIZATION_ID+x} ]; then echo "r'${VIASH_PAR_NORMALIZATION_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'layer_output': $( if [ ! -z ${VIASH_PAR_LAYER_OUTPUT+x} ]; then echo "r'${VIASH_PAR_LAYER_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'obs_size_factors': $( if [ ! -z ${VIASH_PAR_OBS_SIZE_FACTORS+x} ]; then echo "r'${VIASH_PAR_OBS_SIZE_FACTORS//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+adata = ad.read_h5ad(par['input'])
+
+print("Normalize data", flush=True)
+# libsize and sqrt L1 norm
+sqrt_data = scprep.utils.matrix_transform(adata.layers['counts'], np.sqrt)
+l1_sqrt, libsize = scprep.normalize.library_size_normalize(sqrt_data, rescale=1, return_library_size=True)
+l1_sqrt = l1_sqrt.tocsr()
+
+print("Store output in adata", flush=True)
+adata.layers[par["layer_output"]] = l1_sqrt
+adata.uns["normalization_id"] = par["normalization_id"] or meta['functionality_name']
+
+print("Write data", flush=True)
+adata.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/normalization/log_cp/.config.vsh.yaml b/target/docker/datasets/normalization/log_cp/.config.vsh.yaml
new file mode 100644
index 0000000000..d0a4c74482
--- /dev/null
+++ b/target/docker/datasets/normalization/log_cp/.config.vsh.yaml
@@ -0,0 +1,553 @@
+functionality:
+  name: "log_cp"
+  namespace: "datasets/normalization"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Raw dataset"
+      summary: "An unprocessed dataset as output by a dataset loader."
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+    example:
+    - "resources_test/common/pancreas/raw.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Normalized dataset"
+      summary: "A normalized dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/normalized.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--normalization_id"
+    description: "The normalization id to store in the dataset metadata. If not specified,\
+      \ the functionality name will be used."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--layer_output"
+    description: "The name of the layer in which to store the normalized data."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_size_factors"
+    description: "In which .obs slot to store the size factors (if any)."
+    info: null
+    default:
+    - "size_factors"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_cp"
+    description: "Number of counts per cell. When set to -1, will use None."
+    info: null
+    default:
+    - 10000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Normalize data using Log CP"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_normalization"
+    type_info:
+      label: "Dataset normalization"
+      summary: "A normalization method which processes the raw counts into a normalized\
+        \ dataset.\n"
+      description: "A component for normalizing the raw counts as output by dataset\
+        \ loaders into a normalized dataset."
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/normalization/log_cp"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/normalization/log_cp/log_cp"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/normalization/log_cp/log_cp b/target/docker/datasets/normalization/log_cp/log_cp
new file mode 100755
index 0000000000..acc63640de
--- /dev/null
+++ b/target/docker/datasets/normalization/log_cp/log_cp
@@ -0,0 +1,1037 @@
+#!/usr/bin/env bash
+
+# log_cp 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="log_cp"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "log_cp 2.0.0"
+  echo ""
+  echo "Normalize data using Log CP"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/common/pancreas/raw.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/common/pancreas/normalized.h5ad"
+  echo ""
+  echo "    --normalization_id"
+  echo "        type: string"
+  echo "        The normalization id to store in the dataset metadata. If not specified,"
+  echo "        the functionality name will be used."
+  echo ""
+  echo "    --layer_output"
+  echo "        type: string"
+  echo "        default: normalized"
+  echo "        The name of the layer in which to store the normalized data."
+  echo ""
+  echo "    --obs_size_factors"
+  echo "        type: string"
+  echo "        default: size_factors"
+  echo "        In which .obs slot to store the size factors (if any)."
+  echo ""
+  echo "    --n_cp"
+  echo "        type: integer"
+  echo "        default: 10000"
+  echo "        Number of counts per cell. When set to -1, will use None."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component datasets/normalization log_cp"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:44Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-log_cp-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "log_cp 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --normalization_id)
+            [ -n "$VIASH_PAR_NORMALIZATION_ID" ] && ViashError Bad arguments for option \'--normalization_id\': \'$VIASH_PAR_NORMALIZATION_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORMALIZATION_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --normalization_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --normalization_id=*)
+            [ -n "$VIASH_PAR_NORMALIZATION_ID" ] && ViashError Bad arguments for option \'--normalization_id=*\': \'$VIASH_PAR_NORMALIZATION_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORMALIZATION_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --layer_output)
+            [ -n "$VIASH_PAR_LAYER_OUTPUT" ] && ViashError Bad arguments for option \'--layer_output\': \'$VIASH_PAR_LAYER_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LAYER_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --layer_output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --layer_output=*)
+            [ -n "$VIASH_PAR_LAYER_OUTPUT" ] && ViashError Bad arguments for option \'--layer_output=*\': \'$VIASH_PAR_LAYER_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LAYER_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obs_size_factors)
+            [ -n "$VIASH_PAR_OBS_SIZE_FACTORS" ] && ViashError Bad arguments for option \'--obs_size_factors\': \'$VIASH_PAR_OBS_SIZE_FACTORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_SIZE_FACTORS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_size_factors. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_size_factors=*)
+            [ -n "$VIASH_PAR_OBS_SIZE_FACTORS" ] && ViashError Bad arguments for option \'--obs_size_factors=*\': \'$VIASH_PAR_OBS_SIZE_FACTORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_SIZE_FACTORS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_cp)
+            [ -n "$VIASH_PAR_N_CP" ] && ViashError Bad arguments for option \'--n_cp\': \'$VIASH_PAR_N_CP\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_CP="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_cp. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_cp=*)
+            [ -n "$VIASH_PAR_N_CP" ] && ViashError Bad arguments for option \'--n_cp=*\': \'$VIASH_PAR_N_CP\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_CP=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/log_cp:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/log_cp:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/log_cp:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/log_cp:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_LAYER_OUTPUT+x} ]; then
+  VIASH_PAR_LAYER_OUTPUT="normalized"
+fi
+if [ -z ${VIASH_PAR_OBS_SIZE_FACTORS+x} ]; then
+  VIASH_PAR_OBS_SIZE_FACTORS="size_factors"
+fi
+if [ -z ${VIASH_PAR_N_CP+x} ]; then
+  VIASH_PAR_N_CP="10000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_CP" ]]; then
+  if ! [[ "$VIASH_PAR_N_CP" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_cp' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/normalization/log_cp:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/normalization/log_cp:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/normalization/log_cp:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-log_cp-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'normalization_id': $( if [ ! -z ${VIASH_PAR_NORMALIZATION_ID+x} ]; then echo "r'${VIASH_PAR_NORMALIZATION_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'layer_output': $( if [ ! -z ${VIASH_PAR_LAYER_OUTPUT+x} ]; then echo "r'${VIASH_PAR_LAYER_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'obs_size_factors': $( if [ ! -z ${VIASH_PAR_OBS_SIZE_FACTORS+x} ]; then echo "r'${VIASH_PAR_OBS_SIZE_FACTORS//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_cp': $( if [ ! -z ${VIASH_PAR_N_CP+x} ]; then echo "int(r'${VIASH_PAR_N_CP//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print(">> Load data", flush=True)
+adata = sc.read_h5ad(par['input'])
+
+print(">> Normalize data", flush=True)
+if par["n_cp"] == -1:
+    norm = sc.pp.normalize_total(
+        adata, 
+        target_sum=None, 
+        layer="counts", 
+        inplace=False
+    )
+else:
+    norm = sc.pp.normalize_total(
+        adata, 
+        target_sum=par["n_cp"], 
+        layer="counts", 
+        inplace=False
+    )
+lognorm = sc.pp.log1p(norm["X"])
+
+print(">> Store output in adata", flush=True)
+adata.layers[par["layer_output"]] = lognorm
+adata.obs[par["obs_size_factors"]] = norm["norm_factor"]
+adata.uns["normalization_id"] = par["normalization_id"] or meta['functionality_name']
+
+print(">> Write data", flush=True)
+adata.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/normalization/log_scran_pooling/.config.vsh.yaml b/target/docker/datasets/normalization/log_scran_pooling/.config.vsh.yaml
new file mode 100644
index 0000000000..80aa434e8b
--- /dev/null
+++ b/target/docker/datasets/normalization/log_scran_pooling/.config.vsh.yaml
@@ -0,0 +1,555 @@
+functionality:
+  name: "log_scran_pooling"
+  namespace: "datasets/normalization"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Raw dataset"
+      summary: "An unprocessed dataset as output by a dataset loader."
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+    example:
+    - "resources_test/common/pancreas/raw.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Normalized dataset"
+      summary: "A normalized dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/normalized.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--normalization_id"
+    description: "The normalization id to store in the dataset metadata. If not specified,\
+      \ the functionality name will be used."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--layer_output"
+    description: "The name of the layer in which to store the normalized data."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_size_factors"
+    description: "In which .obs slot to store the size factors (if any)."
+    info: null
+    default:
+    - "size_factors"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  description: "Normalize data using scran pooling"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_normalization"
+    type_info:
+      label: "Dataset normalization"
+      summary: "A normalization method which processes the raw counts into a normalized\
+        \ dataset.\n"
+      description: "A component for normalizing the raw counts as output by dataset\
+        \ loaders into a normalized dataset."
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "Matrix"
+    - "rlang"
+    - "scran"
+    - "BiocParallel"
+    bioc_force_install: false
+  - type: "python"
+    user: false
+    pip:
+    - "scanpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/normalization/log_scran_pooling"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/normalization/log_scran_pooling/log_scran_pooling"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/normalization/log_scran_pooling/log_scran_pooling b/target/docker/datasets/normalization/log_scran_pooling/log_scran_pooling
new file mode 100755
index 0000000000..300dc67fb6
--- /dev/null
+++ b/target/docker/datasets/normalization/log_scran_pooling/log_scran_pooling
@@ -0,0 +1,1024 @@
+#!/usr/bin/env bash
+
+# log_scran_pooling 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="log_scran_pooling"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "log_scran_pooling 2.0.0"
+  echo ""
+  echo "Normalize data using scran pooling"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/common/pancreas/raw.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/common/pancreas/normalized.h5ad"
+  echo ""
+  echo "    --normalization_id"
+  echo "        type: string"
+  echo "        The normalization id to store in the dataset metadata. If not specified,"
+  echo "        the functionality name will be used."
+  echo ""
+  echo "    --layer_output"
+  echo "        type: string"
+  echo "        default: normalized"
+  echo "        The name of the layer in which to store the normalized data."
+  echo ""
+  echo "    --obs_size_factors"
+  echo "        type: string"
+  echo "        default: size_factors"
+  echo "        In which .obs slot to store the size factors (if any)."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_cran(c("Matrix", "rlang", "scran", "BiocParallel"), repos = "https://cran.rstudio.com")'
+
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scanpy"
+
+LABEL org.opencontainers.image.description="Companion container for running component datasets/normalization log_scran_pooling"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:43Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-log_scran_pooling-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "log_scran_pooling 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --normalization_id)
+            [ -n "$VIASH_PAR_NORMALIZATION_ID" ] && ViashError Bad arguments for option \'--normalization_id\': \'$VIASH_PAR_NORMALIZATION_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORMALIZATION_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --normalization_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --normalization_id=*)
+            [ -n "$VIASH_PAR_NORMALIZATION_ID" ] && ViashError Bad arguments for option \'--normalization_id=*\': \'$VIASH_PAR_NORMALIZATION_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORMALIZATION_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --layer_output)
+            [ -n "$VIASH_PAR_LAYER_OUTPUT" ] && ViashError Bad arguments for option \'--layer_output\': \'$VIASH_PAR_LAYER_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LAYER_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --layer_output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --layer_output=*)
+            [ -n "$VIASH_PAR_LAYER_OUTPUT" ] && ViashError Bad arguments for option \'--layer_output=*\': \'$VIASH_PAR_LAYER_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LAYER_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obs_size_factors)
+            [ -n "$VIASH_PAR_OBS_SIZE_FACTORS" ] && ViashError Bad arguments for option \'--obs_size_factors\': \'$VIASH_PAR_OBS_SIZE_FACTORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_SIZE_FACTORS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_size_factors. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_size_factors=*)
+            [ -n "$VIASH_PAR_OBS_SIZE_FACTORS" ] && ViashError Bad arguments for option \'--obs_size_factors=*\': \'$VIASH_PAR_OBS_SIZE_FACTORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_SIZE_FACTORS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/log_scran_pooling:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/log_scran_pooling:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/log_scran_pooling:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/log_scran_pooling:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_LAYER_OUTPUT+x} ]; then
+  VIASH_PAR_LAYER_OUTPUT="normalized"
+fi
+if [ -z ${VIASH_PAR_OBS_SIZE_FACTORS+x} ]; then
+  VIASH_PAR_OBS_SIZE_FACTORS="size_factors"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/normalization/log_scran_pooling:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/normalization/log_scran_pooling:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/normalization/log_scran_pooling:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-log_scran_pooling-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+cat(">> Loading dependencies\\n")
+library(anndata, warn.conflicts = FALSE)
+requireNamespace("scran", quietly = TRUE)
+requireNamespace("BiocParallel", quietly = TRUE)
+library(Matrix, warn.conflicts = FALSE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "normalization_id" = $( if [ ! -z ${VIASH_PAR_NORMALIZATION_ID+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_NORMALIZATION_ID" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "layer_output" = $( if [ ! -z ${VIASH_PAR_LAYER_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_LAYER_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "obs_size_factors" = $( if [ ! -z ${VIASH_PAR_OBS_SIZE_FACTORS+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OBS_SIZE_FACTORS" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat(">> Load data\\n")
+adata <- anndata::read_h5ad(par\$input)
+counts <- as(t(adata\$layers[["counts"]]), "CsparseMatrix")
+
+cat(">> Normalizing data\\n")
+size_factors <- scran::calculateSumFactors(
+  counts,
+  min.mean = 0.1,
+  BPPARAM = BiocParallel::MulticoreParam()
+)
+lognorm <- log1p(sweep(adata\$layers[["counts"]], 1, size_factors, "*"))
+
+cat(">> Storing in anndata\\n")
+adata\$obs[[par\$obs_size_factors]] <- size_factors
+adata\$layers[[par\$layer_output]] <- lognorm
+norm_id <- par[["normalization_id"]]
+if (is.null(norm_id)) {
+  norm_id <- meta[["functionality_name"]]
+}
+adata\$uns[["normalization_id"]] <- norm_id
+
+cat(">> Writing to file\\n")
+zzz <- adata\$write_h5ad(par\$output, compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/normalization/prot_clr/.config.vsh.yaml b/target/docker/datasets/normalization/prot_clr/.config.vsh.yaml
new file mode 100644
index 0000000000..66fafb8b3e
--- /dev/null
+++ b/target/docker/datasets/normalization/prot_clr/.config.vsh.yaml
@@ -0,0 +1,551 @@
+functionality:
+  name: "prot_clr"
+  namespace: "datasets/normalization"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Raw dataset"
+      summary: "An unprocessed dataset as output by a dataset loader."
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+    example:
+    - "resources_test/common/pancreas/raw.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Normalized dataset"
+      summary: "A normalized dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/normalized.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--normalization_id"
+    description: "The normalization id to store in the dataset metadata. If not specified,\
+      \ the functionality name will be used."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--layer_output"
+    description: "The name of the layer in which to store the normalized data."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_size_factors"
+    description: "In which .obs slot to store the size factors (if any)."
+    info: null
+    default:
+    - "size_factors"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Perform center log ratio (CLR) normalization on input CITE-seq data\
+    \ (Stoeckius et al. 2017).\n\nThe CLR transformation is defined as:\n\n$$\nx_{\\\
+    text{clr}} = \\log\\left(\\frac{x}{g(x)}\\right)\n$$\n\nwhere $\\(g(x)\\)$ is\
+    \ the geometric mean of the row $\\(x\\)$.\n"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_normalization"
+    type_info:
+      label: "Dataset normalization"
+      summary: "A normalization method which processes the raw counts into a normalized\
+        \ dataset.\n"
+      description: "A component for normalizing the raw counts as output by dataset\
+        \ loaders into a normalized dataset."
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "muon"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/prot_clr/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/normalization/prot_clr"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/normalization/prot_clr/prot_clr"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/normalization/prot_clr/prot_clr b/target/docker/datasets/normalization/prot_clr/prot_clr
new file mode 100755
index 0000000000..82998a86fd
--- /dev/null
+++ b/target/docker/datasets/normalization/prot_clr/prot_clr
@@ -0,0 +1,1011 @@
+#!/usr/bin/env bash
+
+# prot_clr 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="prot_clr"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "prot_clr 2.0.0"
+  echo ""
+  echo "Perform center log ratio (CLR) normalization on input CITE-seq data (Stoeckius"
+  echo "et al. 2017)."
+  echo ""
+  echo "The CLR transformation is defined as:"
+  echo ""
+  echo "\$\$"
+  echo "x_{\\text{clr}} = \\log\\left(\\frac{x}{g(x)}\\right)"
+  echo "\$\$"
+  echo ""
+  echo "where \$\\(g(x)\\)\$ is the geometric mean of the row \$\\(x\\)\$."
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/common/pancreas/raw.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/common/pancreas/normalized.h5ad"
+  echo ""
+  echo "    --normalization_id"
+  echo "        type: string"
+  echo "        The normalization id to store in the dataset metadata. If not specified,"
+  echo "        the functionality name will be used."
+  echo ""
+  echo "    --layer_output"
+  echo "        type: string"
+  echo "        default: normalized"
+  echo "        The name of the layer in which to store the normalized data."
+  echo ""
+  echo "    --obs_size_factors"
+  echo "        type: string"
+  echo "        default: size_factors"
+  echo "        In which .obs slot to store the size factors (if any)."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "muon"
+
+LABEL org.opencontainers.image.description="Companion container for running component datasets/normalization prot_clr"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:44Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-prot_clr-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "prot_clr 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --normalization_id)
+            [ -n "$VIASH_PAR_NORMALIZATION_ID" ] && ViashError Bad arguments for option \'--normalization_id\': \'$VIASH_PAR_NORMALIZATION_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORMALIZATION_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --normalization_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --normalization_id=*)
+            [ -n "$VIASH_PAR_NORMALIZATION_ID" ] && ViashError Bad arguments for option \'--normalization_id=*\': \'$VIASH_PAR_NORMALIZATION_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORMALIZATION_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --layer_output)
+            [ -n "$VIASH_PAR_LAYER_OUTPUT" ] && ViashError Bad arguments for option \'--layer_output\': \'$VIASH_PAR_LAYER_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LAYER_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --layer_output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --layer_output=*)
+            [ -n "$VIASH_PAR_LAYER_OUTPUT" ] && ViashError Bad arguments for option \'--layer_output=*\': \'$VIASH_PAR_LAYER_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LAYER_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obs_size_factors)
+            [ -n "$VIASH_PAR_OBS_SIZE_FACTORS" ] && ViashError Bad arguments for option \'--obs_size_factors\': \'$VIASH_PAR_OBS_SIZE_FACTORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_SIZE_FACTORS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_size_factors. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_size_factors=*)
+            [ -n "$VIASH_PAR_OBS_SIZE_FACTORS" ] && ViashError Bad arguments for option \'--obs_size_factors=*\': \'$VIASH_PAR_OBS_SIZE_FACTORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_SIZE_FACTORS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/prot_clr:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/prot_clr:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/prot_clr:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/prot_clr:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_LAYER_OUTPUT+x} ]; then
+  VIASH_PAR_LAYER_OUTPUT="normalized"
+fi
+if [ -z ${VIASH_PAR_OBS_SIZE_FACTORS+x} ]; then
+  VIASH_PAR_OBS_SIZE_FACTORS="size_factors"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/normalization/prot_clr:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/normalization/prot_clr:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/normalization/prot_clr:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-prot_clr-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+from muon import prot as pt
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'normalization_id': $( if [ ! -z ${VIASH_PAR_NORMALIZATION_ID+x} ]; then echo "r'${VIASH_PAR_NORMALIZATION_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'layer_output': $( if [ ! -z ${VIASH_PAR_LAYER_OUTPUT+x} ]; then echo "r'${VIASH_PAR_LAYER_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'obs_size_factors': $( if [ ! -z ${VIASH_PAR_OBS_SIZE_FACTORS+x} ]; then echo "r'${VIASH_PAR_OBS_SIZE_FACTORS//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+adata = ad.read_h5ad(par['input'])
+
+print("Normalize data", flush=True)
+input_adata = ad.AnnData(X=adata.layers["counts"])
+normalized_counts = pt.pp.clr(input_adata, inplace=False)
+if not normalized_counts:
+    raise RuntimeError("CLR failed to return the requested output layer")
+
+print("Store output in adata", flush=True)
+adata.layers[par["layer_output"]] = normalized_counts.X
+adata.uns["normalization_id"] = par["normalization_id"] or meta['functionality_name']
+
+print("Write data", flush=True)
+adata.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/normalization/sqrt_cp/.config.vsh.yaml b/target/docker/datasets/normalization/sqrt_cp/.config.vsh.yaml
new file mode 100644
index 0000000000..d075f67bf6
--- /dev/null
+++ b/target/docker/datasets/normalization/sqrt_cp/.config.vsh.yaml
@@ -0,0 +1,553 @@
+functionality:
+  name: "sqrt_cp"
+  namespace: "datasets/normalization"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Raw dataset"
+      summary: "An unprocessed dataset as output by a dataset loader."
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+    example:
+    - "resources_test/common/pancreas/raw.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Normalized dataset"
+      summary: "A normalized dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/normalized.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--normalization_id"
+    description: "The normalization id to store in the dataset metadata. If not specified,\
+      \ the functionality name will be used."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--layer_output"
+    description: "The name of the layer in which to store the normalized data."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_size_factors"
+    description: "In which .obs slot to store the size factors (if any)."
+    info: null
+    default:
+    - "size_factors"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_cp"
+    description: "Number of counts per cell"
+    info: null
+    default:
+    - 10000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Normalize data using Log Sqrt"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_normalization"
+    type_info:
+      label: "Dataset normalization"
+      summary: "A normalization method which processes the raw counts into a normalized\
+        \ dataset.\n"
+      description: "A component for normalizing the raw counts as output by dataset\
+        \ loaders into a normalized dataset."
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/normalization/sqrt_cp"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/normalization/sqrt_cp/sqrt_cp"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/normalization/sqrt_cp/sqrt_cp b/target/docker/datasets/normalization/sqrt_cp/sqrt_cp
new file mode 100755
index 0000000000..b14fcfd5ea
--- /dev/null
+++ b/target/docker/datasets/normalization/sqrt_cp/sqrt_cp
@@ -0,0 +1,1030 @@
+#!/usr/bin/env bash
+
+# sqrt_cp 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="sqrt_cp"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "sqrt_cp 2.0.0"
+  echo ""
+  echo "Normalize data using Log Sqrt"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/common/pancreas/raw.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/common/pancreas/normalized.h5ad"
+  echo ""
+  echo "    --normalization_id"
+  echo "        type: string"
+  echo "        The normalization id to store in the dataset metadata. If not specified,"
+  echo "        the functionality name will be used."
+  echo ""
+  echo "    --layer_output"
+  echo "        type: string"
+  echo "        default: normalized"
+  echo "        The name of the layer in which to store the normalized data."
+  echo ""
+  echo "    --obs_size_factors"
+  echo "        type: string"
+  echo "        default: size_factors"
+  echo "        In which .obs slot to store the size factors (if any)."
+  echo ""
+  echo "    --n_cp"
+  echo "        type: integer"
+  echo "        default: 10000"
+  echo "        Number of counts per cell"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component datasets/normalization sqrt_cp"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:44Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-sqrt_cp-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "sqrt_cp 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --normalization_id)
+            [ -n "$VIASH_PAR_NORMALIZATION_ID" ] && ViashError Bad arguments for option \'--normalization_id\': \'$VIASH_PAR_NORMALIZATION_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORMALIZATION_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --normalization_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --normalization_id=*)
+            [ -n "$VIASH_PAR_NORMALIZATION_ID" ] && ViashError Bad arguments for option \'--normalization_id=*\': \'$VIASH_PAR_NORMALIZATION_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORMALIZATION_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --layer_output)
+            [ -n "$VIASH_PAR_LAYER_OUTPUT" ] && ViashError Bad arguments for option \'--layer_output\': \'$VIASH_PAR_LAYER_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LAYER_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --layer_output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --layer_output=*)
+            [ -n "$VIASH_PAR_LAYER_OUTPUT" ] && ViashError Bad arguments for option \'--layer_output=*\': \'$VIASH_PAR_LAYER_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LAYER_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obs_size_factors)
+            [ -n "$VIASH_PAR_OBS_SIZE_FACTORS" ] && ViashError Bad arguments for option \'--obs_size_factors\': \'$VIASH_PAR_OBS_SIZE_FACTORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_SIZE_FACTORS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_size_factors. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_size_factors=*)
+            [ -n "$VIASH_PAR_OBS_SIZE_FACTORS" ] && ViashError Bad arguments for option \'--obs_size_factors=*\': \'$VIASH_PAR_OBS_SIZE_FACTORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_SIZE_FACTORS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_cp)
+            [ -n "$VIASH_PAR_N_CP" ] && ViashError Bad arguments for option \'--n_cp\': \'$VIASH_PAR_N_CP\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_CP="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_cp. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_cp=*)
+            [ -n "$VIASH_PAR_N_CP" ] && ViashError Bad arguments for option \'--n_cp=*\': \'$VIASH_PAR_N_CP\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_CP=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/sqrt_cp:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/sqrt_cp:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/sqrt_cp:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/normalization/sqrt_cp:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_LAYER_OUTPUT+x} ]; then
+  VIASH_PAR_LAYER_OUTPUT="normalized"
+fi
+if [ -z ${VIASH_PAR_OBS_SIZE_FACTORS+x} ]; then
+  VIASH_PAR_OBS_SIZE_FACTORS="size_factors"
+fi
+if [ -z ${VIASH_PAR_N_CP+x} ]; then
+  VIASH_PAR_N_CP="10000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_CP" ]]; then
+  if ! [[ "$VIASH_PAR_N_CP" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_cp' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/normalization/sqrt_cp:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/normalization/sqrt_cp:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/normalization/sqrt_cp:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-sqrt_cp-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import scanpy as sc
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'normalization_id': $( if [ ! -z ${VIASH_PAR_NORMALIZATION_ID+x} ]; then echo "r'${VIASH_PAR_NORMALIZATION_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'layer_output': $( if [ ! -z ${VIASH_PAR_LAYER_OUTPUT+x} ]; then echo "r'${VIASH_PAR_LAYER_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'obs_size_factors': $( if [ ! -z ${VIASH_PAR_OBS_SIZE_FACTORS+x} ]; then echo "r'${VIASH_PAR_OBS_SIZE_FACTORS//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_cp': $( if [ ! -z ${VIASH_PAR_N_CP+x} ]; then echo "int(r'${VIASH_PAR_N_CP//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print(">> Load data", flush=True)
+adata = sc.read_h5ad(par['input'])
+
+print(">> Normalize data", flush=True)
+norm = sc.pp.normalize_total(
+    adata, 
+    target_sum=par['n_cp'], 
+    layer="counts", 
+    inplace=False
+)
+lognorm = np.sqrt(norm['X'])
+
+print(">> Store output in adata", flush=True)
+adata.layers[par["layer_output"]] = lognorm
+adata.obs[par["obs_size_factors"]] = norm["norm_factor"]
+adata.uns["normalization_id"] = par["normalization_id"] or meta['functionality_name']
+
+print(">> Write data", flush=True)
+adata.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/processors/hvg/.config.vsh.yaml b/target/docker/datasets/processors/hvg/.config.vsh.yaml
new file mode 100644
index 0000000000..956a85375e
--- /dev/null
+++ b/target/docker/datasets/processors/hvg/.config.vsh.yaml
@@ -0,0 +1,577 @@
+functionality:
+  name: "hvg"
+  namespace: "datasets/processors"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Normalized dataset"
+      summary: "A normalized dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/normalized.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--input_layer"
+    description: "Which layer to use as input."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Dataset+HVG"
+      summary: "A normalised dataset with a PCA embedding and HVG selection."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/hvg.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--var_hvg"
+    description: "In which .var slot to store whether a feature is considered to be\
+      \ hvg."
+    info: null
+    default:
+    - "hvg"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--var_hvg_score"
+    description: "In which .var slot to store the gene variance score (normalized\
+      \ dispersion)."
+    info: null
+    default:
+    - "hvg_score"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--num_features"
+    description: "The number of HVG to select"
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Compute HVG"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_processor"
+    type_info:
+      label: "HVG"
+      summary: "Computes the highly variable genes scores.\n"
+      description: "The resulting AnnData will contain both a boolean 'hvg' column\
+        \ in 'var', as well as a numerical 'hvg_score' in 'var'.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/processors/hvg"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/processors/hvg/hvg"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/processors/hvg/hvg b/target/docker/datasets/processors/hvg/hvg
new file mode 100755
index 0000000000..39ea89e656
--- /dev/null
+++ b/target/docker/datasets/processors/hvg/hvg
@@ -0,0 +1,1036 @@
+#!/usr/bin/env bash
+
+# hvg 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="hvg"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "hvg 2.0.0"
+  echo ""
+  echo "Compute HVG"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/common/pancreas/normalized.h5ad"
+  echo ""
+  echo "    --input_layer"
+  echo "        type: string"
+  echo "        default: normalized"
+  echo "        Which layer to use as input."
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/common/pancreas/hvg.h5ad"
+  echo ""
+  echo "    --var_hvg"
+  echo "        type: string"
+  echo "        default: hvg"
+  echo "        In which .var slot to store whether a feature is considered to be hvg."
+  echo ""
+  echo "    --var_hvg_score"
+  echo "        type: string"
+  echo "        default: hvg_score"
+  echo "        In which .var slot to store the gene variance score (normalized"
+  echo "        dispersion)."
+  echo ""
+  echo "    --num_features"
+  echo "        type: integer"
+  echo "        default: 1000"
+  echo "        The number of HVG to select"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component datasets/processors hvg"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:45Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-hvg-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "hvg 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_layer)
+            [ -n "$VIASH_PAR_INPUT_LAYER" ] && ViashError Bad arguments for option \'--input_layer\': \'$VIASH_PAR_INPUT_LAYER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_LAYER="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_layer. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_layer=*)
+            [ -n "$VIASH_PAR_INPUT_LAYER" ] && ViashError Bad arguments for option \'--input_layer=*\': \'$VIASH_PAR_INPUT_LAYER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_LAYER=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --var_hvg)
+            [ -n "$VIASH_PAR_VAR_HVG" ] && ViashError Bad arguments for option \'--var_hvg\': \'$VIASH_PAR_VAR_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VAR_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --var_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --var_hvg=*)
+            [ -n "$VIASH_PAR_VAR_HVG" ] && ViashError Bad arguments for option \'--var_hvg=*\': \'$VIASH_PAR_VAR_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VAR_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --var_hvg_score)
+            [ -n "$VIASH_PAR_VAR_HVG_SCORE" ] && ViashError Bad arguments for option \'--var_hvg_score\': \'$VIASH_PAR_VAR_HVG_SCORE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VAR_HVG_SCORE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --var_hvg_score. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --var_hvg_score=*)
+            [ -n "$VIASH_PAR_VAR_HVG_SCORE" ] && ViashError Bad arguments for option \'--var_hvg_score=*\': \'$VIASH_PAR_VAR_HVG_SCORE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VAR_HVG_SCORE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --num_features)
+            [ -n "$VIASH_PAR_NUM_FEATURES" ] && ViashError Bad arguments for option \'--num_features\': \'$VIASH_PAR_NUM_FEATURES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_FEATURES="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --num_features. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --num_features=*)
+            [ -n "$VIASH_PAR_NUM_FEATURES" ] && ViashError Bad arguments for option \'--num_features=*\': \'$VIASH_PAR_NUM_FEATURES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_FEATURES=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/processors/hvg:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/processors/hvg:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/processors/hvg:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/processors/hvg:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_INPUT_LAYER+x} ]; then
+  VIASH_PAR_INPUT_LAYER="normalized"
+fi
+if [ -z ${VIASH_PAR_VAR_HVG+x} ]; then
+  VIASH_PAR_VAR_HVG="hvg"
+fi
+if [ -z ${VIASH_PAR_VAR_HVG_SCORE+x} ]; then
+  VIASH_PAR_VAR_HVG_SCORE="hvg_score"
+fi
+if [ -z ${VIASH_PAR_NUM_FEATURES+x} ]; then
+  VIASH_PAR_NUM_FEATURES="1000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_NUM_FEATURES" ]]; then
+  if ! [[ "$VIASH_PAR_NUM_FEATURES" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--num_features' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/processors/hvg:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/processors/hvg:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/processors/hvg:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-hvg-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+
+import scanpy as sc
+
+### VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_layer': $( if [ ! -z ${VIASH_PAR_INPUT_LAYER+x} ]; then echo "r'${VIASH_PAR_INPUT_LAYER//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'var_hvg': $( if [ ! -z ${VIASH_PAR_VAR_HVG+x} ]; then echo "r'${VIASH_PAR_VAR_HVG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'var_hvg_score': $( if [ ! -z ${VIASH_PAR_VAR_HVG_SCORE+x} ]; then echo "r'${VIASH_PAR_VAR_HVG_SCORE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'num_features': $( if [ ! -z ${VIASH_PAR_NUM_FEATURES+x} ]; then echo "int(r'${VIASH_PAR_NUM_FEATURES//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+### VIASH END
+
+print(">> Load data", flush=True)
+adata = sc.read_h5ad(par['input'])
+
+print(">> Look for layer", flush=True)
+layer = adata.X if not par['input_layer'] else adata.layers[par['input_layer']]
+
+print(">> Run HVG", flush=True)
+out = sc.pp.highly_variable_genes(
+  adata,
+  layer=par["input_layer"],
+  n_top_genes=par["num_features"],
+  flavor='cell_ranger',
+  inplace=False
+)
+
+print(">> Storing output", flush=True)
+adata.var[par["var_hvg"]] = out['highly_variable'].values
+adata.var[par["var_hvg_score"]] = out['dispersions_norm'].values
+
+print(">> Writing data", flush=True)
+adata.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/processors/knn/.config.vsh.yaml b/target/docker/datasets/processors/knn/.config.vsh.yaml
new file mode 100644
index 0000000000..38ed76c0c5
--- /dev/null
+++ b/target/docker/datasets/processors/knn/.config.vsh.yaml
@@ -0,0 +1,617 @@
+functionality:
+  name: "knn"
+  namespace: "datasets/processors"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset+HVG+PCA"
+      summary: "A normalised dataset with a PCA embedding"
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        varm:
+        - type: "double"
+          name: "pca_loadings"
+          description: "The PCA loadings matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "double"
+          name: "pca_variance"
+          description: "The PCA variance objects."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/pca.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--input_layer"
+    description: "Which layer to use as input."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Dataset+HVG+PCA+kNN"
+      summary: "A normalised data with a PCA embedding, HVG selection and a kNN graph"
+      slots:
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "double"
+          name: "pca_variance"
+          description: "The PCA variance objects."
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        varm:
+        - type: "double"
+          name: "pca_loadings"
+          description: "The PCA loadings matrix."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/knn.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--key_added"
+    description: "The neighbors data is added to `.uns[key_added]`, \ndistances are\
+      \ stored in `.obsp[key_added+'_distances']` and \nconnectivities in `.obsp[key_added+'_connectivities']`.\n"
+    info: null
+    default:
+    - "knn"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--num_neighbors"
+    description: "The size of local neighborhood (in terms of number of neighboring\
+      \ data points) used for manifold approximation."
+    info: null
+    default:
+    - 15
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Compute KNN"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_processor"
+    type_info:
+      label: "KNN"
+      summary: "Computes the k-nearest-neighbours for each cell.\n"
+      description: "The resulting AnnData will contain both the knn distances and\
+        \ the knn connectivities in 'obsp'.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/knn/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/processors/knn"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/processors/knn/knn"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/processors/knn/knn b/target/docker/datasets/processors/knn/knn
new file mode 100755
index 0000000000..afbe9fbe06
--- /dev/null
+++ b/target/docker/datasets/processors/knn/knn
@@ -0,0 +1,1010 @@
+#!/usr/bin/env bash
+
+# knn 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="knn"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "knn 2.0.0"
+  echo ""
+  echo "Compute KNN"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/common/pancreas/pca.h5ad"
+  echo ""
+  echo "    --input_layer"
+  echo "        type: string"
+  echo "        default: normalized"
+  echo "        Which layer to use as input."
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/common/pancreas/knn.h5ad"
+  echo ""
+  echo "    --key_added"
+  echo "        type: string"
+  echo "        default: knn"
+  echo "        The neighbors data is added to \`.uns[key_added]\`,"
+  echo "        distances are stored in \`.obsp[key_added+'_distances']\` and"
+  echo "        connectivities in \`.obsp[key_added+'_connectivities']\`."
+  echo ""
+  echo "    --num_neighbors"
+  echo "        type: integer"
+  echo "        default: 15"
+  echo "        The size of local neighborhood (in terms of number of neighboring data"
+  echo "        points) used for manifold approximation."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component datasets/processors knn"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:46Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-knn-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "knn 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_layer)
+            [ -n "$VIASH_PAR_INPUT_LAYER" ] && ViashError Bad arguments for option \'--input_layer\': \'$VIASH_PAR_INPUT_LAYER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_LAYER="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_layer. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_layer=*)
+            [ -n "$VIASH_PAR_INPUT_LAYER" ] && ViashError Bad arguments for option \'--input_layer=*\': \'$VIASH_PAR_INPUT_LAYER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_LAYER=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --key_added)
+            [ -n "$VIASH_PAR_KEY_ADDED" ] && ViashError Bad arguments for option \'--key_added\': \'$VIASH_PAR_KEY_ADDED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_KEY_ADDED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --key_added. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --key_added=*)
+            [ -n "$VIASH_PAR_KEY_ADDED" ] && ViashError Bad arguments for option \'--key_added=*\': \'$VIASH_PAR_KEY_ADDED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_KEY_ADDED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --num_neighbors)
+            [ -n "$VIASH_PAR_NUM_NEIGHBORS" ] && ViashError Bad arguments for option \'--num_neighbors\': \'$VIASH_PAR_NUM_NEIGHBORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_NEIGHBORS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --num_neighbors. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --num_neighbors=*)
+            [ -n "$VIASH_PAR_NUM_NEIGHBORS" ] && ViashError Bad arguments for option \'--num_neighbors=*\': \'$VIASH_PAR_NUM_NEIGHBORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_NEIGHBORS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/processors/knn:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/processors/knn:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/processors/knn:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/processors/knn:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_INPUT_LAYER+x} ]; then
+  VIASH_PAR_INPUT_LAYER="normalized"
+fi
+if [ -z ${VIASH_PAR_KEY_ADDED+x} ]; then
+  VIASH_PAR_KEY_ADDED="knn"
+fi
+if [ -z ${VIASH_PAR_NUM_NEIGHBORS+x} ]; then
+  VIASH_PAR_NUM_NEIGHBORS="15"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_NUM_NEIGHBORS" ]]; then
+  if ! [[ "$VIASH_PAR_NUM_NEIGHBORS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--num_neighbors' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/processors/knn:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/processors/knn:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/processors/knn:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-knn-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+
+import scanpy as sc
+
+### VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_layer': $( if [ ! -z ${VIASH_PAR_INPUT_LAYER+x} ]; then echo "r'${VIASH_PAR_INPUT_LAYER//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'key_added': $( if [ ! -z ${VIASH_PAR_KEY_ADDED+x} ]; then echo "r'${VIASH_PAR_KEY_ADDED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'num_neighbors': $( if [ ! -z ${VIASH_PAR_NUM_NEIGHBORS+x} ]; then echo "int(r'${VIASH_PAR_NUM_NEIGHBORS//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+### VIASH END
+
+print(">> Load data", flush=True)
+adata = sc.read(par['input'])
+
+print(">> Run kNN", flush=True)
+sc.pp.neighbors(
+    adata,
+    use_rep='X_pca',
+    key_added=par['key_added'],
+    n_neighbors=par['num_neighbors']
+)
+
+print(">> Writing data", flush=True)
+adata.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/processors/pca/.config.vsh.yaml b/target/docker/datasets/processors/pca/.config.vsh.yaml
new file mode 100644
index 0000000000..2235910627
--- /dev/null
+++ b/target/docker/datasets/processors/pca/.config.vsh.yaml
@@ -0,0 +1,623 @@
+functionality:
+  name: "pca"
+  namespace: "datasets/processors"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset+HVG"
+      summary: "A normalised dataset with a PCA embedding and HVG selection."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/hvg.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--input_layer"
+    description: "Which layer to use as input."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--input_var_features"
+    description: "Column name in .var matrix that will be used to select which genes\
+      \ to run the PCA on."
+    info: null
+    default:
+    - "hvg"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Dataset+HVG+PCA"
+      summary: "A normalised dataset with a PCA embedding"
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        varm:
+        - type: "double"
+          name: "pca_loadings"
+          description: "The PCA loadings matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "double"
+          name: "pca_variance"
+          description: "The PCA variance objects."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/pca.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obsm_embedding"
+    description: "In which .obsm slot to store the resulting embedding."
+    info: null
+    default:
+    - "X_pca"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--varm_loadings"
+    description: "In which .varm slot to store the resulting loadings matrix."
+    info: null
+    default:
+    - "pca_loadings"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--uns_variance"
+    description: "In which .uns slot to store the resulting variance objects."
+    info: null
+    default:
+    - "pca_variance"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--num_components"
+    description: "Number of principal components to compute. Defaults to 50, or 1\
+      \ - minimum dimension size of selected representation."
+    info: null
+    example:
+    - 25
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Compute PCA"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_processor"
+    type_info:
+      label: "PCA"
+      summary: "Computes a PCA embedding of the normalized data.\n"
+      description: "The resulting AnnData will contain an embedding in obsm, as well\
+        \ as optional loadings in 'varm'."
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/pca/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/processors/pca"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/processors/pca/pca"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/processors/pca/pca b/target/docker/datasets/processors/pca/pca
new file mode 100755
index 0000000000..82137c6d9b
--- /dev/null
+++ b/target/docker/datasets/processors/pca/pca
@@ -0,0 +1,1076 @@
+#!/usr/bin/env bash
+
+# pca 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="pca"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "pca 2.0.0"
+  echo ""
+  echo "Compute PCA"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/common/pancreas/hvg.h5ad"
+  echo ""
+  echo "    --input_layer"
+  echo "        type: string"
+  echo "        default: normalized"
+  echo "        Which layer to use as input."
+  echo ""
+  echo "    --input_var_features"
+  echo "        type: string"
+  echo "        default: hvg"
+  echo "        Column name in .var matrix that will be used to select which genes to"
+  echo "        run the PCA on."
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/common/pancreas/pca.h5ad"
+  echo ""
+  echo "    --obsm_embedding"
+  echo "        type: string"
+  echo "        default: X_pca"
+  echo "        In which .obsm slot to store the resulting embedding."
+  echo ""
+  echo "    --varm_loadings"
+  echo "        type: string"
+  echo "        default: pca_loadings"
+  echo "        In which .varm slot to store the resulting loadings matrix."
+  echo ""
+  echo "    --uns_variance"
+  echo "        type: string"
+  echo "        default: pca_variance"
+  echo "        In which .uns slot to store the resulting variance objects."
+  echo ""
+  echo "    --num_components"
+  echo "        type: integer"
+  echo "        example: 25"
+  echo "        Number of principal components to compute. Defaults to 50, or 1 -"
+  echo "        minimum dimension size of selected representation."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component datasets/processors pca"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:45Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-pca-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "pca 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_layer)
+            [ -n "$VIASH_PAR_INPUT_LAYER" ] && ViashError Bad arguments for option \'--input_layer\': \'$VIASH_PAR_INPUT_LAYER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_LAYER="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_layer. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_layer=*)
+            [ -n "$VIASH_PAR_INPUT_LAYER" ] && ViashError Bad arguments for option \'--input_layer=*\': \'$VIASH_PAR_INPUT_LAYER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_LAYER=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_var_features)
+            [ -n "$VIASH_PAR_INPUT_VAR_FEATURES" ] && ViashError Bad arguments for option \'--input_var_features\': \'$VIASH_PAR_INPUT_VAR_FEATURES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_VAR_FEATURES="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_var_features. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_var_features=*)
+            [ -n "$VIASH_PAR_INPUT_VAR_FEATURES" ] && ViashError Bad arguments for option \'--input_var_features=*\': \'$VIASH_PAR_INPUT_VAR_FEATURES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_VAR_FEATURES=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obsm_embedding)
+            [ -n "$VIASH_PAR_OBSM_EMBEDDING" ] && ViashError Bad arguments for option \'--obsm_embedding\': \'$VIASH_PAR_OBSM_EMBEDDING\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBSM_EMBEDDING="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obsm_embedding. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obsm_embedding=*)
+            [ -n "$VIASH_PAR_OBSM_EMBEDDING" ] && ViashError Bad arguments for option \'--obsm_embedding=*\': \'$VIASH_PAR_OBSM_EMBEDDING\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBSM_EMBEDDING=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --varm_loadings)
+            [ -n "$VIASH_PAR_VARM_LOADINGS" ] && ViashError Bad arguments for option \'--varm_loadings\': \'$VIASH_PAR_VARM_LOADINGS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VARM_LOADINGS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --varm_loadings. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --varm_loadings=*)
+            [ -n "$VIASH_PAR_VARM_LOADINGS" ] && ViashError Bad arguments for option \'--varm_loadings=*\': \'$VIASH_PAR_VARM_LOADINGS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VARM_LOADINGS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --uns_variance)
+            [ -n "$VIASH_PAR_UNS_VARIANCE" ] && ViashError Bad arguments for option \'--uns_variance\': \'$VIASH_PAR_UNS_VARIANCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_UNS_VARIANCE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --uns_variance. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --uns_variance=*)
+            [ -n "$VIASH_PAR_UNS_VARIANCE" ] && ViashError Bad arguments for option \'--uns_variance=*\': \'$VIASH_PAR_UNS_VARIANCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_UNS_VARIANCE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --num_components)
+            [ -n "$VIASH_PAR_NUM_COMPONENTS" ] && ViashError Bad arguments for option \'--num_components\': \'$VIASH_PAR_NUM_COMPONENTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_COMPONENTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --num_components. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --num_components=*)
+            [ -n "$VIASH_PAR_NUM_COMPONENTS" ] && ViashError Bad arguments for option \'--num_components=*\': \'$VIASH_PAR_NUM_COMPONENTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_COMPONENTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/processors/pca:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/processors/pca:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/processors/pca:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/processors/pca:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_INPUT_LAYER+x} ]; then
+  VIASH_PAR_INPUT_LAYER="normalized"
+fi
+if [ -z ${VIASH_PAR_INPUT_VAR_FEATURES+x} ]; then
+  VIASH_PAR_INPUT_VAR_FEATURES="hvg"
+fi
+if [ -z ${VIASH_PAR_OBSM_EMBEDDING+x} ]; then
+  VIASH_PAR_OBSM_EMBEDDING="X_pca"
+fi
+if [ -z ${VIASH_PAR_VARM_LOADINGS+x} ]; then
+  VIASH_PAR_VARM_LOADINGS="pca_loadings"
+fi
+if [ -z ${VIASH_PAR_UNS_VARIANCE+x} ]; then
+  VIASH_PAR_UNS_VARIANCE="pca_variance"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_NUM_COMPONENTS" ]]; then
+  if ! [[ "$VIASH_PAR_NUM_COMPONENTS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--num_components' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/processors/pca:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/processors/pca:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/processors/pca:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-pca-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+
+import scanpy as sc
+
+### VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_layer': $( if [ ! -z ${VIASH_PAR_INPUT_LAYER+x} ]; then echo "r'${VIASH_PAR_INPUT_LAYER//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_var_features': $( if [ ! -z ${VIASH_PAR_INPUT_VAR_FEATURES+x} ]; then echo "r'${VIASH_PAR_INPUT_VAR_FEATURES//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'obsm_embedding': $( if [ ! -z ${VIASH_PAR_OBSM_EMBEDDING+x} ]; then echo "r'${VIASH_PAR_OBSM_EMBEDDING//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'varm_loadings': $( if [ ! -z ${VIASH_PAR_VARM_LOADINGS+x} ]; then echo "r'${VIASH_PAR_VARM_LOADINGS//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'uns_variance': $( if [ ! -z ${VIASH_PAR_UNS_VARIANCE+x} ]; then echo "r'${VIASH_PAR_UNS_VARIANCE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'num_components': $( if [ ! -z ${VIASH_PAR_NUM_COMPONENTS+x} ]; then echo "int(r'${VIASH_PAR_NUM_COMPONENTS//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+### VIASH END
+
+print(">> Load data", flush=True)
+adata = sc.read(par['input'])
+
+print(">> Look for layer", flush=True)
+layer = adata.X if not par['input_layer'] else adata.layers[par['input_layer']]
+
+print(">> Run PCA", flush=True)
+X_pca, loadings, variance, variance_ratio = sc.tl.pca(
+    layer, 
+    n_comps=par["num_components"], 
+    return_info=True
+)
+
+print(">> Storing output", flush=True)
+adata.obsm[par["obsm_embedding"]] = X_pca
+adata.varm[par["varm_loadings"]] = loadings.T
+adata.uns[par["uns_variance"]] = {
+    "variance": variance, 
+    "variance_ratio": variance_ratio
+}
+
+print(">> Writing data", flush=True)
+adata.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/processors/subsample/.config.vsh.yaml b/target/docker/datasets/processors/subsample/.config.vsh.yaml
new file mode 100644
index 0000000000..19ca5042c2
--- /dev/null
+++ b/target/docker/datasets/processors/subsample/.config.vsh.yaml
@@ -0,0 +1,1188 @@
+functionality:
+  name: "subsample"
+  namespace: "datasets/processors"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Common dataset"
+      summary: "A dataset processed by the common dataset processing pipeline."
+      description: "This dataset contains both raw counts and normalized data matrices,\n\
+        as well as a PCA embedding, HVG selection and a kNN graph."
+      slots:
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "double"
+          name: "pca_variance"
+          description: "The PCA variance objects."
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        varm:
+        - type: "double"
+          name: "pca_loadings"
+          description: "The PCA loadings matrix."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+    example:
+    - "resources_test/common/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Common dataset"
+      summary: "A dataset processed by the common dataset processing pipeline."
+      description: "This dataset contains both raw counts and normalized data matrices,\n\
+        as well as a PCA embedding, HVG selection and a kNN graph."
+      slots:
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "double"
+          name: "pca_variance"
+          description: "The PCA variance objects."
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        varm:
+        - type: "double"
+          name: "pca_loadings"
+          description: "The PCA loadings matrix."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+    example:
+    - "resources_test/common/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Common dataset"
+      summary: "A dataset processed by the common dataset processing pipeline."
+      description: "This dataset contains both raw counts and normalized data matrices,\n\
+        as well as a PCA embedding, HVG selection and a kNN graph."
+      slots:
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "double"
+          name: "pca_variance"
+          description: "The PCA variance objects."
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        varm:
+        - type: "double"
+          name: "pca_loadings"
+          description: "The PCA loadings matrix."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+    example:
+    - "resources_test/common/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod2"
+    info:
+      label: "Common dataset"
+      summary: "A dataset processed by the common dataset processing pipeline."
+      description: "This dataset contains both raw counts and normalized data matrices,\n\
+        as well as a PCA embedding, HVG selection and a kNN graph."
+      slots:
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "double"
+          name: "pca_variance"
+          description: "The PCA variance objects."
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        varm:
+        - type: "double"
+          name: "pca_loadings"
+          description: "The PCA loadings matrix."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+    example:
+    - "resources_test/common/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_obs"
+    description: "Maximum number of observations to be kept. It might end up being\
+      \ less because empty cells / genes are removed."
+    info: null
+    default:
+    - 500
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_vars"
+    description: "Maximum number of variables to be kept. It might end up being less\
+      \ because empty cells / genes are removed."
+    info: null
+    default:
+    - 500
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--keep_features"
+    description: "A list of genes to keep."
+    info: null
+    required: false
+    direction: "input"
+    multiple: true
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--keep_cell_type_categories"
+    description: "Cell type indexes to be selected"
+    info: null
+    required: false
+    direction: "input"
+    multiple: true
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--keep_batch_categories"
+    description: "Categories indexes to be selected"
+    info: null
+    required: false
+    direction: "input"
+    multiple: true
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean_true"
+    name: "--even"
+    description: "Subsample evenly from different batches"
+    info: null
+    direction: "input"
+    dest: "par"
+  - type: "integer"
+    name: "--seed"
+    description: "A seed for the subsampling."
+    info: null
+    example:
+    - 123
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Subsample an h5ad file"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "python_script"
+    path: "test_script.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/common/pancreas"
+  info:
+    type: "dataset_processor"
+    type_info:
+      label: "Subset"
+      summary: "Sample cells and genes randomly."
+      description: "This component subsets the layers, obs and var to create smaller\
+        \ test datasets."
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  test_setup:
+  - type: "python"
+    user: false
+    packages:
+    - "viashpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/processors/subsample"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/processors/subsample/subsample"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/processors/subsample/subsample b/target/docker/datasets/processors/subsample/subsample
new file mode 100755
index 0000000000..fc2ff361a6
--- /dev/null
+++ b/target/docker/datasets/processors/subsample/subsample
@@ -0,0 +1,1272 @@
+#!/usr/bin/env bash
+
+# subsample 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="subsample"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "subsample 2.0.0"
+  echo ""
+  echo "Subsample an h5ad file"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/common/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --input_mod2"
+  echo "        type: file, file must exist"
+  echo "        example: resources_test/common/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/common/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output_mod2"
+  echo "        type: file, output, file must exist"
+  echo "        example: resources_test/common/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --n_obs"
+  echo "        type: integer"
+  echo "        default: 500"
+  echo "        Maximum number of observations to be kept. It might end up being less"
+  echo "        because empty cells / genes are removed."
+  echo ""
+  echo "    --n_vars"
+  echo "        type: integer"
+  echo "        default: 500"
+  echo "        Maximum number of variables to be kept. It might end up being less"
+  echo "        because empty cells / genes are removed."
+  echo ""
+  echo "    --keep_features"
+  echo "        type: string, multiple values allowed"
+  echo "        A list of genes to keep."
+  echo ""
+  echo "    --keep_cell_type_categories"
+  echo "        type: string, multiple values allowed"
+  echo "        Cell type indexes to be selected"
+  echo ""
+  echo "    --keep_batch_categories"
+  echo "        type: string, multiple values allowed"
+  echo "        Categories indexes to be selected"
+  echo ""
+  echo "    --even"
+  echo "        type: boolean_true"
+  echo "        Subsample evenly from different batches"
+  echo ""
+  echo "    --seed"
+  echo "        type: integer"
+  echo "        example: 123"
+  echo "        A seed for the subsampling."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component datasets/processors subsample"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:45Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-subsample-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "subsample 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_mod2)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2=*\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod2)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod2=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2=*\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_obs)
+            [ -n "$VIASH_PAR_N_OBS" ] && ViashError Bad arguments for option \'--n_obs\': \'$VIASH_PAR_N_OBS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_OBS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_obs. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_obs=*)
+            [ -n "$VIASH_PAR_N_OBS" ] && ViashError Bad arguments for option \'--n_obs=*\': \'$VIASH_PAR_N_OBS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_OBS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_vars)
+            [ -n "$VIASH_PAR_N_VARS" ] && ViashError Bad arguments for option \'--n_vars\': \'$VIASH_PAR_N_VARS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_VARS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_vars. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_vars=*)
+            [ -n "$VIASH_PAR_N_VARS" ] && ViashError Bad arguments for option \'--n_vars=*\': \'$VIASH_PAR_N_VARS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_VARS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --keep_features)
+            if [ -z "$VIASH_PAR_KEEP_FEATURES" ]; then
+              VIASH_PAR_KEEP_FEATURES="$2"
+            else
+              VIASH_PAR_KEEP_FEATURES="$VIASH_PAR_KEEP_FEATURES:""$2"
+            fi
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --keep_features. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --keep_features=*)
+            if [ -z "$VIASH_PAR_KEEP_FEATURES" ]; then
+              VIASH_PAR_KEEP_FEATURES=$(ViashRemoveFlags "$1")
+            else
+              VIASH_PAR_KEEP_FEATURES="$VIASH_PAR_KEEP_FEATURES:"$(ViashRemoveFlags "$1")
+            fi
+            shift 1
+            ;;
+        --keep_cell_type_categories)
+            if [ -z "$VIASH_PAR_KEEP_CELL_TYPE_CATEGORIES" ]; then
+              VIASH_PAR_KEEP_CELL_TYPE_CATEGORIES="$2"
+            else
+              VIASH_PAR_KEEP_CELL_TYPE_CATEGORIES="$VIASH_PAR_KEEP_CELL_TYPE_CATEGORIES:""$2"
+            fi
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --keep_cell_type_categories. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --keep_cell_type_categories=*)
+            if [ -z "$VIASH_PAR_KEEP_CELL_TYPE_CATEGORIES" ]; then
+              VIASH_PAR_KEEP_CELL_TYPE_CATEGORIES=$(ViashRemoveFlags "$1")
+            else
+              VIASH_PAR_KEEP_CELL_TYPE_CATEGORIES="$VIASH_PAR_KEEP_CELL_TYPE_CATEGORIES:"$(ViashRemoveFlags "$1")
+            fi
+            shift 1
+            ;;
+        --keep_batch_categories)
+            if [ -z "$VIASH_PAR_KEEP_BATCH_CATEGORIES" ]; then
+              VIASH_PAR_KEEP_BATCH_CATEGORIES="$2"
+            else
+              VIASH_PAR_KEEP_BATCH_CATEGORIES="$VIASH_PAR_KEEP_BATCH_CATEGORIES:""$2"
+            fi
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --keep_batch_categories. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --keep_batch_categories=*)
+            if [ -z "$VIASH_PAR_KEEP_BATCH_CATEGORIES" ]; then
+              VIASH_PAR_KEEP_BATCH_CATEGORIES=$(ViashRemoveFlags "$1")
+            else
+              VIASH_PAR_KEEP_BATCH_CATEGORIES="$VIASH_PAR_KEEP_BATCH_CATEGORIES:"$(ViashRemoveFlags "$1")
+            fi
+            shift 1
+            ;;
+        --even)
+            [ -n "$VIASH_PAR_EVEN" ] && ViashError Bad arguments for option \'--even\': \'$VIASH_PAR_EVEN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_EVEN=true
+            shift 1
+            ;;
+        --seed)
+            [ -n "$VIASH_PAR_SEED" ] && ViashError Bad arguments for option \'--seed\': \'$VIASH_PAR_SEED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SEED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --seed. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --seed=*)
+            [ -n "$VIASH_PAR_SEED" ] && ViashError Bad arguments for option \'--seed=*\': \'$VIASH_PAR_SEED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SEED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/processors/subsample:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/processors/subsample:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/processors/subsample:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/processors/subsample:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_OBS+x} ]; then
+  VIASH_PAR_N_OBS="500"
+fi
+if [ -z ${VIASH_PAR_N_VARS+x} ]; then
+  VIASH_PAR_N_VARS="500"
+fi
+if [ -z ${VIASH_PAR_EVEN+x} ]; then
+  VIASH_PAR_EVEN="false"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD2' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_OBS" ]]; then
+  if ! [[ "$VIASH_PAR_N_OBS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_obs' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_VARS" ]]; then
+  if ! [[ "$VIASH_PAR_N_VARS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_vars' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_EVEN" ]]; then
+  if ! [[ "$VIASH_PAR_EVEN" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--even' has to be a boolean_true. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_SEED" ]]; then
+  if ! [[ "$VIASH_PAR_SEED" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--seed' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD2")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD2")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD2")" )
+  VIASH_PAR_INPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD2")" )
+  VIASH_PAR_OUTPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD2")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD2" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/processors/subsample:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/processors/subsample:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/processors/subsample:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-subsample-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import scanpy as sc
+import random
+import numpy as np
+
+### VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_obs': $( if [ ! -z ${VIASH_PAR_N_OBS+x} ]; then echo "int(r'${VIASH_PAR_N_OBS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_vars': $( if [ ! -z ${VIASH_PAR_N_VARS+x} ]; then echo "int(r'${VIASH_PAR_N_VARS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'keep_features': $( if [ ! -z ${VIASH_PAR_KEEP_FEATURES+x} ]; then echo "r'${VIASH_PAR_KEEP_FEATURES//\'/\'\"\'\"r\'}'.split(':')"; else echo None; fi ),
+  'keep_cell_type_categories': $( if [ ! -z ${VIASH_PAR_KEEP_CELL_TYPE_CATEGORIES+x} ]; then echo "r'${VIASH_PAR_KEEP_CELL_TYPE_CATEGORIES//\'/\'\"\'\"r\'}'.split(':')"; else echo None; fi ),
+  'keep_batch_categories': $( if [ ! -z ${VIASH_PAR_KEEP_BATCH_CATEGORIES+x} ]; then echo "r'${VIASH_PAR_KEEP_BATCH_CATEGORIES//\'/\'\"\'\"r\'}'.split(':')"; else echo None; fi ),
+  'even': $( if [ ! -z ${VIASH_PAR_EVEN+x} ]; then echo "r'${VIASH_PAR_EVEN//\'/\'\"\'\"r\'}'.lower() == 'true'"; else echo None; fi ),
+  'seed': $( if [ ! -z ${VIASH_PAR_SEED+x} ]; then echo "int(r'${VIASH_PAR_SEED//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+### VIASH END
+
+if par["seed"]:
+    print(f">> Setting seed to {par['seed']}", flush=True)
+    random.seed(par["seed"])
+
+print(">> Load data", flush=True)
+adata_input = sc.read_h5ad(par["input"])
+
+if par["input_mod2"] is not None:
+    adata_mod2 = sc.read_h5ad(par["input_mod2"])
+
+# copy counts to .X because otherwise filter_genes and filter_cells won't work
+adata_input.X = adata_input.layers["counts"]
+if par["input_mod2"] is not None:
+    adata_mod2.X = adata_mod2.layers["counts"]
+
+print(">> Determining output shape", flush=True)
+min_obs_list = [par["n_obs"], adata_input.shape[0]]
+if par["input_mod2"] is not None:
+    min_obs_list.append(adata_mod2.shape[0])
+n_obs = min(min_obs_list)
+
+min_vars_list = [par["n_vars"], adata_input.shape[1]]
+if par["input_mod2"] is not None:
+    min_vars_list.append(adata_mod2.shape[1])
+n_vars = min(min_vars_list)
+
+print(">> Subsampling the observations", flush=True)
+obs_filt = np.ones(dtype=np.bool_, shape=adata_input.n_obs)
+
+# subset by cell_type
+if par.get("keep_cell_type_categories"):
+    print(f">> Selecting cell_type_categories {par['keep_cell_type_categories']}")
+    obs_filt = obs_filt & adata_input.obs["cell_type"].isin(par["keep_cell_type_categories"])
+
+# subset by batch
+if par.get("keep_batch_categories"):
+    print(f">> Selecting cell_type_categories {par['keep_batch_categories']}")
+    obs_filt = obs_filt & adata_input.obs["batch"].isin(par["keep_batch_categories"])
+
+# subsample evenly across batches or not
+if par.get("even"):
+    obs_evenly = "batch"
+    choice_ix = np.where(obs_filt)[0]
+    choice_batch = adata_input[choice_ix].obs[obs_evenly]
+    names, counts = np.unique(choice_batch, return_counts=True)
+    probs = dict(zip(names, 1 / counts / len(names)))
+    
+    choice_probs = [ probs[batch] for batch in choice_batch ]
+    obs_index = np.random.choice(choice_ix, size=n_obs, replace=False, p=choice_probs)
+else:
+    obs_index = np.random.choice(np.where(obs_filt)[0], n_obs, replace=False)
+
+# subsample obs
+adata_output = adata_input[obs_index].copy()
+if par["input_mod2"] is not None:
+    adata_output_mod2 = adata_mod2[obs_index].copy()
+
+# filter cells and genes
+if par["input_mod2"] is not None:
+    n_cells =  adata_output.X.sum(axis=1).A.flatten()
+    n_cells_mod2 =  adata_output_mod2.X.sum(axis=1).A.flatten()
+    keep_cells = np.minimum(n_cells, n_cells_mod2) > 1
+    adata_output = adata_output[keep_cells, :].copy()
+    adata_output_mod2 = adata_output_mod2[keep_cells, :].copy()
+
+    sc.pp.filter_genes(adata_output, min_cells=1)
+    sc.pp.filter_genes(adata_output_mod2, min_cells=1)
+    
+else:
+    # todo: this should not remove features in keep_features!
+    print(">> Remove empty observations and features", flush=True)
+    sc.pp.filter_genes(adata_output, min_cells=1)
+    sc.pp.filter_cells(adata_output, min_counts=2)
+
+print(">> Subsampling the features", flush=True)
+if par.get("keep_features"):
+    initial_filt = adata_output.var_names.isin(par["keep_features"])
+    initial_idx, *_ = initial_filt.nonzero()
+    remaining_idx, *_ = (~initial_filt).nonzero()
+    rest_idx = remaining_idx[np.random.choice(len(remaining_idx), n_vars - len(initial_idx), replace=False)]
+    var_ix = np.concatenate([initial_idx, rest_idx])
+else:
+    var_ix = np.random.choice(adata_output.shape[1], n_vars, replace=False)
+    if par["input_mod2"] is not None:
+        var_ix_mod2 = np.random.choice(adata_output_mod2.shape[1], n_vars, replace=False)
+
+#  subsample vars
+adata_output = adata_output[:, var_ix].copy()
+if par["input_mod2"] is not None:
+    adata_output_mod2 = adata_output_mod2[:, var_ix_mod2].copy()
+
+# filter cells and genes
+if par["input_mod2"] is not None:
+    n_cells =  adata_output.X.sum(axis=1).A.flatten()
+    n_cells_mod2 =  adata_output_mod2.X.sum(axis=1).A.flatten()
+    keep_cells = np.minimum(n_cells, n_cells_mod2) > 1
+    adata_output = adata_output[keep_cells, :].copy()
+    adata_output_mod2 = adata_output_mod2[keep_cells, :].copy()
+
+    sc.pp.filter_genes(adata_output, min_cells=1)
+    sc.pp.filter_genes(adata_output_mod2, min_cells=1)
+    
+
+else:
+    # todo: this should not remove features in keep_features!
+    print(">> Remove empty observations and features", flush=True)
+    sc.pp.filter_genes(adata_output, min_cells=1)
+    sc.pp.filter_cells(adata_output, min_counts=2)
+
+print(">> Update dataset_id", flush=True)
+adata_output.uns["dataset_id"] = adata_output.uns["dataset_id"] + "_subsample"
+if par["input_mod2"] is not None:
+    adata_output_mod2.uns["dataset_id"] = adata_output_mod2.uns["dataset_id"] + "_subsample"
+
+# remove previously copied .X
+del adata_output.X
+if par["input_mod2"] is not None:
+    del adata_output_mod2.X
+
+print(">> Writing data", flush=True)
+adata_output.write_h5ad(par["output"])
+if par["output_mod2"] is not None:
+    adata_output_mod2.write_h5ad(par["output_mod2"])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_PAR_INPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_PAR_OUTPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD2")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD2' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/datasets/processors/svd/.config.vsh.yaml b/target/docker/datasets/processors/svd/.config.vsh.yaml
new file mode 100644
index 0000000000..7434f6b84e
--- /dev/null
+++ b/target/docker/datasets/processors/svd/.config.vsh.yaml
@@ -0,0 +1,1014 @@
+functionality:
+  name: "svd"
+  namespace: "datasets/processors"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Normalized dataset"
+      summary: "A normalized dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/normalized.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Normalized dataset"
+      summary: "A normalized dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/normalized.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--input_layer"
+    description: "Which layer to use as input."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Dataset+SVD"
+      summary: "A normalised dataset with a SVD embedding"
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD embedding."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/svd.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod2"
+    info:
+      label: "Dataset+SVD"
+      summary: "A normalised dataset with a SVD embedding"
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD embedding."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/svd.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obsm_embedding"
+    description: "In which .obsm slot to store the resulting embedding."
+    info: null
+    default:
+    - "X_svd"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--num_components"
+    description: "Number of principal components to compute. Defaults to 100, or 1\
+      \ - minimum dimension size of selected representation."
+    info: null
+    default:
+    - 100
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Compute SVD pca reduction"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_processor"
+    type_info:
+      label: "SVD"
+      summary: "Computes a SVD PCA embedding of the normalized data.\n"
+      description: "The resulting AnnData will contain an embedding in obsm."
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/svd/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/datasets/processors/svd"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/datasets/processors/svd/svd"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/datasets/processors/svd/svd b/target/docker/datasets/processors/svd/svd
new file mode 100755
index 0000000000..07e86f995b
--- /dev/null
+++ b/target/docker/datasets/processors/svd/svd
@@ -0,0 +1,1085 @@
+#!/usr/bin/env bash
+
+# svd 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="svd"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "svd 2.0.0"
+  echo ""
+  echo "Compute SVD pca reduction"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/common/pancreas/normalized.h5ad"
+  echo ""
+  echo "    --input_mod2"
+  echo "        type: file, file must exist"
+  echo "        example: resources_test/common/pancreas/normalized.h5ad"
+  echo ""
+  echo "    --input_layer"
+  echo "        type: string"
+  echo "        default: normalized"
+  echo "        Which layer to use as input."
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/common/pancreas/svd.h5ad"
+  echo ""
+  echo "    --output_mod2"
+  echo "        type: file, output, file must exist"
+  echo "        example: resources_test/common/pancreas/svd.h5ad"
+  echo ""
+  echo "    --obsm_embedding"
+  echo "        type: string"
+  echo "        default: X_svd"
+  echo "        In which .obsm slot to store the resulting embedding."
+  echo ""
+  echo "    --num_components"
+  echo "        type: integer"
+  echo "        default: 100"
+  echo "        Number of principal components to compute. Defaults to 100, or 1 -"
+  echo "        minimum dimension size of selected representation."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scikit-learn"
+
+LABEL org.opencontainers.image.description="Companion container for running component datasets/processors svd"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:45Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-svd-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "svd 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_mod2)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2=*\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_layer)
+            [ -n "$VIASH_PAR_INPUT_LAYER" ] && ViashError Bad arguments for option \'--input_layer\': \'$VIASH_PAR_INPUT_LAYER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_LAYER="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_layer. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_layer=*)
+            [ -n "$VIASH_PAR_INPUT_LAYER" ] && ViashError Bad arguments for option \'--input_layer=*\': \'$VIASH_PAR_INPUT_LAYER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_LAYER=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod2)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod2=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2=*\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obsm_embedding)
+            [ -n "$VIASH_PAR_OBSM_EMBEDDING" ] && ViashError Bad arguments for option \'--obsm_embedding\': \'$VIASH_PAR_OBSM_EMBEDDING\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBSM_EMBEDDING="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obsm_embedding. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obsm_embedding=*)
+            [ -n "$VIASH_PAR_OBSM_EMBEDDING" ] && ViashError Bad arguments for option \'--obsm_embedding=*\': \'$VIASH_PAR_OBSM_EMBEDDING\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBSM_EMBEDDING=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --num_components)
+            [ -n "$VIASH_PAR_NUM_COMPONENTS" ] && ViashError Bad arguments for option \'--num_components\': \'$VIASH_PAR_NUM_COMPONENTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_COMPONENTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --num_components. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --num_components=*)
+            [ -n "$VIASH_PAR_NUM_COMPONENTS" ] && ViashError Bad arguments for option \'--num_components=*\': \'$VIASH_PAR_NUM_COMPONENTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_COMPONENTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/processors/svd:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/datasets/processors/svd:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/processors/svd:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/datasets/processors/svd:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_INPUT_LAYER+x} ]; then
+  VIASH_PAR_INPUT_LAYER="normalized"
+fi
+if [ -z ${VIASH_PAR_OBSM_EMBEDDING+x} ]; then
+  VIASH_PAR_OBSM_EMBEDDING="X_svd"
+fi
+if [ -z ${VIASH_PAR_NUM_COMPONENTS+x} ]; then
+  VIASH_PAR_NUM_COMPONENTS="100"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD2' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_NUM_COMPONENTS" ]]; then
+  if ! [[ "$VIASH_PAR_NUM_COMPONENTS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--num_components' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD2")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD2")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD2")" )
+  VIASH_PAR_INPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD2")" )
+  VIASH_PAR_OUTPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD2")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD2" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/datasets/processors/svd:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/processors/svd:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/datasets/processors/svd:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-svd-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import sklearn.decomposition
+
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_layer': $( if [ ! -z ${VIASH_PAR_INPUT_LAYER+x} ]; then echo "r'${VIASH_PAR_INPUT_LAYER//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'obsm_embedding': $( if [ ! -z ${VIASH_PAR_OBSM_EMBEDDING+x} ]; then echo "r'${VIASH_PAR_OBSM_EMBEDDING//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'num_components': $( if [ ! -z ${VIASH_PAR_NUM_COMPONENTS+x} ]; then echo "int(r'${VIASH_PAR_NUM_COMPONENTS//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print(">> Load data", flush=True)
+adata = ad.read(par["input"])
+if par["input_mod2"] is not None:
+    adata2 = ad.read(par["input_mod2"])
+
+print(">> check parameters", flush=True)
+min_list = [par["num_components"], min(adata.layers[par["input_layer"]].shape) - 1]
+
+if par["input_mod2"] is not None:
+    min_list.append(min(adata2.layers[par["input_layer"]].shape) - 1)
+
+n_svd = min(min_list)
+
+
+print(">> Run SVD", flush=True)
+svd1 = sklearn.decomposition.TruncatedSVD(n_svd).fit_transform(adata.layers[par["input_layer"]])
+if par["input_mod2"] is not None:
+    svd2 = sklearn.decomposition.TruncatedSVD(n_svd).fit_transform(adata2.layers[par["input_layer"]])
+
+print(">> Storing output", flush=True)
+adata.obsm[par["obsm_embedding"]] = svd1
+if par["input_mod2"] is not None:
+    adata2.obsm[par["obsm_embedding"]] = svd2
+
+
+print(">> Writing data", flush=True)
+adata.write_h5ad(par["output"])
+if par["input_mod2"] is not None:
+    adata2.write_h5ad(par["output_mod2"])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_PAR_INPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_PAR_OUTPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD2")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD2' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/denoising/control_methods/no_denoising/.config.vsh.yaml b/target/docker/denoising/control_methods/no_denoising/.config.vsh.yaml
new file mode 100644
index 0000000000..f0bbfd1332
--- /dev/null
+++ b/target/docker/denoising/control_methods/no_denoising/.config.vsh.yaml
@@ -0,0 +1,199 @@
+functionality:
+  name: "no_denoising"
+  namespace: "denoising/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The subset of molecules used for the training dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The subset of molecules used for the test dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "train_sum"
+          type: "integer"
+          description: "The total number of counts in the training dataset."
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Denoised data"
+      summary: "A denoised dataset as output by a denoising method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "denoised"
+          description: "denoised data"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/denoised.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/denoising/pancreas"
+    dest: "resources_test/denoising/pancreas"
+  info:
+    label: "No Denoising"
+    summary: "negative control by copying train counts"
+    description: "This method serves as a negative control, where the denoised data\
+      \ is a copy of the unaltered training data. This represents the scoring threshold\
+      \ if denoising was not performed on the data."
+    v1:
+      path: "openproblems/tasks/denoising/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    variants:
+      no_denoising: null
+    preferred_normalization: "counts"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "These components have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/control_methods/no_denoising/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/denoising/control_methods/no_denoising"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/denoising/control_methods/no_denoising/no_denoising"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/denoising/control_methods/no_denoising/no_denoising b/target/docker/denoising/control_methods/no_denoising/no_denoising
new file mode 100755
index 0000000000..065e9b268a
--- /dev/null
+++ b/target/docker/denoising/control_methods/no_denoising/no_denoising
@@ -0,0 +1,964 @@
+#!/usr/bin/env bash
+
+# no_denoising 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="no_denoising"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "no_denoising 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/denoising/pancreas/train.h5ad"
+  echo ""
+  echo "    --input_test"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/denoising/pancreas/test.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/denoising/pancreas/denoised.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component denoising/control_methods no_denoising"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:38Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-no_denoising-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "no_denoising 2.0.0"
+            exit
+            ;;
+        --input_train)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train=*\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test=*)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test=*\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/denoising/control_methods/no_denoising:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/denoising/control_methods/no_denoising:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/denoising/control_methods/no_denoising:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/denoising/control_methods/no_denoising:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then
+  ViashError '--input_train' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST+x} ]; then
+  ViashError '--input_test' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ] && [ ! -e "$VIASH_PAR_INPUT_TEST" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN")" )
+  VIASH_PAR_INPUT_TRAIN=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST")" )
+  VIASH_PAR_INPUT_TEST=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/denoising/control_methods/no_denoising:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/denoising/control_methods/no_denoising:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/denoising/control_methods/no_denoising:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-no_denoising-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+
+print("Process data", flush=True)
+input_train.layers["denoised"] = input_train.layers['counts']
+
+input_train.uns["method_id"] = meta['functionality_name']
+
+print("Write Data", flush=True)
+input_train.write_h5ad(par['output'],compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_PAR_INPUT_TRAIN=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_PAR_INPUT_TEST=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/denoising/control_methods/perfect_denoising/.config.vsh.yaml b/target/docker/denoising/control_methods/perfect_denoising/.config.vsh.yaml
new file mode 100644
index 0000000000..752887b77b
--- /dev/null
+++ b/target/docker/denoising/control_methods/perfect_denoising/.config.vsh.yaml
@@ -0,0 +1,199 @@
+functionality:
+  name: "perfect_denoising"
+  namespace: "denoising/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The subset of molecules used for the training dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The subset of molecules used for the test dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "train_sum"
+          type: "integer"
+          description: "The total number of counts in the training dataset."
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Denoised data"
+      summary: "A denoised dataset as output by a denoising method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "denoised"
+          description: "denoised data"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/denoised.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/denoising/pancreas"
+    dest: "resources_test/denoising/pancreas"
+  info:
+    label: "Perfect Denoising"
+    summary: "Positive control by copying the test counts"
+    description: "This method serves as a positive control, where the test data is\
+      \ copied 1-to-1 to the denoised data. This makes it seem as if the data is perfectly\
+      \ denoised as it will be compared to the test data in the metrics."
+    v1:
+      path: "openproblems/tasks/denoising/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    variants:
+      perfect_denoising: null
+    preferred_normalization: "counts"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "These components have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/control_methods/perfect_denoising/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/denoising/control_methods/perfect_denoising"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/denoising/control_methods/perfect_denoising/perfect_denoising"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/denoising/control_methods/perfect_denoising/perfect_denoising b/target/docker/denoising/control_methods/perfect_denoising/perfect_denoising
new file mode 100755
index 0000000000..19a1750d1c
--- /dev/null
+++ b/target/docker/denoising/control_methods/perfect_denoising/perfect_denoising
@@ -0,0 +1,965 @@
+#!/usr/bin/env bash
+
+# perfect_denoising 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="perfect_denoising"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "perfect_denoising 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/denoising/pancreas/train.h5ad"
+  echo ""
+  echo "    --input_test"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/denoising/pancreas/test.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/denoising/pancreas/denoised.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component denoising/control_methods perfect_denoising"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:38Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-perfect_denoising-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "perfect_denoising 2.0.0"
+            exit
+            ;;
+        --input_train)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train=*\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test=*)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test=*\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/denoising/control_methods/perfect_denoising:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/denoising/control_methods/perfect_denoising:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/denoising/control_methods/perfect_denoising:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/denoising/control_methods/perfect_denoising:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then
+  ViashError '--input_train' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST+x} ]; then
+  ViashError '--input_test' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ] && [ ! -e "$VIASH_PAR_INPUT_TEST" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN")" )
+  VIASH_PAR_INPUT_TRAIN=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST")" )
+  VIASH_PAR_INPUT_TEST=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/denoising/control_methods/perfect_denoising:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/denoising/control_methods/perfect_denoising:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/denoising/control_methods/perfect_denoising:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-perfect_denoising-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Process data", flush=True)
+input_train.layers["denoised"] = input_test.layers['counts']
+
+input_train.uns["method_id"] = meta['functionality_name']
+
+print("Write Data", flush=True)
+input_train.write_h5ad(par['output'],compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_PAR_INPUT_TRAIN=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_PAR_INPUT_TEST=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/denoising/methods/alra/.config.vsh.yaml b/target/docker/denoising/methods/alra/.config.vsh.yaml
new file mode 100644
index 0000000000..ae2d699560
--- /dev/null
+++ b/target/docker/denoising/methods/alra/.config.vsh.yaml
@@ -0,0 +1,176 @@
+functionality:
+  name: "alra"
+  namespace: "denoising/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The subset of molecules used for the training dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Denoised data"
+      summary: "A denoised dataset as output by a denoising method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "denoised"
+          description: "denoised data"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/denoised.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--norm"
+    description: "Normalization method"
+    info: null
+    default:
+    - "log"
+    required: false
+    choices:
+    - "sqrt"
+    - "log"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/denoising/pancreas"
+    dest: "resources_test/denoising/pancreas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "ALRA"
+    summary: "ALRA imputes missing values in scRNA-seq data by computing rank-k approximation,\
+      \ thresholding by gene, and rescaling the matrix."
+    description: "Adaptively-thresholded Low Rank Approximation (ALRA). \n\nALRA is\
+      \ a method for imputation of missing values in single cell RNA-sequencing data,\
+      \ \ndescribed in the preprint, \"Zero-preserving imputation of scRNA-seq data\
+      \ using low-rank approximation\" \navailable [here](https://www.biorxiv.org/content/early/2018/08/22/397588).\
+      \ Given a \nscRNA-seq expression matrix, ALRA first computes its rank-k approximation\
+      \ using randomized SVD. \nNext, each row (gene) is thresholded by the magnitude\
+      \ of the most negative value of that gene. \nFinally, the matrix is rescaled.\n"
+    reference: "linderman2018zero"
+    repository_url: "https://github.com/KlugerLab/ALRA"
+    documentation_url: "https://github.com/KlugerLab/ALRA/blob/master/README.md"
+    v1:
+      path: "openproblems/tasks/denoising/methods/alra.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    variants:
+      alra: null
+    preferred_normalization: "counts"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A denoising method."
+      description: "A denoising method to remove noise (i.e. technical artifacts)\
+        \ from a dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "Matrix"
+    - "rsvd"
+    github:
+    - "KlugerLab/ALRA"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/alra/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/denoising/methods/alra"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/denoising/methods/alra/alra"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/denoising/methods/alra/alra b/target/docker/denoising/methods/alra/alra
new file mode 100755
index 0000000000..623f505b6f
--- /dev/null
+++ b/target/docker/denoising/methods/alra/alra
@@ -0,0 +1,1010 @@
+#!/usr/bin/env bash
+
+# alra 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="alra"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "alra 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/denoising/pancreas/train.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/denoising/pancreas/denoised.h5ad"
+  echo ""
+  echo "    --norm"
+  echo "        type: string"
+  echo "        default: log"
+  echo "        choices: [ sqrt, log ]"
+  echo "        Normalization method"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_cran(c("Matrix", "rsvd"), repos = "https://cran.rstudio.com")' && \
+  Rscript -e 'remotes::install_github(c("KlugerLab/ALRA"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component denoising/methods alra"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:37Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-alra-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "alra 2.0.0"
+            exit
+            ;;
+        --input_train)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train=*\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --norm)
+            [ -n "$VIASH_PAR_NORM" ] && ViashError Bad arguments for option \'--norm\': \'$VIASH_PAR_NORM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORM="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --norm. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --norm=*)
+            [ -n "$VIASH_PAR_NORM" ] && ViashError Bad arguments for option \'--norm=*\': \'$VIASH_PAR_NORM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORM=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/denoising/methods/alra:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/denoising/methods/alra:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/denoising/methods/alra:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/denoising/methods/alra:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then
+  ViashError '--input_train' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_NORM+x} ]; then
+  VIASH_PAR_NORM="log"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# check whether value is belongs to a set of choices
+if [ ! -z "$VIASH_PAR_NORM" ]; then
+  VIASH_PAR_NORM_CHOICES=("sqrt:log")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_NORM_CHOICES[*]}:" =~ ":$VIASH_PAR_NORM:" ]]; then
+    ViashError '--norm' specified value of \'$VIASH_PAR_NORM\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN")" )
+  VIASH_PAR_INPUT_TRAIN=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/denoising/methods/alra:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/denoising/methods/alra:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/denoising/methods/alra:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-alra-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+cat(">> Loading dependencies\\n")
+library(anndata, warn.conflicts = FALSE)
+library(ALRA, warn.conflicts = FALSE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_train" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "norm" = $( if [ ! -z ${VIASH_PAR_NORM+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_NORM" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat(">> Load input data\\n")
+input_train <- read_h5ad(par\$input_train, backed = "r")
+
+cat(">> Set normalization method\\n")
+if (par\$norm == "sqrt") {
+  norm_fn <- sqrt
+  denorm_fn <- function(x) x^2
+} else if (par\$norm == "log") {
+  norm_fn <- log1p
+  denorm_fn <- expm1
+} else {
+  stop("Unknown normalization method: ", par\$norm)
+}
+
+cat(">> Normalize data\\n")
+data <- as.matrix(input_train\$layers[["counts"]])
+totalPerCell <- rowSums(data)
+data <- sweep(data, 1, totalPerCell, "/")
+data <- norm_fn(data)
+
+cat(">> Run ALRA\\n")
+data <- alra(data)\$A_norm_rank_k_cor_sc
+data <- denorm_fn(data)
+data <- sweep(data, 1, totalPerCell, "*")
+
+cat(">> Store output\\n")
+output <- AnnData(
+  layers = list(denoised = data),
+  obs = input_train\$obs[, c(), drop = FALSE],
+  var = input_train\$var[, c(), drop = FALSE],
+  uns = list(
+    dataset_id = input_train\$uns[["dataset_id"]],
+    method_id = meta\$functionality_name
+  )
+)
+
+cat(">> Write output to file\\n")
+output\$write_h5ad(par\$output, compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_PAR_INPUT_TRAIN=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/denoising/methods/dca/.config.vsh.yaml b/target/docker/denoising/methods/dca/.config.vsh.yaml
new file mode 100644
index 0000000000..c7cfcf6e70
--- /dev/null
+++ b/target/docker/denoising/methods/dca/.config.vsh.yaml
@@ -0,0 +1,177 @@
+functionality:
+  name: "dca"
+  namespace: "denoising/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The subset of molecules used for the training dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Denoised data"
+      summary: "A denoised dataset as output by a denoising method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "denoised"
+          description: "denoised data"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/denoised.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--epochs"
+    description: "Number of total epochs in training"
+    info: null
+    default:
+    - 300
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/denoising/pancreas"
+    dest: "resources_test/denoising/pancreas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "DCA"
+    summary: "A deep autoencoder with ZINB loss function to address the dropout effect\
+      \ in count data"
+    description: "\"Deep Count Autoencoder\n\nRemoves the dropout effect by taking\
+      \ the count structure, overdispersed nature and sparsity of the data into account\
+      \ \nusing a deep autoencoder with zero-inflated negative binomial (ZINB) loss\
+      \ function.\"\n"
+    reference: "eraslan2019single"
+    documentation_url: "https://github.com/theislab/dca#readme"
+    repository_url: "https://github.com/theislab/dca"
+    v1:
+      path: "openproblems/tasks/denoising/methods/dca.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    variants:
+      dca: null
+    preferred_normalization: "counts"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A denoising method."
+      description: "A denoising method to remove noise (i.e. technical artifacts)\
+        \ from a dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.9"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "procps"
+    interactive: false
+  - type: "python"
+    user: false
+    packages:
+    - "anndata~=0.8.0"
+    - "scanpy"
+    - "pyyaml"
+    - "requests"
+    - "jsonschema"
+    - "git+https://github.com/scottgigante-immunai/dca.git@patch-1"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/dca/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/denoising/methods/dca"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/denoising/methods/dca/dca"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/denoising/methods/dca/dca b/target/docker/denoising/methods/dca/dca
new file mode 100755
index 0000000000..f3aef6ef77
--- /dev/null
+++ b/target/docker/denoising/methods/dca/dca
@@ -0,0 +1,983 @@
+#!/usr/bin/env bash
+
+# dca 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="dca"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "dca 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/denoising/pancreas/train.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/denoising/pancreas/denoised.h5ad"
+  echo ""
+  echo "    --epochs"
+  echo "        type: integer"
+  echo "        default: 300"
+  echo "        Number of total epochs in training"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM python:3.9
+
+ENTRYPOINT []
+
+ 
+RUN apt-get update && \
+  DEBIAN_FRONTEND=noninteractive apt-get install -y procps && \
+  rm -rf /var/lib/apt/lists/*
+
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "anndata~=0.8.0" "scanpy" "pyyaml" "requests" "jsonschema" "git+https://github.com/scottgigante-immunai/dca.git@patch-1"
+
+LABEL org.opencontainers.image.description="Companion container for running component denoising/methods dca"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:37Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-dca-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "dca 2.0.0"
+            exit
+            ;;
+        --input_train)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train=*\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --epochs)
+            [ -n "$VIASH_PAR_EPOCHS" ] && ViashError Bad arguments for option \'--epochs\': \'$VIASH_PAR_EPOCHS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_EPOCHS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --epochs. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --epochs=*)
+            [ -n "$VIASH_PAR_EPOCHS" ] && ViashError Bad arguments for option \'--epochs=*\': \'$VIASH_PAR_EPOCHS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_EPOCHS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/denoising/methods/dca:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/denoising/methods/dca:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/denoising/methods/dca:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/denoising/methods/dca:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then
+  ViashError '--input_train' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_EPOCHS+x} ]; then
+  VIASH_PAR_EPOCHS="300"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_EPOCHS" ]]; then
+  if ! [[ "$VIASH_PAR_EPOCHS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--epochs' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN")" )
+  VIASH_PAR_INPUT_TRAIN=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/denoising/methods/dca:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/denoising/methods/dca:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/denoising/methods/dca:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-dca-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+from dca.api import dca
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'epochs': $( if [ ! -z ${VIASH_PAR_EPOCHS+x} ]; then echo "int(r'${VIASH_PAR_EPOCHS//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'], backed="r")
+
+print("Remove unneeded data", flush=True)
+output = ad.AnnData(
+    X=input_train.layers["counts"],
+    obs=input_train.obs[[]],
+    var=input_train.var[[]],
+    uns={
+        "dataset_id": input_train.uns["dataset_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+del input_train
+
+print("Run DCA", flush=True)
+dca(output, epochs=par["epochs"])
+
+print("Move output to correct location", flush=True)
+output.layers["denoised"] = output.X
+del output.X
+
+print("Writing data", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_PAR_INPUT_TRAIN=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/denoising/methods/knn_smoothing/.config.vsh.yaml b/target/docker/denoising/methods/knn_smoothing/.config.vsh.yaml
new file mode 100644
index 0000000000..234455133d
--- /dev/null
+++ b/target/docker/denoising/methods/knn_smoothing/.config.vsh.yaml
@@ -0,0 +1,166 @@
+functionality:
+  name: "knn_smoothing"
+  namespace: "denoising/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The subset of molecules used for the training dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Denoised data"
+      summary: "A denoised dataset as output by a denoising method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "denoised"
+          description: "denoised data"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/denoised.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/denoising/pancreas"
+    dest: "resources_test/denoising/pancreas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "KNN Smoothing"
+    summary: "Iterative kNN-smoothing denoises scRNA-seq data by iteratively increasing\
+      \ the size of neighbourhoods for smoothing until a maximum k value is reached."
+    description: "Iterative kNN-smoothing is a method to repair or denoise noisy scRNA-seq\
+      \ expression matrices. Given a scRNA-seq expression matrix, KNN-smoothing first\
+      \ applies initial normalisation and smoothing. Then, a chosen number of principal\
+      \ components is used to calculate Euclidean distances between cells. Minimally\
+      \ sized neighbourhoods are initially determined from these Euclidean distances,\
+      \ and expression profiles are shared between neighbouring cells. Then, the resultant\
+      \ smoothed matrix is used as input to the next step of smoothing, where the\
+      \ size (k) of the considered neighbourhoods is increased, leading to greater\
+      \ smoothing. This process continues until a chosen maximum k value has been\
+      \ reached, at which point the iteratively smoothed object is then optionally\
+      \ scaled to yield a final result."
+    reference: "wagner2018knearest"
+    documentation_url: "https://github.com/yanailab/knn-smoothing#readme"
+    repository_url: "https://github.com/yanailab/knn-smoothing"
+    v1:
+      path: "openproblems/tasks/denoising/methods/knn_smoothing.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    variants:
+      knn_smoothing: null
+    preferred_normalization: "counts"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A denoising method."
+      description: "A denoising method to remove noise (i.e. technical artifacts)\
+        \ from a dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scipy"
+    github:
+    - "scottgigante-immunai/knn-smoothing@python_package"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/knn_smoothing/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/denoising/methods/knn_smoothing"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/denoising/methods/knn_smoothing/knn_smoothing"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/denoising/methods/knn_smoothing/knn_smoothing b/target/docker/denoising/methods/knn_smoothing/knn_smoothing
new file mode 100755
index 0000000000..3b9bc1c814
--- /dev/null
+++ b/target/docker/denoising/methods/knn_smoothing/knn_smoothing
@@ -0,0 +1,953 @@
+#!/usr/bin/env bash
+
+# knn_smoothing 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="knn_smoothing"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "knn_smoothing 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/denoising/pancreas/train.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/denoising/pancreas/denoised.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scipy" && \
+  pip install --upgrade --no-cache-dir "git+https://github.com/scottgigante-immunai/knn-smoothing@python_package"
+
+LABEL org.opencontainers.image.description="Companion container for running component denoising/methods knn_smoothing"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:37Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-knn_smoothing-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "knn_smoothing 2.0.0"
+            exit
+            ;;
+        --input_train)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train=*\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/denoising/methods/knn_smoothing:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/denoising/methods/knn_smoothing:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/denoising/methods/knn_smoothing:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/denoising/methods/knn_smoothing:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then
+  ViashError '--input_train' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN")" )
+  VIASH_PAR_INPUT_TRAIN=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/denoising/methods/knn_smoothing:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/denoising/methods/knn_smoothing:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/denoising/methods/knn_smoothing:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-knn_smoothing-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import knn_smooth
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par["input_train"], backed="r")
+
+print("Remove unneeded data", flush=True)
+X = input_train.layers["counts"].astype(float).transpose().toarray()
+
+# Create output AnnData for later use
+output = ad.AnnData(
+    obs=input_train.obs[[]],
+    var=input_train.var[[]],
+    uns={
+        "dataset_id": input_train.uns["dataset_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+del input_train
+
+print("Run KNN smoothing", flush=True)
+X = knn_smooth.knn_smoothing(X, k=10).transpose()
+
+print("Process data", flush=True)
+output.layers["denoised"] = X
+
+print("Writing data", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_PAR_INPUT_TRAIN=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/denoising/methods/magic/.config.vsh.yaml b/target/docker/denoising/methods/magic/.config.vsh.yaml
new file mode 100644
index 0000000000..5bfd6efec2
--- /dev/null
+++ b/target/docker/denoising/methods/magic/.config.vsh.yaml
@@ -0,0 +1,222 @@
+functionality:
+  name: "magic"
+  namespace: "denoising/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The subset of molecules used for the training dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Denoised data"
+      summary: "A denoised dataset as output by a denoising method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "denoised"
+          description: "denoised data"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/denoised.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--solver"
+    description: "Which solver to use."
+    info: null
+    default:
+    - "exact"
+    required: false
+    choices:
+    - "exact"
+    - "approximate"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--norm"
+    description: "Normalization method"
+    info: null
+    default:
+    - "log"
+    required: false
+    choices:
+    - "sqrt"
+    - "log"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--decay"
+    description: "sets decay rate of kernel tails"
+    info: null
+    default:
+    - 1
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--t"
+    description: "power to which the diffusion operator is powered"
+    info: null
+    default:
+    - 3
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/denoising/pancreas"
+    dest: "resources_test/denoising/pancreas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "MAGIC"
+    summary: "MAGIC imputes and denoises scRNA-seq data that is noisy or dropout-prone."
+    description: "MAGIC (Markov Affinity-based Graph Imputation of Cells) is a method\
+      \ for imputation and denoising of noisy or dropout-prone single cell RNA-sequencing\
+      \ data. Given a normalised scRNA-seq expression matrix, it first calculates\
+      \ Euclidean distances between each pair of cells in the dataset, which is then\
+      \ augmented using a Gaussian kernel (function) and row-normalised to give a\
+      \ normalised affinity matrix. A t-step markov process is then calculated, by\
+      \ powering this affinity matrix t times. Finally, the powered affinity matrix\
+      \ is right-multiplied by the normalised data, causing the final imputed values\
+      \ to take the value of a per-gene average weighted by the affinities of cells.\
+      \ The resultant imputed matrix is then rescaled, to more closely match the magnitude\
+      \ of measurements in the normalised (input) matrix."
+    reference: "van2018recovering"
+    documentation_url: "https://github.com/KrishnaswamyLab/MAGIC#readme"
+    repository_url: "https://github.com/KrishnaswamyLab/MAGIC"
+    v1:
+      path: "openproblems/tasks/denoising/methods/magic.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    variants:
+      magic: null
+      magic_approx:
+        solver: "approximate"
+      magic_knn_naive:
+        norm: "log"
+        decay: "none"
+        t: 1
+    preferred_normalization: "counts"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A denoising method."
+      description: "A denoising method to remove noise (i.e. technical artifacts)\
+        \ from a dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pip:
+    - "scprep"
+    - "magic-impute"
+    - "scipy"
+    - "scikit-learn<1.2"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/magic/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/denoising/methods/magic"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/denoising/methods/magic/magic"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/denoising/methods/magic/magic b/target/docker/denoising/methods/magic/magic
new file mode 100755
index 0000000000..01db379d41
--- /dev/null
+++ b/target/docker/denoising/methods/magic/magic
@@ -0,0 +1,1105 @@
+#!/usr/bin/env bash
+
+# magic 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="magic"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "magic 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/denoising/pancreas/train.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/denoising/pancreas/denoised.h5ad"
+  echo ""
+  echo "    --solver"
+  echo "        type: string"
+  echo "        default: exact"
+  echo "        choices: [ exact, approximate ]"
+  echo "        Which solver to use."
+  echo ""
+  echo "    --norm"
+  echo "        type: string"
+  echo "        default: log"
+  echo "        choices: [ sqrt, log ]"
+  echo "        Normalization method"
+  echo ""
+  echo "    --decay"
+  echo "        type: integer"
+  echo "        default: 1"
+  echo "        sets decay rate of kernel tails"
+  echo ""
+  echo "    --t"
+  echo "        type: integer"
+  echo "        default: 3"
+  echo "        power to which the diffusion operator is powered"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scprep" "magic-impute" "scipy" "scikit-learn<1.2"
+
+LABEL org.opencontainers.image.description="Companion container for running component denoising/methods magic"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:37Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-magic-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "magic 2.0.0"
+            exit
+            ;;
+        --input_train)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train=*\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --solver)
+            [ -n "$VIASH_PAR_SOLVER" ] && ViashError Bad arguments for option \'--solver\': \'$VIASH_PAR_SOLVER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SOLVER="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --solver. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --solver=*)
+            [ -n "$VIASH_PAR_SOLVER" ] && ViashError Bad arguments for option \'--solver=*\': \'$VIASH_PAR_SOLVER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SOLVER=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --norm)
+            [ -n "$VIASH_PAR_NORM" ] && ViashError Bad arguments for option \'--norm\': \'$VIASH_PAR_NORM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORM="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --norm. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --norm=*)
+            [ -n "$VIASH_PAR_NORM" ] && ViashError Bad arguments for option \'--norm=*\': \'$VIASH_PAR_NORM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORM=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --decay)
+            [ -n "$VIASH_PAR_DECAY" ] && ViashError Bad arguments for option \'--decay\': \'$VIASH_PAR_DECAY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DECAY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --decay. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --decay=*)
+            [ -n "$VIASH_PAR_DECAY" ] && ViashError Bad arguments for option \'--decay=*\': \'$VIASH_PAR_DECAY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DECAY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --t)
+            [ -n "$VIASH_PAR_T" ] && ViashError Bad arguments for option \'--t\': \'$VIASH_PAR_T\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_T="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --t. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --t=*)
+            [ -n "$VIASH_PAR_T" ] && ViashError Bad arguments for option \'--t=*\': \'$VIASH_PAR_T\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_T=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/denoising/methods/magic:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/denoising/methods/magic:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/denoising/methods/magic:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/denoising/methods/magic:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then
+  ViashError '--input_train' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_SOLVER+x} ]; then
+  VIASH_PAR_SOLVER="exact"
+fi
+if [ -z ${VIASH_PAR_NORM+x} ]; then
+  VIASH_PAR_NORM="log"
+fi
+if [ -z ${VIASH_PAR_DECAY+x} ]; then
+  VIASH_PAR_DECAY="1"
+fi
+if [ -z ${VIASH_PAR_T+x} ]; then
+  VIASH_PAR_T="3"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_DECAY" ]]; then
+  if ! [[ "$VIASH_PAR_DECAY" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--decay' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_T" ]]; then
+  if ! [[ "$VIASH_PAR_T" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--t' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# check whether value is belongs to a set of choices
+if [ ! -z "$VIASH_PAR_SOLVER" ]; then
+  VIASH_PAR_SOLVER_CHOICES=("exact:approximate")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_SOLVER_CHOICES[*]}:" =~ ":$VIASH_PAR_SOLVER:" ]]; then
+    ViashError '--solver' specified value of \'$VIASH_PAR_SOLVER\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+if [ ! -z "$VIASH_PAR_NORM" ]; then
+  VIASH_PAR_NORM_CHOICES=("sqrt:log")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_NORM_CHOICES[*]}:" =~ ":$VIASH_PAR_NORM:" ]]; then
+    ViashError '--norm' specified value of \'$VIASH_PAR_NORM\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN")" )
+  VIASH_PAR_INPUT_TRAIN=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/denoising/methods/magic:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/denoising/methods/magic:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/denoising/methods/magic:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-magic-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+import scprep
+from magic import MAGIC
+import scipy
+
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'solver': $( if [ ! -z ${VIASH_PAR_SOLVER+x} ]; then echo "r'${VIASH_PAR_SOLVER//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'norm': $( if [ ! -z ${VIASH_PAR_NORM+x} ]; then echo "r'${VIASH_PAR_NORM//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'decay': $( if [ ! -z ${VIASH_PAR_DECAY+x} ]; then echo "int(r'${VIASH_PAR_DECAY//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  't': $( if [ ! -z ${VIASH_PAR_T+x} ]; then echo "int(r'${VIASH_PAR_T//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+input_train = ad.read_h5ad(par["input_train"], backed="r")
+
+print("Set normalization method", flush=True)
+if par["norm"] == "sqrt":
+    norm_fn = np.sqrt
+    denorm_fn = np.square
+elif par["norm"] == "log":
+    norm_fn = np.log1p
+    denorm_fn = np.expm1
+else:
+    raise ValueError("Unknown normalization method: " + par["norm"] + ".")
+
+print("Remove unneeded data", flush=True)
+X = input_train.layers["counts"]
+
+# Create output AnnData for later use
+output = ad.AnnData(
+    obs=input_train.obs[[]],
+    var=input_train.var[[]],
+    uns={
+        "dataset_id": input_train.uns["dataset_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+del input_train
+
+print("Normalize data", flush=True)
+X, libsize = scprep.normalize.library_size_normalize(
+    X,
+    rescale=1,
+    return_library_size=True
+)
+X = scprep.utils.matrix_transform(X, norm_fn)
+
+print("Run MAGIC", flush=True)
+magic = MAGIC(
+    solver=par["solver"],
+    decay=par["decay"],
+    t=par["t"],
+    verbose=False,
+)
+X = magic.fit_transform(X, genes="all_genes")
+
+print("Denormalizing data", flush=True)
+X = scprep.utils.matrix_transform(X, denorm_fn)
+X = scprep.utils.matrix_vector_elementwise_multiply(X, libsize, axis=0)
+
+print("Create output AnnData", flush=True)
+output.layers["denoised"] = X
+
+print("Write Data", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_PAR_INPUT_TRAIN=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/denoising/metrics/mse/.config.vsh.yaml b/target/docker/denoising/metrics/mse/.config.vsh.yaml
new file mode 100644
index 0000000000..eb74e8665c
--- /dev/null
+++ b/target/docker/denoising/metrics/mse/.config.vsh.yaml
@@ -0,0 +1,214 @@
+functionality:
+  name: "mse"
+  namespace: "denoising/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The subset of molecules used for the test dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "train_sum"
+          type: "integer"
+          description: "The total number of counts in the training dataset."
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_denoised"
+    info:
+      label: "Denoised data"
+      summary: "A denoised dataset as output by a denoising method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "denoised"
+          description: "denoised data"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/denoised.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Metric score file"
+    info:
+      label: "Score"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+    example:
+    - "resources_test/denoising/pancreas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/denoising/pancreas"
+    dest: "resources_test/denoising/pancreas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "mse"
+      label: "Mean-squared error"
+      summary: "The mean squared error between the denoised counts and the true counts."
+      description: "The mean squared error between the denoised counts of the training\
+        \ dataset and the true counts of the test dataset after reweighing by the\
+        \ train/test ratio"
+      reference: "batson2019molecular"
+      v1:
+        path: "openproblems/tasks/denoising/metrics/mse.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      maximize: false
+      min: 0
+      max: "+.inf"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A denoising metric."
+      description: "A metric for evaluating denoised datasets.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    - "scprep"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/metrics/mse/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/denoising/metrics/mse"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/denoising/metrics/mse/mse"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/denoising/metrics/mse/mse b/target/docker/denoising/metrics/mse/mse
new file mode 100755
index 0000000000..f8fc67151b
--- /dev/null
+++ b/target/docker/denoising/metrics/mse/mse
@@ -0,0 +1,994 @@
+#!/usr/bin/env bash
+
+# mse 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="mse"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "mse 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_test"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/denoising/pancreas/test.h5ad"
+  echo ""
+  echo "    --input_denoised"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/denoising/pancreas/denoised.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/denoising/pancreas/score.h5ad"
+  echo "        Metric score file"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scikit-learn" "scprep"
+
+LABEL org.opencontainers.image.description="Companion container for running component denoising/metrics mse"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:38Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-mse-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "mse 2.0.0"
+            exit
+            ;;
+        --input_test)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test=*)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test=*\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_denoised)
+            [ -n "$VIASH_PAR_INPUT_DENOISED" ] && ViashError Bad arguments for option \'--input_denoised\': \'$VIASH_PAR_INPUT_DENOISED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DENOISED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_denoised. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_denoised=*)
+            [ -n "$VIASH_PAR_INPUT_DENOISED" ] && ViashError Bad arguments for option \'--input_denoised=*\': \'$VIASH_PAR_INPUT_DENOISED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DENOISED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/denoising/metrics/mse:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/denoising/metrics/mse:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/denoising/metrics/mse:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/denoising/metrics/mse:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TEST+x} ]; then
+  ViashError '--input_test' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_DENOISED+x} ]; then
+  ViashError '--input_denoised' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ] && [ ! -e "$VIASH_PAR_INPUT_TEST" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_DENOISED" ] && [ ! -e "$VIASH_PAR_INPUT_DENOISED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DENOISED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST")" )
+  VIASH_PAR_INPUT_TEST=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_DENOISED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_DENOISED")" )
+  VIASH_PAR_INPUT_DENOISED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_DENOISED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/denoising/metrics/mse:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/denoising/metrics/mse:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/denoising/metrics/mse:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-mse-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import scanpy as sc
+import sklearn.metrics
+import scprep
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_denoised': $( if [ ! -z ${VIASH_PAR_INPUT_DENOISED+x} ]; then echo "r'${VIASH_PAR_INPUT_DENOISED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+input_denoised = ad.read_h5ad(par['input_denoised'], backed="r")
+input_test = ad.read_h5ad(par['input_test'], backed="r")
+
+test_data = ad.AnnData(X=input_test.layers["counts"], dtype="float")
+denoised_data = ad.AnnData(X=input_denoised.layers["denoised"], dtype="float")
+
+print("Normalize data", flush=True)
+
+# scaling and transformation
+target_sum = 10000
+
+sc.pp.normalize_total(test_data, target_sum)
+sc.pp.log1p(test_data)
+
+sc.pp.normalize_total(denoised_data, target_sum)
+sc.pp.log1p(denoised_data)
+
+print("Compute mse value", flush=True)
+error = sklearn.metrics.mean_squared_error(
+    scprep.utils.toarray(test_data.X), scprep.utils.toarray(denoised_data.X)
+)
+
+print("Store mse value", flush=True)
+output = ad.AnnData(
+    uns={ key: val for key, val in input_test.uns.items() },
+)
+
+output.uns["method_id"] = input_denoised.uns["method_id"]
+output.uns["metric_ids"] = meta['functionality_name']
+output.uns["metric_values"] = error
+
+print("Write adata to file", flush=True)
+output.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_PAR_INPUT_TEST=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_DENOISED" ]; then
+  VIASH_PAR_INPUT_DENOISED=$(ViashStripAutomount "$VIASH_PAR_INPUT_DENOISED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/denoising/metrics/poisson/.config.vsh.yaml b/target/docker/denoising/metrics/poisson/.config.vsh.yaml
new file mode 100644
index 0000000000..f191e41f49
--- /dev/null
+++ b/target/docker/denoising/metrics/poisson/.config.vsh.yaml
@@ -0,0 +1,213 @@
+functionality:
+  name: "poisson"
+  namespace: "denoising/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The subset of molecules used for the test dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "train_sum"
+          type: "integer"
+          description: "The total number of counts in the training dataset."
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_denoised"
+    info:
+      label: "Denoised data"
+      summary: "A denoised dataset as output by a denoising method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "denoised"
+          description: "denoised data"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/denoised.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Metric score file"
+    info:
+      label: "Score"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+    example:
+    - "resources_test/denoising/pancreas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/denoising/pancreas"
+    dest: "resources_test/denoising/pancreas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "poisson"
+      label: "Poisson Loss"
+      summary: "The Poisson log likelihood of the true counts observed in the distribution\
+        \ of denoised counts"
+      description: "The Poisson log likelihood of observing the true counts of the\
+        \ test dataset given the distribution given in the denoised dataset."
+      reference: "batson2019molecular"
+      v1:
+        path: "openproblems/tasks/denoising/metrics/poisson.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      maximize: false
+      min: 0
+      max: "+.inf"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A denoising metric."
+      description: "A metric for evaluating denoised datasets.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pip:
+    - "scprep"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/metrics/poisson/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/denoising/metrics/poisson"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/denoising/metrics/poisson/poisson"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/denoising/metrics/poisson/poisson b/target/docker/denoising/metrics/poisson/poisson
new file mode 100755
index 0000000000..93fe00b3dc
--- /dev/null
+++ b/target/docker/denoising/metrics/poisson/poisson
@@ -0,0 +1,990 @@
+#!/usr/bin/env bash
+
+# poisson 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="poisson"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "poisson 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_test"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/denoising/pancreas/test.h5ad"
+  echo ""
+  echo "    --input_denoised"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/denoising/pancreas/denoised.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/denoising/pancreas/score.h5ad"
+  echo "        Metric score file"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scprep"
+
+LABEL org.opencontainers.image.description="Companion container for running component denoising/metrics poisson"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:38Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-poisson-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "poisson 2.0.0"
+            exit
+            ;;
+        --input_test)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test=*)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test=*\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_denoised)
+            [ -n "$VIASH_PAR_INPUT_DENOISED" ] && ViashError Bad arguments for option \'--input_denoised\': \'$VIASH_PAR_INPUT_DENOISED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DENOISED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_denoised. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_denoised=*)
+            [ -n "$VIASH_PAR_INPUT_DENOISED" ] && ViashError Bad arguments for option \'--input_denoised=*\': \'$VIASH_PAR_INPUT_DENOISED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DENOISED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/denoising/metrics/poisson:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/denoising/metrics/poisson:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/denoising/metrics/poisson:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/denoising/metrics/poisson:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TEST+x} ]; then
+  ViashError '--input_test' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_DENOISED+x} ]; then
+  ViashError '--input_denoised' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ] && [ ! -e "$VIASH_PAR_INPUT_TEST" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_DENOISED" ] && [ ! -e "$VIASH_PAR_INPUT_DENOISED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DENOISED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST")" )
+  VIASH_PAR_INPUT_TEST=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_DENOISED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_DENOISED")" )
+  VIASH_PAR_INPUT_DENOISED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_DENOISED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/denoising/metrics/poisson:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/denoising/metrics/poisson:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/denoising/metrics/poisson:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-poisson-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import scprep
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_denoised': $( if [ ! -z ${VIASH_PAR_INPUT_DENOISED+x} ]; then echo "r'${VIASH_PAR_INPUT_DENOISED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load Data", flush=True)
+input_denoised = ad.read_h5ad(par['input_denoised'], backed="r")
+input_test = ad.read_h5ad(par['input_test'], backed="r")
+
+test_data = scprep.utils.toarray(input_test.layers["counts"])
+denoised_data = scprep.utils.toarray(input_denoised.layers["denoised"])
+
+print("Compute metric value", flush=True)
+# scaling
+initial_sum = input_test.uns["train_sum"]
+target_sum = test_data.sum()
+denoised_data = denoised_data * target_sum / initial_sum
+
+# from molecular_cross_validation.mcv_sweep import poisson_nll_loss
+# copied from: https://github.com/czbiohub/molecular-cross-validation/blob/master/src/molecular_cross_validation/mcv_sweep.py
+def poisson_nll_loss(y_pred: np.ndarray, y_true: np.ndarray) -> float:
+    return (y_pred - y_true * np.log(y_pred + 1e-6)).mean()
+
+error = poisson_nll_loss(test_data, denoised_data)
+
+print("Store poisson value", flush=True)
+output = ad.AnnData(
+    uns={ key: val for key, val in input_test.uns.items() },
+)
+
+output.uns["method_id"] = input_denoised.uns["method_id"]
+output.uns["metric_ids"] = meta['functionality_name']
+output.uns["metric_values"] = error
+
+print("Write adata to file", flush=True)
+output.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_PAR_INPUT_TEST=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_DENOISED" ]; then
+  VIASH_PAR_INPUT_DENOISED=$(ViashStripAutomount "$VIASH_PAR_INPUT_DENOISED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/denoising/process_dataset/.config.vsh.yaml b/target/docker/denoising/process_dataset/.config.vsh.yaml
new file mode 100644
index 0000000000..8b847b0c36
--- /dev/null
+++ b/target/docker/denoising/process_dataset/.config.vsh.yaml
@@ -0,0 +1,459 @@
+functionality:
+  name: "process_dataset"
+  namespace: "denoising"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Common dataset"
+      summary: "A dataset processed by the common dataset processing pipeline."
+      description: "This dataset contains both raw counts and normalized data matrices,\n\
+        as well as a PCA embedding, HVG selection and a kNN graph."
+      slots:
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "double"
+          name: "pca_variance"
+          description: "The PCA variance objects."
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        varm:
+        - type: "double"
+          name: "pca_loadings"
+          description: "The PCA loadings matrix."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+    example:
+    - "resources_test/common/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_train"
+    info:
+      label: "Training data"
+      summary: "The subset of molecules used for the training dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_test"
+    info:
+      label: "Test data"
+      summary: "The subset of molecules used for the test dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "train_sum"
+          type: "integer"
+          description: "The total number of counts in the training dataset."
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--method"
+    description: "The process method to assign train/test."
+    info: null
+    default:
+    - "mcv"
+    required: false
+    choices:
+    - "mcv"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--train_frac"
+    description: "The fraction the molecules need to be split to train dataset"
+    info: null
+    default:
+    - 0.9
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--seed"
+    description: "A seed for the subsampling."
+    info: null
+    example:
+    - 123
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "helper.py"
+  description: "Split data using molecular cross-validation.\n\nSplits molecules into\
+    \ two (potentially overlapping) groups using a fraction ratio.\nThese are output\
+    \ as two separate AnnData objects.\n"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  info:
+    type: "process_dataset"
+    type_info:
+      label: "Data processor"
+      summary: "A denoising dataset processor."
+      description: "A component for processing a Common Dataset into a task-specific\
+        \ dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    - "scipy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/process_dataset/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/denoising/process_dataset"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/denoising/process_dataset/process_dataset"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/denoising/process_dataset/helper.py b/target/docker/denoising/process_dataset/helper.py
new file mode 100644
index 0000000000..2044ed4c6e
--- /dev/null
+++ b/target/docker/denoising/process_dataset/helper.py
@@ -0,0 +1,55 @@
+# MIT License
+
+# Copyright (c) 2019 Chan Zuckerberg Biohub
+
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+# Copied from https://github.com/czbiohub/molecular-cross-validation/blob/master/src/molecular_cross_validation/util.py
+
+
+from typing import Tuple
+
+import numpy as np
+
+def split_molecules(
+    umis: np.ndarray,
+    data_split: float,
+    overlap_factor: float = 0.0,
+    random_state: np.random.RandomState = None,
+) -> Tuple[np.ndarray, np.ndarray]:
+    """Splits molecules into two (potentially overlapping) groups.
+    :param umis: Array of molecules to split
+    :param data_split: Proportion of molecules to assign to the first group
+    :param overlap_factor: Overlap correction factor, if desired
+    :param random_state: For reproducible sampling
+    :return: umis_X and umis_Y, representing ``split`` and ``~(1 - split)`` counts
+             sampled from the input array
+    """
+    if random_state is None:
+        random_state = np.random.RandomState()
+
+    umis_X_disjoint = random_state.binomial(umis, data_split - overlap_factor)
+    umis_Y_disjoint = random_state.binomial(
+        umis - umis_X_disjoint, (1 - data_split) / (1 - data_split + overlap_factor)
+    )
+    overlap_factor = umis - umis_X_disjoint - umis_Y_disjoint
+    umis_X = umis_X_disjoint + overlap_factor
+    umis_Y = umis_Y_disjoint + overlap_factor
+
+    return umis_X, umis_Y
\ No newline at end of file
diff --git a/target/docker/denoising/process_dataset/process_dataset b/target/docker/denoising/process_dataset/process_dataset
new file mode 100755
index 0000000000..0dcb6a77e0
--- /dev/null
+++ b/target/docker/denoising/process_dataset/process_dataset
@@ -0,0 +1,1110 @@
+#!/usr/bin/env bash
+
+# process_dataset 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="process_dataset"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "process_dataset 2.0.0"
+  echo ""
+  echo "Split data using molecular cross-validation."
+  echo ""
+  echo "Splits molecules into two (potentially overlapping) groups using a fraction"
+  echo "ratio."
+  echo "These are output as two separate AnnData objects."
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/common/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output_train"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/denoising/pancreas/train.h5ad"
+  echo ""
+  echo "    --output_test"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/denoising/pancreas/test.h5ad"
+  echo ""
+  echo "    --method"
+  echo "        type: string"
+  echo "        default: mcv"
+  echo "        choices: [ mcv ]"
+  echo "        The process method to assign train/test."
+  echo ""
+  echo "    --train_frac"
+  echo "        type: double"
+  echo "        default: 0.9"
+  echo "        The fraction the molecules need to be split to train dataset"
+  echo ""
+  echo "    --seed"
+  echo "        type: integer"
+  echo "        example: 123"
+  echo "        A seed for the subsampling."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "numpy" "scipy"
+
+LABEL org.opencontainers.image.description="Companion container for running component denoising process_dataset"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:36Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-process_dataset-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "process_dataset 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_train)
+            [ -n "$VIASH_PAR_OUTPUT_TRAIN" ] && ViashError Bad arguments for option \'--output_train\': \'$VIASH_PAR_OUTPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TRAIN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_train. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_train=*)
+            [ -n "$VIASH_PAR_OUTPUT_TRAIN" ] && ViashError Bad arguments for option \'--output_train=*\': \'$VIASH_PAR_OUTPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TRAIN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_test)
+            [ -n "$VIASH_PAR_OUTPUT_TEST" ] && ViashError Bad arguments for option \'--output_test\': \'$VIASH_PAR_OUTPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TEST="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_test. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_test=*)
+            [ -n "$VIASH_PAR_OUTPUT_TEST" ] && ViashError Bad arguments for option \'--output_test=*\': \'$VIASH_PAR_OUTPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TEST=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --method)
+            [ -n "$VIASH_PAR_METHOD" ] && ViashError Bad arguments for option \'--method\': \'$VIASH_PAR_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_METHOD="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --method. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --method=*)
+            [ -n "$VIASH_PAR_METHOD" ] && ViashError Bad arguments for option \'--method=*\': \'$VIASH_PAR_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_METHOD=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --train_frac)
+            [ -n "$VIASH_PAR_TRAIN_FRAC" ] && ViashError Bad arguments for option \'--train_frac\': \'$VIASH_PAR_TRAIN_FRAC\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TRAIN_FRAC="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --train_frac. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --train_frac=*)
+            [ -n "$VIASH_PAR_TRAIN_FRAC" ] && ViashError Bad arguments for option \'--train_frac=*\': \'$VIASH_PAR_TRAIN_FRAC\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TRAIN_FRAC=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --seed)
+            [ -n "$VIASH_PAR_SEED" ] && ViashError Bad arguments for option \'--seed\': \'$VIASH_PAR_SEED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SEED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --seed. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --seed=*)
+            [ -n "$VIASH_PAR_SEED" ] && ViashError Bad arguments for option \'--seed=*\': \'$VIASH_PAR_SEED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SEED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/denoising/process_dataset:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/denoising/process_dataset:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/denoising/process_dataset:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/denoising/process_dataset:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_TRAIN+x} ]; then
+  ViashError '--output_train' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_TEST+x} ]; then
+  ViashError '--output_test' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_METHOD+x} ]; then
+  VIASH_PAR_METHOD="mcv"
+fi
+if [ -z ${VIASH_PAR_TRAIN_FRAC+x} ]; then
+  VIASH_PAR_TRAIN_FRAC="0.9"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_TRAIN_FRAC" ]]; then
+  if ! [[ "$VIASH_PAR_TRAIN_FRAC" =~ ^[-+]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)([eE][-+]?[0-9]+)?$ ]]; then
+    ViashError '--train_frac' has to be a double. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_SEED" ]]; then
+  if ! [[ "$VIASH_PAR_SEED" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--seed' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# check whether value is belongs to a set of choices
+if [ ! -z "$VIASH_PAR_METHOD" ]; then
+  VIASH_PAR_METHOD_CHOICES=("mcv")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_METHOD_CHOICES[*]}:" =~ ":$VIASH_PAR_METHOD:" ]]; then
+    ViashError '--method' specified value of \'$VIASH_PAR_METHOD\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_TRAIN")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_TRAIN")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TEST" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_TEST")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_TEST")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_TRAIN")" )
+  VIASH_PAR_OUTPUT_TRAIN=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_TRAIN")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_TRAIN" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TEST" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_TEST")" )
+  VIASH_PAR_OUTPUT_TEST=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_TEST")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_TEST" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/denoising/process_dataset:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/denoising/process_dataset:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/denoising/process_dataset:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-process_dataset-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_train': $( if [ ! -z ${VIASH_PAR_OUTPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_OUTPUT_TRAIN//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_test': $( if [ ! -z ${VIASH_PAR_OUTPUT_TEST+x} ]; then echo "r'${VIASH_PAR_OUTPUT_TEST//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'method': $( if [ ! -z ${VIASH_PAR_METHOD+x} ]; then echo "r'${VIASH_PAR_METHOD//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'train_frac': $( if [ ! -z ${VIASH_PAR_TRAIN_FRAC+x} ]; then echo "float(r'${VIASH_PAR_TRAIN_FRAC//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'seed': $( if [ ! -z ${VIASH_PAR_SEED+x} ]; then echo "int(r'${VIASH_PAR_SEED//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from helper import split_molecules
+
+# set random state
+random_state = np.random.RandomState(par['seed'])
+
+print(">> Load Data", flush=True)
+adata = ad.read_h5ad(par["input"])
+
+# remove all layers except for counts
+for key in list(adata.layers.keys()):
+    if key != "counts":
+        del adata.layers[key]
+
+# round counts and convert to int
+counts = np.array(adata.layers["counts"]).round().astype(int)
+
+print(">> process and split data", flush=True)
+train_data, test_data = split_molecules(
+    counts.data, par["train_frac"], 0.0, random_state
+)
+
+X_train = counts.copy()
+X_test = counts.copy()
+X_train.data = train_data
+X_test.data = test_data
+X_train.eliminate_zeros()
+X_test.eliminate_zeros()
+
+# copy adata to train_set, test_set
+output_train = ad.AnnData(
+    layers={"counts": X_train},
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    uns={"dataset_id": adata.uns["dataset_id"]}
+)
+test_uns_keys = ["dataset_id", "dataset_name", "dataset_url", "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism"]
+output_test = ad.AnnData(
+    layers={"counts": X_test},
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    uns={key: adata.uns[key] for key in test_uns_keys}
+)
+
+# add additional information for the train set
+output_test.uns["train_sum"] = X_train.sum()
+
+# Remove no cells that do not have enough reads
+is_missing = np.array(X_train.sum(axis=0) == 0)
+
+output_train = output_train[:, ~is_missing.flatten()]
+output_test = output_test[:, ~is_missing.flatten()]
+
+print(">> Write to file", flush=True)
+output_train.write_h5ad(par["output_train"])
+output_test.write_h5ad(par["output_test"])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN" ]; then
+  VIASH_PAR_OUTPUT_TRAIN=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TEST" ]; then
+  VIASH_PAR_OUTPUT_TEST=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_TEST")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN" ] && [ ! -e "$VIASH_PAR_OUTPUT_TRAIN" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_TRAIN' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TEST" ] && [ ! -e "$VIASH_PAR_OUTPUT_TEST" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_TEST' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/control_methods/random_features/.config.vsh.yaml b/target/docker/dimensionality_reduction/control_methods/random_features/.config.vsh.yaml
new file mode 100644
index 0000000000..9bedcfeeda
--- /dev/null
+++ b/target/docker/dimensionality_reduction/control_methods/random_features/.config.vsh.yaml
@@ -0,0 +1,233 @@
+functionality:
+  name: "random_features"
+  namespace: "dimensionality_reduction/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Test data"
+      summary: "The data for evaluating a dimensionality reduction."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    label: "Random Features"
+    summary: "Negative control by randomly embedding into a 2D space."
+    description: "This method serves as a negative control, where the data is randomly\
+      \ embedded into a two-dimensional space, with no attempt to preserve the original\
+      \ structure."
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/baseline.py"
+      commit: "80b37e7a6aa27df4436f400397564c01276817e0"
+    preferred_normalization: "counts"
+    variants:
+      random_features: null
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "Control methods have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/random_features/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/control_methods/random_features"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/control_methods/random_features/random_features"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/control_methods/random_features/random_features b/target/docker/dimensionality_reduction/control_methods/random_features/random_features
new file mode 100755
index 0000000000..88142fe175
--- /dev/null
+++ b/target/docker/dimensionality_reduction/control_methods/random_features/random_features
@@ -0,0 +1,976 @@
+#!/usr/bin/env bash
+
+# random_features 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="random_features"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "random_features 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction/control_methods random_features"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:38Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-random_features-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "random_features 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/random_features:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/random_features:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/random_features:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/random_features:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/random_features:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/random_features:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/random_features:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-random_features-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+
+print("Create random embedding", flush=True)
+X_emb = np.random.normal(0, 1, (input.shape[0], 2))
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/control_methods/spectral_features/.config.vsh.yaml b/target/docker/dimensionality_reduction/control_methods/spectral_features/.config.vsh.yaml
new file mode 100644
index 0000000000..caba6ccaeb
--- /dev/null
+++ b/target/docker/dimensionality_reduction/control_methods/spectral_features/.config.vsh.yaml
@@ -0,0 +1,274 @@
+functionality:
+  name: "spectral_features"
+  namespace: "dimensionality_reduction/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Test data"
+      summary: "The data for evaluating a dimensionality reduction."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_comps"
+    description: "Number of components to use for the embedding."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "t"
+    description: "Number to power the eigenvalues by."
+    info: null
+    default:
+    - 1
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "n_retries"
+    description: "Number of times to retry if the embedding fails, each time adding\
+      \ noise."
+    info: null
+    default:
+    - 1
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    label: "Spectral Features"
+    summary: "Positive control by Use 1000-dimensional diffusions maps as an embedding."
+    description: "This serves as a positive control since it uses 1000-dimensional\
+      \ diffusions maps as an embedding"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      spectral_features: null
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "Control methods have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "umap-learn"
+    - "scipy"
+    - "numpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/spectral_features/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/control_methods/spectral_features"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/control_methods/spectral_features/spectral_features"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/control_methods/spectral_features/spectral_features b/target/docker/dimensionality_reduction/control_methods/spectral_features/spectral_features
new file mode 100755
index 0000000000..b3b8c5c97e
--- /dev/null
+++ b/target/docker/dimensionality_reduction/control_methods/spectral_features/spectral_features
@@ -0,0 +1,1088 @@
+#!/usr/bin/env bash
+
+# spectral_features 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="spectral_features"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "spectral_features 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --n_comps"
+  echo "        type: integer"
+  echo "        default: 1000"
+  echo "        Number of components to use for the embedding."
+  echo ""
+  echo "    t"
+  echo "        type: integer"
+  echo "        default: 1"
+  echo "        Number to power the eigenvalues by."
+  echo ""
+  echo "    n_retries"
+  echo "        type: integer"
+  echo "        default: 1"
+  echo "        Number of times to retry if the embedding fails, each time adding noise."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "umap-learn" "scipy" "numpy"
+
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction/control_methods spectral_features"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:38Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-spectral_features-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "spectral_features 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_comps)
+            [ -n "$VIASH_PAR_N_COMPS" ] && ViashError Bad arguments for option \'--n_comps\': \'$VIASH_PAR_N_COMPS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_COMPS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_comps. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_comps=*)
+            [ -n "$VIASH_PAR_N_COMPS" ] && ViashError Bad arguments for option \'--n_comps=*\': \'$VIASH_PAR_N_COMPS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_COMPS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/spectral_features:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/spectral_features:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/spectral_features:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/spectral_features:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# storing leftover values in positionals
+if [[ $# -gt 0 ]]; then
+  VIASH_PAR_T="$1"
+  shift 1
+fi
+if [[ $# -gt 0 ]]; then
+  VIASH_PAR_N_RETRIES="$1"
+  shift 1
+fi
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_COMPS+x} ]; then
+  VIASH_PAR_N_COMPS="1000"
+fi
+if [ -z ${VIASH_PAR_T+x} ]; then
+  VIASH_PAR_T="1"
+fi
+if [ -z ${VIASH_PAR_N_RETRIES+x} ]; then
+  VIASH_PAR_N_RETRIES="1"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_COMPS" ]]; then
+  if ! [[ "$VIASH_PAR_N_COMPS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_comps' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_T" ]]; then
+  if ! [[ "$VIASH_PAR_T" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 't' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_RETRIES" ]]; then
+  if ! [[ "$VIASH_PAR_N_RETRIES" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'n_retries' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/spectral_features:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/spectral_features:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/spectral_features:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-spectral_features-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import umap
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_comps': $( if [ ! -z ${VIASH_PAR_N_COMPS+x} ]; then echo "int(r'${VIASH_PAR_N_COMPS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  't': $( if [ ! -z ${VIASH_PAR_T+x} ]; then echo "int(r'${VIASH_PAR_T//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_retries': $( if [ ! -z ${VIASH_PAR_N_RETRIES+x} ]; then echo "int(r'${VIASH_PAR_N_RETRIES//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+def diffusion_map(graph, n_comps, t, n_retries):
+    import numpy as np
+    import scipy.sparse.linalg
+
+    diag_data = np.asarray(graph.sum(axis=0))
+    identity = scipy.sparse.identity(graph.shape[0], dtype=np.float64)
+    diag = scipy.sparse.spdiags(
+        1.0 / np.sqrt(diag_data), 0, graph.shape[0], graph.shape[0]
+    )
+    laplacian = identity - diag * graph * diag
+    num_lanczos_vectors = max(2 * n_comps + 1, int(np.sqrt(graph.shape[0])))
+    try:
+        eigenvalues, eigenvectors = scipy.sparse.linalg.eigsh(
+            laplacian,
+            n_comps,
+            which="SM",
+            ncv=num_lanczos_vectors,
+            tol=1e-4,
+            v0=np.ones(laplacian.shape[0]),
+            maxiter=graph.shape[0] * 5,
+        )
+        return (eigenvalues**t) * eigenvectors
+    except scipy.sparse.linalg.ArpackNoConvergence:
+        if n_retries > 0:
+            # add some noise and try again
+            graph_rand = graph.copy().tocoo()
+            graph_rand.row = np.random.choice(
+                graph_rand.shape[0], len(graph_rand.row), replace=True
+            )
+            graph_rand.data *= 0.01
+            return diffusion_map(
+                graph + graph_rand, n_comps, t, n_retries=n_retries - 1
+            )
+        else:
+            raise
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+
+print("Create high dimensionally embedding with all features", flush=True)
+
+n_comps = min(par["n_comps"], min(input.shape) - 2)
+
+graph = umap.UMAP(transform_mode="graph").fit_transform(input.layers["normalized"])
+
+X_emb = diffusion_map(graph, n_comps, t=par["t"], n_retries=par["n_retries"])
+
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/control_methods/true_features/.config.vsh.yaml b/target/docker/dimensionality_reduction/control_methods/true_features/.config.vsh.yaml
new file mode 100644
index 0000000000..babd6231ac
--- /dev/null
+++ b/target/docker/dimensionality_reduction/control_methods/true_features/.config.vsh.yaml
@@ -0,0 +1,232 @@
+functionality:
+  name: "true_features"
+  namespace: "dimensionality_reduction/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Test data"
+      summary: "The data for evaluating a dimensionality reduction."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    label: "True Features"
+    summary: "Positive control by retaining the dimensionality without loss of information."
+    description: "This serves as a positive control since the original high-dimensional\
+      \ data is retained as is, without any loss of information"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      true_features: null
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "Control methods have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/true_features/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/control_methods/true_features"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/control_methods/true_features/true_features"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/control_methods/true_features/true_features b/target/docker/dimensionality_reduction/control_methods/true_features/true_features
new file mode 100755
index 0000000000..bba147db41
--- /dev/null
+++ b/target/docker/dimensionality_reduction/control_methods/true_features/true_features
@@ -0,0 +1,975 @@
+#!/usr/bin/env bash
+
+# true_features 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="true_features"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "true_features 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction/control_methods true_features"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:38Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-true_features-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "true_features 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/true_features:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/true_features:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/true_features:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/true_features:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/true_features:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/true_features:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/control_methods/true_features:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-true_features-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+
+print("Create high dimensionally embedding with all features", flush=True)
+X_emb = input.layers["normalized"].toarray()
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/methods/densmap/.config.vsh.yaml b/target/docker/dimensionality_reduction/methods/densmap/.config.vsh.yaml
new file mode 100644
index 0000000000..11f40cec1a
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/densmap/.config.vsh.yaml
@@ -0,0 +1,203 @@
+functionality:
+  name: "densmap"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to subset to. If not specified,\
+      \ the input matrix will not be subset."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pca_dims"
+    description: "Number of PCA dimensions to use. If not specified, no PCA will be\
+      \ performed."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "densMAP"
+    summary: "Modified UMAP with preservation of local density information"
+    description: "A modification of UMAP that adds an extra cost term in order to\
+      \ preserve information about the relative local density of the data. It is performed\
+      \ on the same inputs as UMAP."
+    reference: "narayan2021assessing"
+    repository_url: "https://github.com/lmcinnes/umap"
+    documentation_url: "https://github.com/lmcinnes/umap#readme"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/umap.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      densmap_logCP10k: null
+      densmap_pca_logCP10k:
+        n_pca_dims: 50
+      densmap_logCP10k_1kHVG:
+        n_hvg: 1000
+      densmap_pca_logCP10k_1kHVG:
+        n_pca_dims: 50
+        n_hvg: 1000
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "umap-learn"
+    - "pynndescent==0.5.11"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/densmap/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/densmap"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/densmap/densmap"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/methods/densmap/densmap b/target/docker/dimensionality_reduction/methods/densmap/densmap
new file mode 100755
index 0000000000..5c3ac7cfce
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/densmap/densmap
@@ -0,0 +1,1011 @@
+#!/usr/bin/env bash
+
+# densmap 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="densmap"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "densmap 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --n_hvg"
+  echo "        type: integer"
+  echo "        Number of highly variable genes to subset to. If not specified, the"
+  echo "        input matrix will not be subset."
+  echo ""
+  echo "    --n_pca_dims"
+  echo "        type: integer"
+  echo "        Number of PCA dimensions to use. If not specified, no PCA will be"
+  echo "        performed."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "umap-learn" "pynndescent==0.5.11"
+
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction/methods densmap"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:36Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-densmap-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "densmap 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hvg)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hvg=*)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg=*\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_pca_dims)
+            [ -n "$VIASH_PAR_N_PCA_DIMS" ] && ViashError Bad arguments for option \'--n_pca_dims\': \'$VIASH_PAR_N_PCA_DIMS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCA_DIMS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_pca_dims. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_pca_dims=*)
+            [ -n "$VIASH_PAR_N_PCA_DIMS" ] && ViashError Bad arguments for option \'--n_pca_dims=*\': \'$VIASH_PAR_N_PCA_DIMS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCA_DIMS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/densmap:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/densmap:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/densmap:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/densmap:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_N_HVG" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hvg' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_PCA_DIMS" ]]; then
+  if ! [[ "$VIASH_PAR_N_PCA_DIMS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_pca_dims' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/densmap:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/densmap:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/densmap:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-densmap-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+from umap import UMAP
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_pca_dims': $( if [ ! -z ${VIASH_PAR_N_PCA_DIMS+x} ]; then echo "int(r'${VIASH_PAR_N_PCA_DIMS//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+if par["n_pca_dims"]:
+    print("Apply PCA to normalized data", flush=True)
+    umap_input = sc.tl.pca(
+        X_mat,
+        n_comps=par["n_pca_dims"],
+        svd_solver="arpack"
+    )
+else:
+    print("Use normalized data as input for UMAP", flush=True)
+    umap_input = X_mat
+
+print("Run densMAP", flush=True)
+X_emb = UMAP(densmap=True, random_state=42).fit_transform(umap_input)
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/methods/diffusion_map/.config.vsh.yaml b/target/docker/dimensionality_reduction/methods/diffusion_map/.config.vsh.yaml
new file mode 100644
index 0000000000..6a67810dda
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/diffusion_map/.config.vsh.yaml
@@ -0,0 +1,182 @@
+functionality:
+  name: "diffusion_map"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_dim"
+    description: "Number of dimensions."
+    info: null
+    default:
+    - 3
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Diffusion Map"
+    summary: "Finding meaningful geometric descriptions of datasets using diffusion\
+      \ maps."
+    description: "Implements diffusion map method of data parametrization, including\
+      \ creation and visualization of diffusion map, clustering with diffusion K-means\
+      \ and regression using adaptive regression model."
+    reference: "coifman2006diffusion"
+    documentation_url: "https://bioconductor.org/packages/release/bioc/html/destiny.html"
+    repository_url: "https://github.com/theislab/destiny"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/diffusion_map.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    bioc:
+    - "destiny"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/diffusion_map/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/diffusion_map"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/diffusion_map/diffusion_map"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/methods/diffusion_map/diffusion_map b/target/docker/dimensionality_reduction/methods/diffusion_map/diffusion_map
new file mode 100755
index 0000000000..6f78cf50e7
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/diffusion_map/diffusion_map
@@ -0,0 +1,988 @@
+#!/usr/bin/env bash
+
+# diffusion_map 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="diffusion_map"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "diffusion_map 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --n_dim"
+  echo "        type: integer"
+  echo "        default: 3"
+  echo "        Number of dimensions."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")' && \
+  Rscript -e 'if (!requireNamespace("destiny", quietly = TRUE)) BiocManager::install("destiny")'
+
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction/methods diffusion_map"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:37Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-diffusion_map-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "diffusion_map 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_dim)
+            [ -n "$VIASH_PAR_N_DIM" ] && ViashError Bad arguments for option \'--n_dim\': \'$VIASH_PAR_N_DIM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_DIM="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_dim. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_dim=*)
+            [ -n "$VIASH_PAR_N_DIM" ] && ViashError Bad arguments for option \'--n_dim=*\': \'$VIASH_PAR_N_DIM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_DIM=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/diffusion_map:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/diffusion_map:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/diffusion_map:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/diffusion_map:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_DIM+x} ]; then
+  VIASH_PAR_N_DIM="3"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_DIM" ]]; then
+  if ! [[ "$VIASH_PAR_N_DIM" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_dim' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/diffusion_map:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/diffusion_map:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/diffusion_map:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-diffusion_map-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("diffusionMap", quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "n_dim" = $( if [ ! -z ${VIASH_PAR_N_DIM+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_DIM" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading input files\\n")
+input <- anndata::read_h5ad(par\$input)
+
+cat("Running destiny diffusion map\\n")
+# create SummarizedExperiment object
+sce <- SingleCellExperiment::SingleCellExperiment(
+  assays = list(
+    logcounts = t(as.matrix(input\$layers[["normalized"]]))
+  )
+)
+dm <- destiny::DiffusionMap(sce)
+X_emb <- destiny::eigenvectors(dm)[, seq_len(par\$n_dim)]
+
+cat("Write output AnnData to file\\n")
+output <- anndata::AnnData(
+  uns = list(
+    dataset_id = input\$uns[["dataset_id"]],
+    normalization_id = input\$uns[["normalization_id"]],
+    method_id = meta\$functionality_name
+  ),
+  obsm = list(
+    X_emb = X_emb
+  ),
+  shape = input\$shape
+)
+output\$write_h5ad(par\$output, compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/methods/ivis/.config.vsh.yaml b/target/docker/dimensionality_reduction/methods/ivis/.config.vsh.yaml
new file mode 100644
index 0000000000..6501ca5d8a
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/ivis/.config.vsh.yaml
@@ -0,0 +1,200 @@
+functionality:
+  name: "ivis"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pca_dims"
+    description: "Number of principal components of PCA to use."
+    info: null
+    default:
+    - 50
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to subset to. If not specified,\
+      \ the input matrix will not be subset."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "ivis"
+    summary: "Structure-preserving dimensionality reduction using a siamese neural\
+      \ network trained on triplets."
+    description: "ivis is a machine learning library for reducing dimensionality of\
+      \ very large datasets using Siamese Neural Networks.\nivis preserves global\
+      \ data structures in a low-dimensional space, adds new data points to existing\
+      \ embeddings using\na parametric mapping function, and scales linearly to millions\
+      \ of observations.\n"
+    reference: "szubert2019structurepreserving"
+    repository_url: "https://github.com/beringresearch/ivis"
+    documentation_url: "https://github.com/beringresearch/ivis#readme"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/ivis.py"
+      commit: "93d2161a08da3edf249abedff5111fb5ce527552"
+    preferred_normalization: "log_cp10k"
+    variants:
+      ivis_logCPM_1kHVG: null
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "ivis[cpu]"
+    - "tensorflow<2.16"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/ivis/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/ivis"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/ivis/ivis"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/methods/ivis/ivis b/target/docker/dimensionality_reduction/methods/ivis/ivis
new file mode 100755
index 0000000000..fa4435cea4
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/ivis/ivis
@@ -0,0 +1,1023 @@
+#!/usr/bin/env bash
+
+# ivis 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="ivis"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "ivis 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --n_pca_dims"
+  echo "        type: integer"
+  echo "        default: 50"
+  echo "        Number of principal components of PCA to use."
+  echo ""
+  echo "    --n_hvg"
+  echo "        type: integer"
+  echo "        default: 1000"
+  echo "        Number of highly variable genes to subset to. If not specified, the"
+  echo "        input matrix will not be subset."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "ivis[cpu]" "tensorflow<2.16"
+
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction/methods ivis"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:37Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-ivis-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "ivis 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_pca_dims)
+            [ -n "$VIASH_PAR_N_PCA_DIMS" ] && ViashError Bad arguments for option \'--n_pca_dims\': \'$VIASH_PAR_N_PCA_DIMS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCA_DIMS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_pca_dims. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_pca_dims=*)
+            [ -n "$VIASH_PAR_N_PCA_DIMS" ] && ViashError Bad arguments for option \'--n_pca_dims=*\': \'$VIASH_PAR_N_PCA_DIMS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCA_DIMS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hvg)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hvg=*)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg=*\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/ivis:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/ivis:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/ivis:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/ivis:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_PCA_DIMS+x} ]; then
+  VIASH_PAR_N_PCA_DIMS="50"
+fi
+if [ -z ${VIASH_PAR_N_HVG+x} ]; then
+  VIASH_PAR_N_HVG="1000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_PCA_DIMS" ]]; then
+  if ! [[ "$VIASH_PAR_N_PCA_DIMS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_pca_dims' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_N_HVG" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hvg' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/ivis:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/ivis:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/ivis:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-ivis-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import scanpy as sc
+from ivis import Ivis
+
+# todo: allow using gpus instead!
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_pca_dims': $( if [ ! -z ${VIASH_PAR_N_PCA_DIMS+x} ]; then echo "int(r'${VIASH_PAR_N_PCA_DIMS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+print(f"Running PCA with {par['n_pca_dims']} dimensions", flush=True)
+X_pca = sc.tl.pca(X_mat, n_comps=par["n_pca_dims"], svd_solver="arpack")
+
+print("Run ivis", flush=True)
+# parameters taken from:
+# https://bering-ivis.readthedocs.io/en/latest/scanpy_singlecell.html#reducing-dimensionality-using-ivis
+ivis = Ivis(
+    k=15,
+    model="maaten",
+    n_epochs_without_progress=5,
+    verbose=0,
+    embedding_dims=2,
+)
+X_emb = ivis.fit_transform(X_pca)
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/methods/lmds/.config.vsh.yaml b/target/docker/dimensionality_reduction/methods/lmds/.config.vsh.yaml
new file mode 100644
index 0000000000..7437f476f6
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/lmds/.config.vsh.yaml
@@ -0,0 +1,214 @@
+functionality:
+  name: "lmds"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_dim"
+    description: "Number of dimensions."
+    info: null
+    default:
+    - 2
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_landmarks"
+    description: "Number of landmarks."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--distance_method"
+    description: "Number of clusters to be estimated over the input dataset."
+    info: null
+    default:
+    - "pearson"
+    required: false
+    choices:
+    - "euclidean"
+    - "pearson"
+    - "spearman"
+    - "cosine"
+    - "chisquared"
+    - "hamming"
+    - "kullback"
+    - "manhattan"
+    - "maximum"
+    - "canberra"
+    - "minkowski"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "LMDS"
+    summary: "Landmark Multi-Dimensional Scaling"
+    description: "Landmark Multi-Dimensional Scaling (LMDS) is a method for dimensionality\
+      \ reduction that is based on the concept of multi-dimensional scaling.\nLMDS\
+      \ is a non-linear dimensionality reduction method that is based on the concept\
+      \ of multi-dimensional scaling.\n"
+    preferred_normalization: "log_cp10k"
+    reference: "saelens2019comparison"
+    documentation_url: "https://dynverse.org/lmds/"
+    repository_url: "https://github.com/dynverse/lmds"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "Matrix"
+    - "lmds"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/lmds/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/lmds"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/lmds/lmds"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/methods/lmds/lmds b/target/docker/dimensionality_reduction/methods/lmds/lmds
new file mode 100755
index 0000000000..63c5578ca5
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/lmds/lmds
@@ -0,0 +1,1049 @@
+#!/usr/bin/env bash
+
+# lmds 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="lmds"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "lmds 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --n_dim"
+  echo "        type: integer"
+  echo "        default: 2"
+  echo "        Number of dimensions."
+  echo ""
+  echo "    --n_landmarks"
+  echo "        type: integer"
+  echo "        default: 1000"
+  echo "        Number of landmarks."
+  echo ""
+  echo "    --distance_method"
+  echo "        type: string"
+  echo "        default: pearson"
+  echo "        choices: [ euclidean, pearson, spearman, cosine, chisquared, hamming,"
+  echo "kullback, manhattan, maximum, canberra, minkowski ]"
+  echo "        Number of clusters to be estimated over the input dataset."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_cran(c("Matrix", "lmds"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction/methods lmds"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:37Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-lmds-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "lmds 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_dim)
+            [ -n "$VIASH_PAR_N_DIM" ] && ViashError Bad arguments for option \'--n_dim\': \'$VIASH_PAR_N_DIM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_DIM="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_dim. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_dim=*)
+            [ -n "$VIASH_PAR_N_DIM" ] && ViashError Bad arguments for option \'--n_dim=*\': \'$VIASH_PAR_N_DIM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_DIM=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_landmarks)
+            [ -n "$VIASH_PAR_N_LANDMARKS" ] && ViashError Bad arguments for option \'--n_landmarks\': \'$VIASH_PAR_N_LANDMARKS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_LANDMARKS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_landmarks. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_landmarks=*)
+            [ -n "$VIASH_PAR_N_LANDMARKS" ] && ViashError Bad arguments for option \'--n_landmarks=*\': \'$VIASH_PAR_N_LANDMARKS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_LANDMARKS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --distance_method)
+            [ -n "$VIASH_PAR_DISTANCE_METHOD" ] && ViashError Bad arguments for option \'--distance_method\': \'$VIASH_PAR_DISTANCE_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DISTANCE_METHOD="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --distance_method. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --distance_method=*)
+            [ -n "$VIASH_PAR_DISTANCE_METHOD" ] && ViashError Bad arguments for option \'--distance_method=*\': \'$VIASH_PAR_DISTANCE_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DISTANCE_METHOD=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/lmds:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/lmds:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/lmds:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/lmds:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_DIM+x} ]; then
+  VIASH_PAR_N_DIM="2"
+fi
+if [ -z ${VIASH_PAR_N_LANDMARKS+x} ]; then
+  VIASH_PAR_N_LANDMARKS="1000"
+fi
+if [ -z ${VIASH_PAR_DISTANCE_METHOD+x} ]; then
+  VIASH_PAR_DISTANCE_METHOD="pearson"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_DIM" ]]; then
+  if ! [[ "$VIASH_PAR_N_DIM" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_dim' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_LANDMARKS" ]]; then
+  if ! [[ "$VIASH_PAR_N_LANDMARKS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_landmarks' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# check whether value is belongs to a set of choices
+if [ ! -z "$VIASH_PAR_DISTANCE_METHOD" ]; then
+  VIASH_PAR_DISTANCE_METHOD_CHOICES=("euclidean:pearson:spearman:cosine:chisquared:hamming:kullback:manhattan:maximum:canberra:minkowski")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_DISTANCE_METHOD_CHOICES[*]}:" =~ ":$VIASH_PAR_DISTANCE_METHOD:" ]]; then
+    ViashError '--distance_method' specified value of \'$VIASH_PAR_DISTANCE_METHOD\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/lmds:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/lmds:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/lmds:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-lmds-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("lmds", quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "n_dim" = $( if [ ! -z ${VIASH_PAR_N_DIM+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_DIM" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "n_landmarks" = $( if [ ! -z ${VIASH_PAR_N_LANDMARKS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_LANDMARKS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "distance_method" = $( if [ ! -z ${VIASH_PAR_DISTANCE_METHOD+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_DISTANCE_METHOD" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading input files\\n")
+input <- anndata::read_h5ad(par\$input)
+
+# TODO: if we wanted to, we could compute the distance
+# matrix in batches. This would be useful for large datasets.
+cat("Running LMDS\\n")
+X_emb <- lmds::lmds(
+  input\$layers[["normalized"]],
+  ndim = par\$n_dim,
+  num_landmarks = par\$n_landmarks,
+  distance_method = par\$distance_method
+)
+
+cat("Write output AnnData to file\\n")
+output <- anndata::AnnData(
+  uns = list(
+    dataset_id = input\$uns[["dataset_id"]],
+    method_id = meta\$functionality_name,
+    normalization_id = input\$uns[["normalization_id"]]
+  ),
+  obsm = list(
+    X_emb = X_emb
+  ),
+  shape = input\$shape
+)
+output\$write_h5ad(par\$output, compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/methods/neuralee/.config.vsh.yaml b/target/docker/dimensionality_reduction/methods/neuralee/.config.vsh.yaml
new file mode 100644
index 0000000000..9709a438d8
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/neuralee/.config.vsh.yaml
@@ -0,0 +1,217 @@
+functionality:
+  name: "neuralee"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_iter"
+    description: "Number of iterations."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to subset to. If not specified,\
+      \ the input matrix will not be subset."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean"
+    name: "--normalize"
+    description: "Whether to perform own normalization"
+    info: null
+    default:
+    - false
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "NeuralEE"
+    summary: "Non-linear method that uses a neural network to preserve pairwise distances\
+      \ between data points in a high-dimensional space."
+    description: "A neural network implementation of elastic embedding. It is a\n\
+      non-linear method that preserves pairwise distances between data points.\nNeuralEE\
+      \ uses a neural network to optimize an objective function that\nmeasures the\
+      \ difference between pairwise distances in the original\nhigh-dimensional space\
+      \ and the two-dimensional space. It is computed on both\nthe recommended input\
+      \ from the package authors of 500 HVGs selected from a\nlogged expression matrix\
+      \ (without sequencing depth scaling) and the default\nlogCPM matrix with 1000\
+      \ HVGs.\n"
+    reference: "xiong2020neuralee"
+    repository_url: "https://github.com/HiBearME/NeuralEE"
+    documentation_url: "https://github.com/HiBearME/NeuralEE#readme"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/neuralee.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      neuralee_default:
+        normalize: true
+        n_hvg: 500
+      neuralee_logCP10k_1kHVG:
+        normalize: false
+        n_hvg: 1000
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "torch"
+    - "git+https://github.com/michalk8/neuralee@8946abf"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/neuralee/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/neuralee"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/neuralee/neuralee"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/methods/neuralee/neuralee b/target/docker/dimensionality_reduction/methods/neuralee/neuralee
new file mode 100755
index 0000000000..076f804375
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/neuralee/neuralee
@@ -0,0 +1,1065 @@
+#!/usr/bin/env bash
+
+# neuralee 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="neuralee"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "neuralee 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --n_iter"
+  echo "        type: integer"
+  echo "        Number of iterations."
+  echo ""
+  echo "    --n_hvg"
+  echo "        type: integer"
+  echo "        default: 1000"
+  echo "        Number of highly variable genes to subset to. If not specified, the"
+  echo "        input matrix will not be subset."
+  echo ""
+  echo "    --normalize"
+  echo "        type: boolean"
+  echo "        default: false"
+  echo "        Whether to perform own normalization"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "torch" "git+https://github.com/michalk8/neuralee@8946abf"
+
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction/methods neuralee"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:37Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-neuralee-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "neuralee 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_iter)
+            [ -n "$VIASH_PAR_N_ITER" ] && ViashError Bad arguments for option \'--n_iter\': \'$VIASH_PAR_N_ITER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_ITER="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_iter. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_iter=*)
+            [ -n "$VIASH_PAR_N_ITER" ] && ViashError Bad arguments for option \'--n_iter=*\': \'$VIASH_PAR_N_ITER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_ITER=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hvg)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hvg=*)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg=*\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --normalize)
+            [ -n "$VIASH_PAR_NORMALIZE" ] && ViashError Bad arguments for option \'--normalize\': \'$VIASH_PAR_NORMALIZE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORMALIZE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --normalize. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --normalize=*)
+            [ -n "$VIASH_PAR_NORMALIZE" ] && ViashError Bad arguments for option \'--normalize=*\': \'$VIASH_PAR_NORMALIZE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORMALIZE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/neuralee:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/neuralee:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/neuralee:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/neuralee:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_HVG+x} ]; then
+  VIASH_PAR_N_HVG="1000"
+fi
+if [ -z ${VIASH_PAR_NORMALIZE+x} ]; then
+  VIASH_PAR_NORMALIZE="false"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_ITER" ]]; then
+  if ! [[ "$VIASH_PAR_N_ITER" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_iter' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_N_HVG" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hvg' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_NORMALIZE" ]]; then
+  if ! [[ "$VIASH_PAR_NORMALIZE" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--normalize' has to be a boolean. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/neuralee:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/neuralee:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/neuralee:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-neuralee-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import torch
+from neuralee.embedding import NeuralEE
+from neuralee.dataset import GeneExpressionDataset
+
+# todo: allow gpu
+device = torch.device("cpu")
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_iter': $( if [ ! -z ${VIASH_PAR_N_ITER+x} ]; then echo "int(r'${VIASH_PAR_N_ITER//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'normalize': $( if [ ! -z ${VIASH_PAR_NORMALIZE+x} ]; then echo "r'${VIASH_PAR_NORMALIZE//\'/\'\"\'\"r\'}'.lower() == 'true'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+
+if par["normalize"]:
+    print("Performing own normalization", flush=True)
+    # perform own normalization based on the "recommended" preprocessing taken from example notebooks, e.g.:
+    # https://github.com/HiBearME/NeuralEE/blob/master/tests/notebooks/retina_dataset.ipynb
+    dataset = GeneExpressionDataset(input.layers["counts"])
+    dataset.log_shift()
+    if par["n_hvg"]:
+        dataset.subsample_genes(par["n_hvg"])
+    dataset.standardscale()
+
+else:
+    X_mat = input.layers["normalized"]
+
+    if par["n_hvg"]:
+        print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+        idx = input.var["hvg_score"].to_numpy().argsort()[-par["n_hvg"]:]
+        X_mat = X_mat[:, idx]
+    
+    print("Using pre-normalized data", flush=True)
+    dataset = GeneExpressionDataset(X_mat)
+
+
+# estimate the affinity matrix
+batch_size = min(1000, input.n_obs)
+print(f"Use {batch_size} cells as batch to estimate the affinity matrix", flush=True)
+dataset.affinity_split(N_small=batch_size)
+
+print("Create NeuralEE object", flush=True)
+NEE = NeuralEE(dataset, d=2, device=device)
+fine_tune_kwargs = dict(verbose=False)
+
+if par["n_iter"]:
+    fine_tune_kwargs["maxit"] = par["n_iter"]
+
+print("Run NeuralEE", flush=True)
+res = NEE.fine_tune(**fine_tune_kwargs)
+
+X_emb = res["X"].detach().cpu().numpy()
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/methods/pca/.config.vsh.yaml b/target/docker/dimensionality_reduction/methods/pca/.config.vsh.yaml
new file mode 100644
index 0000000000..fe1afba4bf
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/pca/.config.vsh.yaml
@@ -0,0 +1,188 @@
+functionality:
+  name: "pca"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to subset to. If not specified,\
+      \ the input matrix will not be subset."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "PCA"
+    summary: "A linear method that finds orthogonal directions to compute the two-dimensional\
+      \ embedding."
+    description: "Principal Component Analysis is a linear method that finds orthogonal\n\
+      directions in the data that capture the most variance. The first two\nprincipal\
+      \ components are chosen as the two-dimensional embedding. We select\nonly the\
+      \ first two principal components as the two-dimensional embedding. PCA\nis calculated\
+      \ on the logCPM expression matrix with and without selecting 1000\nHVGs.\n"
+    reference: "pearson1901pca"
+    repository_url: "https://github.com/scikit-learn/scikit-learn"
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/pca.py"
+      commit: "154ccb9fd99113f3d28d9c3f139194539a0290f9"
+    preferred_normalization: "log_cp10k"
+    variants:
+      pca_logCP10k: null
+      pca_logCP10k_1kHVG:
+        n_hvg: 1000
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scanpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/pca/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/pca"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/pca/pca"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/methods/pca/pca b/target/docker/dimensionality_reduction/methods/pca/pca
new file mode 100755
index 0000000000..e105820210
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/pca/pca
@@ -0,0 +1,976 @@
+#!/usr/bin/env bash
+
+# pca 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="pca"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "pca 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --n_hvg"
+  echo "        type: integer"
+  echo "        Number of highly variable genes to subset to. If not specified, the"
+  echo "        input matrix will not be subset."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scanpy"
+
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction/methods pca"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:37Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-pca-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "pca 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hvg)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hvg=*)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg=*\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/pca:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/pca:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/pca:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/pca:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_N_HVG" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hvg' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/pca:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/pca:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/pca:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-pca-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+print(f"Running PCA", flush=True)
+X_emb = sc.tl.pca(X_mat, n_comps=2, svd_solver="arpack")[:, :2]
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/methods/phate/.config.vsh.yaml b/target/docker/dimensionality_reduction/methods/phate/.config.vsh.yaml
new file mode 100644
index 0000000000..41b46b3ca2
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/phate/.config.vsh.yaml
@@ -0,0 +1,219 @@
+functionality:
+  name: "phate"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pca_dims"
+    description: "Number of principal components of PCA to use."
+    info: null
+    default:
+    - 50
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to subset to. If not specified,\
+      \ the input matrix will not be subset."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--gamma"
+    description: "Gamma value"
+    info: null
+    default:
+    - 1.0
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "PHATE"
+    summary: "Preservating trajectories in a dataset by using heat diffusion potential."
+    description: "PHATE or \"Potential of Heat - diffusion for Affinity - based Transition\n\
+      Embedding\" uses the potential of heat diffusion to preserve trajectories in\
+      \ a\ndataset via a diffusion process. It is an affinity - based method that\n\
+      creates an embedding by finding the dominant eigenvalues of a Markov\ntransition\
+      \ matrix. We evaluate several variants including using the\nrecommended square\
+      \ - root transformed CPM matrix as input, this input with\nthe gamma parameter\
+      \ set to zero and the normal logCPM transformed matrix with\nand without HVG\
+      \ selection.\n"
+    reference: "moon2019visualizing"
+    repository_url: "https://github.com/KrishnaswamyLab/PHATE"
+    documentation_url: "https://github.com/KrishnaswamyLab/PHATE#readme"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/phate.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "sqrt_cp10k"
+    variants:
+      phate_default: null
+      phate_sqrt:
+        gamma: 0
+      phate_logCP10k:
+        preferred_normalization: "log_cp10k"
+      phate_logCP10k_1kHVG:
+        n_hvg: 1000
+        preferred_normalization: "log_cp10k"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "phate==1.0.*"
+    - "scprep"
+    - "scikit-learn<1.2"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/phate/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/phate"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/phate/phate"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/methods/phate/phate b/target/docker/dimensionality_reduction/methods/phate/phate
new file mode 100755
index 0000000000..e618d47816
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/phate/phate
@@ -0,0 +1,1032 @@
+#!/usr/bin/env bash
+
+# phate 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="phate"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "phate 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --n_pca_dims"
+  echo "        type: integer"
+  echo "        default: 50"
+  echo "        Number of principal components of PCA to use."
+  echo ""
+  echo "    --n_hvg"
+  echo "        type: integer"
+  echo "        Number of highly variable genes to subset to. If not specified, the"
+  echo "        input matrix will not be subset."
+  echo ""
+  echo "    --gamma"
+  echo "        type: double"
+  echo "        default: 1.0"
+  echo "        Gamma value"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "phate==1.0.*" "scprep" "scikit-learn<1.2"
+
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction/methods phate"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:38Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-phate-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "phate 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_pca_dims)
+            [ -n "$VIASH_PAR_N_PCA_DIMS" ] && ViashError Bad arguments for option \'--n_pca_dims\': \'$VIASH_PAR_N_PCA_DIMS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCA_DIMS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_pca_dims. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_pca_dims=*)
+            [ -n "$VIASH_PAR_N_PCA_DIMS" ] && ViashError Bad arguments for option \'--n_pca_dims=*\': \'$VIASH_PAR_N_PCA_DIMS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCA_DIMS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hvg)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hvg=*)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg=*\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --gamma)
+            [ -n "$VIASH_PAR_GAMMA" ] && ViashError Bad arguments for option \'--gamma\': \'$VIASH_PAR_GAMMA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GAMMA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --gamma. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --gamma=*)
+            [ -n "$VIASH_PAR_GAMMA" ] && ViashError Bad arguments for option \'--gamma=*\': \'$VIASH_PAR_GAMMA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GAMMA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/phate:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/phate:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/phate:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/phate:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_PCA_DIMS+x} ]; then
+  VIASH_PAR_N_PCA_DIMS="50"
+fi
+if [ -z ${VIASH_PAR_GAMMA+x} ]; then
+  VIASH_PAR_GAMMA="1.0"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_PCA_DIMS" ]]; then
+  if ! [[ "$VIASH_PAR_N_PCA_DIMS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_pca_dims' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_N_HVG" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hvg' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_GAMMA" ]]; then
+  if ! [[ "$VIASH_PAR_GAMMA" =~ ^[-+]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)([eE][-+]?[0-9]+)?$ ]]; then
+    ViashError '--gamma' has to be a double. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/phate:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/phate:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/phate:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-phate-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+from phate import PHATE
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_pca_dims': $( if [ ! -z ${VIASH_PAR_N_PCA_DIMS+x} ]; then echo "int(r'${VIASH_PAR_N_PCA_DIMS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'gamma': $( if [ ! -z ${VIASH_PAR_GAMMA+x} ]; then echo "float(r'${VIASH_PAR_GAMMA//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Subsetting to {par['n_hvg']} HVG", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+print("Run PHATE", flush=True)
+phate_op = PHATE(n_pca=par["n_pca_dims"], verbose=False, n_jobs=-1, gamma=par["gamma"])
+X_emb = phate_op.fit_transform(X_mat)
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/methods/pymde/.config.vsh.yaml b/target/docker/dimensionality_reduction/methods/pymde/.config.vsh.yaml
new file mode 100644
index 0000000000..281a419199
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/pymde/.config.vsh.yaml
@@ -0,0 +1,206 @@
+functionality:
+  name: "pymde"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--embed_method"
+    description: "The method to use for embedding. Options are 'umap' and 'tsne'."
+    info: null
+    default:
+    - "neighbors"
+    required: false
+    choices:
+    - "neighbors"
+    - "distances"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to subset to. If not specified,\
+      \ the input matrix will not be subset."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pca_dims"
+    description: "Number of principal components to use for the initial PCA step."
+    info: null
+    default:
+    - 100
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "PyMDE"
+    summary: "A Python implementation of Minimum-Distortion Embedding"
+    description: "PyMDE is a Python implementation of Minimum-Distortion Embedding.\
+      \ It is a non-linear\nmethod that preserves distances between cells or neighbourhoods\
+      \ in the original space.\n"
+    reference: "agrawal2021mde"
+    repository_url: "https://github.com/cvxgrp/pymde"
+    documentation_url: "https://pymde.org"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/pymde.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "pymde"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/pymde/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/pymde"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/pymde/pymde"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/methods/pymde/pymde b/target/docker/dimensionality_reduction/methods/pymde/pymde
new file mode 100755
index 0000000000..a57d1aa945
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/pymde/pymde
@@ -0,0 +1,1054 @@
+#!/usr/bin/env bash
+
+# pymde 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="pymde"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "pymde 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --embed_method"
+  echo "        type: string"
+  echo "        default: neighbors"
+  echo "        choices: [ neighbors, distances ]"
+  echo "        The method to use for embedding. Options are 'umap' and 'tsne'."
+  echo ""
+  echo "    --n_hvg"
+  echo "        type: integer"
+  echo "        Number of highly variable genes to subset to. If not specified, the"
+  echo "        input matrix will not be subset."
+  echo ""
+  echo "    --n_pca_dims"
+  echo "        type: integer"
+  echo "        default: 100"
+  echo "        Number of principal components to use for the initial PCA step."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "pymde"
+
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction/methods pymde"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:37Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-pymde-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "pymde 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --embed_method)
+            [ -n "$VIASH_PAR_EMBED_METHOD" ] && ViashError Bad arguments for option \'--embed_method\': \'$VIASH_PAR_EMBED_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_EMBED_METHOD="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --embed_method. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --embed_method=*)
+            [ -n "$VIASH_PAR_EMBED_METHOD" ] && ViashError Bad arguments for option \'--embed_method=*\': \'$VIASH_PAR_EMBED_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_EMBED_METHOD=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hvg)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hvg=*)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg=*\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_pca_dims)
+            [ -n "$VIASH_PAR_N_PCA_DIMS" ] && ViashError Bad arguments for option \'--n_pca_dims\': \'$VIASH_PAR_N_PCA_DIMS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCA_DIMS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_pca_dims. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_pca_dims=*)
+            [ -n "$VIASH_PAR_N_PCA_DIMS" ] && ViashError Bad arguments for option \'--n_pca_dims=*\': \'$VIASH_PAR_N_PCA_DIMS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCA_DIMS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/pymde:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/pymde:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/pymde:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/pymde:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_EMBED_METHOD+x} ]; then
+  VIASH_PAR_EMBED_METHOD="neighbors"
+fi
+if [ -z ${VIASH_PAR_N_PCA_DIMS+x} ]; then
+  VIASH_PAR_N_PCA_DIMS="100"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_N_HVG" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hvg' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_PCA_DIMS" ]]; then
+  if ! [[ "$VIASH_PAR_N_PCA_DIMS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_pca_dims' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# check whether value is belongs to a set of choices
+if [ ! -z "$VIASH_PAR_EMBED_METHOD" ]; then
+  VIASH_PAR_EMBED_METHOD_CHOICES=("neighbors:distances")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_EMBED_METHOD_CHOICES[*]}:" =~ ":$VIASH_PAR_EMBED_METHOD:" ]]; then
+    ViashError '--embed_method' specified value of \'$VIASH_PAR_EMBED_METHOD\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/pymde:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/pymde:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/pymde:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-pymde-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import scanpy as sc
+import pymde
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'embed_method': $( if [ ! -z ${VIASH_PAR_EMBED_METHOD+x} ]; then echo "r'${VIASH_PAR_EMBED_METHOD//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_pca_dims': $( if [ ! -z ${VIASH_PAR_N_PCA_DIMS+x} ]; then echo "int(r'${VIASH_PAR_N_PCA_DIMS//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+if par["embed_method"] == "neighbors":
+    mde_fn = pymde.preserve_neighbors
+elif par["embed_method"] == "distances":
+    mde_fn = pymde.preserve_distances
+else:
+    raise ValueError(f"Unknown embedding method: {par['embed_method']}")
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+print(f"Compute PCA", flush=True)
+X_pca = sc.tl.pca(X_mat, n_comps=par["n_pca_dims"], svd_solver="arpack")
+
+print(f"Run MDE", flush=True)
+X_emb = (
+    mde_fn(X_pca, embedding_dim=2, verbose=True)
+    .embed(verbose=True)
+    .detach()
+    .numpy()
+)
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/methods/simlr/.config.vsh.yaml b/target/docker/dimensionality_reduction/methods/simlr/.config.vsh.yaml
new file mode 100644
index 0000000000..a98b3f18f0
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/simlr/.config.vsh.yaml
@@ -0,0 +1,252 @@
+functionality:
+  name: "simlr"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_dim"
+    description: "Number of dimensions."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_clusters"
+    description: "Number of clusters to be estimated over the input dataset."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--tuning_param"
+    description: "Number of dimensions."
+    info: null
+    default:
+    - 10
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean"
+    name: "--impute"
+    description: "Should the input data be transposed?"
+    info: null
+    default:
+    - false
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean"
+    name: "--normalize"
+    description: "Should the input data be normalized?"
+    info: null
+    default:
+    - false
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--cores_ratio"
+    description: "Ratio of the number of cores to be used when computing the multi-kernel."
+    info: null
+    default:
+    - 1
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SIMLR"
+    summary: "Multikernel-based learning of distance metrics from gene expression\
+      \ data for dimension reduction, clustering and visulaization."
+    description: "Single-cell Interpretation via Multikernel LeaRning (SIMLR) learns\
+      \ cell-to-cell similarity measures from single-cell RNA-seq data in using Gaussian\
+      \ kernels with various hyperparameters in order to perform dimension reduction,\
+      \ clustering and visualization. \nSIMLR assumes that if C separable populations\
+      \ exist among the N cells, then the similarity matrix should have an approximate\
+      \ block-diagonal structure with C blocks whereby cells have larger similarities\
+      \ to other cells within the same subpopulations. Learned similarity between\
+      \ two cells should be small if the Euclidean distance between them is large.\
+      \ The cell-to-cell similarity is computed using an optimization framework over\
+      \ an N x N similarity matrix, a low-dimensional auxilary matrix enforcing low\
+      \ rank constraint on the similarity matrix, and the kernel weights. \nDimension\
+      \ reduction is achieved by the stochastic neighbor embedding methodology with\
+      \ the learned similarities as input. \n"
+    preferred_normalization: "log_cp10k"
+    reference: "wang2017visualization"
+    documentation_url: "https://github.com/BatzoglouLabSU/SIMLR/blob/SIMLR/README.md"
+    repository_url: "https://github.com/BatzoglouLabSU/SIMLR"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    packages:
+    - "grDevices"
+    cran:
+    - "Matrix"
+    - "parallel"
+    - "Rcpp"
+    - "pracma"
+    - "RcppAnnoy"
+    - "RSpectra"
+    - "igraph"
+    bioc:
+    - "SIMLR"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/simlr/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/simlr"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/simlr/simlr"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/methods/simlr/simlr b/target/docker/dimensionality_reduction/methods/simlr/simlr
new file mode 100755
index 0000000000..484cb26515
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/simlr/simlr
@@ -0,0 +1,1136 @@
+#!/usr/bin/env bash
+
+# simlr 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="simlr"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "simlr 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --n_dim"
+  echo "        type: integer"
+  echo "        Number of dimensions."
+  echo ""
+  echo "    --n_clusters"
+  echo "        type: integer"
+  echo "        Number of clusters to be estimated over the input dataset."
+  echo ""
+  echo "    --tuning_param"
+  echo "        type: integer"
+  echo "        default: 10"
+  echo "        Number of dimensions."
+  echo ""
+  echo "    --impute"
+  echo "        type: boolean"
+  echo "        default: false"
+  echo "        Should the input data be transposed?"
+  echo ""
+  echo "    --normalize"
+  echo "        type: boolean"
+  echo "        default: false"
+  echo "        Should the input data be normalized?"
+  echo ""
+  echo "    --cores_ratio"
+  echo "        type: integer"
+  echo "        default: 1"
+  echo "        Ratio of the number of cores to be used when computing the multi-kernel."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")' && \
+  Rscript -e 'if (!requireNamespace("SIMLR", quietly = TRUE)) BiocManager::install("SIMLR")' && \
+  Rscript -e 'remotes::install_cran(c("Matrix", "parallel", "Rcpp", "pracma", "RcppAnnoy", "RSpectra", "igraph", "grDevices"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction/methods simlr"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:38Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-simlr-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "simlr 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_dim)
+            [ -n "$VIASH_PAR_N_DIM" ] && ViashError Bad arguments for option \'--n_dim\': \'$VIASH_PAR_N_DIM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_DIM="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_dim. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_dim=*)
+            [ -n "$VIASH_PAR_N_DIM" ] && ViashError Bad arguments for option \'--n_dim=*\': \'$VIASH_PAR_N_DIM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_DIM=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_clusters)
+            [ -n "$VIASH_PAR_N_CLUSTERS" ] && ViashError Bad arguments for option \'--n_clusters\': \'$VIASH_PAR_N_CLUSTERS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_CLUSTERS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_clusters. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_clusters=*)
+            [ -n "$VIASH_PAR_N_CLUSTERS" ] && ViashError Bad arguments for option \'--n_clusters=*\': \'$VIASH_PAR_N_CLUSTERS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_CLUSTERS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --tuning_param)
+            [ -n "$VIASH_PAR_TUNING_PARAM" ] && ViashError Bad arguments for option \'--tuning_param\': \'$VIASH_PAR_TUNING_PARAM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TUNING_PARAM="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --tuning_param. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --tuning_param=*)
+            [ -n "$VIASH_PAR_TUNING_PARAM" ] && ViashError Bad arguments for option \'--tuning_param=*\': \'$VIASH_PAR_TUNING_PARAM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TUNING_PARAM=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --impute)
+            [ -n "$VIASH_PAR_IMPUTE" ] && ViashError Bad arguments for option \'--impute\': \'$VIASH_PAR_IMPUTE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_IMPUTE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --impute. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --impute=*)
+            [ -n "$VIASH_PAR_IMPUTE" ] && ViashError Bad arguments for option \'--impute=*\': \'$VIASH_PAR_IMPUTE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_IMPUTE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --normalize)
+            [ -n "$VIASH_PAR_NORMALIZE" ] && ViashError Bad arguments for option \'--normalize\': \'$VIASH_PAR_NORMALIZE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORMALIZE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --normalize. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --normalize=*)
+            [ -n "$VIASH_PAR_NORMALIZE" ] && ViashError Bad arguments for option \'--normalize=*\': \'$VIASH_PAR_NORMALIZE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORMALIZE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --cores_ratio)
+            [ -n "$VIASH_PAR_CORES_RATIO" ] && ViashError Bad arguments for option \'--cores_ratio\': \'$VIASH_PAR_CORES_RATIO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CORES_RATIO="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --cores_ratio. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --cores_ratio=*)
+            [ -n "$VIASH_PAR_CORES_RATIO" ] && ViashError Bad arguments for option \'--cores_ratio=*\': \'$VIASH_PAR_CORES_RATIO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CORES_RATIO=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/simlr:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/simlr:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/simlr:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/simlr:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_TUNING_PARAM+x} ]; then
+  VIASH_PAR_TUNING_PARAM="10"
+fi
+if [ -z ${VIASH_PAR_IMPUTE+x} ]; then
+  VIASH_PAR_IMPUTE="false"
+fi
+if [ -z ${VIASH_PAR_NORMALIZE+x} ]; then
+  VIASH_PAR_NORMALIZE="false"
+fi
+if [ -z ${VIASH_PAR_CORES_RATIO+x} ]; then
+  VIASH_PAR_CORES_RATIO="1"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_DIM" ]]; then
+  if ! [[ "$VIASH_PAR_N_DIM" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_dim' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_CLUSTERS" ]]; then
+  if ! [[ "$VIASH_PAR_N_CLUSTERS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_clusters' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_TUNING_PARAM" ]]; then
+  if ! [[ "$VIASH_PAR_TUNING_PARAM" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--tuning_param' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_IMPUTE" ]]; then
+  if ! [[ "$VIASH_PAR_IMPUTE" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--impute' has to be a boolean. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_NORMALIZE" ]]; then
+  if ! [[ "$VIASH_PAR_NORMALIZE" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--normalize' has to be a boolean. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_CORES_RATIO" ]]; then
+  if ! [[ "$VIASH_PAR_CORES_RATIO" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--cores_ratio' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/simlr:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/simlr:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/simlr:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-simlr-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("SIMLR", quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "n_dim" = $( if [ ! -z ${VIASH_PAR_N_DIM+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_DIM" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "n_clusters" = $( if [ ! -z ${VIASH_PAR_N_CLUSTERS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_CLUSTERS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "tuning_param" = $( if [ ! -z ${VIASH_PAR_TUNING_PARAM+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_TUNING_PARAM" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "impute" = $( if [ ! -z ${VIASH_PAR_IMPUTE+x} ]; then echo -n "as.logical(toupper('"; echo -n "$VIASH_PAR_IMPUTE" | sed "s#['\\]#\\\\&#g"; echo "'))"; else echo NULL; fi ),
+  "normalize" = $( if [ ! -z ${VIASH_PAR_NORMALIZE+x} ]; then echo -n "as.logical(toupper('"; echo -n "$VIASH_PAR_NORMALIZE" | sed "s#['\\]#\\\\&#g"; echo "'))"; else echo NULL; fi ),
+  "cores_ratio" = $( if [ ! -z ${VIASH_PAR_CORES_RATIO+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_CORES_RATIO" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading input files\\n")
+input <- anndata::read_h5ad(par\$input)
+
+X <- t(as.matrix(input\$layers[["normalized"]]))
+
+if (is.null(par\$n_clusters)) {
+  cat("Estimating the number of clusters\\n")
+  set.seed(1)
+  NUMC = 2:5
+  estimates <- SIMLR::SIMLR_Estimate_Number_of_Clusters(
+    X = X,
+    NUMC = NUMC,
+    cores.ratio = par\$cores_ratio
+  )
+  n_clusters <- NUMC[which.min(estimates\$K2)]
+} else {
+  n_clusters <- par\$n_clusters
+}
+
+if (is.null(par\$n_dim)) {
+  n_dim <- NA
+} else {
+  n_dim <- par\$n_dim
+}
+
+cat("Running SIMLR\\n")
+simlr_result <- SIMLR::SIMLR(
+  X = X,
+  c = n_clusters,
+  no.dim = n_dim,
+  k = par\$tuning_param,
+  if.impute = par\$impute,
+  normalize = par\$normalize,
+  cores.ratio = par\$cores_ratio
+)
+obsm_X_emb <- simlr_result\$ydata
+
+cat("Write output AnnData to file\\n")
+output <- anndata::AnnData(
+  uns = list(
+    dataset_id = input\$uns[["dataset_id"]],
+    method_id = meta\$functionality_name,
+    normalization_id = input\$uns[["normalization_id"]]
+  ),
+  obsm = list(
+    X_emb = obsm_X_emb
+  ),
+  shape = input\$shape
+)
+output\$write_h5ad(par\$output, compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/methods/tsne/.config.vsh.yaml b/target/docker/dimensionality_reduction/methods/tsne/.config.vsh.yaml
new file mode 100644
index 0000000000..63039c9134
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/tsne/.config.vsh.yaml
@@ -0,0 +1,206 @@
+functionality:
+  name: "tsne"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to subset to. If not specified,\
+      \ the input matrix will not be subset."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pca_dims"
+    description: "Number of PCA dimensions to use. If not specified, no PCA will be\
+      \ performed."
+    info: null
+    default:
+    - 50
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "t-SNE"
+    summary: "Minimizing Kullback-Leibler divergence by converting similarities into\
+      \ joint probabilities between data points and the low/high dimensional embedding."
+    description: "t-distributed Stochastic Neighbor Embedding converts similarities\n\
+      between data points to joint probabilities and tries to minimize the\nKullback-Leibler\
+      \ divergence between the joint probabilities of the\nlow-dimensional embedding\
+      \ and the high-dimensional data. We use the\nimplementation in the scanpy package\
+      \ with the result of PCA on the logCPM\nexpression matrix (with and without\
+      \ HVG selection).\n"
+    reference: "vandermaaten2008visualizing"
+    repository_url: "https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html#sklearn.manifold.TSNE"
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html#sklearn.manifold.TSNE"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/tsne.py"
+      commit: "154ccb9fd99113f3d28d9c3f139194539a0290f9"
+    preferred_normalization: "log_cp10k"
+    variants:
+      tsne_logCP10k: null
+      tsne_logCP10k_1kHVG:
+        n_hvg: 1000
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "cmake"
+    - "gcc"
+    interactive: false
+  - type: "python"
+    user: false
+    github:
+    - "DmitryUlyanov/Multicore-TSNE"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/tsne/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/tsne"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/tsne/tsne"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/methods/tsne/tsne b/target/docker/dimensionality_reduction/methods/tsne/tsne
new file mode 100755
index 0000000000..6db8b0b29f
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/tsne/tsne
@@ -0,0 +1,1014 @@
+#!/usr/bin/env bash
+
+# tsne 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="tsne"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "tsne 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --n_hvg"
+  echo "        type: integer"
+  echo "        Number of highly variable genes to subset to. If not specified, the"
+  echo "        input matrix will not be subset."
+  echo ""
+  echo "    --n_pca_dims"
+  echo "        type: integer"
+  echo "        default: 50"
+  echo "        Number of PCA dimensions to use. If not specified, no PCA will be"
+  echo "        performed."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN apt-get update && \
+  DEBIAN_FRONTEND=noninteractive apt-get install -y cmake gcc && \
+  rm -rf /var/lib/apt/lists/*
+
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "git+https://github.com/DmitryUlyanov/Multicore-TSNE"
+
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction/methods tsne"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:36Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-tsne-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "tsne 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hvg)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hvg=*)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg=*\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_pca_dims)
+            [ -n "$VIASH_PAR_N_PCA_DIMS" ] && ViashError Bad arguments for option \'--n_pca_dims\': \'$VIASH_PAR_N_PCA_DIMS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCA_DIMS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_pca_dims. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_pca_dims=*)
+            [ -n "$VIASH_PAR_N_PCA_DIMS" ] && ViashError Bad arguments for option \'--n_pca_dims=*\': \'$VIASH_PAR_N_PCA_DIMS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCA_DIMS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/tsne:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/tsne:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/tsne:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/tsne:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_PCA_DIMS+x} ]; then
+  VIASH_PAR_N_PCA_DIMS="50"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_N_HVG" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hvg' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_PCA_DIMS" ]]; then
+  if ! [[ "$VIASH_PAR_N_PCA_DIMS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_pca_dims' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/tsne:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/tsne:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/tsne:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-tsne-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_pca_dims': $( if [ ! -z ${VIASH_PAR_N_PCA_DIMS+x} ]; then echo "int(r'${VIASH_PAR_N_PCA_DIMS//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Subsetting to {par['n_hvg']} HVG", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+print("Computing PCA", flush=True)
+input.obsm["X_pca"] = sc.tl.pca(X_mat, n_comps=par["n_pca_dims"], svd_solver="arpack")
+
+print("Run t-SNE", flush=True)
+sc.tl.tsne(input, use_rep="X_pca", n_pcs=par["n_pca_dims"])
+X_emb = input.obsm["X_tsne"].copy()
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/methods/umap/.config.vsh.yaml b/target/docker/dimensionality_reduction/methods/umap/.config.vsh.yaml
new file mode 100644
index 0000000000..341979b0d5
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/umap/.config.vsh.yaml
@@ -0,0 +1,205 @@
+functionality:
+  name: "umap"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to subset to. If not specified,\
+      \ the input matrix will not be subset."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pca_dims"
+    description: "Number of PCA dimensions to use. If not specified, no PCA will be\
+      \ performed."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "UMAP"
+    summary: "A manifold learning algorithm that utilizes topological data analysis\
+      \ for dimension reduction."
+    description: "Uniform Manifold Approximation and Projection is an algorithm for\n\
+      dimension reduction based on manifold learning techniques and ideas from\ntopological\
+      \ data analysis. We perform UMAP on the logCPM expression matrix\nbefore and\
+      \ after HVG selection and with and without PCA as a pre-processing\nstep.\n"
+    reference: "mcinnes2018umap"
+    repository_url: "https://github.com/lmcinnes/umap"
+    documentation_url: "https://github.com/lmcinnes/umap#readme"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/umap.py"
+      commit: "14d70b330cae09527a6d4c4e552db240601e31cf"
+    preferred_normalization: "log_cp10k"
+    variants:
+      umap_logCP10k: null
+      umap_pca_logCP10k:
+        n_pca_dims: 50
+      umap_logCP10k_1kHVG:
+        n_hvg: 1000
+      umap_pca_logCP10k_1kHVG:
+        n_pca_dims: 50
+        n_hvg: 1000
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "umap-learn"
+    - "pynndescent==0.5.11"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/umap/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/umap"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/methods/umap/umap"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/methods/umap/umap b/target/docker/dimensionality_reduction/methods/umap/umap
new file mode 100755
index 0000000000..c71463ded0
--- /dev/null
+++ b/target/docker/dimensionality_reduction/methods/umap/umap
@@ -0,0 +1,1017 @@
+#!/usr/bin/env bash
+
+# umap 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="umap"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "umap 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --n_hvg"
+  echo "        type: integer"
+  echo "        default: 1000"
+  echo "        Number of highly variable genes to subset to. If not specified, the"
+  echo "        input matrix will not be subset."
+  echo ""
+  echo "    --n_pca_dims"
+  echo "        type: integer"
+  echo "        Number of PCA dimensions to use. If not specified, no PCA will be"
+  echo "        performed."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "umap-learn" "pynndescent==0.5.11"
+
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction/methods umap"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:38Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-umap-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "umap 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hvg)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hvg=*)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg=*\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_pca_dims)
+            [ -n "$VIASH_PAR_N_PCA_DIMS" ] && ViashError Bad arguments for option \'--n_pca_dims\': \'$VIASH_PAR_N_PCA_DIMS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCA_DIMS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_pca_dims. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_pca_dims=*)
+            [ -n "$VIASH_PAR_N_PCA_DIMS" ] && ViashError Bad arguments for option \'--n_pca_dims=*\': \'$VIASH_PAR_N_PCA_DIMS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCA_DIMS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/umap:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/umap:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/umap:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/umap:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_HVG+x} ]; then
+  VIASH_PAR_N_HVG="1000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_N_HVG" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hvg' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_PCA_DIMS" ]]; then
+  if ! [[ "$VIASH_PAR_N_PCA_DIMS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_pca_dims' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/umap:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/umap:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/methods/umap:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-umap-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+from umap import UMAP
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_pca_dims': $( if [ ! -z ${VIASH_PAR_N_PCA_DIMS+x} ]; then echo "int(r'${VIASH_PAR_N_PCA_DIMS//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+if par["n_pca_dims"]:
+    print("Apply PCA to normalized data", flush=True)
+    umap_input = sc.tl.pca(
+        X_mat,
+        n_comps=par["n_pca_dims"],
+        svd_solver="arpack"
+    )
+else:
+    print("Use normalized data as input for UMAP", flush=True)
+    umap_input = X_mat
+
+print("Run UMAP", flush=True)
+X_emb = UMAP(densmap=False, random_state=42).fit_transform(umap_input)
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/metrics/clustering_performance/.config.vsh.yaml b/target/docker/dimensionality_reduction/metrics/clustering_performance/.config.vsh.yaml
new file mode 100644
index 0000000000..5c5750261d
--- /dev/null
+++ b/target/docker/dimensionality_reduction/metrics/clustering_performance/.config.vsh.yaml
@@ -0,0 +1,299 @@
+functionality:
+  name: "clustering_performance"
+  namespace: "dimensionality_reduction/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_embedding"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Test data"
+      summary: "The data for evaluating a dimensionality reduction."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--nmi_avg_method"
+    description: "Method to compute normalizer in the denominator for normalized mutual\
+      \ information score calculation."
+    info: null
+    default:
+    - "arithmetic"
+    required: false
+    choices:
+    - "min"
+    - "geometric"
+    - "arithmetic"
+    - "max"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "normalized_mutual_information"
+      label: "NMI"
+      summary: "Normalized Mutual Information (NMI) is a measure of the concordance\
+        \ between clustering obtained from the reduced-dimensional embeddings and\
+        \ the cell labels."
+      description: "The Normalized Mutual Information (NMI) is a measure of the similarity\
+        \ between cluster labels obtained from the clustering of dimensionality reduction\
+        \ embeddings and the true cell labels. It is a normalization of the Mutual\
+        \ Information (MI) score to scale the results between 0 (no mutual information)\
+        \ and 1 (perfect correlation). \nMutual Information quantifies the \"amount\
+        \ of information\" obtained about one random variable by observing the other\
+        \ random variable. Assuming two label assignments X and Y, it is given by:\
+        \ \n  $MI(X,Y) = \\sum_{x=1}^{X}\\sum_{y=1}^{Y}p(x,y)log(\\frac{P(x,y)}{P(x)P'(y)})$,\
+        \ \nwhere P(x,y) is the joint probability mass function of X and Y, and P(x),\
+        \ P'(y) are the marginal probability mass functions of X and Y respectively.\
+        \ The mutual information is normalized by some generalized mean of H(X) and\
+        \ H(Y). Therefore, Normalized Mutual Information can be defined as: \n  $NMI(X,Y)\
+        \ = \\frac{MI(X,Y)}{mean(H(X),H(Y))}$, \nwhere H(X) and H(Y) are the entropies\
+        \ of X and Y respectively. Higher NMI score suggests that the method is effective\
+        \ in preserving relevant information.\n"
+      reference: "emmons2016analysis"
+      documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.normalized_mutual_info_score.html"
+      repository_url: "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.normalized_mutual_info_score.html"
+      min: 0
+      max: 1
+      maximize: true
+    - name: "adjusted_rand_index"
+      label: "ARI"
+      summary: "Adjusted Rand Index (ARI) is a measure of the similarities between\
+        \ two cluster assignments of the reduced-dimensional embeddings and the true\
+        \ cell types."
+      description: "Adjusted Rand Index (ARI) is a measure of similarity between two\
+        \ clusterings by considering all pairs of samples and counting pairs that\
+        \ are assigned in the same or different clusters in the predicted (from the\
+        \ reduced dimensional embeddings) and true clusterings (cell type labels).\
+        \ It is the Rand Index (RI) adjusted for chance.\nAssuming the C as the cell\
+        \ type labels and K as the clustering of the reduced dimensional embedding,\
+        \ Rand Index can be defined as:\n  $RI = \\frac{a + b}{{C}_{2}^{n_{samples}}}$,\n\
+        where 'a' is the number of pairs of elements that are in the same set in C\
+        \ and in the same set in K, 'b' is the number of pairs of elements that are\
+        \ in different sets in C and in different sets in K, and ${C}_{2}^{n_{samples}}$\
+        \ is the total number of possible pairs in the dataset. Random label assignments\
+        \ can be discounted as follows: \n  $ARI = \\frac{RI - E[RI]}{max(RI) - E[RI]}$,\
+        \ \nwhere E[RI] is the expected RI of random labellings.\n"
+      reference: "santos2009on"
+      documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html#sklearn.metrics.adjusted_rand_score"
+      repository_url: "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html#sklearn.metrics.adjusted_rand_score"
+      min: 0
+      max: 1
+      maximize: true
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A dimensionality reduction metric."
+      description: "A metric for evaluating dimensionality reductions.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    - "scanpy"
+    - "leidenalg"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/clustering_performance/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/metrics/clustering_performance"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/metrics/clustering_performance/clustering_performance"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/metrics/clustering_performance/clustering_performance b/target/docker/dimensionality_reduction/metrics/clustering_performance/clustering_performance
new file mode 100755
index 0000000000..e2e1b783c5
--- /dev/null
+++ b/target/docker/dimensionality_reduction/metrics/clustering_performance/clustering_performance
@@ -0,0 +1,1042 @@
+#!/usr/bin/env bash
+
+# clustering_performance 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="clustering_performance"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "clustering_performance 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_embedding"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/score.h5ad"
+  echo ""
+  echo "    --nmi_avg_method"
+  echo "        type: string"
+  echo "        default: arithmetic"
+  echo "        choices: [ min, geometric, arithmetic, max ]"
+  echo "        Method to compute normalizer in the denominator for normalized mutual"
+  echo "        information score calculation."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scikit-learn" "scanpy" "leidenalg"
+
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction/metrics clustering_performance"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:39Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-clustering_performance-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "clustering_performance 2.0.0"
+            exit
+            ;;
+        --input_embedding)
+            [ -n "$VIASH_PAR_INPUT_EMBEDDING" ] && ViashError Bad arguments for option \'--input_embedding\': \'$VIASH_PAR_INPUT_EMBEDDING\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_EMBEDDING="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_embedding. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_embedding=*)
+            [ -n "$VIASH_PAR_INPUT_EMBEDDING" ] && ViashError Bad arguments for option \'--input_embedding=*\': \'$VIASH_PAR_INPUT_EMBEDDING\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_EMBEDDING=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --nmi_avg_method)
+            [ -n "$VIASH_PAR_NMI_AVG_METHOD" ] && ViashError Bad arguments for option \'--nmi_avg_method\': \'$VIASH_PAR_NMI_AVG_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NMI_AVG_METHOD="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --nmi_avg_method. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --nmi_avg_method=*)
+            [ -n "$VIASH_PAR_NMI_AVG_METHOD" ] && ViashError Bad arguments for option \'--nmi_avg_method=*\': \'$VIASH_PAR_NMI_AVG_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NMI_AVG_METHOD=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/clustering_performance:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/clustering_performance:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/clustering_performance:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/clustering_performance:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_EMBEDDING+x} ]; then
+  ViashError '--input_embedding' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_NMI_AVG_METHOD+x} ]; then
+  VIASH_PAR_NMI_AVG_METHOD="arithmetic"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_EMBEDDING" ] && [ ! -e "$VIASH_PAR_INPUT_EMBEDDING" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_EMBEDDING' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# check whether value is belongs to a set of choices
+if [ ! -z "$VIASH_PAR_NMI_AVG_METHOD" ]; then
+  VIASH_PAR_NMI_AVG_METHOD_CHOICES=("min:geometric:arithmetic:max")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_NMI_AVG_METHOD_CHOICES[*]}:" =~ ":$VIASH_PAR_NMI_AVG_METHOD:" ]]; then
+    ViashError '--nmi_avg_method' specified value of \'$VIASH_PAR_NMI_AVG_METHOD\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_EMBEDDING" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_EMBEDDING")" )
+  VIASH_PAR_INPUT_EMBEDDING=$(ViashAutodetectMount "$VIASH_PAR_INPUT_EMBEDDING")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/clustering_performance:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/clustering_performance:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/clustering_performance:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-clustering_performance-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import scanpy as sc
+from sklearn.cluster import KMeans
+from sklearn.metrics import normalized_mutual_info_score
+from sklearn.metrics import adjusted_rand_score
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_embedding': $( if [ ! -z ${VIASH_PAR_INPUT_EMBEDDING+x} ]; then echo "r'${VIASH_PAR_INPUT_EMBEDDING//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'nmi_avg_method': $( if [ ! -z ${VIASH_PAR_NMI_AVG_METHOD+x} ]; then echo "r'${VIASH_PAR_NMI_AVG_METHOD//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_embedding = ad.read_h5ad(par['input_embedding'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Compute metrics', flush=True)
+
+# Perform Leiden clustering on dimensionlity reduction embedding
+n = 20
+resolutions = [2 * x / n for x in range(1, n + 1)]
+score_max = 0
+res_max = resolutions[0]
+key_max = None
+score_all = []
+
+if "neighbors" not in input_embedding.uns:
+  sc.pp.neighbors(input_embedding, use_rep="X_emb")
+
+for res in resolutions:
+  key_added = f"X_emb_leiden_{res}"
+  sc.tl.leiden(input_embedding, resolution=res, key_added=key_added)
+  score = normalized_mutual_info_score(input_solution.obs["cell_type"], input_embedding.obs[key_added], average_method = par['nmi_avg_method'])
+  score_all.append(score)
+
+  if score_max < score:
+    score_max = score
+    res_max = res
+    key_max = key_added
+
+# Compute NMI scores
+nmi = normalized_mutual_info_score(input_solution.obs["cell_type"], input_embedding.obs[key_max], average_method = par['nmi_avg_method'])
+
+# Compute ARI scores
+ari = adjusted_rand_score(input_solution.obs["cell_type"], input_embedding.obs[key_max])
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  uns={
+    'dataset_id': input_embedding.uns['dataset_id'],
+    'normalization_id': input_embedding.uns['normalization_id'],
+    'method_id': input_embedding.uns['method_id'],
+    'metric_ids': [ 'normalized_mutual_information', 'adjusted_rand_index' ],
+    'metric_values': [ nmi, ari ]
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_EMBEDDING" ]; then
+  VIASH_PAR_INPUT_EMBEDDING=$(ViashStripAutomount "$VIASH_PAR_INPUT_EMBEDDING")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/metrics/coranking/.config.vsh.yaml b/target/docker/dimensionality_reduction/metrics/coranking/.config.vsh.yaml
new file mode 100644
index 0000000000..ec879427da
--- /dev/null
+++ b/target/docker/dimensionality_reduction/metrics/coranking/.config.vsh.yaml
@@ -0,0 +1,374 @@
+functionality:
+  name: "coranking"
+  namespace: "dimensionality_reduction/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_embedding"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Test data"
+      summary: "The data for evaluating a dimensionality reduction."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "continuity_at_k30"
+      label: "Continuity at k=30"
+      reference: "venna2006local"
+      summary: "The continuity metric at k=30 computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      description: "The continuity metric at k=30 computed on the co-ranking matrix\
+        \ between expression matrix and embedding."
+      repository_url: "https://github.com/gdkrmr/coRanking/"
+      documentation_url: "https://coranking.guido-kraemer.com/"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py"
+        commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+        note: "The original v1 implementations consisted of a lot of helper functions\
+          \ which were \nderived from the pyDRMetrics package. This version uses the\
+          \ coRanking package\nto avoid reimplementing and potentially introducing\
+          \ a lot of bugs in how\nthe various metrics are computed.\n\nIn addition,\
+          \ the references for each of the metrics were looked up to\nproperly attribute\
+          \ the original authors of each of the metrics.\n"
+    - name: "trustworthiness_at_k30"
+      label: "Trustworthiness at k=30"
+      summary: "The trustworthiness metric at k=30 computed on the co-ranking matrix\
+        \ between expression matrix and embedding."
+      description: "The trustworthiness metric at k=30 computed on the co-ranking\
+        \ matrix between expression matrix and embedding."
+      repository_url: "https://github.com/gdkrmr/coRanking/"
+      documentation_url: "https://coranking.guido-kraemer.com/"
+      reference: "venna2006local"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py"
+        commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+        note: "The original v1 implementations consisted of a lot of helper functions\
+          \ which were \nderived from the pyDRMetrics package. This version uses the\
+          \ coRanking package\nto avoid reimplementing and potentially introducing\
+          \ a lot of bugs in how\nthe various metrics are computed.\n\nIn addition,\
+          \ the references for each of the metrics were looked up to\nproperly attribute\
+          \ the original authors of each of the metrics.\n"
+    - name: "qnx_at_k30"
+      label: "The value for QNX at k=30"
+      summary: "The QNX metric at k=30 computed on the co-ranking matrix between expression\
+        \ matrix and embedding."
+      description: "The QNX metric at k=30 computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      repository_url: "https://github.com/gdkrmr/coRanking/"
+      documentation_url: "https://coranking.guido-kraemer.com/"
+      reference: "lee2009quality"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py"
+        commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+        note: "The original v1 implementations consisted of a lot of helper functions\
+          \ which were \nderived from the pyDRMetrics package. This version uses the\
+          \ coRanking package\nto avoid reimplementing and potentially introducing\
+          \ a lot of bugs in how\nthe various metrics are computed.\n\nIn addition,\
+          \ the references for each of the metrics were looked up to\nproperly attribute\
+          \ the original authors of each of the metrics.\n"
+    - name: "lcmc_at_k30"
+      label: "The value for LCMC at k=30"
+      summary: "The LCMC metric at k=30 computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      description: "The LCMC metric at k=30 computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      repository_url: "https://github.com/gdkrmr/coRanking/"
+      documentation_url: "https://coranking.guido-kraemer.com/"
+      reference: "chen2009local"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py"
+        commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+        note: "The original v1 implementations consisted of a lot of helper functions\
+          \ which were \nderived from the pyDRMetrics package. This version uses the\
+          \ coRanking package\nto avoid reimplementing and potentially introducing\
+          \ a lot of bugs in how\nthe various metrics are computed.\n\nIn addition,\
+          \ the references for each of the metrics were looked up to\nproperly attribute\
+          \ the original authors of each of the metrics.\n"
+    - name: "qnx_auc"
+      label: "Area under the QNX curve"
+      summary: "The AU-QNX metric at k=30 computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      description: "The AU-QNX metric at k=30 computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      repository_url: "https://github.com/gdkrmr/coRanking/"
+      documentation_url: "https://coranking.guido-kraemer.com/"
+      reference: "lueks2011evaluate"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py"
+        commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+        note: "The original v1 implementations consisted of a lot of helper functions\
+          \ which were \nderived from the pyDRMetrics package. This version uses the\
+          \ coRanking package\nto avoid reimplementing and potentially introducing\
+          \ a lot of bugs in how\nthe various metrics are computed.\n\nIn addition,\
+          \ the references for each of the metrics were looked up to\nproperly attribute\
+          \ the original authors of each of the metrics.\n"
+    - name: "qlocal"
+      label: "Local quality measure"
+      summary: "The local quality metric computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      description: "The local quality metric computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      repository_url: "https://github.com/gdkrmr/coRanking/"
+      documentation_url: "https://coranking.guido-kraemer.com/"
+      reference: "lueks2011evaluate"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py"
+        commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+        note: "The original v1 implementations consisted of a lot of helper functions\
+          \ which were \nderived from the pyDRMetrics package. This version uses the\
+          \ coRanking package\nto avoid reimplementing and potentially introducing\
+          \ a lot of bugs in how\nthe various metrics are computed.\n\nIn addition,\
+          \ the references for each of the metrics were looked up to\nproperly attribute\
+          \ the original authors of each of the metrics.\n"
+    - name: "qglobal"
+      label: "Global quality measure"
+      summary: "The Global quality metric computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      description: "The Global quality metric computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      repository_url: "https://github.com/gdkrmr/coRanking/"
+      documentation_url: "https://coranking.guido-kraemer.com/"
+      reference: "lueks2011evaluate"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py"
+        commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+        note: "The original v1 implementations consisted of a lot of helper functions\
+          \ which were \nderived from the pyDRMetrics package. This version uses the\
+          \ coRanking package\nto avoid reimplementing and potentially introducing\
+          \ a lot of bugs in how\nthe various metrics are computed.\n\nIn addition,\
+          \ the references for each of the metrics were looked up to\nproperly attribute\
+          \ the original authors of each of the metrics.\n"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A dimensionality reduction metric."
+      description: "A metric for evaluating dimensionality reductions.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "coRanking"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/coranking/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/metrics/coranking"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/metrics/coranking/coranking"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/metrics/coranking/coranking b/target/docker/dimensionality_reduction/metrics/coranking/coranking
new file mode 100755
index 0000000000..d77951cfa8
--- /dev/null
+++ b/target/docker/dimensionality_reduction/metrics/coranking/coranking
@@ -0,0 +1,1055 @@
+#!/usr/bin/env bash
+
+# coranking 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="coranking"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "coranking 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_embedding"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/score.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_cran(c("coRanking"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction/metrics coranking"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:37Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-coranking-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "coranking 2.0.0"
+            exit
+            ;;
+        --input_embedding)
+            [ -n "$VIASH_PAR_INPUT_EMBEDDING" ] && ViashError Bad arguments for option \'--input_embedding\': \'$VIASH_PAR_INPUT_EMBEDDING\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_EMBEDDING="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_embedding. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_embedding=*)
+            [ -n "$VIASH_PAR_INPUT_EMBEDDING" ] && ViashError Bad arguments for option \'--input_embedding=*\': \'$VIASH_PAR_INPUT_EMBEDDING\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_EMBEDDING=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/coranking:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/coranking:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/coranking:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/coranking:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_EMBEDDING+x} ]; then
+  ViashError '--input_embedding' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_EMBEDDING" ] && [ ! -e "$VIASH_PAR_INPUT_EMBEDDING" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_EMBEDDING' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_EMBEDDING" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_EMBEDDING")" )
+  VIASH_PAR_INPUT_EMBEDDING=$(ViashAutodetectMount "$VIASH_PAR_INPUT_EMBEDDING")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/coranking:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/coranking:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/coranking:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-coranking-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+library(anndata)
+library(coRanking)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_embedding" = $( if [ ! -z ${VIASH_PAR_INPUT_EMBEDDING+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_EMBEDDING" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_solution" = $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_SOLUTION" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Read anndata objects")
+input_solution <- anndata::read_h5ad(par[["input_solution"]])
+input_embedding <- anndata::read_h5ad(par[["input_embedding"]])
+
+# get datasets
+high_dim <- input_solution\$layers[["normalized"]]
+X_emb <- input_embedding\$obsm[["X_emb"]]
+
+if (any(is.na(X_emb))) {
+  continuity_at_k30 <-
+    trustworthiness_at_k30 <-
+    qnx_at_k30 <-
+    lcmc_at_k30 <-
+    qnx_auc <-
+    qlocal <-
+    qglobal <-
+    0
+} else {
+  cat("Compute pairwise distances\\n")
+  # TODO: computing a square distance matrix is problematic for large datasets!
+  # TODO: should we use a different distance metric for the high_dim?
+  # TODO: or should we subset to the HVG?
+  dist_highdim <- coRanking:::euclidean(as.matrix(high_dim))
+  dist_emb <- coRanking:::euclidean(as.matrix(X_emb))
+
+  cat("Compute ranking matrices\\n")
+  rmat_highdim <- rankmatrix(dist_highdim, input = "dist")
+  rmat_emb <- rankmatrix(dist_emb, input = "dist")
+
+  cat("Compute coranking matrix\\n")
+  corank <- coranking(rmat_highdim, rmat_emb, "rank")
+
+  cat("Compute metrics\\n")
+  # Compute QNX. This is a curve indicating the percentage of points
+  # that are mild in- and extrusions or keep their rank.
+  qnx <- Q_NX(corank)
+
+  # Calculate the local continuity meta-criterion from a co-ranking matrix.
+  lcmc <- LCMC(corank)
+
+  # the values of qnx are split into local and global values by kmax
+  kmax <- which.max(lcmc)
+
+  # check certain quality values at k=30
+  k30 <- 30
+  trustworthiness_at_k30 <- coRanking:::cm.M_T(corank, k30)
+  continuity_at_k30 <- coRanking:::cm.M_C(corank, k30)
+  qnx_at_k30 <- qnx[[k30]]
+  lcmc_at_k30 <- lcmc[[k30]]
+
+  # area under the QNX curve
+  qnx_auc <- mean(qnx)
+
+  # local quality measure
+  qlocal <- mean(qnx[seq_len(kmax)])
+
+  # global quality measure
+  qglobal <- mean(qnx[-seq_len(kmax)])
+}
+
+cat("construct output AnnData\\n")
+output <- AnnData(
+  shape = c(0L, 0L),
+  uns = list(
+    dataset_id = input_solution\$uns[["dataset_id"]],
+    normalization_id = input_solution\$uns[["normalization_id"]],
+    method_id = input_embedding\$uns[["method_id"]],
+    metric_ids = c(
+      "continuity_at_k30",
+      "trustworthiness_at_k30",
+      "qnx_at_k30",
+      "lcmc_at_k30",
+      "qnx_auc",
+      "qlocal",
+      "qglobal"
+    ),
+    metric_values = c(
+      continuity_at_k30,
+      trustworthiness_at_k30,
+      qnx_at_k30,
+      lcmc_at_k30,
+      qnx_auc,
+      qlocal,
+      qglobal
+    )
+  )
+)
+
+cat("Write to file\\n")
+output\$write_h5ad(par\$output)
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_EMBEDDING" ]; then
+  VIASH_PAR_INPUT_EMBEDDING=$(ViashStripAutomount "$VIASH_PAR_INPUT_EMBEDDING")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/metrics/density_preservation/.config.vsh.yaml b/target/docker/dimensionality_reduction/metrics/density_preservation/.config.vsh.yaml
new file mode 100644
index 0000000000..21117ab700
--- /dev/null
+++ b/target/docker/dimensionality_reduction/metrics/density_preservation/.config.vsh.yaml
@@ -0,0 +1,267 @@
+functionality:
+  name: "density_preservation"
+  namespace: "dimensionality_reduction/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_embedding"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Test data"
+      summary: "The data for evaluating a dimensionality reduction."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_neighbors"
+    description: "Number of neighbors to use for density estimation."
+    info: null
+    default:
+    - 30
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--seed"
+    description: "Random seed."
+    info: null
+    default:
+    - 42
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "density_preservation"
+      label: "Density preservation"
+      summary: "Similarity between local densities in the high-dimensional data and\
+        \ the reduced data."
+      description: "\"Similarity between local densities in the high-dimensional data\
+        \ and the reduced data.\nThis is computed as the pearson correlation of local\
+        \ radii with the local radii in the original data space.\"\n"
+      reference: "narayan2021assessing"
+      min: -1
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/density.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A dimensionality reduction metric."
+      description: "A metric for evaluating dimensionality reductions.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scipy"
+    - "numpy"
+    - "umap-learn"
+    - "pynndescent~=0.5.11"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/density_preservation/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/metrics/density_preservation"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/metrics/density_preservation/density_preservation"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/metrics/density_preservation/density_preservation b/target/docker/dimensionality_reduction/metrics/density_preservation/density_preservation
new file mode 100755
index 0000000000..fc7204ef49
--- /dev/null
+++ b/target/docker/dimensionality_reduction/metrics/density_preservation/density_preservation
@@ -0,0 +1,1130 @@
+#!/usr/bin/env bash
+
+# density_preservation 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="density_preservation"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "density_preservation 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_embedding"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/score.h5ad"
+  echo ""
+  echo "    --n_neighbors"
+  echo "        type: integer"
+  echo "        default: 30"
+  echo "        Number of neighbors to use for density estimation."
+  echo ""
+  echo "    --seed"
+  echo "        type: integer"
+  echo "        default: 42"
+  echo "        Random seed."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scipy" "numpy" "umap-learn" "pynndescent~=0.5.11"
+
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction/metrics density_preservation"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:38Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-density_preservation-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "density_preservation 2.0.0"
+            exit
+            ;;
+        --input_embedding)
+            [ -n "$VIASH_PAR_INPUT_EMBEDDING" ] && ViashError Bad arguments for option \'--input_embedding\': \'$VIASH_PAR_INPUT_EMBEDDING\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_EMBEDDING="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_embedding. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_embedding=*)
+            [ -n "$VIASH_PAR_INPUT_EMBEDDING" ] && ViashError Bad arguments for option \'--input_embedding=*\': \'$VIASH_PAR_INPUT_EMBEDDING\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_EMBEDDING=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_neighbors)
+            [ -n "$VIASH_PAR_N_NEIGHBORS" ] && ViashError Bad arguments for option \'--n_neighbors\': \'$VIASH_PAR_N_NEIGHBORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_NEIGHBORS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_neighbors. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_neighbors=*)
+            [ -n "$VIASH_PAR_N_NEIGHBORS" ] && ViashError Bad arguments for option \'--n_neighbors=*\': \'$VIASH_PAR_N_NEIGHBORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_NEIGHBORS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --seed)
+            [ -n "$VIASH_PAR_SEED" ] && ViashError Bad arguments for option \'--seed\': \'$VIASH_PAR_SEED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SEED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --seed. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --seed=*)
+            [ -n "$VIASH_PAR_SEED" ] && ViashError Bad arguments for option \'--seed=*\': \'$VIASH_PAR_SEED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SEED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/density_preservation:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/density_preservation:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/density_preservation:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/density_preservation:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_EMBEDDING+x} ]; then
+  ViashError '--input_embedding' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_NEIGHBORS+x} ]; then
+  VIASH_PAR_N_NEIGHBORS="30"
+fi
+if [ -z ${VIASH_PAR_SEED+x} ]; then
+  VIASH_PAR_SEED="42"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_EMBEDDING" ] && [ ! -e "$VIASH_PAR_INPUT_EMBEDDING" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_EMBEDDING' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_NEIGHBORS" ]]; then
+  if ! [[ "$VIASH_PAR_N_NEIGHBORS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_neighbors' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_SEED" ]]; then
+  if ! [[ "$VIASH_PAR_SEED" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--seed' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_EMBEDDING" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_EMBEDDING")" )
+  VIASH_PAR_INPUT_EMBEDDING=$(ViashAutodetectMount "$VIASH_PAR_INPUT_EMBEDDING")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/density_preservation:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/density_preservation:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/density_preservation:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-density_preservation-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+
+
+import anndata as ad
+import numpy as np
+from typing import Optional
+from umap import UMAP
+from scipy.stats import pearsonr
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_embedding': $( if [ ! -z ${VIASH_PAR_INPUT_EMBEDDING+x} ]; then echo "r'${VIASH_PAR_INPUT_EMBEDDING//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_neighbors': $( if [ ! -z ${VIASH_PAR_N_NEIGHBORS+x} ]; then echo "int(r'${VIASH_PAR_N_NEIGHBORS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'seed': $( if [ ! -z ${VIASH_PAR_SEED+x} ]; then echo "int(r'${VIASH_PAR_SEED//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# Interpreted from:
+# https://github.com/lmcinnes/umap/blob/317ce81dc64aec9e279aa1374ac809d9ced236f6/umap/umap_.py#L1190-L1243
+#
+# Author: Leland McInnes <leland.mcinnes@gmail.com>
+#
+# License: BSD 3 clause
+def _calculate_radii(
+    X: np.ndarray,
+    n_neighbors: int = 30,
+    random_state: Optional[int] = None
+) -> np.ndarray:
+    from umap.umap_ import fuzzy_simplicial_set
+    from umap.umap_ import nearest_neighbors
+
+    (knn_indices, knn_dists, _) = nearest_neighbors(
+        X,
+        n_neighbors,
+        "euclidean",
+        {},
+        False,
+        random_state,
+        verbose=False,
+    )
+
+    emb_graph, _, _, emb_dists = fuzzy_simplicial_set(
+        X,
+        n_neighbors,
+        random_state,
+        "euclidean",
+        {},
+        knn_indices,
+        knn_dists,
+        verbose=False,
+        return_dists=True,
+    )
+
+    emb_graph = emb_graph.tocoo()
+    emb_graph.sum_duplicates()
+    emb_graph.eliminate_zeros()
+
+    n_vertices = emb_graph.shape[1]
+
+    mu_sum = np.zeros(n_vertices, dtype=np.float32)
+    re = np.zeros(n_vertices, dtype=np.float32)
+
+    head = emb_graph.row
+    tail = emb_graph.col
+    for i in range(len(head)):
+        j = head[i]
+        k = tail[i]
+        D = emb_dists[j, k]
+        mu = emb_graph.data[i]
+        re[j] += mu * D
+        re[k] += mu * D
+        mu_sum[j] += mu
+        mu_sum[k] += mu
+
+    epsilon = 1e-8
+    return np.log(epsilon + (re / mu_sum))
+
+def compute_density_preservation(
+    X_emb: np.ndarray,
+    high_dim: np.ndarray,
+    n_neighbors: int = 30,
+    random_state: Optional[int] = None
+) -> float:
+    if np.any(np.isnan(X_emb)):
+        return 0.0
+    
+    print("Compute local radii in original data", flush=True)
+    ro = _calculate_radii(
+        high_dim,
+        n_neighbors=n_neighbors,
+        random_state=random_state
+    )
+
+    print("Compute local radii of embedding", flush=True)
+    re = _calculate_radii(
+        X_emb,
+        n_neighbors=n_neighbors,
+        random_state=random_state
+    )
+    
+    print("Compute pearson correlation", flush=True)
+    return pearsonr(ro, re)[0]
+
+
+print("Load data", flush=True)
+input_solution = ad.read_h5ad(par["input_solution"])
+input_embedding = ad.read_h5ad(par["input_embedding"])
+
+high_dim = input_solution.layers["normalized"]
+X_emb = input_embedding.obsm["X_emb"]
+
+density_preservation = compute_density_preservation(
+    X_emb=X_emb,
+    high_dim=high_dim,
+    n_neighbors=par["n_neighbors"],
+    random_state=par["seed"]
+)
+
+print("Create output AnnData object", flush=True)
+output = ad.AnnData(
+    uns={
+        "dataset_id": input_solution.uns["dataset_id"],
+        "normalization_id": input_solution.uns["normalization_id"],
+        "method_id": input_embedding.uns["method_id"],
+        "metric_ids": [ "density_preservation" ],
+        "metric_values": [ density_preservation ]
+    }
+)
+
+print("Write data to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_EMBEDDING" ]; then
+  VIASH_PAR_INPUT_EMBEDDING=$(ViashStripAutomount "$VIASH_PAR_INPUT_EMBEDDING")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/metrics/distance_correlation/.config.vsh.yaml b/target/docker/dimensionality_reduction/metrics/distance_correlation/.config.vsh.yaml
new file mode 100644
index 0000000000..b031588b00
--- /dev/null
+++ b/target/docker/dimensionality_reduction/metrics/distance_correlation/.config.vsh.yaml
@@ -0,0 +1,267 @@
+functionality:
+  name: "distance_correlation"
+  namespace: "dimensionality_reduction/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_embedding"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Test data"
+      summary: "The data for evaluating a dimensionality reduction."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean_true"
+    name: "--spectral"
+    description: "Calculate the spectral root mean squared error."
+    info: null
+    direction: "input"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "distance_correlation"
+      label: "Distance Correlation"
+      summary: "Calculates the distance correlation by computing Spearman correlations\
+        \ between distances."
+      description: "Calculates the distance correlation by computing Spearman correlations\
+        \ between distances on the full (or processed) data matrix and the dimensionally-reduced\
+        \ matrix."
+      reference: "kruskal1964mds"
+      min: 0
+      max: "+.inf"
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/distance_correlation.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+        note: "This metric was ported but will probably be removed soon."
+    - name: "distance_correlation_spectral"
+      label: "Distance Correlation Spectral"
+      summary: "Spearman correlation between all pairwise diffusion distances in the\
+        \ original and dimension-reduced data."
+      description: "Spearman correlation between all pairwise diffusion distances\
+        \ in the original and dimension-reduced data."
+      reference: "coifman2006diffusion"
+      min: 0
+      max: "+.inf"
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/root_mean_square_error.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+        note: "This metric was ported but will probably be removed soon."
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A dimensionality reduction metric."
+      description: "A metric for evaluating dimensionality reductions.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "umap-learn"
+    - "scikit-learn"
+    - "numpy"
+    - "pynndescent~=0.5.11"
+    - "scipy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/distance_correlation/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/metrics/distance_correlation"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/metrics/distance_correlation/distance_correlation"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/metrics/distance_correlation/distance_correlation b/target/docker/dimensionality_reduction/metrics/distance_correlation/distance_correlation
new file mode 100755
index 0000000000..eedcdbca12
--- /dev/null
+++ b/target/docker/dimensionality_reduction/metrics/distance_correlation/distance_correlation
@@ -0,0 +1,1026 @@
+#!/usr/bin/env bash
+
+# distance_correlation 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="distance_correlation"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "distance_correlation 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_embedding"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/score.h5ad"
+  echo ""
+  echo "    --spectral"
+  echo "        type: boolean_true"
+  echo "        Calculate the spectral root mean squared error."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "umap-learn" "scikit-learn" "numpy" "pynndescent~=0.5.11" "scipy"
+
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction/metrics distance_correlation"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:37Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-distance_correlation-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "distance_correlation 2.0.0"
+            exit
+            ;;
+        --input_embedding)
+            [ -n "$VIASH_PAR_INPUT_EMBEDDING" ] && ViashError Bad arguments for option \'--input_embedding\': \'$VIASH_PAR_INPUT_EMBEDDING\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_EMBEDDING="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_embedding. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_embedding=*)
+            [ -n "$VIASH_PAR_INPUT_EMBEDDING" ] && ViashError Bad arguments for option \'--input_embedding=*\': \'$VIASH_PAR_INPUT_EMBEDDING\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_EMBEDDING=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --spectral)
+            [ -n "$VIASH_PAR_SPECTRAL" ] && ViashError Bad arguments for option \'--spectral\': \'$VIASH_PAR_SPECTRAL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SPECTRAL=true
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/distance_correlation:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/distance_correlation:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/distance_correlation:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/distance_correlation:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_EMBEDDING+x} ]; then
+  ViashError '--input_embedding' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_SPECTRAL+x} ]; then
+  VIASH_PAR_SPECTRAL="false"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_EMBEDDING" ] && [ ! -e "$VIASH_PAR_INPUT_EMBEDDING" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_EMBEDDING' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_SPECTRAL" ]]; then
+  if ! [[ "$VIASH_PAR_SPECTRAL" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--spectral' has to be a boolean_true. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_EMBEDDING" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_EMBEDDING")" )
+  VIASH_PAR_INPUT_EMBEDDING=$(ViashAutodetectMount "$VIASH_PAR_INPUT_EMBEDDING")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/distance_correlation:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/distance_correlation:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/distance_correlation:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-distance_correlation-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+import sklearn.decomposition
+import scipy.stats
+import scipy.spatial
+from sklearn.metrics import pairwise_distances
+import umap
+import umap.spectral
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_embedding': $( if [ ! -z ${VIASH_PAR_INPUT_EMBEDDING+x} ]; then echo "r'${VIASH_PAR_INPUT_EMBEDDING//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'spectral': $( if [ ! -z ${VIASH_PAR_SPECTRAL+x} ]; then echo "r'${VIASH_PAR_SPECTRAL//\'/\'\"\'\"r\'}'.lower() == 'true'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+def _distance_correlation(X, X_emb):
+    high_dimensional_distance_vector = scipy.spatial.distance.pdist(X)
+    low_dimensional_distance_vector = scipy.spatial.distance.pdist(X_emb)
+    corr = scipy.stats.spearmanr(
+        low_dimensional_distance_vector, high_dimensional_distance_vector
+    )
+    return corr
+
+print("Load data", flush=True)
+input_solution = ad.read_h5ad(par["input_solution"])
+input_embedding = ad.read_h5ad(par["input_embedding"])
+
+high_dim = input_solution.layers["normalized"]
+X_emb = input_embedding.obsm["X_emb"]
+
+print("Compute NNLS residual after SVD", flush=True)
+n_svd = 500
+svd_emb = sklearn.decomposition.TruncatedSVD(n_svd).fit_transform(high_dim)
+dist_corr = _distance_correlation(svd_emb, X_emb).correlation
+
+#! Explicitly not changing it to use diffusion map method as this will have a positive effect on the diffusion map method for this specific metric.
+print("Compute NLSS residual after spectral embedding", flush=True)
+n_comps = min(1000, min(input_solution.shape) - 2)
+umap_graph = umap.UMAP(transform_mode="graph").fit_transform(high_dim)
+spectral_emb = umap.spectral.spectral_layout(
+    high_dim, umap_graph, n_comps, random_state=np.random.default_rng()
+)
+dist_corr_spectral = _distance_correlation(spectral_emb, X_emb).correlation
+
+print("Create output AnnData object", flush=True)
+output = ad.AnnData(
+    uns={
+        "dataset_id": input_solution.uns["dataset_id"],
+        "normalization_id": input_solution.uns["normalization_id"],
+        "method_id": input_embedding.uns["method_id"],
+        "metric_ids": [ "distance_correlation", "distance_correlation_spectral" ],
+        "metric_values": [ dist_corr, dist_corr_spectral ]
+    }
+)
+
+print("Write data to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_EMBEDDING" ]; then
+  VIASH_PAR_INPUT_EMBEDDING=$(ViashStripAutomount "$VIASH_PAR_INPUT_EMBEDDING")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/metrics/trustworthiness/.config.vsh.yaml b/target/docker/dimensionality_reduction/metrics/trustworthiness/.config.vsh.yaml
new file mode 100644
index 0000000000..323f4a7091
--- /dev/null
+++ b/target/docker/dimensionality_reduction/metrics/trustworthiness/.config.vsh.yaml
@@ -0,0 +1,244 @@
+functionality:
+  name: "trustworthiness"
+  namespace: "dimensionality_reduction/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_embedding"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Test data"
+      summary: "The data for evaluating a dimensionality reduction."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "trustworthiness"
+      label: "Trustworthiness at k=15"
+      summary: "A measurement of similarity between the rank of each point's nearest\
+        \ neighbors in the high-dimensional data and the reduced data."
+      description: "A measurement of similarity between the rank of each point's nearest\
+        \ neighbors in the high-dimensional data and the reduced data."
+      reference: "venna2006local"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/trustworthiness.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+        note: "This metric is already included in the 'coranking' component and can\
+          \ be removed."
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A dimensionality reduction metric."
+      description: "A metric for evaluating dimensionality reductions.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    - "numpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/trustworthiness/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/metrics/trustworthiness"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/metrics/trustworthiness/trustworthiness"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/metrics/trustworthiness/trustworthiness b/target/docker/dimensionality_reduction/metrics/trustworthiness/trustworthiness
new file mode 100755
index 0000000000..daa7d3d370
--- /dev/null
+++ b/target/docker/dimensionality_reduction/metrics/trustworthiness/trustworthiness
@@ -0,0 +1,983 @@
+#!/usr/bin/env bash
+
+# trustworthiness 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="trustworthiness"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "trustworthiness 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_embedding"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/score.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scikit-learn" "numpy"
+
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction/metrics trustworthiness"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:38Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-trustworthiness-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "trustworthiness 2.0.0"
+            exit
+            ;;
+        --input_embedding)
+            [ -n "$VIASH_PAR_INPUT_EMBEDDING" ] && ViashError Bad arguments for option \'--input_embedding\': \'$VIASH_PAR_INPUT_EMBEDDING\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_EMBEDDING="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_embedding. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_embedding=*)
+            [ -n "$VIASH_PAR_INPUT_EMBEDDING" ] && ViashError Bad arguments for option \'--input_embedding=*\': \'$VIASH_PAR_INPUT_EMBEDDING\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_EMBEDDING=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/trustworthiness:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/trustworthiness:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/trustworthiness:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/trustworthiness:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_EMBEDDING+x} ]; then
+  ViashError '--input_embedding' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_EMBEDDING" ] && [ ! -e "$VIASH_PAR_INPUT_EMBEDDING" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_EMBEDDING' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_EMBEDDING" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_EMBEDDING")" )
+  VIASH_PAR_INPUT_EMBEDDING=$(ViashAutodetectMount "$VIASH_PAR_INPUT_EMBEDDING")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/trustworthiness:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/trustworthiness:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/metrics/trustworthiness:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-trustworthiness-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+from sklearn import manifold
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_embedding': $( if [ ! -z ${VIASH_PAR_INPUT_EMBEDDING+x} ]; then echo "r'${VIASH_PAR_INPUT_EMBEDDING//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+input_solution = ad.read_h5ad(par["input_solution"])
+input_embedding = ad.read_h5ad(par["input_embedding"])
+
+high_dim = input_solution.layers["normalized"]
+X_emb = input_embedding.obsm["X_emb"]
+
+print("Reduce dimensionality of raw data", flush=True)
+trustworthiness = manifold.trustworthiness(
+    high_dim, X_emb, n_neighbors=15, metric="euclidean"
+)
+
+print("Create output AnnData object", flush=True)
+output = ad.AnnData(
+    uns={
+        "dataset_id": input_solution.uns["dataset_id"],
+        "normalization_id": input_solution.uns["normalization_id"],
+        "method_id": input_embedding.uns["method_id"],
+        "metric_ids": [ "trustworthiness" ],
+        "metric_values": [ trustworthiness ]
+    }
+)
+
+print("Write data to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_EMBEDDING" ]; then
+  VIASH_PAR_INPUT_EMBEDDING=$(ViashStripAutomount "$VIASH_PAR_INPUT_EMBEDDING")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/process_dataset/.config.vsh.yaml b/target/docker/dimensionality_reduction/process_dataset/.config.vsh.yaml
new file mode 100644
index 0000000000..0068c57cef
--- /dev/null
+++ b/target/docker/dimensionality_reduction/process_dataset/.config.vsh.yaml
@@ -0,0 +1,444 @@
+functionality:
+  name: "process_dataset"
+  namespace: "dimensionality_reduction"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Common dataset"
+      summary: "A dataset processed by the common dataset processing pipeline."
+      description: "This dataset contains both raw counts and normalized data matrices,\n\
+        as well as a PCA embedding, HVG selection and a kNN graph."
+      slots:
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "double"
+          name: "pca_variance"
+          description: "The PCA variance objects."
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        varm:
+        - type: "double"
+          name: "pca_loadings"
+          description: "The PCA loadings matrix."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+    example:
+    - "resources_test/common/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_dataset"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_solution"
+    info:
+      label: "Test data"
+      summary: "The data for evaluating a dimensionality reduction."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/subset_anndata.py"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas/"
+    dest: "resources_test/common/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "process_dataset"
+    type_info:
+      label: "Data processor"
+      summary: "A dimensionality reduction dataset processor."
+      description: "A component for processing a Common Dataset into a task-specific\
+        \ dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/process_dataset/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/process_dataset"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/dimensionality_reduction/process_dataset/process_dataset"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/dimensionality_reduction/process_dataset/process_dataset b/target/docker/dimensionality_reduction/process_dataset/process_dataset
new file mode 100755
index 0000000000..cc706ca9ee
--- /dev/null
+++ b/target/docker/dimensionality_reduction/process_dataset/process_dataset
@@ -0,0 +1,978 @@
+#!/usr/bin/env bash
+
+# process_dataset 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="process_dataset"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "process_dataset 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/common/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output_dataset"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output_solution"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component dimensionality_reduction process_dataset"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:36Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-process_dataset-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "process_dataset 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_dataset)
+            [ -n "$VIASH_PAR_OUTPUT_DATASET" ] && ViashError Bad arguments for option \'--output_dataset\': \'$VIASH_PAR_OUTPUT_DATASET\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_DATASET="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_dataset. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_dataset=*)
+            [ -n "$VIASH_PAR_OUTPUT_DATASET" ] && ViashError Bad arguments for option \'--output_dataset=*\': \'$VIASH_PAR_OUTPUT_DATASET\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_DATASET=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_solution)
+            [ -n "$VIASH_PAR_OUTPUT_SOLUTION" ] && ViashError Bad arguments for option \'--output_solution\': \'$VIASH_PAR_OUTPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_solution=*)
+            [ -n "$VIASH_PAR_OUTPUT_SOLUTION" ] && ViashError Bad arguments for option \'--output_solution=*\': \'$VIASH_PAR_OUTPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/process_dataset:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/process_dataset:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/process_dataset:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/process_dataset:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_DATASET+x} ]; then
+  ViashError '--output_dataset' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_SOLUTION+x} ]; then
+  ViashError '--output_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT_DATASET" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_DATASET")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_DATASET")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_SOLUTION")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_SOLUTION")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_DATASET" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_DATASET")" )
+  VIASH_PAR_OUTPUT_DATASET=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_DATASET")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_DATASET" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_SOLUTION")" )
+  VIASH_PAR_OUTPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_SOLUTION")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_SOLUTION" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/process_dataset:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/process_dataset:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/dimensionality_reduction/process_dataset:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-process_dataset-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_dataset': $( if [ ! -z ${VIASH_PAR_OUTPUT_DATASET+x} ]; then echo "r'${VIASH_PAR_OUTPUT_DATASET//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_solution': $( if [ ! -z ${VIASH_PAR_OUTPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_OUTPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# import helper functions
+sys.path.append(meta['resources_dir'])
+from subset_anndata import read_config_slots_info, subset_anndata
+
+print(">> Load Data", flush=True)
+adata = ad.read_h5ad(par["input"])
+
+print(">> Figuring out which data needs to be copied to which output file", flush=True)
+slot_info = read_config_slots_info(meta["config"])
+
+print(">> Creating train data", flush=True)
+output_dataset = subset_anndata(adata, slot_info["output_dataset"])
+
+print(">> Creating test data", flush=True)
+output_solution = subset_anndata(adata, slot_info["output_solution"])
+
+print(">> Writing", flush=True)
+output_dataset.write_h5ad(par["output_dataset"])
+output_solution.write_h5ad(par["output_solution"])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_DATASET" ]; then
+  VIASH_PAR_OUTPUT_DATASET=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_DATASET")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ]; then
+  VIASH_PAR_OUTPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT_DATASET" ] && [ ! -e "$VIASH_PAR_OUTPUT_DATASET" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_DATASET' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_OUTPUT_SOLUTION" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/dimensionality_reduction/process_dataset/subset_anndata.py b/target/docker/dimensionality_reduction/process_dataset/subset_anndata.py
new file mode 100644
index 0000000000..80bd160872
--- /dev/null
+++ b/target/docker/dimensionality_reduction/process_dataset/subset_anndata.py
@@ -0,0 +1,83 @@
+"""Helper functions related to subsetting AnnData objects based on the file format 
+specifications in the .config.vsh.yaml and slot mapping overrides."""
+
+def read_config_slots_info(config_file, slot_mapping = {}):
+    """Read the .config.vsh.yaml to find out which output slots need to be copied to which output file.
+    
+    Arguments:
+    config_file -- Path to the .config.vsh.yaml file (required).
+    slot_mapping -- Which slots to retain. Must be a dictionary whose keys are the names 
+      of the AnnData structs, and values is another dictionary with destination value
+      names as keys and source value names as values.
+      Example of slot_mapping: 
+        ```
+        slot_mapping = {
+          "layers": {
+            "counts": par["layer_counts"],
+          },
+          "obs": {
+            "cell_type": par["obs_cell_type"],
+            "batch": par["obs_batch"],
+          }
+        }
+        ```
+    """
+    import yaml
+    import re
+
+    # read output spec from yaml
+    with open(config_file, "r") as object_name:
+        config = yaml.safe_load(object_name)
+
+    output_struct_slots = {}
+
+    # fetch info on which slots should be copied to which file
+    for arg in config["functionality"]["arguments"]:
+        # argument is an output file with a slot specification
+        if arg["direction"] == "output" and arg.get("info", {}).get("slots"):
+            object_name = re.sub("--", "", arg["name"])
+            
+            struct_slots = arg['info']['slots']
+            out = {}
+            for (struct, slots) in struct_slots.items():
+                out_struct = {}
+                for slot in slots:
+                    # if slot_mapping[struct][slot['name']] exists, use that as the source slot name
+                    # otherwise use slot['name']
+                    source_slot = slot_mapping.get(struct, {}).get(slot["name"], slot["name"])
+                    out_struct[slot["name"]] = source_slot
+                out[struct] = out_struct
+
+            output_struct_slots[object_name] = out
+
+    return output_struct_slots
+
+# create new anndata objects according to api spec
+def subset_anndata(adata, slot_info):
+    """Create new anndata object according to slot info specifications.
+    
+    Arguments:
+    adata -- An AnnData object to subset (required)
+    slot_info -- Which slots to retain, typically one of the items in the output of read_config_slots_info.
+      Must be a dictionary whose keys are the names of the AnnData structs, and values is another 
+      dictionary with destination value names as keys and source value names as values. 
+      """
+    import pandas as pd
+    import anndata as ad
+
+    structs = ["layers", "obs", "var", "uns", "obsp", "obsm", "varp", "varm"]
+    kwargs = {}
+
+    for struct in structs:
+        slot_mapping = slot_info.get(struct, {})
+        data = {dest : getattr(adata, struct)[src] for (dest, src) in slot_mapping.items()}
+        if len(data) > 0:
+            if struct in ['obs', 'var']:
+                data = pd.concat(data, axis=1)
+            kwargs[struct] = data
+        elif struct in ['obs', 'var']:
+            # if no columns need to be copied, we still need an 'obs' and a 'var' 
+            # to help determine the shape of the adata
+            kwargs[struct] = getattr(adata, struct).iloc[:,[]]
+
+    return ad.AnnData(**kwargs)
\ No newline at end of file
diff --git a/target/docker/label_projection/control_methods/majority_vote/.config.vsh.yaml b/target/docker/label_projection/control_methods/majority_vote/.config.vsh.yaml
new file mode 100644
index 0000000000..a35de0e042
--- /dev/null
+++ b/target/docker/label_projection/control_methods/majority_vote/.config.vsh.yaml
@@ -0,0 +1,317 @@
+functionality:
+  name: "majority_vote"
+  namespace: "label_projection/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "The solution for the test data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    label: "Majority Vote"
+    summary: "A control-type method that predicts all cells to belong to the most\
+      \ abundant cell type in the dataset"
+    description: "A control-type method that predicts all cells to belong to the most\
+      \ abundant cell type in the dataset"
+    v1:
+      path: "openproblems/tasks/label_projection/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    variants:
+      majority_vote: null
+    preferred_normalization: "counts"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "This folder contains control components for the task. \nThese\
+        \ components have the same interface as the regular methods\nbut also receive\
+        \ the solution object as input. It serves as a\nstarting point to test the\
+        \ relative accuracy of new methods in\nthe task, and also as a quality control\
+        \ for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/majority_vote/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/control_methods/majority_vote"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/control_methods/majority_vote/majority_vote"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/label_projection/control_methods/majority_vote/majority_vote b/target/docker/label_projection/control_methods/majority_vote/majority_vote
new file mode 100755
index 0000000000..219617be6a
--- /dev/null
+++ b/target/docker/label_projection/control_methods/majority_vote/majority_vote
@@ -0,0 +1,998 @@
+#!/usr/bin/env bash
+
+# majority_vote 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="majority_vote"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "majority_vote 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/train.h5ad"
+  echo ""
+  echo "    --input_test"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/test.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/prediction.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component label_projection/control_methods majority_vote"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:35Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-majority_vote-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "majority_vote 2.0.0"
+            exit
+            ;;
+        --input_train)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train=*\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test=*)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test=*\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/majority_vote:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/majority_vote:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/majority_vote:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/majority_vote:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then
+  ViashError '--input_train' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST+x} ]; then
+  ViashError '--input_test' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ] && [ ! -e "$VIASH_PAR_INPUT_TEST" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN")" )
+  VIASH_PAR_INPUT_TRAIN=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST")" )
+  VIASH_PAR_INPUT_TEST=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/majority_vote:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/majority_vote:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/majority_vote:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-majority_vote-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Compute majority vote", flush=True)
+majority = input_train.obs.label.value_counts().index[0]
+
+print("Create prediction object", flush=True)
+input_test.obs["label_pred"] = majority
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_PAR_INPUT_TRAIN=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_PAR_INPUT_TEST=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/label_projection/control_methods/random_labels/.config.vsh.yaml b/target/docker/label_projection/control_methods/random_labels/.config.vsh.yaml
new file mode 100644
index 0000000000..1200e81164
--- /dev/null
+++ b/target/docker/label_projection/control_methods/random_labels/.config.vsh.yaml
@@ -0,0 +1,322 @@
+functionality:
+  name: "random_labels"
+  namespace: "label_projection/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "The solution for the test data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    label: "Random Labels"
+    summary: "a negative control, where the labels are randomly predicted."
+    description: "A negative control, where the labels are randomly predicted without\
+      \ training the data."
+    v1:
+      path: "openproblems/tasks/label_projection/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "counts"
+    variants:
+      random_labels: null
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "This folder contains control components for the task. \nThese\
+        \ components have the same interface as the regular methods\nbut also receive\
+        \ the solution object as input. It serves as a\nstarting point to test the\
+        \ relative accuracy of new methods in\nthe task, and also as a quality control\
+        \ for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scanpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/random_labels/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/control_methods/random_labels"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/control_methods/random_labels/random_labels"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/label_projection/control_methods/random_labels/random_labels b/target/docker/label_projection/control_methods/random_labels/random_labels
new file mode 100755
index 0000000000..0c76d66cb1
--- /dev/null
+++ b/target/docker/label_projection/control_methods/random_labels/random_labels
@@ -0,0 +1,1007 @@
+#!/usr/bin/env bash
+
+# random_labels 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="random_labels"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "random_labels 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/train.h5ad"
+  echo ""
+  echo "    --input_test"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/test.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/prediction.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scanpy"
+
+LABEL org.opencontainers.image.description="Companion container for running component label_projection/control_methods random_labels"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:35Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-random_labels-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "random_labels 2.0.0"
+            exit
+            ;;
+        --input_train)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train=*\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test=*)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test=*\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/random_labels:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/random_labels:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/random_labels:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/random_labels:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then
+  ViashError '--input_train' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST+x} ]; then
+  ViashError '--input_test' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ] && [ ! -e "$VIASH_PAR_INPUT_TEST" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN")" )
+  VIASH_PAR_INPUT_TRAIN=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST")" )
+  VIASH_PAR_INPUT_TEST=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/random_labels:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/random_labels:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/random_labels:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-random_labels-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Compute label distribution", flush=True)
+label_distribution = input_train.obs.label.value_counts()
+label_distribution = label_distribution / label_distribution.sum()
+
+print("Create prediction object", flush=True)
+input_test.obs["label_pred"] = np.random.choice(
+    label_distribution.index,
+    size=input_test.n_obs,
+    replace=True,
+    p=label_distribution
+)
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_PAR_INPUT_TRAIN=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_PAR_INPUT_TEST=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/label_projection/control_methods/true_labels/.config.vsh.yaml b/target/docker/label_projection/control_methods/true_labels/.config.vsh.yaml
new file mode 100644
index 0000000000..a0074fe9a9
--- /dev/null
+++ b/target/docker/label_projection/control_methods/true_labels/.config.vsh.yaml
@@ -0,0 +1,317 @@
+functionality:
+  name: "true_labels"
+  namespace: "label_projection/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "The solution for the test data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    label: "True labels"
+    summary: "a positive control, solution labels are copied 1 to 1 to the predicted\
+      \ data."
+    description: "A positive control, where the solution labels are copied 1 to 1\
+      \ to the predicted data."
+    v1:
+      path: "openproblems/tasks/label_projection/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "counts"
+    variants:
+      true_labels: null
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "This folder contains control components for the task. \nThese\
+        \ components have the same interface as the regular methods\nbut also receive\
+        \ the solution object as input. It serves as a\nstarting point to test the\
+        \ relative accuracy of new methods in\nthe task, and also as a quality control\
+        \ for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/true_labels/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/control_methods/true_labels"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/control_methods/true_labels/true_labels"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/label_projection/control_methods/true_labels/true_labels b/target/docker/label_projection/control_methods/true_labels/true_labels
new file mode 100755
index 0000000000..4176b39e85
--- /dev/null
+++ b/target/docker/label_projection/control_methods/true_labels/true_labels
@@ -0,0 +1,996 @@
+#!/usr/bin/env bash
+
+# true_labels 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="true_labels"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "true_labels 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/train.h5ad"
+  echo ""
+  echo "    --input_test"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/test.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/prediction.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component label_projection/control_methods true_labels"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:35Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-true_labels-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "true_labels 2.0.0"
+            exit
+            ;;
+        --input_train)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train=*\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test=*)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test=*\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/true_labels:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/true_labels:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/true_labels:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/true_labels:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then
+  ViashError '--input_train' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST+x} ]; then
+  ViashError '--input_test' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ] && [ ! -e "$VIASH_PAR_INPUT_TEST" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN")" )
+  VIASH_PAR_INPUT_TRAIN=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST")" )
+  VIASH_PAR_INPUT_TEST=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/true_labels:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/true_labels:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/control_methods/true_labels:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-true_labels-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+# input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print("Create prediction object", flush=True)
+input_test.obs["label_pred"] = input_solution.obs["label"]
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_PAR_INPUT_TRAIN=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_PAR_INPUT_TEST=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/label_projection/methods/knn/.config.vsh.yaml b/target/docker/label_projection/methods/knn/.config.vsh.yaml
new file mode 100644
index 0000000000..313bd7398c
--- /dev/null
+++ b/target/docker/label_projection/methods/knn/.config.vsh.yaml
@@ -0,0 +1,253 @@
+functionality:
+  name: "knn"
+  namespace: "label_projection/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "KNN"
+    summary: "Assumes cells with similar gene expression belong to the same cell type,\
+      \ and assigns an unlabelled cell the most common cell type among its k nearest\
+      \ neighbors in PCA space."
+    description: "Using the \"k-nearest neighbours\" approach, which is a\npopular\
+      \ machine learning algorithm for classification and regression tasks.\nThe assumption\
+      \ underlying KNN in this context is that cells with similar gene\nexpression\
+      \ profiles tend to belong to the same cell type. For each unlabelled\ncell,\
+      \ this method computes the $k$ labelled cells (in this case, 5) with the\nsmallest\
+      \ distance in PCA space, and assigns that cell the most common cell\ntype among\
+      \ its $k$ nearest neighbors.\n"
+    reference: "cover1967nearest"
+    repository_url: "https://github.com/scikit-learn/scikit-learn"
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html"
+    v1:
+      path: "openproblems/tasks/label_projection/methods/knn_classifier.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      knn_classifier_log_cp10k: null
+      knn_classifier_scran:
+        preferred_normalization: "log_scran_pooling"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A label projection method."
+      description: "A label projection method to predict the labels of a new \"test\"\
+        \ndataset based on an annotated \"training\" dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    - "jsonschema"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/knn/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/methods/knn"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/methods/knn/knn"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/label_projection/methods/knn/knn b/target/docker/label_projection/methods/knn/knn
new file mode 100755
index 0000000000..076fa21ef0
--- /dev/null
+++ b/target/docker/label_projection/methods/knn/knn
@@ -0,0 +1,971 @@
+#!/usr/bin/env bash
+
+# knn 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="knn"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "knn 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/train.h5ad"
+  echo ""
+  echo "    --input_test"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/test.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/prediction.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scikit-learn" "jsonschema"
+
+LABEL org.opencontainers.image.description="Companion container for running component label_projection/methods knn"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:34Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-knn-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "knn 2.0.0"
+            exit
+            ;;
+        --input_train)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train=*\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test=*)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test=*\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/knn:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/knn:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/knn:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/knn:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then
+  ViashError '--input_train' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST+x} ]; then
+  ViashError '--input_test' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ] && [ ! -e "$VIASH_PAR_INPUT_TEST" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN")" )
+  VIASH_PAR_INPUT_TRAIN=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST")" )
+  VIASH_PAR_INPUT_TEST=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/label_projection/methods/knn:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/methods/knn:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/methods/knn:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-knn-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import sklearn.neighbors
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Fit to train data", flush=True)
+classifier = sklearn.neighbors.KNeighborsClassifier()
+classifier.fit(input_train.obsm["X_pca"], input_train.obs["label"].astype(str))
+
+print("Predict on test data", flush=True)
+input_test.obs["label_pred"] = classifier.predict(input_test.obsm["X_pca"])
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_PAR_INPUT_TRAIN=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_PAR_INPUT_TEST=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/label_projection/methods/logistic_regression/.config.vsh.yaml b/target/docker/label_projection/methods/logistic_regression/.config.vsh.yaml
new file mode 100644
index 0000000000..af4714ac26
--- /dev/null
+++ b/target/docker/label_projection/methods/logistic_regression/.config.vsh.yaml
@@ -0,0 +1,249 @@
+functionality:
+  name: "logistic_regression"
+  namespace: "label_projection/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Logistic Regression"
+    summary: "Logistic Regression with 100-dimensional PCA coordinates estimates parameters\
+      \ for multivariate classification by minimizing cross entropy loss over cell\
+      \ type classes."
+    description: "Logistic Regression estimates parameters of a logistic function\
+      \ for\nmultivariate classification tasks. Here, we use 100-dimensional whitened\
+      \ PCA\ncoordinates as independent variables, and the model minimises the cross\n\
+      entropy loss over all cell type classes.\n"
+    reference: "hosmer2013applied"
+    repository_url: "https://github.com/scikit-learn/scikit-learn"
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html"
+    v1:
+      path: "openproblems/tasks/label_projection/methods/logistic_regression.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      logistic_regression_log_cp10k: null
+      logistic_regression_scran:
+        preferred_normalization: "log_scran_pooling"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A label projection method."
+      description: "A label projection method to predict the labels of a new \"test\"\
+        \ndataset based on an annotated \"training\" dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/logistic_regression/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/methods/logistic_regression"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/methods/logistic_regression/logistic_regression"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/label_projection/methods/logistic_regression/logistic_regression b/target/docker/label_projection/methods/logistic_regression/logistic_regression
new file mode 100755
index 0000000000..06e5c226c6
--- /dev/null
+++ b/target/docker/label_projection/methods/logistic_regression/logistic_regression
@@ -0,0 +1,971 @@
+#!/usr/bin/env bash
+
+# logistic_regression 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="logistic_regression"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "logistic_regression 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/train.h5ad"
+  echo ""
+  echo "    --input_test"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/test.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/prediction.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scikit-learn"
+
+LABEL org.opencontainers.image.description="Companion container for running component label_projection/methods logistic_regression"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:33Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-logistic_regression-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "logistic_regression 2.0.0"
+            exit
+            ;;
+        --input_train)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train=*\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test=*)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test=*\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/logistic_regression:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/logistic_regression:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/logistic_regression:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/logistic_regression:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then
+  ViashError '--input_train' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST+x} ]; then
+  ViashError '--input_test' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ] && [ ! -e "$VIASH_PAR_INPUT_TEST" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN")" )
+  VIASH_PAR_INPUT_TRAIN=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST")" )
+  VIASH_PAR_INPUT_TEST=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/label_projection/methods/logistic_regression:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/methods/logistic_regression:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/methods/logistic_regression:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-logistic_regression-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import sklearn.linear_model
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Fit to train data", flush=True)
+classifier = sklearn.linear_model.LogisticRegression()
+classifier.fit(input_train.obsm["X_pca"], input_train.obs["label"].astype(str))
+
+print("Predict on test data", flush=True)
+input_test.obs["label_pred"] = classifier.predict(input_test.obsm["X_pca"])
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_PAR_INPUT_TRAIN=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_PAR_INPUT_TEST=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/label_projection/methods/mlp/.config.vsh.yaml b/target/docker/label_projection/methods/mlp/.config.vsh.yaml
new file mode 100644
index 0000000000..ba891d7e92
--- /dev/null
+++ b/target/docker/label_projection/methods/mlp/.config.vsh.yaml
@@ -0,0 +1,275 @@
+functionality:
+  name: "mlp"
+  namespace: "label_projection/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--hidden_layer_sizes"
+    description: "The ith element represents the number of neurons in the ith hidden\
+      \ layer."
+    info: null
+    default:
+    - 100
+    - 100
+    required: false
+    direction: "input"
+    multiple: true
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_iter"
+    description: "Maximum number of iterations"
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Multilayer perceptron"
+    summary: "A neural network with 100-dimensional PCA input, two hidden layers,\
+      \ and gradient descent weight updates to minimize cross entropy loss."
+    description: "Multi-Layer Perceptron is a type of artificial neural network that\n\
+      consists of multiple layers of interconnected neurons. Each neuron computes\
+      \ a\nweighted sum of all neurons in the previous layer and transforms it with\n\
+      nonlinear activation function. The output layer provides the final\nprediction,\
+      \ and network weights are updated by gradient descent to minimize\nthe cross\
+      \ entropy loss. Here, the input data is 100-dimensional whitened PCA\ncoordinates\
+      \ for each cell, and we use two hidden layers of 100 neurons each.\n"
+    reference: "hinton1989connectionist"
+    repository_url: "https://github.com/scikit-learn/scikit-learn"
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html"
+    v1:
+      path: "openproblems/tasks/label_projection/methods/mlp.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      mlp_log_cp10k: null
+      mlp_scran:
+        preferred_normalization: "log_scran_pooling"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A label projection method."
+      description: "A label projection method to predict the labels of a new \"test\"\
+        \ndataset based on an annotated \"training\" dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/mlp/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/methods/mlp"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/methods/mlp/mlp"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/label_projection/methods/mlp/mlp b/target/docker/label_projection/methods/mlp/mlp
new file mode 100755
index 0000000000..24e74e9249
--- /dev/null
+++ b/target/docker/label_projection/methods/mlp/mlp
@@ -0,0 +1,1042 @@
+#!/usr/bin/env bash
+
+# mlp 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="mlp"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "mlp 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/train.h5ad"
+  echo ""
+  echo "    --input_test"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/test.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/prediction.h5ad"
+  echo ""
+  echo "    --hidden_layer_sizes"
+  echo "        type: integer, multiple values allowed"
+  echo "        default: 100:100"
+  echo "        The ith element represents the number of neurons in the ith hidden"
+  echo "        layer."
+  echo ""
+  echo "    --max_iter"
+  echo "        type: integer"
+  echo "        default: 1000"
+  echo "        Maximum number of iterations"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scikit-learn"
+
+LABEL org.opencontainers.image.description="Companion container for running component label_projection/methods mlp"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:33Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-mlp-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "mlp 2.0.0"
+            exit
+            ;;
+        --input_train)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train=*\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test=*)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test=*\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --hidden_layer_sizes)
+            if [ -z "$VIASH_PAR_HIDDEN_LAYER_SIZES" ]; then
+              VIASH_PAR_HIDDEN_LAYER_SIZES="$2"
+            else
+              VIASH_PAR_HIDDEN_LAYER_SIZES="$VIASH_PAR_HIDDEN_LAYER_SIZES:""$2"
+            fi
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --hidden_layer_sizes. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --hidden_layer_sizes=*)
+            if [ -z "$VIASH_PAR_HIDDEN_LAYER_SIZES" ]; then
+              VIASH_PAR_HIDDEN_LAYER_SIZES=$(ViashRemoveFlags "$1")
+            else
+              VIASH_PAR_HIDDEN_LAYER_SIZES="$VIASH_PAR_HIDDEN_LAYER_SIZES:"$(ViashRemoveFlags "$1")
+            fi
+            shift 1
+            ;;
+        --max_iter)
+            [ -n "$VIASH_PAR_MAX_ITER" ] && ViashError Bad arguments for option \'--max_iter\': \'$VIASH_PAR_MAX_ITER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_ITER="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_iter. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_iter=*)
+            [ -n "$VIASH_PAR_MAX_ITER" ] && ViashError Bad arguments for option \'--max_iter=*\': \'$VIASH_PAR_MAX_ITER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_ITER=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/mlp:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/mlp:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/mlp:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/mlp:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then
+  ViashError '--input_train' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST+x} ]; then
+  ViashError '--input_test' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_HIDDEN_LAYER_SIZES+x} ]; then
+  VIASH_PAR_HIDDEN_LAYER_SIZES="100:100"
+fi
+if [ -z ${VIASH_PAR_MAX_ITER+x} ]; then
+  VIASH_PAR_MAX_ITER="1000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ] && [ ! -e "$VIASH_PAR_INPUT_TEST" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [ -n "$VIASH_PAR_HIDDEN_LAYER_SIZES" ]; then
+  IFS=':'
+  set -f
+  for val in $VIASH_PAR_HIDDEN_LAYER_SIZES; do
+    if ! [[ "${val}" =~ ^[-+]?[0-9]+$ ]]; then
+      ViashError '--hidden_layer_sizes' has to be an integer. Use "--help" to get more information on the parameters.
+      exit 1
+    fi
+  done
+  set +f
+  unset IFS
+fi
+
+if [[ -n "$VIASH_PAR_MAX_ITER" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_ITER" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_iter' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN")" )
+  VIASH_PAR_INPUT_TRAIN=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST")" )
+  VIASH_PAR_INPUT_TEST=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/label_projection/methods/mlp:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/methods/mlp:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/methods/mlp:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-mlp-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+from sklearn.neural_network import MLPClassifier
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'hidden_layer_sizes': $( if [ ! -z ${VIASH_PAR_HIDDEN_LAYER_SIZES+x} ]; then echo "list(map(int, r'${VIASH_PAR_HIDDEN_LAYER_SIZES//\'/\'\"\'\"r\'}'.split(':')))"; else echo None; fi ),
+  'max_iter': $( if [ ! -z ${VIASH_PAR_MAX_ITER+x} ]; then echo "int(r'${VIASH_PAR_MAX_ITER//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Fit to train data", flush=True)
+classifier = MLPClassifier(
+    max_iter=par["max_iter"], 
+    hidden_layer_sizes=tuple(par["hidden_layer_sizes"])
+)
+classifier.fit(input_train.obsm["X_pca"], input_train.obs["label"].astype(str))
+
+print("Predict on test data", flush=True)
+input_test.obs["label_pred"] = classifier.predict(input_test.obsm["X_pca"])
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_PAR_INPUT_TRAIN=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_PAR_INPUT_TEST=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/label_projection/methods/naive_bayes/.config.vsh.yaml b/target/docker/label_projection/methods/naive_bayes/.config.vsh.yaml
new file mode 100644
index 0000000000..65436d5631
--- /dev/null
+++ b/target/docker/label_projection/methods/naive_bayes/.config.vsh.yaml
@@ -0,0 +1,249 @@
+functionality:
+  name: "naive_bayes"
+  namespace: "label_projection/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Naive Bayesian Classifier"
+    summary: "Naive Bayes classification using feature probabilities to project cell\
+      \ type labels from a reference dataset."
+    description: "Naive Bayes classification leverages probabilistic models based\
+      \ on Bayes' theorem\nto classify cells into different types. In the context\
+      \ of single-cell datasets, this method\nutilizes the probabilities of features\
+      \ to project cell type labels from a reference dataset\nto new datasets. The\
+      \ algorithm assumes independence between features, making it computationally\n\
+      efficient and well-suited for high-dimensional data. It is particularly useful\
+      \ for annotating\ncells in atlas-scale datasets, ensuring consistency and alignment\
+      \ with existing reference annotations.\n"
+    reference: "hosmer2013applied"
+    repository_url: "https://github.com/scikit-learn/scikit-learn"
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html"
+    preferred_normalization: "log_cp10k"
+    variants:
+      naive_bayes_log_cp10k: null
+      naive_bayes_scran:
+        preferred_normalization: "log_scran_pooling"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A label projection method."
+      description: "A label projection method to predict the labels of a new \"test\"\
+        \ndataset based on an annotated \"training\" dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/naive_bayes/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/methods/naive_bayes"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/methods/naive_bayes/naive_bayes"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/label_projection/methods/naive_bayes/naive_bayes b/target/docker/label_projection/methods/naive_bayes/naive_bayes
new file mode 100755
index 0000000000..104cc69846
--- /dev/null
+++ b/target/docker/label_projection/methods/naive_bayes/naive_bayes
@@ -0,0 +1,971 @@
+#!/usr/bin/env bash
+
+# naive_bayes 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="naive_bayes"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "naive_bayes 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/train.h5ad"
+  echo ""
+  echo "    --input_test"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/test.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/prediction.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scikit-learn"
+
+LABEL org.opencontainers.image.description="Companion container for running component label_projection/methods naive_bayes"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:35Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-naive_bayes-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "naive_bayes 2.0.0"
+            exit
+            ;;
+        --input_train)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train=*\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test=*)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test=*\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/naive_bayes:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/naive_bayes:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/naive_bayes:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/naive_bayes:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then
+  ViashError '--input_train' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST+x} ]; then
+  ViashError '--input_test' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ] && [ ! -e "$VIASH_PAR_INPUT_TEST" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN")" )
+  VIASH_PAR_INPUT_TRAIN=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST")" )
+  VIASH_PAR_INPUT_TEST=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/label_projection/methods/naive_bayes:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/methods/naive_bayes:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/methods/naive_bayes:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-naive_bayes-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import sklearn.naive_bayes
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Fit to train data", flush=True)
+classifier = sklearn.naive_bayes.GaussianNB()
+classifier.fit(input_train.obsm["X_pca"], input_train.obs["label"].astype(str))
+
+print("Predict on test data", flush=True)
+input_test.obs["label_pred"] = classifier.predict(input_test.obsm["X_pca"])
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_PAR_INPUT_TRAIN=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_PAR_INPUT_TEST=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/label_projection/methods/scanvi/.config.vsh.yaml b/target/docker/label_projection/methods/scanvi/.config.vsh.yaml
new file mode 100644
index 0000000000..d4b86c30db
--- /dev/null
+++ b/target/docker/label_projection/methods/scanvi/.config.vsh.yaml
@@ -0,0 +1,266 @@
+functionality:
+  name: "scanvi"
+  namespace: "label_projection/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--num_hvg"
+    description: "The number of HVG genes to subset to."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "scANVI"
+    summary: "scANVI predicts cell type labels for unlabelled test data by leveraging\
+      \ cell type labels, modelling uncertainty and using deep neural networks with\
+      \ stochastic optimization."
+    description: "single-cell ANnotation using Variational Inference is a\nsemi-supervised\
+      \ variant of the scVI(Lopez et al. 2018) algorithm. Like scVI,\nscANVI uses\
+      \ deep neural networks and stochastic optimization to model\nuncertainty caused\
+      \ by technical noise and bias in single - cell\ntranscriptomics measurements.\
+      \ However, scANVI also leverages cell type labels\nin the generative modelling.\
+      \ In this approach, scANVI is used to predict the\ncell type labels of the unlabelled\
+      \ test data.\n"
+    reference: "lotfollahi2020query"
+    repository_url: "https://github.com/scverse/scvi-tools"
+    documentation_url: "https://scarches.readthedocs.io/en/latest/scanvi_surgery_pipeline.html"
+    v1:
+      path: "openproblems/tasks/label_projection/methods/scvi_tools.py"
+      commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+    preferred_normalization: "counts"
+    variants:
+      scanvi_all_genes: null
+      scanvi_hvg:
+        num_hvg: 2000
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A label projection method."
+      description: "A label projection method to predict the labels of a new \"test\"\
+        \ndataset based on an annotated \"training\" dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_pytorch_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scarches"
+    - "scvi-tools>=1.1.0"
+    upgrade: true
+  - type: "docker"
+    run:
+    - "pip install -U \"jax[cuda12_pip]\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "highcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/scanvi/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/methods/scanvi"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/methods/scanvi/scanvi"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/label_projection/methods/scanvi/scanvi b/target/docker/label_projection/methods/scanvi/scanvi
new file mode 100755
index 0000000000..c3fbd7f568
--- /dev/null
+++ b/target/docker/label_projection/methods/scanvi/scanvi
@@ -0,0 +1,1044 @@
+#!/usr/bin/env bash
+
+# scanvi 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="scanvi"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "scanvi 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/train.h5ad"
+  echo ""
+  echo "    --input_test"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/test.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/prediction.h5ad"
+  echo ""
+  echo "    --num_hvg"
+  echo "        type: integer"
+  echo "        The number of HVG genes to subset to."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_pytorch_nvidia:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scarches" "scvi-tools>=1.1.0"
+
+RUN pip install -U "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
+
+LABEL org.opencontainers.image.description="Companion container for running component label_projection/methods scanvi"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:34Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-scanvi-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "scanvi 2.0.0"
+            exit
+            ;;
+        --input_train)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train=*\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test=*)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test=*\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --num_hvg)
+            [ -n "$VIASH_PAR_NUM_HVG" ] && ViashError Bad arguments for option \'--num_hvg\': \'$VIASH_PAR_NUM_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --num_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --num_hvg=*)
+            [ -n "$VIASH_PAR_NUM_HVG" ] && ViashError Bad arguments for option \'--num_hvg=*\': \'$VIASH_PAR_NUM_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/scanvi:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/scanvi:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/scanvi:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/scanvi:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then
+  ViashError '--input_train' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST+x} ]; then
+  ViashError '--input_test' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ] && [ ! -e "$VIASH_PAR_INPUT_TEST" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_NUM_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_NUM_HVG" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--num_hvg' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN")" )
+  VIASH_PAR_INPUT_TRAIN=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST")" )
+  VIASH_PAR_INPUT_TEST=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/label_projection/methods/scanvi:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/methods/scanvi:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/methods/scanvi:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-scanvi-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import scarches as sca
+import pandas as pd
+
+# followed procedure from here:
+# https://scarches.readthedocs.io/en/latest/scanvi_surgery_pipeline.html
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'num_hvg': $( if [ ! -z ${VIASH_PAR_NUM_HVG+x} ]; then echo "int(r'${VIASH_PAR_NUM_HVG//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+if par["num_hvg"]:
+    print("Subsetting to HVG", flush=True)
+    hvg_idx = input_train.var['hvg_score'].to_numpy().argsort()[:par["num_hvg"]]
+    input_train = input_train[:,hvg_idx]
+    input_test = input_test[:,hvg_idx]
+
+print("Concatenating train and test data", flush=True)
+input_train.obs['is_test'] = False
+input_test.obs['is_test'] = True
+input_test.obs['label'] = "Unknown"
+adata = ad.concat([input_train, input_test], merge = "same")
+del input_train
+
+print("Create SCANVI model and train it on fully labelled reference dataset", flush=True)
+sca.models.SCVI.setup_anndata(
+    adata, 
+    batch_key="batch", 
+    labels_key="label",
+    layer="counts"
+)
+
+vae = sca.models.SCVI(
+    adata,
+    n_layers=2,
+    encode_covariates=True,
+    deeply_inject_covariates=False,
+    use_layer_norm="both",
+    use_batch_norm="none",
+)
+
+print("Create the SCANVI model instance with ZINB loss", flush=True)
+scanvae = sca.models.SCANVI.from_scvi_model(vae, unlabeled_category = "Unknown")
+
+print("Train SCANVI model", flush=True)
+scanvae.train()
+
+print("Make predictions", flush=True)
+preds = scanvae.predict(adata)
+
+print("Store outputs", flush=True)
+output = ad.AnnData(
+    obs=pd.DataFrame(
+        {"label_pred": preds[adata.obs['is_test'].values]},
+        index=input_test.obs.index,
+    ),
+    var=input_test.var[[]],
+    uns={
+        "dataset_id": input_test.uns["dataset_id"],
+        "normalization_id": input_test.uns["normalization_id"],
+        "method_id": meta["functionality_name"],
+    },
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_PAR_INPUT_TRAIN=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_PAR_INPUT_TEST=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/label_projection/methods/scanvi_scarches/.config.vsh.yaml b/target/docker/label_projection/methods/scanvi_scarches/.config.vsh.yaml
new file mode 100644
index 0000000000..f156c0fe59
--- /dev/null
+++ b/target/docker/label_projection/methods/scanvi_scarches/.config.vsh.yaml
@@ -0,0 +1,306 @@
+functionality:
+  name: "scanvi_scarches"
+  namespace: "label_projection/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_latent"
+    description: "Number of units in the latent layer"
+    info: null
+    default:
+    - 30
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_layers"
+    description: "Number of hidden layers"
+    info: null
+    default:
+    - 2
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hidden"
+    description: "Number of units in the hidden layers"
+    info: null
+    default:
+    - 128
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--dropout_rate"
+    description: "Rate of dropout applied in training"
+    info: null
+    default:
+    - 0.2
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs"
+    description: "Maximum number of training epochs"
+    info: null
+    default:
+    - 2
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "scANVI+scArches"
+    summary: "Query to reference single-cell integration with transfer learning with\
+      \ scANVI and scArches"
+    description: "scArches+scANVI or \"Single-cell architecture surgery\" is a deep\
+      \ learning method for mapping new datasets onto a pre-existing reference model,\
+      \ using transfer learning and parameter optimization. It first uses scANVI to\
+      \ build a reference model from the training data, and then apply scArches to\
+      \ map the test data onto the reference model and make predictions."
+    reference: "lotfollahi2020query"
+    documentation_url: "https://docs.scvi-tools.org"
+    repository_url: "https://github.com/scverse/scvi-tools"
+    preferred_normalization: "counts"
+    v1:
+      path: "openproblems/tasks/label_projection/methods/scvi_tools.py"
+      commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+    variants:
+      scanvi_scarches: null
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A label projection method."
+      description: "A label projection method to predict the labels of a new \"test\"\
+        \ndataset based on an annotated \"training\" dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_pytorch_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scvi-tools>=1.1.0"
+    upgrade: true
+  - type: "docker"
+    run:
+    - "pip install -U \"jax[cuda12_pip]\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/scanvi_scarches/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/methods/scanvi_scarches"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/methods/scanvi_scarches/scanvi_scarches"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/label_projection/methods/scanvi_scarches/scanvi_scarches b/target/docker/label_projection/methods/scanvi_scarches/scanvi_scarches
new file mode 100755
index 0000000000..921f83cdb3
--- /dev/null
+++ b/target/docker/label_projection/methods/scanvi_scarches/scanvi_scarches
@@ -0,0 +1,1135 @@
+#!/usr/bin/env bash
+
+# scanvi_scarches 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="scanvi_scarches"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "scanvi_scarches 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/train.h5ad"
+  echo ""
+  echo "    --input_test"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/test.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/prediction.h5ad"
+  echo ""
+  echo "    --n_latent"
+  echo "        type: integer"
+  echo "        default: 30"
+  echo "        Number of units in the latent layer"
+  echo ""
+  echo "    --n_layers"
+  echo "        type: integer"
+  echo "        default: 2"
+  echo "        Number of hidden layers"
+  echo ""
+  echo "    --n_hidden"
+  echo "        type: integer"
+  echo "        default: 128"
+  echo "        Number of units in the hidden layers"
+  echo ""
+  echo "    --dropout_rate"
+  echo "        type: double"
+  echo "        default: 0.2"
+  echo "        Rate of dropout applied in training"
+  echo ""
+  echo "    --max_epochs"
+  echo "        type: integer"
+  echo "        default: 2"
+  echo "        Maximum number of training epochs"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_pytorch_nvidia:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scvi-tools>=1.1.0"
+
+RUN pip install -U "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
+
+LABEL org.opencontainers.image.description="Companion container for running component label_projection/methods scanvi_scarches"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:34Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-scanvi_scarches-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "scanvi_scarches 2.0.0"
+            exit
+            ;;
+        --input_train)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train=*\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test=*)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test=*\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_latent)
+            [ -n "$VIASH_PAR_N_LATENT" ] && ViashError Bad arguments for option \'--n_latent\': \'$VIASH_PAR_N_LATENT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_LATENT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_latent. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_latent=*)
+            [ -n "$VIASH_PAR_N_LATENT" ] && ViashError Bad arguments for option \'--n_latent=*\': \'$VIASH_PAR_N_LATENT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_LATENT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_layers)
+            [ -n "$VIASH_PAR_N_LAYERS" ] && ViashError Bad arguments for option \'--n_layers\': \'$VIASH_PAR_N_LAYERS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_LAYERS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_layers. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_layers=*)
+            [ -n "$VIASH_PAR_N_LAYERS" ] && ViashError Bad arguments for option \'--n_layers=*\': \'$VIASH_PAR_N_LAYERS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_LAYERS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hidden)
+            [ -n "$VIASH_PAR_N_HIDDEN" ] && ViashError Bad arguments for option \'--n_hidden\': \'$VIASH_PAR_N_HIDDEN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HIDDEN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hidden. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hidden=*)
+            [ -n "$VIASH_PAR_N_HIDDEN" ] && ViashError Bad arguments for option \'--n_hidden=*\': \'$VIASH_PAR_N_HIDDEN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HIDDEN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dropout_rate)
+            [ -n "$VIASH_PAR_DROPOUT_RATE" ] && ViashError Bad arguments for option \'--dropout_rate\': \'$VIASH_PAR_DROPOUT_RATE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DROPOUT_RATE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dropout_rate. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dropout_rate=*)
+            [ -n "$VIASH_PAR_DROPOUT_RATE" ] && ViashError Bad arguments for option \'--dropout_rate=*\': \'$VIASH_PAR_DROPOUT_RATE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DROPOUT_RATE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_epochs)
+            [ -n "$VIASH_PAR_MAX_EPOCHS" ] && ViashError Bad arguments for option \'--max_epochs\': \'$VIASH_PAR_MAX_EPOCHS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_epochs. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_epochs=*)
+            [ -n "$VIASH_PAR_MAX_EPOCHS" ] && ViashError Bad arguments for option \'--max_epochs=*\': \'$VIASH_PAR_MAX_EPOCHS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/scanvi_scarches:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/scanvi_scarches:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/scanvi_scarches:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/scanvi_scarches:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then
+  ViashError '--input_train' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST+x} ]; then
+  ViashError '--input_test' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_LATENT+x} ]; then
+  VIASH_PAR_N_LATENT="30"
+fi
+if [ -z ${VIASH_PAR_N_LAYERS+x} ]; then
+  VIASH_PAR_N_LAYERS="2"
+fi
+if [ -z ${VIASH_PAR_N_HIDDEN+x} ]; then
+  VIASH_PAR_N_HIDDEN="128"
+fi
+if [ -z ${VIASH_PAR_DROPOUT_RATE+x} ]; then
+  VIASH_PAR_DROPOUT_RATE="0.2"
+fi
+if [ -z ${VIASH_PAR_MAX_EPOCHS+x} ]; then
+  VIASH_PAR_MAX_EPOCHS="2"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ] && [ ! -e "$VIASH_PAR_INPUT_TEST" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_LATENT" ]]; then
+  if ! [[ "$VIASH_PAR_N_LATENT" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_latent' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_LAYERS" ]]; then
+  if ! [[ "$VIASH_PAR_N_LAYERS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_layers' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_HIDDEN" ]]; then
+  if ! [[ "$VIASH_PAR_N_HIDDEN" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hidden' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_DROPOUT_RATE" ]]; then
+  if ! [[ "$VIASH_PAR_DROPOUT_RATE" =~ ^[-+]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)([eE][-+]?[0-9]+)?$ ]]; then
+    ViashError '--dropout_rate' has to be a double. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_MAX_EPOCHS" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_EPOCHS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_epochs' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN")" )
+  VIASH_PAR_INPUT_TRAIN=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST")" )
+  VIASH_PAR_INPUT_TEST=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/label_projection/methods/scanvi_scarches:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/methods/scanvi_scarches:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/methods/scanvi_scarches:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-scanvi_scarches-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+import scvi
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_latent': $( if [ ! -z ${VIASH_PAR_N_LATENT+x} ]; then echo "int(r'${VIASH_PAR_N_LATENT//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_layers': $( if [ ! -z ${VIASH_PAR_N_LAYERS+x} ]; then echo "int(r'${VIASH_PAR_N_LAYERS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_hidden': $( if [ ! -z ${VIASH_PAR_N_HIDDEN+x} ]; then echo "int(r'${VIASH_PAR_N_HIDDEN//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'dropout_rate': $( if [ ! -z ${VIASH_PAR_DROPOUT_RATE+x} ]; then echo "float(r'${VIASH_PAR_DROPOUT_RATE//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'max_epochs': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Reading input files", flush=True)
+input_train = ad.read_h5ad(par["input_train"])
+input_test = ad.read_h5ad(par["input_test"])
+input_train.X = input_train.layers["counts"]
+input_test.X = input_test.layers["counts"]
+
+print("Train model", flush=True)
+unlabeled_category = "Unknown"
+
+scvi.model.SCVI.setup_anndata(input_train, batch_key="batch", labels_key="label")
+
+# specific scArches parameters
+arches_params = dict(
+    use_layer_norm="both",
+    use_batch_norm="none",
+    encode_covariates=True,
+    dropout_rate=par["dropout_rate"],
+    n_hidden=par["n_hidden"],
+    n_layers=par["n_layers"],
+    n_latent=par["n_latent"],
+)
+scvi_model = scvi.model.SCVI(input_train, **arches_params)
+train_kwargs = dict(
+    train_size=0.9,
+    early_stopping=True,
+)
+scvi_model.train(**train_kwargs)
+model = scvi.model.SCANVI.from_scvi_model(
+    scvi_model, unlabeled_category=unlabeled_category
+)
+model.train(**train_kwargs)
+
+query_model = scvi.model.SCANVI.load_query_data(input_test, model)
+train_kwargs = dict(max_epochs=par["max_epochs"], early_stopping=True)
+query_model.train(plan_kwargs=dict(weight_decay=0.0), **train_kwargs)
+
+print("Generate predictions", flush=True)
+input_test.obs["label"] = "Unknown"
+input_test.obs["label_pred"] = query_model.predict(input_test)
+
+print("Write output AnnData to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_PAR_INPUT_TRAIN=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_PAR_INPUT_TEST=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/label_projection/methods/xgboost/.config.vsh.yaml b/target/docker/label_projection/methods/xgboost/.config.vsh.yaml
new file mode 100644
index 0000000000..e145b2e9d0
--- /dev/null
+++ b/target/docker/label_projection/methods/xgboost/.config.vsh.yaml
@@ -0,0 +1,248 @@
+functionality:
+  name: "xgboost"
+  namespace: "label_projection/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "XGBoost"
+    summary: "XGBoost is a decision tree model that averages multiple trees with gradient\
+      \ boosting."
+    description: "XGBoost is a gradient boosting decision tree model that learns multiple\
+      \ tree\nstructures in the form of a series of input features and their values,\n\
+      leading to a prediction decision, and averages predictions from all its\ntrees.\
+      \ Here, input features are normalised gene expression values.\n"
+    reference: "chen2016xgboost"
+    repository_url: "https://github.com/dmlc/xgboost"
+    documentation_url: "https://xgboost.readthedocs.io/en/stable/index.html"
+    v1:
+      path: "openproblems/tasks/label_projection/methods/xgboost.py"
+      commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+    preferred_normalization: "log_cp10k"
+    variants:
+      xgboost_log_cp10k: null
+      xgboost_scran:
+        preferred_normalization: "log_scran_pooling"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A label projection method."
+      description: "A label projection method to predict the labels of a new \"test\"\
+        \ndataset based on an annotated \"training\" dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "xgboost"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/xgboost/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/methods/xgboost"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/methods/xgboost/xgboost"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/label_projection/methods/xgboost/xgboost b/target/docker/label_projection/methods/xgboost/xgboost
new file mode 100755
index 0000000000..1e89658419
--- /dev/null
+++ b/target/docker/label_projection/methods/xgboost/xgboost
@@ -0,0 +1,982 @@
+#!/usr/bin/env bash
+
+# xgboost 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="xgboost"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "xgboost 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/train.h5ad"
+  echo ""
+  echo "    --input_test"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/test.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/prediction.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "xgboost"
+
+LABEL org.opencontainers.image.description="Companion container for running component label_projection/methods xgboost"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:34Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-xgboost-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "xgboost 2.0.0"
+            exit
+            ;;
+        --input_train)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN" ] && ViashError Bad arguments for option \'--input_train=*\': \'$VIASH_PAR_INPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test=*)
+            [ -n "$VIASH_PAR_INPUT_TEST" ] && ViashError Bad arguments for option \'--input_test=*\': \'$VIASH_PAR_INPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/xgboost:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/xgboost:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/xgboost:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/methods/xgboost:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then
+  ViashError '--input_train' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST+x} ]; then
+  ViashError '--input_test' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ] && [ ! -e "$VIASH_PAR_INPUT_TEST" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN")" )
+  VIASH_PAR_INPUT_TRAIN=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST")" )
+  VIASH_PAR_INPUT_TEST=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/label_projection/methods/xgboost:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/methods/xgboost:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/methods/xgboost:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-xgboost-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import xgboost as xgb
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+input_layer = "normalized"
+
+print("Transform into integers", flush=True)
+input_train.obs["label_int"] = input_train.obs["label"].cat.codes
+categories = input_train.obs["label"].cat.categories
+
+print("Convert AnnDatas into datasets", flush=True)
+xg_train = xgb.DMatrix(input_train.layers[input_layer], label=input_train.obs["label_int"])
+xg_test = xgb.DMatrix(input_test.layers[input_layer])
+
+print("Fit on train data", flush=True)
+param = {'objective': 'multi:softmax', 'num_class': len(categories)}
+watchlist = [(xg_train, "train")]
+xgb_op = xgb.train(param, xg_train, evals=watchlist)
+
+print("Predict on test data", flush=True)
+pred = xgb_op.predict(xg_test).astype(int)
+input_test.obs["label_pred"] = categories[pred]
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN" ]; then
+  VIASH_PAR_INPUT_TRAIN=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST" ]; then
+  VIASH_PAR_INPUT_TEST=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/label_projection/metrics/accuracy/.config.vsh.yaml b/target/docker/label_projection/metrics/accuracy/.config.vsh.yaml
new file mode 100644
index 0000000000..f0191d629d
--- /dev/null
+++ b/target/docker/label_projection/metrics/accuracy/.config.vsh.yaml
@@ -0,0 +1,251 @@
+functionality:
+  name: "accuracy"
+  namespace: "label_projection/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "The solution for the test data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_prediction"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "accuracy"
+      label: "Accuracy"
+      summary: "The percentage of correctly predicted labels."
+      description: "The percentage of correctly predicted labels."
+      min: 0
+      max: 1
+      maximize: true
+      reference: "grandini2020metrics"
+      v1:
+        path: "openproblems/tasks/label_projection/metrics/accuracy.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A label projection metric."
+      description: "A metric for evaluating predicted labels.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/metrics/accuracy/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/metrics/accuracy"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/metrics/accuracy/accuracy"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/label_projection/metrics/accuracy/accuracy b/target/docker/label_projection/metrics/accuracy/accuracy
new file mode 100755
index 0000000000..5283c13ce1
--- /dev/null
+++ b/target/docker/label_projection/metrics/accuracy/accuracy
@@ -0,0 +1,979 @@
+#!/usr/bin/env bash
+
+# accuracy 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="accuracy"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "accuracy 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/solution.h5ad"
+  echo ""
+  echo "    --input_prediction"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/prediction.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/score.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scikit-learn"
+
+LABEL org.opencontainers.image.description="Companion container for running component label_projection/metrics accuracy"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:36Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-accuracy-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "accuracy 2.0.0"
+            exit
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_prediction)
+            [ -n "$VIASH_PAR_INPUT_PREDICTION" ] && ViashError Bad arguments for option \'--input_prediction\': \'$VIASH_PAR_INPUT_PREDICTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_PREDICTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_prediction. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_prediction=*)
+            [ -n "$VIASH_PAR_INPUT_PREDICTION" ] && ViashError Bad arguments for option \'--input_prediction=*\': \'$VIASH_PAR_INPUT_PREDICTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_PREDICTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/metrics/accuracy:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/metrics/accuracy:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/metrics/accuracy:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/metrics/accuracy:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_PREDICTION+x} ]; then
+  ViashError '--input_prediction' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_PREDICTION" ] && [ ! -e "$VIASH_PAR_INPUT_PREDICTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_PREDICTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_PREDICTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_PREDICTION")" )
+  VIASH_PAR_INPUT_PREDICTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_PREDICTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/label_projection/metrics/accuracy:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/metrics/accuracy:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/metrics/accuracy:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-accuracy-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import numpy as np
+import sklearn.preprocessing
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_prediction': $( if [ ! -z ${VIASH_PAR_INPUT_PREDICTION+x} ]; then echo "r'${VIASH_PAR_INPUT_PREDICTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+input_prediction = ad.read_h5ad(par['input_prediction'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+assert (input_prediction.obs_names == input_solution.obs_names).all(), "obs_names not the same in prediction and solution inputs"
+
+print("Encode labels", flush=True)
+cats = list(input_solution.obs["label"].dtype.categories) + list(input_prediction.obs["label_pred"].dtype.categories)
+encoder = sklearn.preprocessing.LabelEncoder().fit(cats)
+input_solution.obs["label"] = encoder.transform(input_solution.obs["label"])
+input_prediction.obs["label_pred"] = encoder.transform(input_prediction.obs["label_pred"])
+
+print("Compute prediction accuracy", flush=True)
+accuracy = np.mean(input_solution.obs["label"] == input_prediction.obs["label_pred"])
+
+print("Store metric value", flush=True)
+input_prediction.uns["metric_ids"] = "accuracy"
+input_prediction.uns["metric_values"] = accuracy
+
+print("Writing adata to file", flush=True)
+input_prediction.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_PREDICTION" ]; then
+  VIASH_PAR_INPUT_PREDICTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_PREDICTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/label_projection/metrics/f1/.config.vsh.yaml b/target/docker/label_projection/metrics/f1/.config.vsh.yaml
new file mode 100644
index 0000000000..846699fc78
--- /dev/null
+++ b/target/docker/label_projection/metrics/f1/.config.vsh.yaml
@@ -0,0 +1,278 @@
+functionality:
+  name: "f1"
+  namespace: "label_projection/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "The solution for the test data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_prediction"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "f1_weighted"
+      label: "F1 weighted"
+      summary: "Average weigthed support between each labels F1 score"
+      description: "Calculates the F1 score for each label, and find their average\
+        \ weighted by support (the number of true instances for each label). This\
+        \ alters 'macro' to account for label imbalance; it can result in an F-score\
+        \ that is not between precision and recall."
+      reference: "grandini2020metrics"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/label_projection/metrics/f1.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    - name: "f1_macro"
+      label: "F1 macro"
+      summary: "Unweighted mean of each label F1-score"
+      description: "Calculates the F1 score for each label, and find their unweighted\
+        \ mean. This does not take label imbalance into account."
+      reference: "grandini2020metrics"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/label_projection/metrics/f1.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    - name: "f1_micro"
+      label: "F1 micro"
+      summary: "Calculation of TP, FN and FP."
+      description: "Calculates the F1 score globally by counting the total true positives,\
+        \ false negatives and false positives."
+      reference: "grandini2020metrics"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/label_projection/metrics/f1.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A label projection metric."
+      description: "A metric for evaluating predicted labels.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/metrics/f1/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/metrics/f1"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/metrics/f1/f1"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/label_projection/metrics/f1/f1 b/target/docker/label_projection/metrics/f1/f1
new file mode 100755
index 0000000000..0382fe2f88
--- /dev/null
+++ b/target/docker/label_projection/metrics/f1/f1
@@ -0,0 +1,985 @@
+#!/usr/bin/env bash
+
+# f1 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="f1"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "f1 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/solution.h5ad"
+  echo ""
+  echo "    --input_prediction"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/prediction.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/score.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scikit-learn"
+
+LABEL org.opencontainers.image.description="Companion container for running component label_projection/metrics f1"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:36Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-f1-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "f1 2.0.0"
+            exit
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_prediction)
+            [ -n "$VIASH_PAR_INPUT_PREDICTION" ] && ViashError Bad arguments for option \'--input_prediction\': \'$VIASH_PAR_INPUT_PREDICTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_PREDICTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_prediction. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_prediction=*)
+            [ -n "$VIASH_PAR_INPUT_PREDICTION" ] && ViashError Bad arguments for option \'--input_prediction=*\': \'$VIASH_PAR_INPUT_PREDICTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_PREDICTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/metrics/f1:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/metrics/f1:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/metrics/f1:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/metrics/f1:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_PREDICTION+x} ]; then
+  ViashError '--input_prediction' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_PREDICTION" ] && [ ! -e "$VIASH_PAR_INPUT_PREDICTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_PREDICTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_PREDICTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_PREDICTION")" )
+  VIASH_PAR_INPUT_PREDICTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_PREDICTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/label_projection/metrics/f1:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/metrics/f1:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/metrics/f1:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-f1-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+from sklearn.metrics import f1_score
+import sklearn.preprocessing
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_prediction': $( if [ ! -z ${VIASH_PAR_INPUT_PREDICTION+x} ]; then echo "r'${VIASH_PAR_INPUT_PREDICTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+input_prediction = ad.read_h5ad(par['input_prediction'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+assert (input_prediction.obs_names == input_solution.obs_names).all(), "obs_names not the same in prediction and solution inputs"
+
+print("Encode labels", flush=True)
+cats = list(input_solution.obs["label"].dtype.categories) + list(input_prediction.obs["label_pred"].dtype.categories)
+encoder = sklearn.preprocessing.LabelEncoder().fit(cats)
+input_solution.obs["label"] = encoder.transform(input_solution.obs["label"])
+input_prediction.obs["label_pred"] = encoder.transform(input_prediction.obs["label_pred"])
+
+print("Compute F1 score", flush=True)
+metric_type = [ "macro", "micro", "weighted" ]
+metric_id = [ "f1_" + x for x in metric_type]
+metric_value = [ f1_score(
+        input_solution.obs["label"], 
+        input_prediction.obs["label_pred"], 
+        average=x
+    ) for x in metric_type ]
+
+print("Store metric value", flush=True)
+input_prediction.uns["metric_ids"] = metric_id
+input_prediction.uns["metric_values"] = metric_value
+
+print("Writing adata to file", flush=True)
+input_prediction.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_PREDICTION" ]; then
+  VIASH_PAR_INPUT_PREDICTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_PREDICTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/label_projection/process_dataset/.config.vsh.yaml b/target/docker/label_projection/process_dataset/.config.vsh.yaml
new file mode 100644
index 0000000000..49b07a6104
--- /dev/null
+++ b/target/docker/label_projection/process_dataset/.config.vsh.yaml
@@ -0,0 +1,397 @@
+functionality:
+  name: "process_dataset"
+  namespace: "label_projection"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Common Dataset"
+      summary: "A subset of the common dataset."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type information"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/common/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_solution"
+    info:
+      label: "Solution"
+      summary: "The solution for the test data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--method"
+    description: "The process method to assign train/test."
+    info: null
+    default:
+    - "batch"
+    required: false
+    choices:
+    - "batch"
+    - "random"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_label"
+    description: "Which .obs slot to use as label."
+    info: null
+    default:
+    - "cell_type"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_batch"
+    description: "Which .obs slot to use as batch covariate."
+    info: null
+    default:
+    - "batch"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--seed"
+    description: "A seed for the subsampling."
+    info: null
+    example:
+    - 123
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/subset_anndata.py"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "process_dataset"
+    type_info:
+      label: "Data processor"
+      summary: "A label projection dataset processor."
+      description: "A component for processing a Common Dataset into a task-specific\
+        \ dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/process_dataset/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/process_dataset"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/label_projection/process_dataset/process_dataset"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/label_projection/process_dataset/process_dataset b/target/docker/label_projection/process_dataset/process_dataset
new file mode 100755
index 0000000000..094f010bf6
--- /dev/null
+++ b/target/docker/label_projection/process_dataset/process_dataset
@@ -0,0 +1,1151 @@
+#!/usr/bin/env bash
+
+# process_dataset 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="process_dataset"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "process_dataset 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/common/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output_train"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/train.h5ad"
+  echo ""
+  echo "    --output_test"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/test.h5ad"
+  echo ""
+  echo "    --output_solution"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/label_projection/pancreas/solution.h5ad"
+  echo ""
+  echo "    --method"
+  echo "        type: string"
+  echo "        default: batch"
+  echo "        choices: [ batch, random ]"
+  echo "        The process method to assign train/test."
+  echo ""
+  echo "    --obs_label"
+  echo "        type: string"
+  echo "        default: cell_type"
+  echo "        Which .obs slot to use as label."
+  echo ""
+  echo "    --obs_batch"
+  echo "        type: string"
+  echo "        default: batch"
+  echo "        Which .obs slot to use as batch covariate."
+  echo ""
+  echo "    --seed"
+  echo "        type: integer"
+  echo "        example: 123"
+  echo "        A seed for the subsampling."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component label_projection process_dataset"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:33Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-process_dataset-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "process_dataset 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_train)
+            [ -n "$VIASH_PAR_OUTPUT_TRAIN" ] && ViashError Bad arguments for option \'--output_train\': \'$VIASH_PAR_OUTPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TRAIN="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_train. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_train=*)
+            [ -n "$VIASH_PAR_OUTPUT_TRAIN" ] && ViashError Bad arguments for option \'--output_train=*\': \'$VIASH_PAR_OUTPUT_TRAIN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TRAIN=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_test)
+            [ -n "$VIASH_PAR_OUTPUT_TEST" ] && ViashError Bad arguments for option \'--output_test\': \'$VIASH_PAR_OUTPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TEST="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_test. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_test=*)
+            [ -n "$VIASH_PAR_OUTPUT_TEST" ] && ViashError Bad arguments for option \'--output_test=*\': \'$VIASH_PAR_OUTPUT_TEST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TEST=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_solution)
+            [ -n "$VIASH_PAR_OUTPUT_SOLUTION" ] && ViashError Bad arguments for option \'--output_solution\': \'$VIASH_PAR_OUTPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_solution=*)
+            [ -n "$VIASH_PAR_OUTPUT_SOLUTION" ] && ViashError Bad arguments for option \'--output_solution=*\': \'$VIASH_PAR_OUTPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --method)
+            [ -n "$VIASH_PAR_METHOD" ] && ViashError Bad arguments for option \'--method\': \'$VIASH_PAR_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_METHOD="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --method. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --method=*)
+            [ -n "$VIASH_PAR_METHOD" ] && ViashError Bad arguments for option \'--method=*\': \'$VIASH_PAR_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_METHOD=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obs_label)
+            [ -n "$VIASH_PAR_OBS_LABEL" ] && ViashError Bad arguments for option \'--obs_label\': \'$VIASH_PAR_OBS_LABEL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_LABEL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_label. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_label=*)
+            [ -n "$VIASH_PAR_OBS_LABEL" ] && ViashError Bad arguments for option \'--obs_label=*\': \'$VIASH_PAR_OBS_LABEL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_LABEL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --obs_batch)
+            [ -n "$VIASH_PAR_OBS_BATCH" ] && ViashError Bad arguments for option \'--obs_batch\': \'$VIASH_PAR_OBS_BATCH\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_BATCH="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --obs_batch. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --obs_batch=*)
+            [ -n "$VIASH_PAR_OBS_BATCH" ] && ViashError Bad arguments for option \'--obs_batch=*\': \'$VIASH_PAR_OBS_BATCH\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OBS_BATCH=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --seed)
+            [ -n "$VIASH_PAR_SEED" ] && ViashError Bad arguments for option \'--seed\': \'$VIASH_PAR_SEED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SEED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --seed. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --seed=*)
+            [ -n "$VIASH_PAR_SEED" ] && ViashError Bad arguments for option \'--seed=*\': \'$VIASH_PAR_SEED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SEED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/process_dataset:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/label_projection/process_dataset:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/process_dataset:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/label_projection/process_dataset:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_TRAIN+x} ]; then
+  ViashError '--output_train' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_TEST+x} ]; then
+  ViashError '--output_test' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_SOLUTION+x} ]; then
+  ViashError '--output_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_METHOD+x} ]; then
+  VIASH_PAR_METHOD="batch"
+fi
+if [ -z ${VIASH_PAR_OBS_LABEL+x} ]; then
+  VIASH_PAR_OBS_LABEL="cell_type"
+fi
+if [ -z ${VIASH_PAR_OBS_BATCH+x} ]; then
+  VIASH_PAR_OBS_BATCH="batch"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_SEED" ]]; then
+  if ! [[ "$VIASH_PAR_SEED" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--seed' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# check whether value is belongs to a set of choices
+if [ ! -z "$VIASH_PAR_METHOD" ]; then
+  VIASH_PAR_METHOD_CHOICES=("batch:random")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_METHOD_CHOICES[*]}:" =~ ":$VIASH_PAR_METHOD:" ]]; then
+    ViashError '--method' specified value of \'$VIASH_PAR_METHOD\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_TRAIN")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_TRAIN")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TEST" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_TEST")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_TEST")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_SOLUTION")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_SOLUTION")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_TRAIN")" )
+  VIASH_PAR_OUTPUT_TRAIN=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_TRAIN")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_TRAIN" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TEST" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_TEST")" )
+  VIASH_PAR_OUTPUT_TEST=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_TEST")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_TEST" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_SOLUTION")" )
+  VIASH_PAR_OUTPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_SOLUTION")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_SOLUTION" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/label_projection/process_dataset:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/process_dataset:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/label_projection/process_dataset:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-process_dataset-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import random
+import numpy as np
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_train': $( if [ ! -z ${VIASH_PAR_OUTPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_OUTPUT_TRAIN//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_test': $( if [ ! -z ${VIASH_PAR_OUTPUT_TEST+x} ]; then echo "r'${VIASH_PAR_OUTPUT_TEST//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_solution': $( if [ ! -z ${VIASH_PAR_OUTPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_OUTPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'method': $( if [ ! -z ${VIASH_PAR_METHOD+x} ]; then echo "r'${VIASH_PAR_METHOD//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'obs_label': $( if [ ! -z ${VIASH_PAR_OBS_LABEL+x} ]; then echo "r'${VIASH_PAR_OBS_LABEL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'obs_batch': $( if [ ! -z ${VIASH_PAR_OBS_BATCH+x} ]; then echo "r'${VIASH_PAR_OBS_BATCH//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'seed': $( if [ ! -z ${VIASH_PAR_SEED+x} ]; then echo "int(r'${VIASH_PAR_SEED//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# import helper functions
+sys.path.append(meta['resources_dir'])
+from subset_anndata import read_config_slots_info, subset_anndata
+
+# set seed if need be
+if par["seed"]:
+    print(f">> Setting seed to {par['seed']}")
+    random.seed(par["seed"])
+
+print(">> Load data", flush=True)
+adata = ad.read_h5ad(par["input"])
+print("input:", adata)
+
+print(f">> Process data using {par['method']} method")
+if par["method"] == "batch":
+    batch_info = adata.obs[par["obs_batch"]]
+    batch_categories = batch_info.dtype.categories
+    test_batches = random.sample(list(batch_categories), 1)
+    is_test = [ x in test_batches for x in batch_info ]
+elif par["method"] == "random":
+    train_ix = np.random.choice(adata.n_obs, round(adata.n_obs * 0.8), replace=False)
+    is_test = [ not x in train_ix for x in range(0, adata.n_obs) ]
+
+# subset the different adatas
+print(">> Figuring which data needs to be copied to which output file", flush=True)
+# use par arguments to look for label and batch value in different slots
+slot_mapping = {
+    "obs": {
+        "label": par["obs_label"],
+        "batch": par["obs_batch"],
+    }
+}
+slot_info = read_config_slots_info(meta["config"], slot_mapping)
+
+print(">> Creating train data", flush=True)
+output_train = subset_anndata(
+    adata[[not x for x in is_test]], 
+    slot_info["output_train"]
+)
+
+print(">> Creating test data", flush=True)
+output_test = subset_anndata(
+    adata[is_test],
+    slot_info["output_test"]
+)
+
+print(">> Creating solution data", flush=True)
+output_solution = subset_anndata(
+    adata[is_test],
+    slot_info['output_solution']
+)
+
+print(">> Writing data", flush=True)
+output_train.write_h5ad(par["output_train"])
+output_test.write_h5ad(par["output_test"])
+output_solution.write_h5ad(par["output_solution"])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN" ]; then
+  VIASH_PAR_OUTPUT_TRAIN=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_TRAIN")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TEST" ]; then
+  VIASH_PAR_OUTPUT_TEST=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_TEST")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ]; then
+  VIASH_PAR_OUTPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN" ] && [ ! -e "$VIASH_PAR_OUTPUT_TRAIN" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_TRAIN' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TEST" ] && [ ! -e "$VIASH_PAR_OUTPUT_TEST" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_TEST' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_OUTPUT_SOLUTION" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/label_projection/process_dataset/subset_anndata.py b/target/docker/label_projection/process_dataset/subset_anndata.py
new file mode 100644
index 0000000000..80bd160872
--- /dev/null
+++ b/target/docker/label_projection/process_dataset/subset_anndata.py
@@ -0,0 +1,83 @@
+"""Helper functions related to subsetting AnnData objects based on the file format 
+specifications in the .config.vsh.yaml and slot mapping overrides."""
+
+def read_config_slots_info(config_file, slot_mapping = {}):
+    """Read the .config.vsh.yaml to find out which output slots need to be copied to which output file.
+    
+    Arguments:
+    config_file -- Path to the .config.vsh.yaml file (required).
+    slot_mapping -- Which slots to retain. Must be a dictionary whose keys are the names 
+      of the AnnData structs, and values is another dictionary with destination value
+      names as keys and source value names as values.
+      Example of slot_mapping: 
+        ```
+        slot_mapping = {
+          "layers": {
+            "counts": par["layer_counts"],
+          },
+          "obs": {
+            "cell_type": par["obs_cell_type"],
+            "batch": par["obs_batch"],
+          }
+        }
+        ```
+    """
+    import yaml
+    import re
+
+    # read output spec from yaml
+    with open(config_file, "r") as object_name:
+        config = yaml.safe_load(object_name)
+
+    output_struct_slots = {}
+
+    # fetch info on which slots should be copied to which file
+    for arg in config["functionality"]["arguments"]:
+        # argument is an output file with a slot specification
+        if arg["direction"] == "output" and arg.get("info", {}).get("slots"):
+            object_name = re.sub("--", "", arg["name"])
+            
+            struct_slots = arg['info']['slots']
+            out = {}
+            for (struct, slots) in struct_slots.items():
+                out_struct = {}
+                for slot in slots:
+                    # if slot_mapping[struct][slot['name']] exists, use that as the source slot name
+                    # otherwise use slot['name']
+                    source_slot = slot_mapping.get(struct, {}).get(slot["name"], slot["name"])
+                    out_struct[slot["name"]] = source_slot
+                out[struct] = out_struct
+
+            output_struct_slots[object_name] = out
+
+    return output_struct_slots
+
+# create new anndata objects according to api spec
+def subset_anndata(adata, slot_info):
+    """Create new anndata object according to slot info specifications.
+    
+    Arguments:
+    adata -- An AnnData object to subset (required)
+    slot_info -- Which slots to retain, typically one of the items in the output of read_config_slots_info.
+      Must be a dictionary whose keys are the names of the AnnData structs, and values is another 
+      dictionary with destination value names as keys and source value names as values. 
+      """
+    import pandas as pd
+    import anndata as ad
+
+    structs = ["layers", "obs", "var", "uns", "obsp", "obsm", "varp", "varm"]
+    kwargs = {}
+
+    for struct in structs:
+        slot_mapping = slot_info.get(struct, {})
+        data = {dest : getattr(adata, struct)[src] for (dest, src) in slot_mapping.items()}
+        if len(data) > 0:
+            if struct in ['obs', 'var']:
+                data = pd.concat(data, axis=1)
+            kwargs[struct] = data
+        elif struct in ['obs', 'var']:
+            # if no columns need to be copied, we still need an 'obs' and a 'var' 
+            # to help determine the shape of the adata
+            kwargs[struct] = getattr(adata, struct).iloc[:,[]]
+
+    return ad.AnnData(**kwargs)
\ No newline at end of file
diff --git a/target/docker/match_modalities/control_methods/random_features/.config.vsh.yaml b/target/docker/match_modalities/control_methods/random_features/.config.vsh.yaml
new file mode 100644
index 0000000000..8b576b54d0
--- /dev/null
+++ b/target/docker/match_modalities/control_methods/random_features/.config.vsh.yaml
@@ -0,0 +1,376 @@
+functionality:
+  name: "random_features"
+  namespace: "match_modalities/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_mod1"
+    info:
+      label: "Modality 1"
+      summary: "The first modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Modality 2"
+      summary: "The second modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution_mod1"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the first modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution_mod2"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the second modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod1"
+    info:
+      label: "Integrated mod1"
+      summary: "The integrated embedding for the first modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod2"
+    info:
+      label: "Integrated mod2"
+      summary: "The integrated embedding for the second modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/match_modalities/scicar_cell_lines"
+    dest: "resources_test/match_modalities/scicar_cell_lines"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Random Features"
+    summary: "Randomly permutated features"
+    description: "\"Randomly permuted twice, once for use as the output for each modality,\
+      \ producing random features with no correlation between modalities.\"\n"
+    preferred_normalization: "log_cp10k"
+    v1:
+      path: "openproblems/tasks/matching_modalities/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "A multimodal data integration control method."
+      description: "This folder contains control components for the task. \nThese\
+        \ components have the same interface as the regular methods\nbut also receive\
+        \ the solution object as input. It serves as a\nstarting point to test the\
+        \ relative accuracy of new methods in\nthe task, and also as a quality control\
+        \ for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/control_methods/random_features/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/match_modalities/control_methods/random_features"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/match_modalities/control_methods/random_features/random_features"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/match_modalities/control_methods/random_features/random_features b/target/docker/match_modalities/control_methods/random_features/random_features
new file mode 100755
index 0000000000..6c318c967e
--- /dev/null
+++ b/target/docker/match_modalities/control_methods/random_features/random_features
@@ -0,0 +1,1074 @@
+#!/usr/bin/env bash
+
+# random_features 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="random_features"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "random_features 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+  echo ""
+  echo "    --input_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+  echo ""
+  echo "    --input_solution_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+  echo ""
+  echo "    --input_solution_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+  echo ""
+  echo "    --output_mod1"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+  echo ""
+  echo "    --output_mod2"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "numpy"
+
+LABEL org.opencontainers.image.description="Companion container for running component match_modalities/control_methods random_features"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:32Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-random_features-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "random_features 2.0.0"
+            exit
+            ;;
+        --input_mod1)
+            [ -n "$VIASH_PAR_INPUT_MOD1" ] && ViashError Bad arguments for option \'--input_mod1\': \'$VIASH_PAR_INPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_MOD1" ] && ViashError Bad arguments for option \'--input_mod1=*\': \'$VIASH_PAR_INPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_mod2)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2=*\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution_mod1)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION_MOD1" ] && ViashError Bad arguments for option \'--input_solution_mod1\': \'$VIASH_PAR_INPUT_SOLUTION_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION_MOD1" ] && ViashError Bad arguments for option \'--input_solution_mod1=*\': \'$VIASH_PAR_INPUT_SOLUTION_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution_mod2)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION_MOD2" ] && ViashError Bad arguments for option \'--input_solution_mod2\': \'$VIASH_PAR_INPUT_SOLUTION_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION_MOD2" ] && ViashError Bad arguments for option \'--input_solution_mod2=*\': \'$VIASH_PAR_INPUT_SOLUTION_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod1)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod1=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1=*\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod2)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod2=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2=*\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/match_modalities/control_methods/random_features:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/match_modalities/control_methods/random_features:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/match_modalities/control_methods/random_features:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/match_modalities/control_methods/random_features:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_MOD1+x} ]; then
+  ViashError '--input_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_MOD2+x} ]; then
+  ViashError '--input_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION_MOD1+x} ]; then
+  ViashError '--input_solution_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION_MOD2+x} ]; then
+  ViashError '--input_solution_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then
+  ViashError '--output_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then
+  ViashError '--output_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD2' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION_MOD2' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD1")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD1")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD2")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD2")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD1")" )
+  VIASH_PAR_INPUT_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD2")" )
+  VIASH_PAR_INPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION_MOD1")" )
+  VIASH_PAR_INPUT_SOLUTION_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION_MOD2")" )
+  VIASH_PAR_INPUT_SOLUTION_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD1")" )
+  VIASH_PAR_OUTPUT_MOD1=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD1")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD1" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD2")" )
+  VIASH_PAR_OUTPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD2")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD2" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/match_modalities/control_methods/random_features:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/match_modalities/control_methods/random_features:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/match_modalities/control_methods/random_features:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-random_features-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Reading input h5ad file", flush=True)
+adata_mod1 = ad.read_h5ad(par["input_mod1"])
+adata_mod2 = ad.read_h5ad(par["input_mod2"])
+
+print("Generating random features", flush=True)
+# todo: do we actually need to permute this once more
+adata_mod1.obsm["integrated"] = adata_mod1.obsm["X_svd"][np.random.permutation(np.arange(adata_mod1.shape[0]))]
+adata_mod2.obsm["integrated"] = adata_mod1.obsm["X_svd"][np.random.permutation(np.arange(adata_mod1.shape[0]))]
+
+print("Write output to file", flush=True)
+adata_mod1.uns["method_id"] = meta["functionality_name"]
+adata_mod2.uns["method_id"] = meta["functionality_name"]
+adata_mod1.write_h5ad(par["output_mod1"], compression="gzip")
+adata_mod2.write_h5ad(par["output_mod2"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ]; then
+  VIASH_PAR_INPUT_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_PAR_INPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD1" ]; then
+  VIASH_PAR_INPUT_SOLUTION_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD2" ]; then
+  VIASH_PAR_INPUT_SOLUTION_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_PAR_OUTPUT_MOD1=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_PAR_OUTPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD2")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD2' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/match_modalities/control_methods/true_features/.config.vsh.yaml b/target/docker/match_modalities/control_methods/true_features/.config.vsh.yaml
new file mode 100644
index 0000000000..e373386eb3
--- /dev/null
+++ b/target/docker/match_modalities/control_methods/true_features/.config.vsh.yaml
@@ -0,0 +1,369 @@
+functionality:
+  name: "true_features"
+  namespace: "match_modalities/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_mod1"
+    info:
+      label: "Modality 1"
+      summary: "The first modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Modality 2"
+      summary: "The second modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution_mod1"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the first modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution_mod2"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the second modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod1"
+    info:
+      label: "Integrated mod1"
+      summary: "The integrated embedding for the first modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod2"
+    info:
+      label: "Integrated mod2"
+      summary: "The integrated embedding for the second modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/match_modalities/scicar_cell_lines"
+    dest: "resources_test/match_modalities/scicar_cell_lines"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "True Features"
+    summary: "A 1 to 1 mapping of features between modalities"
+    description: "\"use the same features for both modalities\"\n"
+    preferred_normalization: "log_cp10k"
+    v1:
+      path: "openproblems/tasks/matching_modalities/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "A multimodal data integration control method."
+      description: "This folder contains control components for the task. \nThese\
+        \ components have the same interface as the regular methods\nbut also receive\
+        \ the solution object as input. It serves as a\nstarting point to test the\
+        \ relative accuracy of new methods in\nthe task, and also as a quality control\
+        \ for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/control_methods/true_features/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/match_modalities/control_methods/true_features"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/match_modalities/control_methods/true_features/true_features"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/match_modalities/control_methods/true_features/true_features b/target/docker/match_modalities/control_methods/true_features/true_features
new file mode 100755
index 0000000000..d5fc2edeb5
--- /dev/null
+++ b/target/docker/match_modalities/control_methods/true_features/true_features
@@ -0,0 +1,1100 @@
+#!/usr/bin/env bash
+
+# true_features 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="true_features"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "true_features 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+  echo ""
+  echo "    --input_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+  echo ""
+  echo "    --input_solution_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+  echo ""
+  echo "    --input_solution_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+  echo ""
+  echo "    --output_mod1"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+  echo ""
+  echo "    --output_mod2"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component match_modalities/control_methods true_features"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:32Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-true_features-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "true_features 2.0.0"
+            exit
+            ;;
+        --input_mod1)
+            [ -n "$VIASH_PAR_INPUT_MOD1" ] && ViashError Bad arguments for option \'--input_mod1\': \'$VIASH_PAR_INPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_MOD1" ] && ViashError Bad arguments for option \'--input_mod1=*\': \'$VIASH_PAR_INPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_mod2)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2=*\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution_mod1)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION_MOD1" ] && ViashError Bad arguments for option \'--input_solution_mod1\': \'$VIASH_PAR_INPUT_SOLUTION_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION_MOD1" ] && ViashError Bad arguments for option \'--input_solution_mod1=*\': \'$VIASH_PAR_INPUT_SOLUTION_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution_mod2)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION_MOD2" ] && ViashError Bad arguments for option \'--input_solution_mod2\': \'$VIASH_PAR_INPUT_SOLUTION_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION_MOD2" ] && ViashError Bad arguments for option \'--input_solution_mod2=*\': \'$VIASH_PAR_INPUT_SOLUTION_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod1)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod1=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1=*\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod2)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod2=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2=*\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/match_modalities/control_methods/true_features:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/match_modalities/control_methods/true_features:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/match_modalities/control_methods/true_features:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/match_modalities/control_methods/true_features:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_MOD1+x} ]; then
+  ViashError '--input_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_MOD2+x} ]; then
+  ViashError '--input_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION_MOD1+x} ]; then
+  ViashError '--input_solution_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION_MOD2+x} ]; then
+  ViashError '--input_solution_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then
+  ViashError '--output_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then
+  ViashError '--output_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD2' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION_MOD2' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD1")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD1")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD2")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD2")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD1")" )
+  VIASH_PAR_INPUT_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD2")" )
+  VIASH_PAR_INPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION_MOD1")" )
+  VIASH_PAR_INPUT_SOLUTION_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION_MOD2")" )
+  VIASH_PAR_INPUT_SOLUTION_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD1")" )
+  VIASH_PAR_OUTPUT_MOD1=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD1")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD1" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD2")" )
+  VIASH_PAR_OUTPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD2")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD2" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/match_modalities/control_methods/true_features:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/match_modalities/control_methods/true_features:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/match_modalities/control_methods/true_features:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-true_features-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Reading input h5ad file", flush=True)
+adata_mod1 = ad.read_h5ad(par["input_mod1"])
+adata_mod2 = ad.read_h5ad(par["input_mod2"])
+
+solution_mod1 = ad.read_h5ad(par["input_solution_mod1"])
+solution_mod2 = ad.read_h5ad(par["input_solution_mod2"])
+
+print("Storing true features", flush=True)
+output_mod1 = ad.AnnData(
+  obs=adata_mod1.obs[[]],
+  var=adata_mod1.var[[]],
+  obsm={
+    "integrated": adata_mod1.obsm["X_svd"]
+  },
+  uns={
+    "dataset_id": adata_mod1.uns["dataset_id"],
+    "normalization_id": adata_mod1.uns["normalization_id"],
+    "method_id": meta["functionality_name"]
+  }
+)
+
+# Permutate mod1 according to mod2
+mod2_obsm = adata_mod1.obsm["X_svd"][solution_mod1.obs["permutation_indices"]]
+reverse_indices_mod2 = np.argsort(solution_mod2.obs["permutation_indices"])
+mod2_obsm = mod2_obsm[reverse_indices_mod2]
+
+output_mod2 = ad.AnnData(
+  obs=adata_mod2.obs[[]],
+  var=adata_mod2.var[[]],
+  obsm={
+    "integrated": mod2_obsm
+  },
+  uns={
+    "dataset_id": adata_mod2.uns["dataset_id"],
+    "normalization_id": adata_mod2.uns["normalization_id"],
+    "method_id": meta["functionality_name"]
+  }
+)
+
+print("Write output to file", flush=True)
+output_mod1.write_h5ad(par["output_mod1"], compression="gzip")
+output_mod2.write_h5ad(par["output_mod2"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ]; then
+  VIASH_PAR_INPUT_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_PAR_INPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD1" ]; then
+  VIASH_PAR_INPUT_SOLUTION_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD2" ]; then
+  VIASH_PAR_INPUT_SOLUTION_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_PAR_OUTPUT_MOD1=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_PAR_OUTPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD2")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD2' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/match_modalities/methods/fastmnn/.config.vsh.yaml b/target/docker/match_modalities/methods/fastmnn/.config.vsh.yaml
new file mode 100644
index 0000000000..e48379087c
--- /dev/null
+++ b/target/docker/match_modalities/methods/fastmnn/.config.vsh.yaml
@@ -0,0 +1,241 @@
+functionality:
+  name: "fastmnn"
+  namespace: "match_modalities/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_mod1"
+    info:
+      label: "Modality 1"
+      summary: "The first modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Modality 2"
+      summary: "The second modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod1"
+    info:
+      label: "Integrated mod1"
+      summary: "The integrated embedding for the first modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod2"
+    info:
+      label: "Integrated mod2"
+      summary: "The integrated embedding for the second modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/match_modalities/scicar_cell_lines"
+    dest: "resources_test/match_modalities/scicar_cell_lines"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "fastMNN"
+    summary: "A simpler version of the original mnnCorrect algorithm."
+    description: "FastMNN is a simplified version of the mnnCorrect algorithm. Both\
+      \ use Mutual Nearest Neighbors to integrate multimodal single-cell data.\n"
+    preferred_normalization: "log_cp10k"
+    variants:
+      mnn_log_cp10k: null
+      mnn_log_scran_pooling:
+        preferred_normalization: "log_scran_pooling"
+    reference: "haghverdi2018batch"
+    repository_url: "https://github.com/LTLA/batchelor"
+    documentation_url: "https://github.com/LTLA/batchelor#readme"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A multimodal data integration method."
+      description: "A multimodal method to integrate data.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    bioc:
+    - "batchelor"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/fastmnn/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/match_modalities/methods/fastmnn"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/match_modalities/methods/fastmnn/fastmnn"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/match_modalities/methods/fastmnn/fastmnn b/target/docker/match_modalities/methods/fastmnn/fastmnn
new file mode 100755
index 0000000000..67e3c41c73
--- /dev/null
+++ b/target/docker/match_modalities/methods/fastmnn/fastmnn
@@ -0,0 +1,1029 @@
+#!/usr/bin/env bash
+
+# fastmnn 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="fastmnn"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "fastmnn 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+  echo ""
+  echo "    --input_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+  echo ""
+  echo "    --output_mod1"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+  echo ""
+  echo "    --output_mod2"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")' && \
+  Rscript -e 'if (!requireNamespace("batchelor", quietly = TRUE)) BiocManager::install("batchelor")'
+
+LABEL org.opencontainers.image.description="Companion container for running component match_modalities/methods fastmnn"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:31Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-fastmnn-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "fastmnn 2.0.0"
+            exit
+            ;;
+        --input_mod1)
+            [ -n "$VIASH_PAR_INPUT_MOD1" ] && ViashError Bad arguments for option \'--input_mod1\': \'$VIASH_PAR_INPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_MOD1" ] && ViashError Bad arguments for option \'--input_mod1=*\': \'$VIASH_PAR_INPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_mod2)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2=*\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod1)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod1=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1=*\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod2)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod2=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2=*\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/match_modalities/methods/fastmnn:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/match_modalities/methods/fastmnn:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/match_modalities/methods/fastmnn:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/match_modalities/methods/fastmnn:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_MOD1+x} ]; then
+  ViashError '--input_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_MOD2+x} ]; then
+  ViashError '--input_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then
+  ViashError '--output_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then
+  ViashError '--output_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD2' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD1")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD1")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD2")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD2")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD1")" )
+  VIASH_PAR_INPUT_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD2")" )
+  VIASH_PAR_INPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD1")" )
+  VIASH_PAR_OUTPUT_MOD1=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD1")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD1" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD2")" )
+  VIASH_PAR_OUTPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD2")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD2" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/match_modalities/methods/fastmnn:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/match_modalities/methods/fastmnn:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/match_modalities/methods/fastmnn:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-fastmnn-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+library(anndata, warn.conflicts = FALSE)
+library(Matrix, warn.conflicts = FALSE)
+requireNamespace("batchelor", quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_MOD1" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_MOD2" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output_mod1" = $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT_MOD1" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output_mod2" = $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT_MOD2" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading input h5ad file\\n")
+adata_mod1 <- read_h5ad(par\$input_mod1)
+adata_mod2 <- read_h5ad(par\$input_mod2)
+
+cat("Running MNN\\n")
+sce_mnn <- batchelor::fastMNN(
+  t(adata_mod1\$obsm[["X_svd"]]),
+  t(adata_mod2\$obsm[["X_svd"]])
+)
+
+cat("Storing output\\n")
+combined_recons <- t(SummarizedExperiment::assay(sce_mnn, "reconstructed"))
+mode1_recons <- combined_recons[seq_len(nrow(adata_mod1\$obsm[["X_svd"]])), , drop = FALSE]
+mode2_recons <- combined_recons[-seq_len(nrow(adata_mod1\$obsm[["X_svd"]])), , drop = FALSE]
+
+adata_mod1\$obsm[["integrated"]] <- as.matrix(mode1_recons)
+adata_mod2\$obsm[["integrated"]] <- as.matrix(mode2_recons)
+
+cat("Writing to file\\n")
+adata_mod1\$uns["method_id"] <- meta\$functionality_name
+adata_mod2\$uns["method_id"] <- meta\$functionality_name
+
+yyy <- adata_mod1\$write_h5ad(par\$output_mod1, compression = "gzip")
+zzz <- adata_mod2\$write_h5ad(par\$output_mod2, compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ]; then
+  VIASH_PAR_INPUT_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_PAR_INPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_PAR_OUTPUT_MOD1=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_PAR_OUTPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD2")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD2' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/match_modalities/methods/harmonic_alignment/.config.vsh.yaml b/target/docker/match_modalities/methods/harmonic_alignment/.config.vsh.yaml
new file mode 100644
index 0000000000..0abf933478
--- /dev/null
+++ b/target/docker/match_modalities/methods/harmonic_alignment/.config.vsh.yaml
@@ -0,0 +1,266 @@
+functionality:
+  name: "harmonic_alignment"
+  namespace: "match_modalities/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_mod1"
+    info:
+      label: "Modality 1"
+      summary: "The first modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Modality 2"
+      summary: "The second modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod1"
+    info:
+      label: "Integrated mod1"
+      summary: "The integrated embedding for the first modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod2"
+    info:
+      label: "Integrated mod2"
+      summary: "The integrated embedding for the second modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pca_XY"
+    description: "Default number of principal components on which to build graph."
+    info: null
+    default:
+    - 100
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_eigenvectors"
+    description: "Number of eigenvectors of the normalized Laplacian on which to perform\
+      \ alignment."
+    info: null
+    default:
+    - 100
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/match_modalities/scicar_cell_lines"
+    dest: "resources_test/match_modalities/scicar_cell_lines"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Harmonic Alignment"
+    summary: "Harmonic Alignment"
+    description: "Harmonic Alignment is a method for integrating multimodal single-cell\
+      \ data. It is based on the idea of aligning the eigenvectors of the Laplacian\
+      \ matrices of the two modalities. The alignment is achieved by solving a generalized\
+      \ eigenvalue problem. The method is described in the following paper: https://doi.org/10.1137/1.9781611976236.36\n"
+    preferred_normalization: "log_cp10k"
+    v1:
+      path: "openproblems/tasks/matching_modalities/methods/harmonic_alignment.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    reference: "stanley2020harmonic"
+    documentation_url: "https://github.com/KrishnaswamyLab/harmonic-alignment#readme"
+    repository_url: "https://github.com/KrishnaswamyLab/harmonic-alignment"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A multimodal data integration method."
+      description: "A multimodal method to integrate data.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    github:
+    - "KrishnaswamyLab/harmonic-alignment#subdirectory=python"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/harmonic_alignment/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/match_modalities/methods/harmonic_alignment"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/match_modalities/methods/harmonic_alignment/harmonic_alignment"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/match_modalities/methods/harmonic_alignment/harmonic_alignment b/target/docker/match_modalities/methods/harmonic_alignment/harmonic_alignment
new file mode 100755
index 0000000000..16315add23
--- /dev/null
+++ b/target/docker/match_modalities/methods/harmonic_alignment/harmonic_alignment
@@ -0,0 +1,1083 @@
+#!/usr/bin/env bash
+
+# harmonic_alignment 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="harmonic_alignment"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "harmonic_alignment 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+  echo ""
+  echo "    --input_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+  echo ""
+  echo "    --output_mod1"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+  echo ""
+  echo "    --output_mod2"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+  echo ""
+  echo "    --n_pca_XY"
+  echo "        type: integer"
+  echo "        default: 100"
+  echo "        Default number of principal components on which to build graph."
+  echo ""
+  echo "    --n_eigenvectors"
+  echo "        type: integer"
+  echo "        default: 100"
+  echo "        Number of eigenvectors of the normalized Laplacian on which to perform"
+  echo "        alignment."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "git+https://github.com/KrishnaswamyLab/harmonic-alignment#subdirectory=python"
+
+LABEL org.opencontainers.image.description="Companion container for running component match_modalities/methods harmonic_alignment"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:31Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-harmonic_alignment-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "harmonic_alignment 2.0.0"
+            exit
+            ;;
+        --input_mod1)
+            [ -n "$VIASH_PAR_INPUT_MOD1" ] && ViashError Bad arguments for option \'--input_mod1\': \'$VIASH_PAR_INPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_MOD1" ] && ViashError Bad arguments for option \'--input_mod1=*\': \'$VIASH_PAR_INPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_mod2)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2=*\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod1)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod1=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1=*\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod2)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod2=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2=*\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_pca_XY)
+            [ -n "$VIASH_PAR_N_PCA_XY" ] && ViashError Bad arguments for option \'--n_pca_XY\': \'$VIASH_PAR_N_PCA_XY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCA_XY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_pca_XY. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_pca_XY=*)
+            [ -n "$VIASH_PAR_N_PCA_XY" ] && ViashError Bad arguments for option \'--n_pca_XY=*\': \'$VIASH_PAR_N_PCA_XY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCA_XY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_eigenvectors)
+            [ -n "$VIASH_PAR_N_EIGENVECTORS" ] && ViashError Bad arguments for option \'--n_eigenvectors\': \'$VIASH_PAR_N_EIGENVECTORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_EIGENVECTORS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_eigenvectors. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_eigenvectors=*)
+            [ -n "$VIASH_PAR_N_EIGENVECTORS" ] && ViashError Bad arguments for option \'--n_eigenvectors=*\': \'$VIASH_PAR_N_EIGENVECTORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_EIGENVECTORS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/match_modalities/methods/harmonic_alignment:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/match_modalities/methods/harmonic_alignment:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/match_modalities/methods/harmonic_alignment:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/match_modalities/methods/harmonic_alignment:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_MOD1+x} ]; then
+  ViashError '--input_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_MOD2+x} ]; then
+  ViashError '--input_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then
+  ViashError '--output_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then
+  ViashError '--output_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_PCA_XY+x} ]; then
+  VIASH_PAR_N_PCA_XY="100"
+fi
+if [ -z ${VIASH_PAR_N_EIGENVECTORS+x} ]; then
+  VIASH_PAR_N_EIGENVECTORS="100"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD2' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_PCA_XY" ]]; then
+  if ! [[ "$VIASH_PAR_N_PCA_XY" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_pca_XY' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_EIGENVECTORS" ]]; then
+  if ! [[ "$VIASH_PAR_N_EIGENVECTORS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_eigenvectors' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD1")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD1")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD2")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD2")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD1")" )
+  VIASH_PAR_INPUT_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD2")" )
+  VIASH_PAR_INPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD1")" )
+  VIASH_PAR_OUTPUT_MOD1=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD1")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD1" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD2")" )
+  VIASH_PAR_OUTPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD2")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD2" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/match_modalities/methods/harmonic_alignment:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/match_modalities/methods/harmonic_alignment:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/match_modalities/methods/harmonic_alignment:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-harmonic_alignment-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import harmonicalignment
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_pca_XY': $( if [ ! -z ${VIASH_PAR_N_PCA_XY+x} ]; then echo "int(r'${VIASH_PAR_N_PCA_XY//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_eigenvectors': $( if [ ! -z ${VIASH_PAR_N_EIGENVECTORS+x} ]; then echo "int(r'${VIASH_PAR_N_EIGENVECTORS//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+
+print("Reading input h5ad file", flush=True)
+adata_mod1 = ad.read_h5ad(par["input_mod1"])
+adata_mod2 = ad.read_h5ad(par["input_mod2"])
+
+print("Check parameters", flush=True)
+n_eigenvectors = par["n_eigenvectors"]
+n_pca_XY = par["n_pca_XY"]
+
+if adata_mod1.layers["normalized"].shape[0] <= n_eigenvectors:
+    n_eigenvectors = None
+if adata_mod1.layers["normalized"].shape[0] <= n_pca_XY:
+    n_pca_XY = None
+
+
+print("Running Harmonic Alignment", flush=True)
+ha_op = harmonicalignment.HarmonicAlignment(
+    n_filters=8, n_pca_XY=n_pca_XY, n_eigenvectors=n_eigenvectors
+)
+ha_op.align(adata_mod1.obsm["X_svd"], adata_mod2.obsm["X_svd"])
+XY_aligned = ha_op.diffusion_map(n_eigenvectors=n_eigenvectors)
+
+print("Storing output data structures", flush=True)
+
+adata_mod1.obsm["integrated"] = XY_aligned[: adata_mod1.obsm["X_svd"].shape[0]]
+adata_mod2.obsm["integrated"] = XY_aligned[-adata_mod2.obsm["X_svd"].shape[0] :]
+
+print("Write output to file", flush=True)
+adata_mod1.uns["method_id"] = meta["functionality_name"]
+adata_mod2.uns["method_id"] = meta["functionality_name"]
+adata_mod1.write_h5ad(par["output_mod1"], compression = "gzip")
+adata_mod2.write_h5ad(par["output_mod2"], compression = "gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ]; then
+  VIASH_PAR_INPUT_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_PAR_INPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_PAR_OUTPUT_MOD1=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_PAR_OUTPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD2")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD2' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/match_modalities/methods/procrustes/.config.vsh.yaml b/target/docker/match_modalities/methods/procrustes/.config.vsh.yaml
new file mode 100644
index 0000000000..c897dbc496
--- /dev/null
+++ b/target/docker/match_modalities/methods/procrustes/.config.vsh.yaml
@@ -0,0 +1,245 @@
+functionality:
+  name: "procrustes"
+  namespace: "match_modalities/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_mod1"
+    info:
+      label: "Modality 1"
+      summary: "The first modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Modality 2"
+      summary: "The second modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod1"
+    info:
+      label: "Integrated mod1"
+      summary: "The integrated embedding for the first modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod2"
+    info:
+      label: "Integrated mod2"
+      summary: "The integrated embedding for the second modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/match_modalities/scicar_cell_lines"
+    dest: "resources_test/match_modalities/scicar_cell_lines"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Procrustes"
+    summary: "\"Procrustes superimposition embeds cellular data from each modality\
+      \ into a common space.\"\n"
+    description: "\"Procrustes superimposition embeds cellular data from each modality\
+      \ into a common space by aligning the 100-dimensional SVD embeddings to one\
+      \ another by using an isomorphic transformation that minimizes the root mean\
+      \ squared distance between points. The unmodified SVD embedding and the transformed\
+      \ second modality are used as output for the task.\"\n"
+    v1:
+      path: "openproblems/tasks/matching_modalities/methods/procrustes.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    reference: "gower1975generalized"
+    documentation_url: "https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.procrustes.html"
+    repository_url: "https://github.com/scipy/scipy"
+    preferred_normalization: "log_cp10k"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A multimodal data integration method."
+      description: "A multimodal method to integrate data.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scipy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/procrustes/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/match_modalities/methods/procrustes"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/match_modalities/methods/procrustes/procrustes"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/match_modalities/methods/procrustes/procrustes b/target/docker/match_modalities/methods/procrustes/procrustes
new file mode 100755
index 0000000000..a3c690961e
--- /dev/null
+++ b/target/docker/match_modalities/methods/procrustes/procrustes
@@ -0,0 +1,1012 @@
+#!/usr/bin/env bash
+
+# procrustes 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="procrustes"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "procrustes 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+  echo ""
+  echo "    --input_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+  echo ""
+  echo "    --output_mod1"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+  echo ""
+  echo "    --output_mod2"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scipy"
+
+LABEL org.opencontainers.image.description="Companion container for running component match_modalities/methods procrustes"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:30Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-procrustes-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "procrustes 2.0.0"
+            exit
+            ;;
+        --input_mod1)
+            [ -n "$VIASH_PAR_INPUT_MOD1" ] && ViashError Bad arguments for option \'--input_mod1\': \'$VIASH_PAR_INPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_MOD1" ] && ViashError Bad arguments for option \'--input_mod1=*\': \'$VIASH_PAR_INPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_mod2)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2=*\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod1)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod1=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1=*\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod2)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod2=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2=*\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/match_modalities/methods/procrustes:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/match_modalities/methods/procrustes:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/match_modalities/methods/procrustes:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/match_modalities/methods/procrustes:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_MOD1+x} ]; then
+  ViashError '--input_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_MOD2+x} ]; then
+  ViashError '--input_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then
+  ViashError '--output_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then
+  ViashError '--output_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD2' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD1")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD1")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD2")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD2")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD1")" )
+  VIASH_PAR_INPUT_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD2")" )
+  VIASH_PAR_INPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD1")" )
+  VIASH_PAR_OUTPUT_MOD1=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD1")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD1" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD2")" )
+  VIASH_PAR_OUTPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD2")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD2" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/match_modalities/methods/procrustes:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/match_modalities/methods/procrustes:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/match_modalities/methods/procrustes:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-procrustes-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import scipy.spatial
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Reading input h5ad file", flush=True)
+adata_mod1 = ad.read_h5ad(par["input_mod1"])
+adata_mod2 = ad.read_h5ad(par["input_mod2"])
+
+print("procrustes alignment", flush=True)
+X_proc, Y_proc, _ = scipy.spatial.procrustes(adata_mod1.obsm["X_svd"], adata_mod2.obsm["X_svd"])
+
+print("Storing output data", flush=True)
+adata_mod1.obsm["integrated"] = X_proc
+adata_mod2.obsm["integrated"] = Y_proc
+
+print("Write output to file", flush=True)
+adata_mod1.uns["method_id"] = meta["functionality_name"]
+adata_mod2.uns["method_id"] = meta["functionality_name"]
+adata_mod1.write_h5ad(par["output_mod1"], compression = "gzip")
+adata_mod2.write_h5ad(par["output_mod2"], compression = "gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ]; then
+  VIASH_PAR_INPUT_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_PAR_INPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_PAR_OUTPUT_MOD1=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_PAR_OUTPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD2")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD2' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/match_modalities/methods/scot/.config.vsh.yaml b/target/docker/match_modalities/methods/scot/.config.vsh.yaml
new file mode 100644
index 0000000000..25058eab9e
--- /dev/null
+++ b/target/docker/match_modalities/methods/scot/.config.vsh.yaml
@@ -0,0 +1,250 @@
+functionality:
+  name: "scot"
+  namespace: "match_modalities/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_mod1"
+    info:
+      label: "Modality 1"
+      summary: "The first modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Modality 2"
+      summary: "The second modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod1"
+    info:
+      label: "Integrated mod1"
+      summary: "The integrated embedding for the first modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod2"
+    info:
+      label: "Integrated mod2"
+      summary: "The integrated embedding for the second modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean_true"
+    name: "--balanced"
+    description: "Determines whether balanced or unbalanced optimal transport. In\
+      \ the balanced case, the target and source distributions are assumed to have\
+      \ equal mass."
+    info: null
+    direction: "input"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/match_modalities/scicar_cell_lines"
+    dest: "resources_test/match_modalities/scicar_cell_lines"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Single Cell Optimal Transport"
+    description: "Single Cell Optimal Transport (SCOT) is a method for integrating\
+      \ multimodal single-cell data. It is based on the idea of aligning the distributions\
+      \ of the two modalities using optimal transport.\n"
+    summary: "Run Single Cell Optimal Transport"
+    preferred_normalization: "log_cp10k"
+    reference: "Demetci2020scot"
+    documentation_url: "https://github.com/rsinghlab/SCOT#readme"
+    repository_url: "https://github.com/rsinghlab/SCOT"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A multimodal data integration method."
+      description: "A multimodal method to integrate data.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    interactive: false
+  - type: "docker"
+    run:
+    - "cd /opt && git clone --depth 1 https://github.com/rsinghlab/SCOT.git && cd\
+      \ SCOT && pip install -r requirements.txt"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/scot/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/match_modalities/methods/scot"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/match_modalities/methods/scot/scot"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/match_modalities/methods/scot/scot b/target/docker/match_modalities/methods/scot/scot
new file mode 100755
index 0000000000..aacb448ee8
--- /dev/null
+++ b/target/docker/match_modalities/methods/scot/scot
@@ -0,0 +1,1053 @@
+#!/usr/bin/env bash
+
+# scot 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="scot"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "scot 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+  echo ""
+  echo "    --input_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+  echo ""
+  echo "    --output_mod1"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+  echo ""
+  echo "    --output_mod2"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+  echo ""
+  echo "    --balanced"
+  echo "        type: boolean_true"
+  echo "        Determines whether balanced or unbalanced optimal transport. In the"
+  echo "        balanced case, the target and source distributions are assumed to have"
+  echo "        equal mass."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN apt-get update && \
+  DEBIAN_FRONTEND=noninteractive apt-get install -y git && \
+  rm -rf /var/lib/apt/lists/*
+
+RUN cd /opt && git clone --depth 1 https://github.com/rsinghlab/SCOT.git && cd SCOT && pip install -r requirements.txt
+LABEL org.opencontainers.image.description="Companion container for running component match_modalities/methods scot"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:31Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-scot-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "scot 2.0.0"
+            exit
+            ;;
+        --input_mod1)
+            [ -n "$VIASH_PAR_INPUT_MOD1" ] && ViashError Bad arguments for option \'--input_mod1\': \'$VIASH_PAR_INPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_MOD1" ] && ViashError Bad arguments for option \'--input_mod1=*\': \'$VIASH_PAR_INPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_mod2)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2=*\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod1)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod1=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1=*\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod2)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod2=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2=*\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --balanced)
+            [ -n "$VIASH_PAR_BALANCED" ] && ViashError Bad arguments for option \'--balanced\': \'$VIASH_PAR_BALANCED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_BALANCED=true
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/match_modalities/methods/scot:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/match_modalities/methods/scot:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/match_modalities/methods/scot:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/match_modalities/methods/scot:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_MOD1+x} ]; then
+  ViashError '--input_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_MOD2+x} ]; then
+  ViashError '--input_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then
+  ViashError '--output_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then
+  ViashError '--output_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_BALANCED+x} ]; then
+  VIASH_PAR_BALANCED="false"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD2' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_BALANCED" ]]; then
+  if ! [[ "$VIASH_PAR_BALANCED" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--balanced' has to be a boolean_true. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD1")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD1")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD2")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD2")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD1")" )
+  VIASH_PAR_INPUT_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD2")" )
+  VIASH_PAR_INPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD1")" )
+  VIASH_PAR_OUTPUT_MOD1=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD1")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD1" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD2")" )
+  VIASH_PAR_OUTPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD2")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD2" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/match_modalities/methods/scot:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/match_modalities/methods/scot:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/match_modalities/methods/scot:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-scot-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import sys
+sys.path.append("/opt/SCOT/src/")
+import scotv1
+import pandas as pd
+
+# importing helper functions from common preprocessing.py file in resources dir
+import sys
+
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'balanced': $( if [ ! -z ${VIASH_PAR_BALANCED+x} ]; then echo "r'${VIASH_PAR_BALANCED//\'/\'\"\'\"r\'}'.lower() == 'true'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+
+print("Reading input h5ad file", flush=True)
+adata_mod1 = ad.read_h5ad(par["input_mod1"])
+adata_mod2 = ad.read_h5ad(par["input_mod2"])
+
+
+print("Initialize SCOT", flush=True)
+scot = scotv1.SCOT(adata_mod1.obsm["X_svd"], adata_mod2.obsm["X_svd"])
+
+print("Call the unbalanced alignment", flush=True)
+# From https://github.com/rsinghlab/SCOT/blob/master/examples/unbalanced_GW_SNAREseq.ipynb # noqa: 501
+X_new_unbal, y_new_unbal = scot.align(
+    k=50, e=1e-3, normalize=True
+)
+
+
+print("store output", flush=True)
+adata_mod1.obsm["integrated"] = X_new_unbal
+adata_mod2.obsm["integrated"] = y_new_unbal
+
+print("Write output to file", flush=True)
+adata_mod1.uns["method_id"] = meta["functionality_name"]
+adata_mod2.uns["method_id"] = meta["functionality_name"]
+adata_mod1.write_h5ad(par["output_mod1"], compression = "gzip")
+adata_mod2.write_h5ad(par["output_mod2"], compression = "gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ]; then
+  VIASH_PAR_INPUT_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_PAR_INPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_PAR_OUTPUT_MOD1=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_PAR_OUTPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD2")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD2' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/match_modalities/metrics/knn_auc/.config.vsh.yaml b/target/docker/match_modalities/metrics/knn_auc/.config.vsh.yaml
new file mode 100644
index 0000000000..790370a7b2
--- /dev/null
+++ b/target/docker/match_modalities/metrics/knn_auc/.config.vsh.yaml
@@ -0,0 +1,352 @@
+functionality:
+  name: "knn_auc"
+  namespace: "match_modalities/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated_mod1"
+    info:
+      label: "Integrated mod1"
+      summary: "The integrated embedding for the first modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_integrated_mod2"
+    info:
+      label: "Integrated mod2"
+      summary: "The integrated embedding for the second modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution_mod1"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the first modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution_mod2"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the second modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--proportion_neighbors"
+    description: "The proportion of neighbours to use in computing the KNN."
+    info: null
+    default:
+    - 0.1
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/match_modalities/scicar_cell_lines"
+    dest: "resources_test/match_modalities/scicar_cell_lines"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - label: "kNN Area Under the Curve"
+      name: "knn_auc"
+      summary: "Compute the kNN Area Under the Curve"
+      description: "Let $f(i) \\in F$ be the scRNA-seq measurement of cell $i$, and\
+        \ $g(i) \\in G$ be the scATAC- seq measurement of cell $i$. kNN-AUC calculates\
+        \ the average percentage overlap of neighborhoods of $f(i)$ in $F$ with neighborhoods\
+        \ of $g(i)$ in $G$. Higher is better.\n"
+      reference: "lance2022multimodal"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/matching_modalities/metrics/knn_auc.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A multimodal data integration metric."
+      description: "A metric for evaluating integrated data.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/metrics/knn_auc/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/match_modalities/metrics/knn_auc"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/match_modalities/metrics/knn_auc/knn_auc"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/match_modalities/metrics/knn_auc/knn_auc b/target/docker/match_modalities/metrics/knn_auc/knn_auc
new file mode 100755
index 0000000000..e9398de825
--- /dev/null
+++ b/target/docker/match_modalities/metrics/knn_auc/knn_auc
@@ -0,0 +1,1109 @@
+#!/usr/bin/env bash
+
+# knn_auc 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="knn_auc"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "knn_auc 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_integrated_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+  echo ""
+  echo "    --input_integrated_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+  echo ""
+  echo "    --input_solution_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+  echo ""
+  echo "    --input_solution_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/match_modalities/scicar_cell_lines/score.h5ad"
+  echo ""
+  echo "    --proportion_neighbors"
+  echo "        type: double"
+  echo "        default: 0.1"
+  echo "        The proportion of neighbours to use in computing the KNN."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "numpy" "scikit-learn"
+
+LABEL org.opencontainers.image.description="Companion container for running component match_modalities/metrics knn_auc"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:32Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-knn_auc-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "knn_auc 2.0.0"
+            exit
+            ;;
+        --input_integrated_mod1)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED_MOD1" ] && ViashError Bad arguments for option \'--input_integrated_mod1\': \'$VIASH_PAR_INPUT_INTEGRATED_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_integrated_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_integrated_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED_MOD1" ] && ViashError Bad arguments for option \'--input_integrated_mod1=*\': \'$VIASH_PAR_INPUT_INTEGRATED_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_integrated_mod2)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED_MOD2" ] && ViashError Bad arguments for option \'--input_integrated_mod2\': \'$VIASH_PAR_INPUT_INTEGRATED_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_integrated_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_integrated_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED_MOD2" ] && ViashError Bad arguments for option \'--input_integrated_mod2=*\': \'$VIASH_PAR_INPUT_INTEGRATED_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution_mod1)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION_MOD1" ] && ViashError Bad arguments for option \'--input_solution_mod1\': \'$VIASH_PAR_INPUT_SOLUTION_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION_MOD1" ] && ViashError Bad arguments for option \'--input_solution_mod1=*\': \'$VIASH_PAR_INPUT_SOLUTION_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution_mod2)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION_MOD2" ] && ViashError Bad arguments for option \'--input_solution_mod2\': \'$VIASH_PAR_INPUT_SOLUTION_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION_MOD2" ] && ViashError Bad arguments for option \'--input_solution_mod2=*\': \'$VIASH_PAR_INPUT_SOLUTION_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --proportion_neighbors)
+            [ -n "$VIASH_PAR_PROPORTION_NEIGHBORS" ] && ViashError Bad arguments for option \'--proportion_neighbors\': \'$VIASH_PAR_PROPORTION_NEIGHBORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_PROPORTION_NEIGHBORS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --proportion_neighbors. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --proportion_neighbors=*)
+            [ -n "$VIASH_PAR_PROPORTION_NEIGHBORS" ] && ViashError Bad arguments for option \'--proportion_neighbors=*\': \'$VIASH_PAR_PROPORTION_NEIGHBORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_PROPORTION_NEIGHBORS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/match_modalities/metrics/knn_auc:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/match_modalities/metrics/knn_auc:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/match_modalities/metrics/knn_auc:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/match_modalities/metrics/knn_auc:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_INTEGRATED_MOD1+x} ]; then
+  ViashError '--input_integrated_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_INTEGRATED_MOD2+x} ]; then
+  ViashError '--input_integrated_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION_MOD1+x} ]; then
+  ViashError '--input_solution_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION_MOD2+x} ]; then
+  ViashError '--input_solution_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_PROPORTION_NEIGHBORS+x} ]; then
+  VIASH_PAR_PROPORTION_NEIGHBORS="0.1"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_INTEGRATED_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_INTEGRATED_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_INTEGRATED_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_INTEGRATED_MOD2' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION_MOD2' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_PROPORTION_NEIGHBORS" ]]; then
+  if ! [[ "$VIASH_PAR_PROPORTION_NEIGHBORS" =~ ^[-+]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)([eE][-+]?[0-9]+)?$ ]]; then
+    ViashError '--proportion_neighbors' has to be a double. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_INTEGRATED_MOD1")" )
+  VIASH_PAR_INPUT_INTEGRATED_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_INTEGRATED_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_INTEGRATED_MOD2")" )
+  VIASH_PAR_INPUT_INTEGRATED_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_INTEGRATED_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION_MOD1")" )
+  VIASH_PAR_INPUT_SOLUTION_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION_MOD2")" )
+  VIASH_PAR_INPUT_SOLUTION_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/match_modalities/metrics/knn_auc:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/match_modalities/metrics/knn_auc:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/match_modalities/metrics/knn_auc:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-knn_auc-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+import sklearn.decomposition
+import sklearn.neighbors
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_integrated_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'proportion_neighbors': $( if [ ! -z ${VIASH_PAR_PROPORTION_NEIGHBORS+x} ]; then echo "float(r'${VIASH_PAR_PROPORTION_NEIGHBORS//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Reading adata file", flush=True)
+input_solution_mod1 = ad.read_h5ad(par["input_solution_mod1"])
+input_solution_mod2 = ad.read_h5ad(par["input_solution_mod2"])
+
+input_integrated_mod1 = ad.read_h5ad(par["input_integrated_mod1"])[input_solution_mod1.obs["permutation_indices"]]
+input_integrated_mod2 = ad.read_h5ad(par["input_integrated_mod2"])[input_solution_mod2.obs["permutation_indices"]]
+
+print("Checking parameters", flush=True)
+n_neighbors = int(np.ceil(par["proportion_neighbors"] * input_solution_mod1.n_obs))
+
+print("Compute KNN on PCA", flush=True)
+_, indices_true = (
+    sklearn.neighbors.NearestNeighbors(n_neighbors=n_neighbors)
+    .fit(input_solution_mod1.obsm["X_svd"])
+    .kneighbors(input_solution_mod1.obsm["X_svd"])
+)
+
+_, indices_pred = (
+    sklearn.neighbors.NearestNeighbors(n_neighbors=n_neighbors)
+    .fit(input_integrated_mod1.obsm["integrated"])
+    .kneighbors(input_integrated_mod2.obsm["integrated"])
+)
+
+print("Check which neighbours match", flush=True)
+neighbors_match = np.zeros(n_neighbors, dtype=int)
+for i in range(input_solution_mod1.n_obs):
+    _, pred_matches, true_matches = np.intersect1d(
+        indices_pred[i], indices_true[i], return_indices=True
+    )
+    neighbors_match_idx = np.maximum(pred_matches, true_matches)
+    neighbors_match += np.sum(
+        np.arange(n_neighbors) >= neighbors_match_idx[:, None],
+        axis=0,
+    )
+
+print("Compute area under neighbours match curve", flush=True)
+neighbors_match_curve = neighbors_match / (
+    np.arange(1, n_neighbors + 1) * input_solution_mod1.n_obs
+)
+area_under_curve = np.mean(neighbors_match_curve)
+
+print("Store metric value", flush=True)
+uns = {
+  "dataset_id": input_solution_mod1.uns["dataset_id"],
+  "normalization_id": input_solution_mod1.uns["normalization_id"],
+  "method_id": input_integrated_mod1.uns["method_id"],
+  "metric_ids": "knn_auc",
+  "metric_values": area_under_curve
+}
+output_metric = ad.AnnData(
+  shape=(0,0),
+  uns=uns
+)
+
+print("Writing adata to file", flush=True)
+output_metric.write_h5ad(par["output"], compression = "gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED_MOD1" ]; then
+  VIASH_PAR_INPUT_INTEGRATED_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_INTEGRATED_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED_MOD2" ]; then
+  VIASH_PAR_INPUT_INTEGRATED_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_INTEGRATED_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD1" ]; then
+  VIASH_PAR_INPUT_SOLUTION_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD2" ]; then
+  VIASH_PAR_INPUT_SOLUTION_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/match_modalities/metrics/mse/.config.vsh.yaml b/target/docker/match_modalities/metrics/mse/.config.vsh.yaml
new file mode 100644
index 0000000000..511d7c7d11
--- /dev/null
+++ b/target/docker/match_modalities/metrics/mse/.config.vsh.yaml
@@ -0,0 +1,341 @@
+functionality:
+  name: "mse"
+  namespace: "match_modalities/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated_mod1"
+    info:
+      label: "Integrated mod1"
+      summary: "The integrated embedding for the first modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_integrated_mod2"
+    info:
+      label: "Integrated mod2"
+      summary: "The integrated embedding for the second modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution_mod1"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the first modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution_mod2"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the second modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/match_modalities/scicar_cell_lines"
+    dest: "resources_test/match_modalities/scicar_cell_lines"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - label: "Mean Squared Error"
+      name: "mse"
+      summary: "Compute the mean squared error."
+      description: "Mean squared error (MSE) is the average distance between each\
+        \ pair of matched observations of the same cell in the learned latent space.\
+        \ Lower is better.\n"
+      reference: "lance2022multimodal"
+      maximize: false
+      min: 0
+      max: "+.inf"
+      v1:
+        path: "openproblems/tasks/matching_modalities/metrics/mse.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A multimodal data integration metric."
+      description: "A metric for evaluating integrated data.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy<2"
+    - "scipy"
+    - "scprep"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/metrics/mse/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/match_modalities/metrics/mse"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/match_modalities/metrics/mse/mse"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/match_modalities/metrics/mse/mse b/target/docker/match_modalities/metrics/mse/mse
new file mode 100755
index 0000000000..8906ee99e9
--- /dev/null
+++ b/target/docker/match_modalities/metrics/mse/mse
@@ -0,0 +1,1063 @@
+#!/usr/bin/env bash
+
+# mse 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="mse"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "mse 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_integrated_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+  echo ""
+  echo "    --input_integrated_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+  echo ""
+  echo "    --input_solution_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+  echo ""
+  echo "    --input_solution_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/match_modalities/scicar_cell_lines/score.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "numpy<2" "scipy" "scprep"
+
+LABEL org.opencontainers.image.description="Companion container for running component match_modalities/metrics mse"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:32Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-mse-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "mse 2.0.0"
+            exit
+            ;;
+        --input_integrated_mod1)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED_MOD1" ] && ViashError Bad arguments for option \'--input_integrated_mod1\': \'$VIASH_PAR_INPUT_INTEGRATED_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_integrated_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_integrated_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED_MOD1" ] && ViashError Bad arguments for option \'--input_integrated_mod1=*\': \'$VIASH_PAR_INPUT_INTEGRATED_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_integrated_mod2)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED_MOD2" ] && ViashError Bad arguments for option \'--input_integrated_mod2\': \'$VIASH_PAR_INPUT_INTEGRATED_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_integrated_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_integrated_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_INTEGRATED_MOD2" ] && ViashError Bad arguments for option \'--input_integrated_mod2=*\': \'$VIASH_PAR_INPUT_INTEGRATED_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_INTEGRATED_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution_mod1)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION_MOD1" ] && ViashError Bad arguments for option \'--input_solution_mod1\': \'$VIASH_PAR_INPUT_SOLUTION_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION_MOD1" ] && ViashError Bad arguments for option \'--input_solution_mod1=*\': \'$VIASH_PAR_INPUT_SOLUTION_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution_mod2)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION_MOD2" ] && ViashError Bad arguments for option \'--input_solution_mod2\': \'$VIASH_PAR_INPUT_SOLUTION_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION_MOD2" ] && ViashError Bad arguments for option \'--input_solution_mod2=*\': \'$VIASH_PAR_INPUT_SOLUTION_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/match_modalities/metrics/mse:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/match_modalities/metrics/mse:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/match_modalities/metrics/mse:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/match_modalities/metrics/mse:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_INTEGRATED_MOD1+x} ]; then
+  ViashError '--input_integrated_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_INTEGRATED_MOD2+x} ]; then
+  ViashError '--input_integrated_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION_MOD1+x} ]; then
+  ViashError '--input_solution_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION_MOD2+x} ]; then
+  ViashError '--input_solution_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_INTEGRATED_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_INTEGRATED_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_INTEGRATED_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_INTEGRATED_MOD2' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION_MOD2' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_INTEGRATED_MOD1")" )
+  VIASH_PAR_INPUT_INTEGRATED_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_INTEGRATED_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_INTEGRATED_MOD2")" )
+  VIASH_PAR_INPUT_INTEGRATED_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_INTEGRATED_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION_MOD1")" )
+  VIASH_PAR_INPUT_SOLUTION_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION_MOD2")" )
+  VIASH_PAR_INPUT_SOLUTION_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/match_modalities/metrics/mse:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/match_modalities/metrics/mse:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/match_modalities/metrics/mse:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-mse-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+from scipy import sparse
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_integrated_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Reading adata file", flush=True)
+input_solution_mod1 = ad.read_h5ad(par["input_solution_mod1"])
+input_solution_mod2 = ad.read_h5ad(par["input_solution_mod2"])
+
+input_integrated_mod1 = ad.read_h5ad(par["input_integrated_mod1"])[input_solution_mod1.obs["permutation_indices"]]
+input_integrated_mod2 = ad.read_h5ad(par["input_integrated_mod2"])[input_solution_mod2.obs["permutation_indices"]]
+
+print("Computing MSE", flush=True)
+def _square(X):
+	if sparse.issparse(X):
+		X.data = X.data ** 2
+		return X
+	else:
+		return X ** 2
+
+
+X = input_integrated_mod1.obsm["integrated"].toarray()
+Y = input_integrated_mod2.obsm["integrated"].toarray()
+
+X_shuffled = X[np.random.permutation(np.arange(X.shape[0])), :]
+error_random = np.mean(np.sum(_square(X_shuffled - Y)))
+error_abs = np.mean(np.sum(_square(X - Y)))
+metric_value = (error_abs / error_random).item()
+
+print("Store metric value", flush=True)
+uns = {
+  "dataset_id": input_solution_mod1.uns["dataset_id"],
+  "normalization_id": input_solution_mod1.uns["normalization_id"],
+  "method_id": input_integrated_mod1.uns["method_id"],
+  "metric_ids": "mse",
+  "metric_values": metric_value
+}
+output_metric = ad.AnnData(
+  shape=(0,0),
+  uns=uns
+)
+
+print("Writing adata to file", flush=True)
+output_metric.write_h5ad(par["output"], compression = "gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED_MOD1" ]; then
+  VIASH_PAR_INPUT_INTEGRATED_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_INTEGRATED_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_INTEGRATED_MOD2" ]; then
+  VIASH_PAR_INPUT_INTEGRATED_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_INTEGRATED_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD1" ]; then
+  VIASH_PAR_INPUT_SOLUTION_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION_MOD2" ]; then
+  VIASH_PAR_INPUT_SOLUTION_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/match_modalities/process_dataset/.config.vsh.yaml b/target/docker/match_modalities/process_dataset/.config.vsh.yaml
new file mode 100644
index 0000000000..0836ee8ecc
--- /dev/null
+++ b/target/docker/match_modalities/process_dataset/.config.vsh.yaml
@@ -0,0 +1,431 @@
+functionality:
+  name: "process_dataset"
+  namespace: "match_modalities"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_mod1"
+    info:
+      label: "Common dataset mod1"
+      summary: "The first modality (RNA) of a dataset processed by the common multimodal\
+        \ dataset processing pipeline."
+      description: "This dataset contains both raw counts and normalized data matrices,\n\
+        as well as a PCA embedding, HVG selection and a kNN graph.\n"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/common/scicar_cell_lines/dataset_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Common dataset mod2"
+      summary: "The second modality (ADT or ATAC) of a dataset processed by the common\
+        \ multimodal dataset processing pipeline."
+      description: "This dataset contains both raw counts and normalized data matrices,\n\
+        as well as a PCA embedding, HVG selection and a kNN graph.\n"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/common/scicar_cell_lines/dataset_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod1"
+    info:
+      label: "Modality 1"
+      summary: "The first modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod2"
+    info:
+      label: "Modality 2"
+      summary: "The second modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_solution_mod1"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the first modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_solution_mod2"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the second modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--seed"
+    description: "A seed for the subsampling."
+    info: null
+    example:
+    - 123
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/subset_anndata.py"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/scicar_cell_lines"
+    dest: "resources_test/common/scicar_cell_lines"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "process_dataset"
+    type_info:
+      label: "Data processor"
+      summary: "A match modalities dataset processor."
+      description: "A component for processing a Common Dataset into a task-specific\
+        \ dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/process_dataset/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/match_modalities/process_dataset"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/match_modalities/process_dataset/process_dataset"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/match_modalities/process_dataset/process_dataset b/target/docker/match_modalities/process_dataset/process_dataset
new file mode 100755
index 0000000000..b0e34ac8fe
--- /dev/null
+++ b/target/docker/match_modalities/process_dataset/process_dataset
@@ -0,0 +1,1132 @@
+#!/usr/bin/env bash
+
+# process_dataset 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="process_dataset"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "process_dataset 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/common/scicar_cell_lines/dataset_mod1.h5ad"
+  echo ""
+  echo "    --input_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/common/scicar_cell_lines/dataset_mod2.h5ad"
+  echo ""
+  echo "    --output_mod1"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+  echo ""
+  echo "    --output_mod2"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+  echo ""
+  echo "    --output_solution_mod1"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+  echo ""
+  echo "    --output_solution_mod2"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+  echo ""
+  echo "    --seed"
+  echo "        type: integer"
+  echo "        example: 123"
+  echo "        A seed for the subsampling."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component match_modalities process_dataset"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:30Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-process_dataset-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "process_dataset 2.0.0"
+            exit
+            ;;
+        --input_mod1)
+            [ -n "$VIASH_PAR_INPUT_MOD1" ] && ViashError Bad arguments for option \'--input_mod1\': \'$VIASH_PAR_INPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_MOD1" ] && ViashError Bad arguments for option \'--input_mod1=*\': \'$VIASH_PAR_INPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_mod2)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2=*\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod1)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod1=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD1" ] && ViashError Bad arguments for option \'--output_mod1=*\': \'$VIASH_PAR_OUTPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_mod2)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_mod2=*)
+            [ -n "$VIASH_PAR_OUTPUT_MOD2" ] && ViashError Bad arguments for option \'--output_mod2=*\': \'$VIASH_PAR_OUTPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_solution_mod1)
+            [ -n "$VIASH_PAR_OUTPUT_SOLUTION_MOD1" ] && ViashError Bad arguments for option \'--output_solution_mod1\': \'$VIASH_PAR_OUTPUT_SOLUTION_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_SOLUTION_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_solution_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_solution_mod1=*)
+            [ -n "$VIASH_PAR_OUTPUT_SOLUTION_MOD1" ] && ViashError Bad arguments for option \'--output_solution_mod1=*\': \'$VIASH_PAR_OUTPUT_SOLUTION_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_SOLUTION_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_solution_mod2)
+            [ -n "$VIASH_PAR_OUTPUT_SOLUTION_MOD2" ] && ViashError Bad arguments for option \'--output_solution_mod2\': \'$VIASH_PAR_OUTPUT_SOLUTION_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_SOLUTION_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_solution_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_solution_mod2=*)
+            [ -n "$VIASH_PAR_OUTPUT_SOLUTION_MOD2" ] && ViashError Bad arguments for option \'--output_solution_mod2=*\': \'$VIASH_PAR_OUTPUT_SOLUTION_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_SOLUTION_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --seed)
+            [ -n "$VIASH_PAR_SEED" ] && ViashError Bad arguments for option \'--seed\': \'$VIASH_PAR_SEED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SEED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --seed. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --seed=*)
+            [ -n "$VIASH_PAR_SEED" ] && ViashError Bad arguments for option \'--seed=*\': \'$VIASH_PAR_SEED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SEED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/match_modalities/process_dataset:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/match_modalities/process_dataset:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/match_modalities/process_dataset:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/match_modalities/process_dataset:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_MOD1+x} ]; then
+  ViashError '--input_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_MOD2+x} ]; then
+  ViashError '--input_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then
+  ViashError '--output_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then
+  ViashError '--output_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_SOLUTION_MOD1+x} ]; then
+  ViashError '--output_solution_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_SOLUTION_MOD2+x} ]; then
+  ViashError '--output_solution_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD2' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_SEED" ]]; then
+  if ! [[ "$VIASH_PAR_SEED" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--seed' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD1")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD1")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_MOD2")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_MOD2")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION_MOD1" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_SOLUTION_MOD1")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_SOLUTION_MOD1")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION_MOD2" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_SOLUTION_MOD2")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_SOLUTION_MOD2")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD1")" )
+  VIASH_PAR_INPUT_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD2")" )
+  VIASH_PAR_INPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD1")" )
+  VIASH_PAR_OUTPUT_MOD1=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD1")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD1" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_MOD2")" )
+  VIASH_PAR_OUTPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_MOD2")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_MOD2" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_SOLUTION_MOD1")" )
+  VIASH_PAR_OUTPUT_SOLUTION_MOD1=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_SOLUTION_MOD1")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_SOLUTION_MOD1" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_SOLUTION_MOD2")" )
+  VIASH_PAR_OUTPUT_SOLUTION_MOD2=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_SOLUTION_MOD2")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_SOLUTION_MOD2" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/match_modalities/process_dataset:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/match_modalities/process_dataset:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/match_modalities/process_dataset:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-process_dataset-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import random
+import numpy as np
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_solution_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_SOLUTION_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_SOLUTION_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_solution_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_SOLUTION_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_SOLUTION_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'seed': $( if [ ! -z ${VIASH_PAR_SEED+x} ]; then echo "int(r'${VIASH_PAR_SEED//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# import helper functions
+sys.path.append(meta["resources_dir"])
+from subset_anndata import read_config_slots_info, subset_anndata
+
+# set seed if need be
+if par["seed"]:
+    print(f">> Setting seed to {par['seed']}")
+    random.seed(par["seed"])
+
+print(">> Load data", flush=True)
+input_mod1 = ad.read_h5ad(par["input_mod1"])
+input_mod2 = ad.read_h5ad(par["input_mod2"])
+
+print(f">> Permute input data")
+mod1_perm = np.random.permutation(np.arange(input_mod1.n_obs))
+mod2_perm = np.random.permutation(np.arange(input_mod2.n_obs))
+
+output_mod1 = input_mod1[mod1_perm]
+output_mod1.obs_names = [f"cell_mod1_{i}" for i in range(output_mod1.n_obs)]
+output_mod2 = input_mod2[mod2_perm]
+output_mod2.obs_names = [f"cell_mod2_{i}" for i in range(output_mod2.n_obs)]
+
+print(f">> Create solution objects")
+output_solution_mod1 = input_mod1.copy()
+output_solution_mod1.obs["permutation_indices"] = np.argsort(mod1_perm)
+output_solution_mod2 = input_mod2.copy()
+output_solution_mod2.obs["permutation_indices"] = np.argsort(mod2_perm)
+    
+# subset the different adatas
+print(">> Read slot info from config file", flush=True)
+slot_info = read_config_slots_info(meta["config"])
+
+print(">> Subset anndatas", flush=True)
+output_mod1 = subset_anndata(output_mod1, slot_info["output_mod1"])
+output_mod2 = subset_anndata(output_mod2, slot_info["output_mod2"])
+output_solution_mod1 = subset_anndata(output_solution_mod1, slot_info["output_solution_mod1"])
+output_solution_mod2 = subset_anndata(output_solution_mod2, slot_info["output_solution_mod2"])
+
+print(">> Writing data", flush=True)
+output_mod1.write_h5ad(par["output_mod1"])
+output_mod2.write_h5ad(par["output_mod2"])
+output_solution_mod1.write_h5ad(par["output_solution_mod1"])
+output_solution_mod2.write_h5ad(par["output_solution_mod2"])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ]; then
+  VIASH_PAR_INPUT_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_PAR_INPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  VIASH_PAR_OUTPUT_MOD1=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  VIASH_PAR_OUTPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION_MOD1" ]; then
+  VIASH_PAR_OUTPUT_SOLUTION_MOD1=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_SOLUTION_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION_MOD2" ]; then
+  VIASH_PAR_OUTPUT_SOLUTION_MOD2=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_SOLUTION_MOD2")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD1" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD1" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_MOD2" ] && [ ! -e "$VIASH_PAR_OUTPUT_MOD2" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_MOD2' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION_MOD1" ] && [ ! -e "$VIASH_PAR_OUTPUT_SOLUTION_MOD1" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_SOLUTION_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION_MOD2" ] && [ ! -e "$VIASH_PAR_OUTPUT_SOLUTION_MOD2" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_SOLUTION_MOD2' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/match_modalities/process_dataset/subset_anndata.py b/target/docker/match_modalities/process_dataset/subset_anndata.py
new file mode 100644
index 0000000000..80bd160872
--- /dev/null
+++ b/target/docker/match_modalities/process_dataset/subset_anndata.py
@@ -0,0 +1,83 @@
+"""Helper functions related to subsetting AnnData objects based on the file format 
+specifications in the .config.vsh.yaml and slot mapping overrides."""
+
+def read_config_slots_info(config_file, slot_mapping = {}):
+    """Read the .config.vsh.yaml to find out which output slots need to be copied to which output file.
+    
+    Arguments:
+    config_file -- Path to the .config.vsh.yaml file (required).
+    slot_mapping -- Which slots to retain. Must be a dictionary whose keys are the names 
+      of the AnnData structs, and values is another dictionary with destination value
+      names as keys and source value names as values.
+      Example of slot_mapping: 
+        ```
+        slot_mapping = {
+          "layers": {
+            "counts": par["layer_counts"],
+          },
+          "obs": {
+            "cell_type": par["obs_cell_type"],
+            "batch": par["obs_batch"],
+          }
+        }
+        ```
+    """
+    import yaml
+    import re
+
+    # read output spec from yaml
+    with open(config_file, "r") as object_name:
+        config = yaml.safe_load(object_name)
+
+    output_struct_slots = {}
+
+    # fetch info on which slots should be copied to which file
+    for arg in config["functionality"]["arguments"]:
+        # argument is an output file with a slot specification
+        if arg["direction"] == "output" and arg.get("info", {}).get("slots"):
+            object_name = re.sub("--", "", arg["name"])
+            
+            struct_slots = arg['info']['slots']
+            out = {}
+            for (struct, slots) in struct_slots.items():
+                out_struct = {}
+                for slot in slots:
+                    # if slot_mapping[struct][slot['name']] exists, use that as the source slot name
+                    # otherwise use slot['name']
+                    source_slot = slot_mapping.get(struct, {}).get(slot["name"], slot["name"])
+                    out_struct[slot["name"]] = source_slot
+                out[struct] = out_struct
+
+            output_struct_slots[object_name] = out
+
+    return output_struct_slots
+
+# create new anndata objects according to api spec
+def subset_anndata(adata, slot_info):
+    """Create new anndata object according to slot info specifications.
+    
+    Arguments:
+    adata -- An AnnData object to subset (required)
+    slot_info -- Which slots to retain, typically one of the items in the output of read_config_slots_info.
+      Must be a dictionary whose keys are the names of the AnnData structs, and values is another 
+      dictionary with destination value names as keys and source value names as values. 
+      """
+    import pandas as pd
+    import anndata as ad
+
+    structs = ["layers", "obs", "var", "uns", "obsp", "obsm", "varp", "varm"]
+    kwargs = {}
+
+    for struct in structs:
+        slot_mapping = slot_info.get(struct, {})
+        data = {dest : getattr(adata, struct)[src] for (dest, src) in slot_mapping.items()}
+        if len(data) > 0:
+            if struct in ['obs', 'var']:
+                data = pd.concat(data, axis=1)
+            kwargs[struct] = data
+        elif struct in ['obs', 'var']:
+            # if no columns need to be copied, we still need an 'obs' and a 'var' 
+            # to help determine the shape of the adata
+            kwargs[struct] = getattr(adata, struct).iloc[:,[]]
+
+    return ad.AnnData(**kwargs)
\ No newline at end of file
diff --git a/target/docker/predict_modality/control_methods/mean_per_gene/.config.vsh.yaml b/target/docker/predict_modality/control_methods/mean_per_gene/.config.vsh.yaml
new file mode 100644
index 0000000000..816286fece
--- /dev/null
+++ b/target/docker/predict_modality/control_methods/mean_per_gene/.config.vsh.yaml
@@ -0,0 +1,446 @@
+functionality:
+  name: "mean_per_gene"
+  namespace: "predict_modality/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod2"
+    info:
+      label: "Test mod2"
+      summary: "The mod2 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  info:
+    label: "Mean per gene"
+    summary: "Returns the mean expression value per gene."
+    description: "Returns the mean expression value per gene."
+    type: "control_method"
+    preferred_normalization: "counts"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "These components have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/meanpergene/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/control_methods/mean_per_gene"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/control_methods/mean_per_gene/mean_per_gene"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/predict_modality/control_methods/mean_per_gene/mean_per_gene b/target/docker/predict_modality/control_methods/mean_per_gene/mean_per_gene
new file mode 100755
index 0000000000..c88e6e5a87
--- /dev/null
+++ b/target/docker/predict_modality/control_methods/mean_per_gene/mean_per_gene
@@ -0,0 +1,1043 @@
+#!/usr/bin/env bash
+
+# mean_per_gene 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="mean_per_gene"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "mean_per_gene 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+  echo ""
+  echo "    --input_train_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+  echo ""
+  echo "    --input_test_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+  echo ""
+  echo "    --input_test_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component predict_modality/control_methods mean_per_gene"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:32Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-mean_per_gene-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "mean_per_gene 2.0.0"
+            exit
+            ;;
+        --input_train_mod1)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_train_mod2)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test_mod1)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1=*\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test_mod2)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD2" ] && ViashError Bad arguments for option \'--input_test_mod2\': \'$VIASH_PAR_INPUT_TEST_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD2" ] && ViashError Bad arguments for option \'--input_test_mod2=*\': \'$VIASH_PAR_INPUT_TEST_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/mean_per_gene:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/mean_per_gene:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/mean_per_gene:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/mean_per_gene:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then
+  ViashError '--input_train_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then
+  ViashError '--input_train_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then
+  ViashError '--input_test_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST_MOD2+x} ]; then
+  ViashError '--input_test_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD2' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_TEST_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST_MOD2' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD1")" )
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD2")" )
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST_MOD1")" )
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST_MOD2")" )
+  VIASH_PAR_INPUT_TEST_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/mean_per_gene:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/mean_per_gene:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/mean_per_gene:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-mean_per_gene-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+from scipy.sparse import csc_matrix
+import numpy as np
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_train_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+input_test_mod1 = ad.read_h5ad(par["input_test_mod1"])
+input_train_mod2 = ad.read_h5ad(par["input_train_mod2"])
+
+
+# Find the correct shape
+mean = np.array(input_train_mod2.layers["normalized"].mean(axis=0)).flatten()
+prediction = csc_matrix(np.tile(mean, (input_test_mod1.shape[0], 1)))
+
+# Write out prediction
+out = ad.AnnData(
+    layers={"normalized": prediction},
+    shape=prediction.shape,
+    obs=input_test_mod1.obs,
+    var=input_train_mod2.var,
+    uns={
+        "dataset_id": input_test_mod1.uns["dataset_id"],
+        "method_id": meta["functionality_name"],
+    }
+)
+out.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD2" ]; then
+  VIASH_PAR_INPUT_TEST_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/predict_modality/control_methods/random_predict/.config.vsh.yaml b/target/docker/predict_modality/control_methods/random_predict/.config.vsh.yaml
new file mode 100644
index 0000000000..906cb44552
--- /dev/null
+++ b/target/docker/predict_modality/control_methods/random_predict/.config.vsh.yaml
@@ -0,0 +1,446 @@
+functionality:
+  name: "random_predict"
+  namespace: "predict_modality/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod2"
+    info:
+      label: "Test mod2"
+      summary: "The mod2 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  info:
+    label: "Random predictions"
+    summary: "Returns random training profiles."
+    description: "Returns random training profiles."
+    type: "control_method"
+    preferred_normalization: "counts"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "These components have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/random_predict/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/control_methods/random_predict"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/control_methods/random_predict/random_predict"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/predict_modality/control_methods/random_predict/random_predict b/target/docker/predict_modality/control_methods/random_predict/random_predict
new file mode 100755
index 0000000000..3ac67a19e6
--- /dev/null
+++ b/target/docker/predict_modality/control_methods/random_predict/random_predict
@@ -0,0 +1,1051 @@
+#!/usr/bin/env bash
+
+# random_predict 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="random_predict"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "random_predict 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+  echo ""
+  echo "    --input_train_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+  echo ""
+  echo "    --input_test_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+  echo ""
+  echo "    --input_test_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component predict_modality/control_methods random_predict"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:31Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-random_predict-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "random_predict 2.0.0"
+            exit
+            ;;
+        --input_train_mod1)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_train_mod2)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test_mod1)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1=*\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test_mod2)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD2" ] && ViashError Bad arguments for option \'--input_test_mod2\': \'$VIASH_PAR_INPUT_TEST_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD2" ] && ViashError Bad arguments for option \'--input_test_mod2=*\': \'$VIASH_PAR_INPUT_TEST_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/random_predict:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/random_predict:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/random_predict:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/random_predict:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then
+  ViashError '--input_train_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then
+  ViashError '--input_train_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then
+  ViashError '--input_test_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST_MOD2+x} ]; then
+  ViashError '--input_test_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD2' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_TEST_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST_MOD2' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD1")" )
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD2")" )
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST_MOD1")" )
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST_MOD2")" )
+  VIASH_PAR_INPUT_TEST_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/random_predict:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/random_predict:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/random_predict:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-random_predict-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+cat("Loading dependencies\\n")
+requireNamespace("anndata", quietly = TRUE)
+library(Matrix, warn.conflicts = FALSE, quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_train_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD1" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_train_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD2" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_test_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TEST_MOD1" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_test_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TEST_MOD2" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading h5ad files\\n")
+input_train_mod2 <- anndata::read_h5ad(par\$input_train_mod2)
+input_test_mod1 <- anndata::read_h5ad(par\$input_test_mod1)
+
+cat("Creating outputs object\\n")
+sample_ix <- sample.int(nrow(input_train_mod2), nrow(input_test_mod1), replace = TRUE)
+prediction <- input_train_mod2\$layers[["normalized"]][sample_ix, , drop = FALSE]
+rownames(prediction) <- rownames(input_test_mod1)
+
+out <- anndata::AnnData(
+  layers = list(normalized = prediction),
+  shape = dim(prediction),
+  uns = list(
+    dataset_id = input_train_mod2\$uns[["dataset_id"]],
+    method_id = meta[["functionality_name"]]
+  )
+)
+
+cat("Writing predictions to file\\n")
+zzz <- out\$write_h5ad(par\$output, compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD2" ]; then
+  VIASH_PAR_INPUT_TEST_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/predict_modality/control_methods/solution/.config.vsh.yaml b/target/docker/predict_modality/control_methods/solution/.config.vsh.yaml
new file mode 100644
index 0000000000..de6893c11d
--- /dev/null
+++ b/target/docker/predict_modality/control_methods/solution/.config.vsh.yaml
@@ -0,0 +1,446 @@
+functionality:
+  name: "solution"
+  namespace: "predict_modality/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod2"
+    info:
+      label: "Test mod2"
+      summary: "The mod2 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  info:
+    label: "Solution"
+    summary: "Returns the ground-truth solution."
+    description: "Returns the ground-truth solution."
+    type: "control_method"
+    preferred_normalization: "counts"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "These components have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/solution/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/control_methods/solution"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/control_methods/solution/solution"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/predict_modality/control_methods/solution/solution b/target/docker/predict_modality/control_methods/solution/solution
new file mode 100755
index 0000000000..a03646fd16
--- /dev/null
+++ b/target/docker/predict_modality/control_methods/solution/solution
@@ -0,0 +1,1036 @@
+#!/usr/bin/env bash
+
+# solution 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="solution"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "solution 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+  echo ""
+  echo "    --input_train_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+  echo ""
+  echo "    --input_test_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+  echo ""
+  echo "    --input_test_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component predict_modality/control_methods solution"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:31Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-solution-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "solution 2.0.0"
+            exit
+            ;;
+        --input_train_mod1)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_train_mod2)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test_mod1)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1=*\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test_mod2)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD2" ] && ViashError Bad arguments for option \'--input_test_mod2\': \'$VIASH_PAR_INPUT_TEST_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD2" ] && ViashError Bad arguments for option \'--input_test_mod2=*\': \'$VIASH_PAR_INPUT_TEST_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/solution:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/solution:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/solution:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/solution:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then
+  ViashError '--input_train_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then
+  ViashError '--input_train_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then
+  ViashError '--input_test_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST_MOD2+x} ]; then
+  ViashError '--input_test_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD2' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_TEST_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST_MOD2' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD1")" )
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD2")" )
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST_MOD1")" )
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST_MOD2")" )
+  VIASH_PAR_INPUT_TEST_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/solution:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/solution:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/solution:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-solution-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+cat("Loading dependencies\\n")
+requireNamespace("anndata", quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_train_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD1" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_train_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD2" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_test_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TEST_MOD1" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_test_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TEST_MOD2" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading h5ad files\\n")
+ad2_test <- anndata::read_h5ad(par\$input_test_mod2)
+ad2_test\$uns[["method_id"]] <- meta\$functionality_name
+
+cat("Writing predictions to file\\n")
+zzz <- ad2_test\$write_h5ad(par\$output, compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD2" ]; then
+  VIASH_PAR_INPUT_TEST_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/predict_modality/control_methods/zeros/.config.vsh.yaml b/target/docker/predict_modality/control_methods/zeros/.config.vsh.yaml
new file mode 100644
index 0000000000..4f53c599e3
--- /dev/null
+++ b/target/docker/predict_modality/control_methods/zeros/.config.vsh.yaml
@@ -0,0 +1,446 @@
+functionality:
+  name: "zeros"
+  namespace: "predict_modality/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod2"
+    info:
+      label: "Test mod2"
+      summary: "The mod2 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  info:
+    label: "Zeros"
+    summary: "Returns a prediction consisting of all zeros."
+    description: "Returns a prediction consisting of all zeros."
+    type: "control_method"
+    preferred_normalization: "counts"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "These components have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/zeros/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/control_methods/zeros"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/control_methods/zeros/zeros"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/predict_modality/control_methods/zeros/zeros b/target/docker/predict_modality/control_methods/zeros/zeros
new file mode 100755
index 0000000000..0289c40063
--- /dev/null
+++ b/target/docker/predict_modality/control_methods/zeros/zeros
@@ -0,0 +1,1043 @@
+#!/usr/bin/env bash
+
+# zeros 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="zeros"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "zeros 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+  echo ""
+  echo "    --input_train_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+  echo ""
+  echo "    --input_test_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+  echo ""
+  echo "    --input_test_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component predict_modality/control_methods zeros"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:31Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-zeros-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "zeros 2.0.0"
+            exit
+            ;;
+        --input_train_mod1)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_train_mod2)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test_mod1)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1=*\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test_mod2)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD2" ] && ViashError Bad arguments for option \'--input_test_mod2\': \'$VIASH_PAR_INPUT_TEST_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD2" ] && ViashError Bad arguments for option \'--input_test_mod2=*\': \'$VIASH_PAR_INPUT_TEST_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/zeros:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/zeros:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/zeros:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/zeros:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then
+  ViashError '--input_train_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then
+  ViashError '--input_train_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then
+  ViashError '--input_test_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST_MOD2+x} ]; then
+  ViashError '--input_test_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD2' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_TEST_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST_MOD2' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD1")" )
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD2")" )
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST_MOD1")" )
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST_MOD2")" )
+  VIASH_PAR_INPUT_TEST_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/zeros:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/zeros:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/control_methods/zeros:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-zeros-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata
+from scipy.sparse import csc_matrix
+import numpy as np
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_train_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print("Reading h5ad files", flush=True)
+ad_mod1_test = anndata.read_h5ad(par["input_test_mod1"])
+ad_mod2 = anndata.read_h5ad(par["input_train_mod2"])
+
+print("create output objects", flush=True)
+prediction = csc_matrix((ad_mod1_test.n_obs, ad_mod2.n_vars), dtype = np.float32)
+
+out = anndata.AnnData(
+    layers={"normalized": prediction},
+    shape=prediction.shape,
+    obs=ad_mod1_test.obs,
+    var=ad_mod2.var,
+    uns={
+        "dataset_id": ad_mod2.uns["dataset_id"],
+        "method_id": meta["functionality_name"],
+    }
+)
+
+print("write predictions to file", flush=True)
+out.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD2" ]; then
+  VIASH_PAR_INPUT_TEST_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/predict_modality/methods/guanlab_dengkw_pm/.config.vsh.yaml b/target/docker/predict_modality/methods/guanlab_dengkw_pm/.config.vsh.yaml
new file mode 100644
index 0000000000..2deb55ffeb
--- /dev/null
+++ b/target/docker/predict_modality/methods/guanlab_dengkw_pm/.config.vsh.yaml
@@ -0,0 +1,401 @@
+functionality:
+  name: "guanlab_dengkw_pm"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--distance_method"
+    description: "The distance metric to use. Possible values include `euclidean`\
+      \ and `minkowski`."
+    info: null
+    default:
+    - "minkowski"
+    required: false
+    choices:
+    - "euclidean"
+    - "minkowski"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pcs"
+    description: "Number of components to use for dimensionality reduction."
+    info: null
+    default:
+    - 50
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Guanlab-dengkw"
+    summary: "A kernel ridge regression method with RBF kernel."
+    description: "This is a solution developed by Team Guanlab - dengkw in the Neurips\
+      \ 2021 competition to predict one modality\nfrom another using kernel ridge\
+      \ regression (KRR) with RBF kernel. Truncated SVD is applied on the combined\n\
+      training and test data from modality 1 followed by row-wise z-score normalization\
+      \ on the reduced matrix. The\ntruncated SVD of modality 2 is predicted by training\
+      \ a KRR model on the normalized training matrix of modality 1.\nPredictions\
+      \ on the normalized test matrix are then re-mapped to the modality 2 feature\
+      \ space via the right\nsingular vectors. \n"
+    preferred_normalization: "log_cp10k"
+    reference: "lance2022multimodal"
+    documentation_url: "https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/Guanlab-dengkw"
+    repository_url: "https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/Guanlab-dengkw"
+    competition_submission_id: 170636
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A regression method."
+      description: "A regression method to predict the expression of one modality\
+        \ from another.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    - "pandas"
+    - "numpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/guanlab_dengkw_pm/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/methods/guanlab_dengkw_pm"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/methods/guanlab_dengkw_pm/guanlab_dengkw_pm"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/predict_modality/methods/guanlab_dengkw_pm/guanlab_dengkw_pm b/target/docker/predict_modality/methods/guanlab_dengkw_pm/guanlab_dengkw_pm
new file mode 100755
index 0000000000..6e4a58fe4e
--- /dev/null
+++ b/target/docker/predict_modality/methods/guanlab_dengkw_pm/guanlab_dengkw_pm
@@ -0,0 +1,1174 @@
+#!/usr/bin/env bash
+
+# guanlab_dengkw_pm 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="guanlab_dengkw_pm"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "guanlab_dengkw_pm 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+  echo ""
+  echo "    --input_train_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+  echo ""
+  echo "    --input_test_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+  echo ""
+  echo "    --distance_method"
+  echo "        type: string"
+  echo "        default: minkowski"
+  echo "        choices: [ euclidean, minkowski ]"
+  echo "        The distance metric to use. Possible values include \`euclidean\` and"
+  echo "        \`minkowski\`."
+  echo ""
+  echo "    --n_pcs"
+  echo "        type: integer"
+  echo "        default: 50"
+  echo "        Number of components to use for dimensionality reduction."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scikit-learn" "pandas" "numpy"
+
+LABEL org.opencontainers.image.description="Companion container for running component predict_modality/methods guanlab_dengkw_pm"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:30Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-guanlab_dengkw_pm-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "guanlab_dengkw_pm 2.0.0"
+            exit
+            ;;
+        --input_train_mod1)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_train_mod2)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test_mod1)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1=*\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --distance_method)
+            [ -n "$VIASH_PAR_DISTANCE_METHOD" ] && ViashError Bad arguments for option \'--distance_method\': \'$VIASH_PAR_DISTANCE_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DISTANCE_METHOD="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --distance_method. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --distance_method=*)
+            [ -n "$VIASH_PAR_DISTANCE_METHOD" ] && ViashError Bad arguments for option \'--distance_method=*\': \'$VIASH_PAR_DISTANCE_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DISTANCE_METHOD=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_pcs)
+            [ -n "$VIASH_PAR_N_PCS" ] && ViashError Bad arguments for option \'--n_pcs\': \'$VIASH_PAR_N_PCS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_pcs. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_pcs=*)
+            [ -n "$VIASH_PAR_N_PCS" ] && ViashError Bad arguments for option \'--n_pcs=*\': \'$VIASH_PAR_N_PCS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/guanlab_dengkw_pm:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/guanlab_dengkw_pm:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/guanlab_dengkw_pm:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/guanlab_dengkw_pm:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then
+  ViashError '--input_train_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then
+  ViashError '--input_train_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then
+  ViashError '--input_test_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_DISTANCE_METHOD+x} ]; then
+  VIASH_PAR_DISTANCE_METHOD="minkowski"
+fi
+if [ -z ${VIASH_PAR_N_PCS+x} ]; then
+  VIASH_PAR_N_PCS="50"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD2' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST_MOD1' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_PCS" ]]; then
+  if ! [[ "$VIASH_PAR_N_PCS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_pcs' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# check whether value is belongs to a set of choices
+if [ ! -z "$VIASH_PAR_DISTANCE_METHOD" ]; then
+  VIASH_PAR_DISTANCE_METHOD_CHOICES=("euclidean:minkowski")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_DISTANCE_METHOD_CHOICES[*]}:" =~ ":$VIASH_PAR_DISTANCE_METHOD:" ]]; then
+    ViashError '--distance_method' specified value of \'$VIASH_PAR_DISTANCE_METHOD\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD1")" )
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD2")" )
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST_MOD1")" )
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/predict_modality/methods/guanlab_dengkw_pm:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/methods/guanlab_dengkw_pm:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/methods/guanlab_dengkw_pm:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-guanlab_dengkw_pm-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+from scipy.sparse import csc_matrix
+from sklearn.decomposition import TruncatedSVD
+from sklearn.gaussian_process.kernels import RBF
+from sklearn.kernel_ridge import KernelRidge
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_train_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'distance_method': $( if [ ! -z ${VIASH_PAR_DISTANCE_METHOD+x} ]; then echo "r'${VIASH_PAR_DISTANCE_METHOD//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_pcs': $( if [ ! -z ${VIASH_PAR_N_PCS+x} ]; then echo "int(r'${VIASH_PAR_N_PCS//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+
+## Removed PCA and normalization steps, as they arr already performed with the input data
+print('Reading input files', flush=True)
+input_train_mod1 = ad.read_h5ad(par['input_train_mod1'])
+input_train_mod2 = ad.read_h5ad(par['input_train_mod2'])
+input_test_mod1 = ad.read_h5ad(par['input_test_mod1'])
+
+batches = input_train_mod1.obs.batch.unique().tolist()
+batch_len = len(batches)
+
+# combine the train and test data
+input_train = ad.concat(
+    {"train": input_train_mod1, "test": input_test_mod1},
+    axis=0,
+    join="outer",
+    label="group",
+    fill_value=0,
+    index_unique="-"
+)
+
+print('Determine parameters by the modalities', flush=True)
+mod1_type = input_train_mod1.uns["modality"].upper()
+mod2_type = input_train_mod2.uns["modality"].upper()
+n_comp_dict = {
+    ("GEX", "ADT"): (300, 70, 10, 0.2),
+    ("ADT", "GEX"): (None, 50, 10, 0.2),
+    ("GEX", "ATAC"): (1000, 50, 10, 0.1),
+    ("ATAC", "GEX"): (100, 70, 10, 0.1)
+}
+print(f"{mod1_type}, {mod2_type}", flush=True)
+n_mod1, n_mod2, scale, alpha = n_comp_dict[(mod1_type, mod2_type)]
+print(f"{n_mod1}, {n_mod2}, {scale}, {alpha}", flush=True)
+
+# Perform PCA on the input data
+print('Models using the Truncated SVD to reduce the dimension', flush=True)
+
+if n_mod1 is not None and n_mod1 < input_train.n_vars:
+    embedder_mod1 = TruncatedSVD(n_components=n_mod1)
+    mod1_pca = embedder_mod1.fit_transform(input_train.layers["normalized"]).astype(np.float32)
+    train_matrix = mod1_pca[input_train.obs['group'] == 'train']
+    test_matrix = mod1_pca[input_train.obs['group'] == 'test']
+else:
+    train_matrix = input_train_mod1.to_df(layer="normalized").values.astype(np.float32)
+    test_matrix = input_test_mod1.to_df(layer="normalized").values.astype(np.float32)
+  
+if n_mod2 is not None and n_mod2 < input_train_mod2.n_vars:
+    embedder_mod2 = TruncatedSVD(n_components=n_mod2)
+    train_gs = embedder_mod2.fit_transform(input_train_mod2.layers["normalized"]).astype(np.float32)
+else:
+    train_gs = input_train_mod2.to_df(layer="normalized").values.astype(np.float32)
+
+del input_train
+
+print('Running normalization ...', flush=True)
+train_sd = np.std(train_matrix, axis=1).reshape(-1, 1)
+train_sd[train_sd == 0] = 1
+train_norm = (train_matrix - np.mean(train_matrix, axis=1).reshape(-1, 1)) / train_sd
+train_norm = train_norm.astype(np.float32)
+del train_matrix
+
+test_sd = np.std(test_matrix, axis=1).reshape(-1, 1)
+test_sd[test_sd == 0] = 1
+test_norm = (test_matrix - np.mean(test_matrix, axis=1).reshape(-1, 1)) / test_sd
+test_norm = test_norm.astype(np.float32)
+del test_matrix
+
+print('Running KRR model ...', flush=True)
+if batch_len == 1:
+    # just in case there is only one batch
+    batch_subsets = [batches]
+elif mod1_type == "ADT" or mod2_type == "ADT":
+    # two fold consensus predictions
+    batch_subsets = [
+        batches[:batch_len//2],
+        batches[batch_len//2:]
+    ]
+else:
+    # leave-one-batch-out consensus predictions
+    batch_subsets = [
+        batches[:i] + batches[i+1:]
+        for i in range(batch_len)
+    ]
+
+y_pred = np.zeros((input_test_mod1.n_obs, input_train_mod2.n_vars), dtype=np.float32)
+for batch in batch_subsets:
+    print(batch, flush=True)
+    kernel = RBF(length_scale = scale)
+    krr = KernelRidge(alpha=alpha, kernel=kernel)
+    print('Fitting KRR ... ', flush=True)
+    krr.fit(
+        train_norm[input_train_mod1.obs.batch.isin(batch)], 
+        train_gs[input_train_mod2.obs.batch.isin(batch)]
+    )
+    y_pred += (krr.predict(test_norm) @ embedder_mod2.components_)
+
+np.clip(y_pred, a_min=0, a_max=None, out=y_pred)
+y_pred /= len(batch_subsets)
+
+# Store as sparse matrix to be efficient. 
+# Note that this might require different classifiers/embedders before-hand. 
+# Not every class is able to support such data structures.
+## Changed from csr to csc matrix as this is more supported.
+y_pred = csc_matrix(y_pred)
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  layers = { 'normalized': y_pred },
+  obs = input_test_mod1.obs[[]],
+  var = input_train_mod2.var[[]],
+  uns = {
+    'dataset_id': input_train_mod1.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/predict_modality/methods/knnr_py/.config.vsh.yaml b/target/docker/predict_modality/methods/knnr_py/.config.vsh.yaml
new file mode 100644
index 0000000000..a4b66d55ab
--- /dev/null
+++ b/target/docker/predict_modality/methods/knnr_py/.config.vsh.yaml
@@ -0,0 +1,393 @@
+functionality:
+  name: "knnr_py"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--distance_method"
+    description: "The distance metric to use. Possible values include `euclidean`\
+      \ and `minkowski`."
+    info: null
+    default:
+    - "minkowski"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pcs"
+    description: "Number of components to use for dimensionality reduction."
+    info: null
+    default:
+    - 50
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_neighbors"
+    description: "Number of neighbors to use."
+    info: null
+    default:
+    - 100
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "KNNR (Py)"
+    summary: "K-nearest neighbor regression in Python."
+    description: "K-nearest neighbor regression in Python."
+    reference: "fix1989discriminatory"
+    documentation_url: "https://scikit-learn.org/stable/modules/neighbors.html"
+    repository_url: "https://github.com/scikit-learn/scikit-learn"
+    preferred_normalization: "log_cp10k"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A regression method."
+      description: "A regression method to predict the expression of one modality\
+        \ from another.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/knnr_py/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/methods/knnr_py"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/methods/knnr_py/knnr_py"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/predict_modality/methods/knnr_py/knnr_py b/target/docker/predict_modality/methods/knnr_py/knnr_py
new file mode 100755
index 0000000000..ffd1e5155a
--- /dev/null
+++ b/target/docker/predict_modality/methods/knnr_py/knnr_py
@@ -0,0 +1,1116 @@
+#!/usr/bin/env bash
+
+# knnr_py 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="knnr_py"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "knnr_py 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+  echo ""
+  echo "    --input_train_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+  echo ""
+  echo "    --input_test_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+  echo ""
+  echo "    --distance_method"
+  echo "        type: string"
+  echo "        default: minkowski"
+  echo "        The distance metric to use. Possible values include \`euclidean\` and"
+  echo "        \`minkowski\`."
+  echo ""
+  echo "    --n_pcs"
+  echo "        type: integer"
+  echo "        default: 50"
+  echo "        Number of components to use for dimensionality reduction."
+  echo ""
+  echo "    --n_neighbors"
+  echo "        type: integer"
+  echo "        default: 100"
+  echo "        Number of neighbors to use."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component predict_modality/methods knnr_py"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:30Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-knnr_py-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "knnr_py 2.0.0"
+            exit
+            ;;
+        --input_train_mod1)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_train_mod2)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test_mod1)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1=*\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --distance_method)
+            [ -n "$VIASH_PAR_DISTANCE_METHOD" ] && ViashError Bad arguments for option \'--distance_method\': \'$VIASH_PAR_DISTANCE_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DISTANCE_METHOD="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --distance_method. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --distance_method=*)
+            [ -n "$VIASH_PAR_DISTANCE_METHOD" ] && ViashError Bad arguments for option \'--distance_method=*\': \'$VIASH_PAR_DISTANCE_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DISTANCE_METHOD=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_pcs)
+            [ -n "$VIASH_PAR_N_PCS" ] && ViashError Bad arguments for option \'--n_pcs\': \'$VIASH_PAR_N_PCS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_pcs. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_pcs=*)
+            [ -n "$VIASH_PAR_N_PCS" ] && ViashError Bad arguments for option \'--n_pcs=*\': \'$VIASH_PAR_N_PCS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_neighbors)
+            [ -n "$VIASH_PAR_N_NEIGHBORS" ] && ViashError Bad arguments for option \'--n_neighbors\': \'$VIASH_PAR_N_NEIGHBORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_NEIGHBORS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_neighbors. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_neighbors=*)
+            [ -n "$VIASH_PAR_N_NEIGHBORS" ] && ViashError Bad arguments for option \'--n_neighbors=*\': \'$VIASH_PAR_N_NEIGHBORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_NEIGHBORS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/knnr_py:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/knnr_py:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/knnr_py:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/knnr_py:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then
+  ViashError '--input_train_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then
+  ViashError '--input_train_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then
+  ViashError '--input_test_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_DISTANCE_METHOD+x} ]; then
+  VIASH_PAR_DISTANCE_METHOD="minkowski"
+fi
+if [ -z ${VIASH_PAR_N_PCS+x} ]; then
+  VIASH_PAR_N_PCS="50"
+fi
+if [ -z ${VIASH_PAR_N_NEIGHBORS+x} ]; then
+  VIASH_PAR_N_NEIGHBORS="100"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD2' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST_MOD1' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_PCS" ]]; then
+  if ! [[ "$VIASH_PAR_N_PCS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_pcs' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_NEIGHBORS" ]]; then
+  if ! [[ "$VIASH_PAR_N_NEIGHBORS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_neighbors' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD1")" )
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD2")" )
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST_MOD1")" )
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/predict_modality/methods/knnr_py:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/methods/knnr_py:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/methods/knnr_py:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-knnr_py-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+from scipy.sparse import csc_matrix
+from sklearn.decomposition import TruncatedSVD
+from sklearn.neighbors import KNeighborsRegressor
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_train_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'distance_method': $( if [ ! -z ${VIASH_PAR_DISTANCE_METHOD+x} ]; then echo "r'${VIASH_PAR_DISTANCE_METHOD//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_pcs': $( if [ ! -z ${VIASH_PAR_N_PCS+x} ]; then echo "int(r'${VIASH_PAR_N_PCS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_neighbors': $( if [ ! -z ${VIASH_PAR_N_NEIGHBORS+x} ]; then echo "int(r'${VIASH_PAR_N_NEIGHBORS//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading \`h5ad\` files...', flush=True)
+input_train_mod1 = ad.read_h5ad(par['input_train_mod1'])
+input_train_mod2 = ad.read_h5ad(par['input_train_mod2'])
+input_test_mod1 = ad.read_h5ad(par['input_test_mod1'])
+
+input_train = ad.concat(
+    {"train": input_train_mod1, "test": input_test_mod1},
+    axis=0,
+    join="outer",
+    label="group",
+    fill_value=0,
+    index_unique="-"
+)
+
+print('Performing dimensionality reduction on modality 1 values...', flush=True)
+embedder = TruncatedSVD(n_components=par['n_pcs'])
+X = embedder.fit_transform(input_train.layers["normalized"])
+
+# split dimred back up
+X_train = X[input_train.obs['group'] == 'train']
+X_test = X[input_train.obs['group'] == 'test']
+y_train = input_train_mod2.layers["normalized"].toarray()
+
+assert len(X_train) + len(X_test) == len(X)
+
+print('Running KNN regression...', flush=True)
+
+reg = KNeighborsRegressor(
+    n_neighbors=par['n_neighbors'],
+    metric=par['distance_method']
+)
+
+reg.fit(X_train, y_train)
+y_pred = reg.predict(X_test)
+
+y_pred = csc_matrix(y_pred)
+
+adata = ad.AnnData(
+    layers={"normalized": y_pred},
+    obs=input_test_mod1.obs,
+    var=input_train_mod2.var,
+    uns={
+        'dataset_id': input_train_mod1.uns['dataset_id'],
+        'method_id': meta["functionality_name"],
+    },
+)
+
+print('Storing annotated data...', flush=True)
+adata.write_h5ad(par['output'], compression = "gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/predict_modality/methods/knnr_r/.config.vsh.yaml b/target/docker/predict_modality/methods/knnr_r/.config.vsh.yaml
new file mode 100644
index 0000000000..3cbac0275a
--- /dev/null
+++ b/target/docker/predict_modality/methods/knnr_r/.config.vsh.yaml
@@ -0,0 +1,400 @@
+functionality:
+  name: "knnr_r"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--distance_method"
+    description: "The distance method to use. Possible values are euclidean, pearson,\
+      \ spearman and others."
+    info: null
+    default:
+    - "spearman"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pcs"
+    description: "Number of principal components to use."
+    info: null
+    default:
+    - 50
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_neighbors"
+    description: "Number of neighbors to use in the knn regression."
+    info: null
+    default:
+    - 20
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "KNNR (R)"
+    summary: "K-nearest neighbor regression in R."
+    description: "K-nearest neighbor regression in R."
+    reference: "fix1989discriminatory"
+    documentation_url: "https://cran.r-project.org/package=FNN"
+    repository_url: "https://github.com/cran/FNN"
+    preferred_normalization: "log_cp10k"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A regression method."
+      description: "A regression method to predict the expression of one modality\
+        \ from another.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "lmds"
+    - "FNN"
+    - "proxyC"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/knnr_r/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/methods/knnr_r"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/methods/knnr_r/knnr_r"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/predict_modality/methods/knnr_r/knnr_r b/target/docker/predict_modality/methods/knnr_r/knnr_r
new file mode 100755
index 0000000000..71fae4a6e6
--- /dev/null
+++ b/target/docker/predict_modality/methods/knnr_r/knnr_r
@@ -0,0 +1,1140 @@
+#!/usr/bin/env bash
+
+# knnr_r 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="knnr_r"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "knnr_r 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+  echo ""
+  echo "    --input_train_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+  echo ""
+  echo "    --input_test_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+  echo ""
+  echo "    --distance_method"
+  echo "        type: string"
+  echo "        default: spearman"
+  echo "        The distance method to use. Possible values are euclidean, pearson,"
+  echo "        spearman and others."
+  echo ""
+  echo "    --n_pcs"
+  echo "        type: integer"
+  echo "        default: 50"
+  echo "        Number of principal components to use."
+  echo ""
+  echo "    --n_neighbors"
+  echo "        type: integer"
+  echo "        default: 20"
+  echo "        Number of neighbors to use in the knn regression."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_cran(c("lmds", "FNN", "proxyC"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component predict_modality/methods knnr_r"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:39Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-knnr_r-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "knnr_r 2.0.0"
+            exit
+            ;;
+        --input_train_mod1)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_train_mod2)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test_mod1)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1=*\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --distance_method)
+            [ -n "$VIASH_PAR_DISTANCE_METHOD" ] && ViashError Bad arguments for option \'--distance_method\': \'$VIASH_PAR_DISTANCE_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DISTANCE_METHOD="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --distance_method. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --distance_method=*)
+            [ -n "$VIASH_PAR_DISTANCE_METHOD" ] && ViashError Bad arguments for option \'--distance_method=*\': \'$VIASH_PAR_DISTANCE_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DISTANCE_METHOD=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_pcs)
+            [ -n "$VIASH_PAR_N_PCS" ] && ViashError Bad arguments for option \'--n_pcs\': \'$VIASH_PAR_N_PCS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_pcs. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_pcs=*)
+            [ -n "$VIASH_PAR_N_PCS" ] && ViashError Bad arguments for option \'--n_pcs=*\': \'$VIASH_PAR_N_PCS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_neighbors)
+            [ -n "$VIASH_PAR_N_NEIGHBORS" ] && ViashError Bad arguments for option \'--n_neighbors\': \'$VIASH_PAR_N_NEIGHBORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_NEIGHBORS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_neighbors. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_neighbors=*)
+            [ -n "$VIASH_PAR_N_NEIGHBORS" ] && ViashError Bad arguments for option \'--n_neighbors=*\': \'$VIASH_PAR_N_NEIGHBORS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_NEIGHBORS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/knnr_r:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/knnr_r:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/knnr_r:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/knnr_r:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then
+  ViashError '--input_train_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then
+  ViashError '--input_train_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then
+  ViashError '--input_test_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_DISTANCE_METHOD+x} ]; then
+  VIASH_PAR_DISTANCE_METHOD="spearman"
+fi
+if [ -z ${VIASH_PAR_N_PCS+x} ]; then
+  VIASH_PAR_N_PCS="50"
+fi
+if [ -z ${VIASH_PAR_N_NEIGHBORS+x} ]; then
+  VIASH_PAR_N_NEIGHBORS="20"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD2' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST_MOD1' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_PCS" ]]; then
+  if ! [[ "$VIASH_PAR_N_PCS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_pcs' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_NEIGHBORS" ]]; then
+  if ! [[ "$VIASH_PAR_N_NEIGHBORS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_neighbors' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD1")" )
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD2")" )
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST_MOD1")" )
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/predict_modality/methods/knnr_r:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/methods/knnr_r:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/methods/knnr_r:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-knnr_r-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+cat("Loading dependencies\\n")
+requireNamespace("anndata", quietly = TRUE)
+library(Matrix, warn.conflicts = FALSE, quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_train_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD1" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_train_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD2" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_test_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TEST_MOD1" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "distance_method" = $( if [ ! -z ${VIASH_PAR_DISTANCE_METHOD+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_DISTANCE_METHOD" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "n_pcs" = $( if [ ! -z ${VIASH_PAR_N_PCS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_PCS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "n_neighbors" = $( if [ ! -z ${VIASH_PAR_N_NEIGHBORS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_NEIGHBORS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading mod1 h5ad files\\n")
+input_train_mod1 <- anndata::read_h5ad(par\$input_train_mod1)
+dataset_id <- input_train_mod1\$uns[["dataset_id"]]
+
+# subset to HVG to reduce memory consumption
+train_mod1_sd <- proxyC::colSds(input_train_mod1\$layers[["normalized"]])
+ix <- order(train_mod1_sd, decreasing = TRUE)[seq_len(min(1000, length(train_mod1_sd)))]
+input_train_mod1 <- input_train_mod1[,ix]\$copy()
+gc()
+
+# subset to HVG to reduce memory consumption
+input_test_mod1 <- anndata::read_h5ad(par\$input_test_mod1)
+input_test_mod1 <- input_test_mod1[,ix]\$copy()
+gc()
+
+cat("Performing DR on the mod1 values\\n")
+# LMDS is more efficient than regular MDS because
+# it does not compure a square distance matrix.
+dr_mod1 <- lmds::lmds(
+  rbind(input_train_mod1\$layers[["normalized"]], input_test_mod1\$layers[["normalized"]]),
+  ndim = par\$n_pcs,
+  distance_method = par\$distance_method
+)
+
+ix <- seq_len(nrow(input_train_mod1))
+dr_mod1_train <- dr_mod1[ix, , drop = FALSE]
+dr_mod1_test <- dr_mod1[-ix, , drop = FALSE]
+
+# remove previous objects to save memory
+rm(input_train_mod1, input_test_mod1)
+gc()
+
+cat("Reading mod2 h5ad files\\n")
+input_train_mod2 <- anndata::read_h5ad(par\$input_train_mod2)
+
+cat("Predicting for each column in modality 2\\n")
+# precompute knn indices
+knn_ix <- FNN::get.knnx(
+  dr_mod1_train,
+  dr_mod1_test,
+  k = par\$n_neighbors
+)\$nn.index
+
+# perform knn regression.
+pred <- input_train_mod2\$layers[["normalized"]][knn_ix[, 1], , drop = FALSE]
+if (par\$n_neighbors > 1) {
+  for (k in seq(2, par\$n_neighbors)) {
+    pred <- pred + input_train_mod2\$layers[["normalized"]][knn_ix[, k], , drop = FALSE]
+  }
+}
+pred <- pred / par\$n_neighbors
+rownames(pred) <- rownames(dr_mod1_test)
+
+out <- anndata::AnnData(
+  layers = list(normalized = pred),
+  shape = dim(pred),
+  uns = list(
+    dataset_id = dataset_id,
+    method_id = meta\$functionality_name
+  )
+)
+
+cat("Writing predictions to file\\n")
+zzz <- out\$write_h5ad(par\$output, compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/predict_modality/methods/lm/.config.vsh.yaml b/target/docker/predict_modality/methods/lm/.config.vsh.yaml
new file mode 100644
index 0000000000..cb7bdfec56
--- /dev/null
+++ b/target/docker/predict_modality/methods/lm/.config.vsh.yaml
@@ -0,0 +1,389 @@
+functionality:
+  name: "lm"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--distance_method"
+    description: "The distance method to use. Possible values are euclidean, pearson,\
+      \ spearman and others."
+    info: null
+    default:
+    - "spearman"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pcs"
+    description: "Number of principal components to use."
+    info: null
+    default:
+    - 50
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Linear Model"
+    summary: "Linear model regression."
+    description: "A linear model regression method."
+    reference: "wilkinson1973symbolic"
+    repository_url: "https://github.com/RcppCore/RcppArmadillo"
+    documentation_url: "https://cran.r-project.org/package=RcppArmadillo"
+    preferred_normalization: "log_cp10k"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A regression method."
+      description: "A regression method to predict the expression of one modality\
+        \ from another.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "lmds"
+    - "RcppArmadillo"
+    - "pbapply"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/lm/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/methods/lm"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/methods/lm/lm"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/predict_modality/methods/lm/lm b/target/docker/predict_modality/methods/lm/lm
new file mode 100755
index 0000000000..5d5d4752ca
--- /dev/null
+++ b/target/docker/predict_modality/methods/lm/lm
@@ -0,0 +1,1108 @@
+#!/usr/bin/env bash
+
+# lm 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="lm"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "lm 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+  echo ""
+  echo "    --input_train_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+  echo ""
+  echo "    --input_test_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+  echo ""
+  echo "    --distance_method"
+  echo "        type: string"
+  echo "        default: spearman"
+  echo "        The distance method to use. Possible values are euclidean, pearson,"
+  echo "        spearman and others."
+  echo ""
+  echo "    --n_pcs"
+  echo "        type: integer"
+  echo "        default: 50"
+  echo "        Number of principal components to use."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_cran(c("lmds", "RcppArmadillo", "pbapply"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component predict_modality/methods lm"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:31Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-lm-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "lm 2.0.0"
+            exit
+            ;;
+        --input_train_mod1)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_train_mod2)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test_mod1)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1=*\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --distance_method)
+            [ -n "$VIASH_PAR_DISTANCE_METHOD" ] && ViashError Bad arguments for option \'--distance_method\': \'$VIASH_PAR_DISTANCE_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DISTANCE_METHOD="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --distance_method. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --distance_method=*)
+            [ -n "$VIASH_PAR_DISTANCE_METHOD" ] && ViashError Bad arguments for option \'--distance_method=*\': \'$VIASH_PAR_DISTANCE_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DISTANCE_METHOD=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_pcs)
+            [ -n "$VIASH_PAR_N_PCS" ] && ViashError Bad arguments for option \'--n_pcs\': \'$VIASH_PAR_N_PCS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_pcs. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_pcs=*)
+            [ -n "$VIASH_PAR_N_PCS" ] && ViashError Bad arguments for option \'--n_pcs=*\': \'$VIASH_PAR_N_PCS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/lm:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/lm:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/lm:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/lm:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then
+  ViashError '--input_train_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then
+  ViashError '--input_train_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then
+  ViashError '--input_test_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_DISTANCE_METHOD+x} ]; then
+  VIASH_PAR_DISTANCE_METHOD="spearman"
+fi
+if [ -z ${VIASH_PAR_N_PCS+x} ]; then
+  VIASH_PAR_N_PCS="50"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD2' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST_MOD1' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_PCS" ]]; then
+  if ! [[ "$VIASH_PAR_N_PCS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_pcs' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD1")" )
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD2")" )
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST_MOD1")" )
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/predict_modality/methods/lm:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/methods/lm:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/methods/lm:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-lm-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+cat("Loading dependencies\\n")
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("pbapply", quietly = TRUE)
+library(Matrix, warn.conflicts = FALSE, quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_train_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD1" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_train_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD2" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_test_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TEST_MOD1" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "distance_method" = $( if [ ! -z ${VIASH_PAR_DISTANCE_METHOD+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_DISTANCE_METHOD" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "n_pcs" = $( if [ ! -z ${VIASH_PAR_N_PCS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_PCS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+n_cores <- parallel::detectCores(all.tests = FALSE, logical = TRUE)
+
+cat("Reading mod1 files\\n")
+input_train_mod1 <- anndata::read_h5ad(par\$input_train_mod1)
+input_test_mod1 <- anndata::read_h5ad(par\$input_test_mod1)
+
+
+cat("Performing DR on the mod1 values\\n")
+dr <- lmds::lmds(
+  rbind(input_train_mod1\$layers[["normalized"]], input_test_mod1\$layers[["normalized"]]), 
+  ndim = par\$n_pcs,
+  distance_method = par\$distance_method
+)
+
+ix <- seq_len(nrow(input_train_mod1))
+dr_train <- dr[ix, , drop = FALSE]
+dr_test <- dr[-ix, , drop = FALSE]
+
+rm(input_test_mod1)
+gc()
+
+
+cat("Reading mod2 files\\n")
+X_mod2 <- anndata::read_h5ad(par\$input_train_mod2)\$layers[["normalized"]]
+
+cat("Predicting for each column in modality 2\\n")
+preds <- pbapply::pblapply(
+  seq_len(ncol(X_mod2)),
+  function(i) {
+    y <- X_mod2[, i]
+    uy <- unique(y)
+    if (length(uy) > 1) {
+      fit <- RcppArmadillo::fastLm(dr_train, y)
+      # fit <- lm(y ~ ., dr_train)
+      stats::predict(fit, dr_test)
+    } else {
+      rep(uy, nrow(dr_test))
+    }
+  }
+)
+
+cat("Creating outputs object\\n")
+prediction <- Matrix::Matrix(do.call(cbind, preds), sparse = TRUE)
+rownames(prediction) <- rownames(dr_test)
+colnames(prediction) <- colnames(X_mod2)
+
+out <- anndata::AnnData(
+  layers = list(normalized = prediction),
+  shape = dim(prediction),
+  uns = list(
+    dataset_id = input_train_mod1\$uns[["dataset_id"]],
+    method_id = meta\$functionality_name
+  )
+)
+
+cat("Writing predictions to file\\n")
+zzz <- out\$write_h5ad(par\$output, compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/predict_modality/methods/lmds_irlba_rf/.config.vsh.yaml b/target/docker/predict_modality/methods/lmds_irlba_rf/.config.vsh.yaml
new file mode 100644
index 0000000000..a79d19c5cb
--- /dev/null
+++ b/target/docker/predict_modality/methods/lmds_irlba_rf/.config.vsh.yaml
@@ -0,0 +1,405 @@
+functionality:
+  name: "lmds_irlba_rf"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--distance_method"
+    description: "The distance method to use. Possible values are euclidean, pearson,\
+      \ spearman and others."
+    info: null
+    default:
+    - "pearson"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pcs"
+    description: "Number of principal components to use."
+    info: null
+    default:
+    - 20
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_trees"
+    description: "Number of trees to use."
+    info: null
+    default:
+    - 500
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "LMDS + IRLBA + RF"
+    summary: "A random forest regression using LMDS of modality 1 to predict a PCA\
+      \ embedding of modality 2, which is then reversed to predict the original modality\
+      \ 2."
+    description: "A random forest regression using LMDS of modality 1 to predict a\
+      \ PCA embedding of modality 2, which is then reversed to predict the original\
+      \ modality 2.\n"
+    reference: "lance2022multimodal"
+    documentation_url: "https://github.com/openproblems-bio/openproblems/tree/main/src/tasks/predict_modality/methods"
+    repository_url: "https://github.com/openproblems-bio/openproblems"
+    preferred_normalization: "log_cp10k"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A regression method."
+      description: "A regression method to predict the expression of one modality\
+        \ from another.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "lmds"
+    - "ranger"
+    - "pbapply"
+    - "irlba"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/lmds_irlba_rf/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/methods/lmds_irlba_rf"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/methods/lmds_irlba_rf/lmds_irlba_rf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/predict_modality/methods/lmds_irlba_rf/lmds_irlba_rf b/target/docker/predict_modality/methods/lmds_irlba_rf/lmds_irlba_rf
new file mode 100755
index 0000000000..e9c7fb2e52
--- /dev/null
+++ b/target/docker/predict_modality/methods/lmds_irlba_rf/lmds_irlba_rf
@@ -0,0 +1,1152 @@
+#!/usr/bin/env bash
+
+# lmds_irlba_rf 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="lmds_irlba_rf"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "lmds_irlba_rf 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+  echo ""
+  echo "    --input_train_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+  echo ""
+  echo "    --input_test_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+  echo ""
+  echo "    --distance_method"
+  echo "        type: string"
+  echo "        default: pearson"
+  echo "        The distance method to use. Possible values are euclidean, pearson,"
+  echo "        spearman and others."
+  echo ""
+  echo "    --n_pcs"
+  echo "        type: integer"
+  echo "        default: 20"
+  echo "        Number of principal components to use."
+  echo ""
+  echo "    --n_trees"
+  echo "        type: integer"
+  echo "        default: 500"
+  echo "        Number of trees to use."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_cran(c("lmds", "ranger", "pbapply", "irlba"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component predict_modality/methods lmds_irlba_rf"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:38Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-lmds_irlba_rf-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "lmds_irlba_rf 2.0.0"
+            exit
+            ;;
+        --input_train_mod1)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_train_mod2)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test_mod1)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1=*\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --distance_method)
+            [ -n "$VIASH_PAR_DISTANCE_METHOD" ] && ViashError Bad arguments for option \'--distance_method\': \'$VIASH_PAR_DISTANCE_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DISTANCE_METHOD="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --distance_method. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --distance_method=*)
+            [ -n "$VIASH_PAR_DISTANCE_METHOD" ] && ViashError Bad arguments for option \'--distance_method=*\': \'$VIASH_PAR_DISTANCE_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DISTANCE_METHOD=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_pcs)
+            [ -n "$VIASH_PAR_N_PCS" ] && ViashError Bad arguments for option \'--n_pcs\': \'$VIASH_PAR_N_PCS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_pcs. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_pcs=*)
+            [ -n "$VIASH_PAR_N_PCS" ] && ViashError Bad arguments for option \'--n_pcs=*\': \'$VIASH_PAR_N_PCS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_trees)
+            [ -n "$VIASH_PAR_N_TREES" ] && ViashError Bad arguments for option \'--n_trees\': \'$VIASH_PAR_N_TREES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_TREES="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_trees. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_trees=*)
+            [ -n "$VIASH_PAR_N_TREES" ] && ViashError Bad arguments for option \'--n_trees=*\': \'$VIASH_PAR_N_TREES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_TREES=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/lmds_irlba_rf:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/lmds_irlba_rf:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/lmds_irlba_rf:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/lmds_irlba_rf:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then
+  ViashError '--input_train_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then
+  ViashError '--input_train_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then
+  ViashError '--input_test_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_DISTANCE_METHOD+x} ]; then
+  VIASH_PAR_DISTANCE_METHOD="pearson"
+fi
+if [ -z ${VIASH_PAR_N_PCS+x} ]; then
+  VIASH_PAR_N_PCS="20"
+fi
+if [ -z ${VIASH_PAR_N_TREES+x} ]; then
+  VIASH_PAR_N_TREES="500"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD2' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST_MOD1' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_PCS" ]]; then
+  if ! [[ "$VIASH_PAR_N_PCS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_pcs' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_TREES" ]]; then
+  if ! [[ "$VIASH_PAR_N_TREES" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_trees' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD1")" )
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD2")" )
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST_MOD1")" )
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/predict_modality/methods/lmds_irlba_rf:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/methods/lmds_irlba_rf:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/methods/lmds_irlba_rf:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-lmds_irlba_rf-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+cat("Loading dependencies\\n")
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("pbapply", quietly = TRUE)
+library(Matrix, warn.conflicts = FALSE, quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_train_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD1" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_train_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD2" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_test_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TEST_MOD1" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "distance_method" = $( if [ ! -z ${VIASH_PAR_DISTANCE_METHOD+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_DISTANCE_METHOD" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "n_pcs" = $( if [ ! -z ${VIASH_PAR_N_PCS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_PCS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "n_trees" = $( if [ ! -z ${VIASH_PAR_N_TREES+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_TREES" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+n_cores <- parallel::detectCores(all.tests = FALSE, logical = TRUE)
+
+cat("Reading mod1 files\\n")
+input_train_mod1 <- anndata::read_h5ad(par\$input_train_mod1)
+input_test_mod1 <- anndata::read_h5ad(par\$input_test_mod1)
+
+dataset_id <- input_train_mod1\$uns[["dataset_id"]]
+
+cat("Performing DR on the mod1 values\\n")
+dr <- lmds::lmds(
+  rbind(input_train_mod1\$layers[["normalized"]], input_test_mod1\$layers[["normalized"]]), 
+  ndim = par\$n_pcs,
+  distance_method = par\$distance_method
+)
+# alternative:
+# pr_out <- irlba::prcomp_irlba(
+#   rbind(input_train_mod1\$layers[["normalized"]], input_test_mod1\$layers[["normalized"]]),
+#   n = par\$n_pcs
+# )
+# dr <- pr_out\$x
+
+# split up dr data
+ix <- seq_len(nrow(input_train_mod1))
+dr_train <- as.data.frame(dr[ix, , drop = FALSE])
+dr_test <- as.data.frame(dr[-ix, , drop = FALSE])
+dr_train <- dr[ix, , drop = FALSE]
+dr_test <- dr[-ix, , drop = FALSE]
+
+rm(input_train_mod1, input_test_mod1)
+gc()
+
+
+cat("Reading mod2 files\\n")
+X_mod2 <- anndata::read_h5ad(par\$input_train_mod2)\$layers[["normalized"]]
+prcomp_mod2 <- irlba::prcomp_irlba(X_mod2, n = par\$n_pcs)
+dr_mod2 <- prcomp_mod2\$x
+
+cat("Predicting for each column in modality 2\\n")
+pred_drs <- pbapply::pblapply(
+  seq_len(ncol(dr_mod2)),
+  function(i) {
+    y <- dr_mod2[, i]
+    uy <- unique(y)
+    if (length(uy) > 1) {
+      rf <- ranger::ranger(
+        x = dr_train,
+        y = y,
+        num.trees = par\$n_trees,
+        num.threads = n_cores
+      )
+      stats::predict(rf, dr_test)\$prediction
+    } else {
+      rep(uy, nrow(dr_test))
+    }
+  }
+)
+
+cat("Creating outputs object\\n")
+pred_dr <- Matrix::Matrix(do.call(cbind, pred_drs), sparse = TRUE)
+prediction <- pred_dr %*% t(prcomp_mod2\$rotation)
+rownames(prediction) <- rownames(dr_test)
+colnames(prediction) <- colnames(X_mod2)
+
+out <- anndata::AnnData(
+  layers = list(normalized = as(prediction, "CsparseMatrix")),
+  shape = dim(prediction),
+  uns = list(
+    dataset_id = dataset_id,
+    method_id = meta\$functionality_name
+  )
+)
+
+
+cat("Writing predictions to file\\n")
+zzz <- out\$write_h5ad(par\$output, compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/predict_modality/methods/novel_predict/.config.vsh.yaml b/target/docker/predict_modality/methods/novel_predict/.config.vsh.yaml
new file mode 100644
index 0000000000..9327ef2e2c
--- /dev/null
+++ b/target/docker/predict_modality/methods/novel_predict/.config.vsh.yaml
@@ -0,0 +1,375 @@
+functionality:
+  name: "novel_predict"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_model"
+    info:
+      label: "Pretrained model"
+      summary: "A pretrained model for predicting the expression of one modality from\
+        \ another."
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_transform"
+    info: null
+    example:
+    - "lsi_transformer.pickle"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "../helper_functions.py"
+  info:
+    type: "method_predict"
+    type_info:
+      label: "Predict"
+      summary: "Make predictions using a trained model."
+      description: "This method makes predictions using a trained model.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_pytorch_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    - "networkx"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "hightime"
+    - "midcpu"
+    - "highsharedmem"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/predict/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/methods/novel_predict"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/methods/novel_predict/novel_predict"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/predict_modality/methods/novel_predict/helper_functions.py b/target/docker/predict_modality/methods/novel_predict/helper_functions.py
new file mode 100644
index 0000000000..17c57c9b3b
--- /dev/null
+++ b/target/docker/predict_modality/methods/novel_predict/helper_functions.py
@@ -0,0 +1,247 @@
+import torch
+
+from torch import nn
+import torch.nn.functional as F
+
+from torch.utils.data import Dataset
+
+from typing import Optional
+
+import anndata
+import numpy as np
+import pandas as pd
+import scipy.sparse
+import sklearn.decomposition
+import sklearn.feature_extraction.text
+import sklearn.preprocessing
+import sklearn.neighbors
+import sklearn.utils.extmath
+
+class tfidfTransformer():
+    def __init__(self):
+        self.idf = None
+        self.fitted = False
+
+    def fit(self, X):
+        self.idf = X.shape[0] / X.sum(axis=0)
+        self.fitted = True
+
+    def transform(self, X):
+        if not self.fitted:
+            raise RuntimeError('Transformer was not fitted on any data')
+        if scipy.sparse.issparse(X):
+            tf = X.multiply(1 / X.sum(axis=1))
+            return tf.multiply(self.idf)
+        else:
+            tf = X / X.sum(axis=1, keepdims=True)
+            return tf * self.idf
+
+    def fit_transform(self, X):
+        self.fit(X)
+        return self.transform(X)
+
+class lsiTransformer():
+    def __init__(self,
+                 n_components: int = 20,
+                 use_highly_variable = None
+                ):
+        self.n_components = n_components
+        self.use_highly_variable = use_highly_variable
+        self.tfidfTransformer = tfidfTransformer()
+        self.normalizer =  sklearn.preprocessing.Normalizer(norm="l1")
+        self.pcaTransformer = sklearn.decomposition.TruncatedSVD(n_components = self.n_components, random_state=777)
+        # self.lsi_mean = None
+        # self.lsi_std = None
+        self.fitted = None
+        
+    def fit(self, adata: anndata.AnnData):
+        if self.use_highly_variable is None:
+            self.use_highly_variable = "hvg" in adata.var
+        adata_use = adata[:, adata.var["hvg"]] if self.use_highly_variable else adata
+        X = self.tfidfTransformer.fit_transform(adata_use.X)
+        X_norm = self.normalizer.fit_transform(X)
+        X_norm = np.log1p(X_norm * 1e4)
+        X_lsi = self.pcaTransformer.fit_transform(X_norm)
+        # self.lsi_mean = X_lsi.mean(axis=1, keepdims=True)
+        # self.lsi_std = X_lsi.std(axis=1, ddof=1, keepdims=True)
+        self.fitted = True
+    
+    def transform(self, adata):
+        if not self.fitted:
+            raise RuntimeError('Transformer was not fitted on any data')
+        adata_use = adata[:, adata.var["hvg"]] if self.use_highly_variable else adata
+        X = self.tfidfTransformer.transform(adata_use.X)
+        X_norm = self.normalizer.transform(X)
+        X_norm = np.log1p(X_norm * 1e4)
+        X_lsi = self.pcaTransformer.transform(X_norm)
+        X_lsi -= X_lsi.mean(axis=1, keepdims=True)
+        X_lsi /= X_lsi.std(axis=1, ddof=1, keepdims=True)
+        lsi_df = pd.DataFrame(X_lsi, index = adata_use.obs_names)
+        return lsi_df
+    
+    def fit_transform(self, adata):
+        self.fit(adata)
+        return self.transform(adata)
+
+class ModalityMatchingDataset(Dataset):
+    def __init__(
+        self, df_modality1, df_modality2, is_train=True
+    ):
+        super().__init__()
+        self.df_modality1 = df_modality1
+        self.df_modality2 = df_modality2
+        self.is_train = is_train
+    def __len__(self):
+        return self.df_modality1.shape[0]
+    
+    def __getitem__(self, index: int):
+        if self.is_train == True:
+            x = self.df_modality1.iloc[index].values
+            y = self.df_modality2.iloc[index].values
+            return x, y
+        else:
+            x = self.df_modality1.iloc[index].values
+            return x
+
+class Swish(torch.autograd.Function):
+    @staticmethod
+    def forward(ctx, i):
+        result = i * sigmoid(i)
+        ctx.save_for_backward(i)
+        return result
+    @staticmethod
+    def backward(ctx, grad_output):
+        i = ctx.saved_variables[0]
+        sigmoid_i = sigmoid(i)
+        return grad_output * (sigmoid_i * (1 + i * (1 - sigmoid_i)))
+
+class Swish_module(nn.Module):
+    def forward(self, x):
+        return Swish.apply(x)
+    
+sigmoid = torch.nn.Sigmoid()
+
+class ModelRegressionGex2Atac(nn.Module):
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionGex2Atac, self).__init__()
+        #self.bn = torch.nn.BatchNorm1d(1024)
+        self.input_ = nn.Linear(dim_mod1, 1024)
+        self.fc = nn.Linear(1024, 256)
+        self.fc1 = nn.Linear(256, 2048)
+        self.dropout1 = nn.Dropout(p=0.298885630228993)
+        self.dropout2 = nn.Dropout(p=0.11289717442776658)
+        self.dropout3 = nn.Dropout(p=0.13523634924414762)
+        self.output = nn.Linear(2048, dim_mod2)
+    def forward(self, x):
+        x = F.gelu(self.input_(x))
+        x = self.dropout1(x)
+        x = F.gelu(self.fc(x))
+        x = self.dropout2(x)
+        x = F.gelu(self.fc1(x))
+        x = self.dropout3(x)
+        x = F.gelu(self.output(x))
+        return x
+
+class ModelRegressionAtac2Gex(nn.Module): #
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionAtac2Gex, self).__init__()
+        self.input_ = nn.Linear(dim_mod1, 2048)
+        self.fc = nn.Linear(2048, 2048)
+        self.fc1 = nn.Linear(2048, 512)
+        self.dropout1 = nn.Dropout(p=0.2649138776004753)
+        self.dropout2 = nn.Dropout(p=0.1769628308148758)
+        self.dropout3 = nn.Dropout(p=0.2516791883012817)
+        self.output = nn.Linear(512, dim_mod2)
+    def forward(self, x):
+        x = F.gelu(self.input_(x))
+        x = self.dropout1(x)
+        x = F.gelu(self.fc(x))
+        x = self.dropout2(x)
+        x = F.gelu(self.fc1(x))
+        x = self.dropout3(x)
+        x = F.gelu(self.output(x))
+        return x
+
+class ModelRegressionAdt2Gex(nn.Module):
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionAdt2Gex, self).__init__()
+        self.input_ = nn.Linear(dim_mod1, 512)
+        self.dropout1 = nn.Dropout(p=0.0)
+        self.swish = Swish_module()
+        self.fc = nn.Linear(512, 512)
+        self.fc1 = nn.Linear(512, 512)
+        self.fc2 = nn.Linear(512, 512)
+        self.output = nn.Linear(512, dim_mod2)
+    def forward(self, x):
+        x = F.gelu(self.input_(x))
+        x = F.gelu(self.fc(x))
+        x = F.gelu(self.fc1(x))
+        x = F.gelu(self.fc2(x))
+        x = F.gelu(self.output(x))
+        return x
+    
+class ModelRegressionGex2Adt(nn.Module):
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionGex2Adt, self).__init__()
+        self.input_ = nn.Linear(dim_mod1, 512)
+        self.dropout1 = nn.Dropout(p=0.20335661386636347)
+        self.dropout2 = nn.Dropout(p=0.15395289261127876)
+        self.dropout3 = nn.Dropout(p=0.16902655078832815)
+        self.fc = nn.Linear(512, 512)
+        self.fc1 = nn.Linear(512, 2048)
+        self.output = nn.Linear(2048, dim_mod2)
+    def forward(self, x):
+       # x = self.batchswap_noise(x)
+        x = F.gelu(self.input_(x))
+        x = self.dropout1(x)
+        x = F.gelu(self.fc(x))
+        x = self.dropout2(x)
+        x = F.gelu(self.fc1(x))
+        x = self.dropout3(x)
+        x = F.gelu(self.output(x))
+        return x
+
+def rmse(y, y_pred):
+    return np.sqrt(np.mean(np.square(y - y_pred)))
+
+def train_and_valid(model, optimizer, loss_fn, dataloader_train, dataloader_test, name_model, device):
+    best_score = 100000
+    for i in range(100):
+        train_losses = []
+        test_losses = []
+        model.train()
+
+        for x, y in dataloader_train:
+            optimizer.zero_grad()
+            output = model(x.float().to(device))
+            loss = torch.sqrt(loss_fn(output, y.float().to(device)))
+            loss.backward()
+            train_losses.append(loss.item())
+            optimizer.step()
+           
+        model.eval()
+        with torch.no_grad():
+            for x, y in dataloader_test:
+                output = model(x.float().to(device))
+                output[output<0] = 0.0
+                loss = torch.sqrt(loss_fn(output, y.float().to(device)))
+                test_losses.append(loss.item())
+        
+        outputs = []
+        targets = []
+        model.eval()
+        with torch.no_grad():
+            for x, y in dataloader_test:
+                output = model(x.float().to(device))
+                
+                outputs.append(output.detach().cpu().numpy())
+                targets.append(y.float().detach().cpu().numpy())
+        cat_outputs = np.concatenate(outputs)
+        cat_targets = np.concatenate(targets)
+        cat_outputs[cat_outputs<0.0] = 0
+        
+        if best_score > rmse(cat_targets,cat_outputs):
+            torch.save(model.state_dict(), name_model)
+            best_score = rmse(cat_targets,cat_outputs)
+    print("best rmse: ", best_score)
+    
diff --git a/target/docker/predict_modality/methods/novel_predict/novel_predict b/target/docker/predict_modality/methods/novel_predict/novel_predict
new file mode 100755
index 0000000000..5bc9d639b1
--- /dev/null
+++ b/target/docker/predict_modality/methods/novel_predict/novel_predict
@@ -0,0 +1,1141 @@
+#!/usr/bin/env bash
+
+# novel_predict 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="novel_predict"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "novel_predict 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train_mod1"
+  echo "        type: file, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+  echo ""
+  echo "    --input_train_mod2"
+  echo "        type: file, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+  echo ""
+  echo "    --input_test_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+  echo ""
+  echo "    --input_model"
+  echo "        type: file, required parameter, file must exist"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+  echo ""
+  echo "    --input_transform"
+  echo "        type: file, file must exist"
+  echo "        example: lsi_transformer.pickle"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_pytorch_nvidia:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scikit-learn" "networkx"
+
+LABEL org.opencontainers.image.description="Companion container for running component predict_modality/methods novel_predict"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:39Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-novel_predict-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "novel_predict 2.0.0"
+            exit
+            ;;
+        --input_train_mod1)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_train_mod2)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test_mod1)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1=*\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_model)
+            [ -n "$VIASH_PAR_INPUT_MODEL" ] && ViashError Bad arguments for option \'--input_model\': \'$VIASH_PAR_INPUT_MODEL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MODEL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_model. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_model=*)
+            [ -n "$VIASH_PAR_INPUT_MODEL" ] && ViashError Bad arguments for option \'--input_model=*\': \'$VIASH_PAR_INPUT_MODEL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MODEL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_transform)
+            [ -n "$VIASH_PAR_INPUT_TRANSFORM" ] && ViashError Bad arguments for option \'--input_transform\': \'$VIASH_PAR_INPUT_TRANSFORM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRANSFORM="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_transform. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_transform=*)
+            [ -n "$VIASH_PAR_INPUT_TRANSFORM" ] && ViashError Bad arguments for option \'--input_transform=*\': \'$VIASH_PAR_INPUT_TRANSFORM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRANSFORM=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/novel_predict:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/novel_predict:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/novel_predict:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/novel_predict:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then
+  ViashError '--input_test_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_MODEL+x} ]; then
+  ViashError '--input_model' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD2' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MODEL" ] && [ ! -e "$VIASH_PAR_INPUT_MODEL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MODEL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRANSFORM" ] && [ ! -e "$VIASH_PAR_INPUT_TRANSFORM" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRANSFORM' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD1")" )
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD2")" )
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST_MOD1")" )
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MODEL" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MODEL")" )
+  VIASH_PAR_INPUT_MODEL=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MODEL")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRANSFORM" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRANSFORM")" )
+  VIASH_PAR_INPUT_TRANSFORM=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRANSFORM")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/predict_modality/methods/novel_predict:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/methods/novel_predict:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/methods/novel_predict:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-novel_predict-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+import torch
+from torch.utils.data import DataLoader
+
+import anndata as ad
+import pickle
+import numpy as np
+from scipy.sparse import csc_matrix
+
+#check gpu available
+if (torch.cuda.is_available()):
+    device = 'cuda:0' #switch to current device
+    print('current device: gpu', flush=True)
+else:
+    device = 'cpu'
+    print('current device: cpu', flush=True)
+
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_train_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_model': $( if [ ! -z ${VIASH_PAR_INPUT_MODEL+x} ]; then echo "r'${VIASH_PAR_INPUT_MODEL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_transform': $( if [ ! -z ${VIASH_PAR_INPUT_TRANSFORM+x} ]; then echo "r'${VIASH_PAR_INPUT_TRANSFORM//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta['resources_dir'])
+from helper_functions import ModelRegressionAtac2Gex, ModelRegressionAdt2Gex, ModelRegressionGex2Adt, ModelRegressionGex2Atac, ModalityMatchingDataset
+
+print("Load data", flush=True)
+
+input_test_mod1 = ad.read_h5ad(par['input_test_mod1'])
+input_train_mod2 = ad.read_h5ad(par['input_train_mod2'])
+
+mod1 = input_test_mod1.uns['modality']
+mod2 = input_train_mod2.uns['modality']
+
+n_vars_mod1 = input_train_mod2.uns["model_dim"]["mod1"]
+n_vars_mod2 = input_train_mod2.uns["model_dim"]["mod2"]
+
+input_test_mod1.X = input_test_mod1.layers['normalized'].tocsr()
+
+# Remove vars that were removed from training set. Mostlyy only applicable for testing.
+if input_train_mod2.uns.get("removed_vars"):
+  rem_var = input_train_mod2.uns["removed_vars"]
+  input_test_mod1 = input_test_mod1[:, ~input_test_mod1.var_names.isin(rem_var)]
+
+del input_train_mod2
+
+
+model_fp = par['input_model']
+
+print("Start predict", flush=True)
+
+if mod1 == 'GEX' and mod2 == 'ADT':
+  model = ModelRegressionGex2Adt(n_vars_mod1,n_vars_mod2)   
+  weight = torch.load(model_fp, map_location='cpu')    
+  with open(par['input_transform'], 'rb') as f:
+    lsi_transformer_gex = pickle.load(f)
+  
+  model.load_state_dict(weight)    
+  input_test_mod1_ = lsi_transformer_gex.transform(input_test_mod1)
+
+elif mod1 == 'GEX' and mod2 == 'ATAC':
+  model = ModelRegressionGex2Atac(n_vars_mod1,n_vars_mod2)   
+  weight = torch.load(model_fp, map_location='cpu')
+  with open(par['input_transform'], 'rb') as f:
+    lsi_transformer_gex = pickle.load(f)
+  
+  model.load_state_dict(weight)    
+  input_test_mod1_ = lsi_transformer_gex.transform(input_test_mod1)
+    
+elif mod1 == 'ATAC' and mod2 == 'GEX':
+  model = ModelRegressionAtac2Gex(n_vars_mod1,n_vars_mod2)   
+  weight = torch.load(model_fp, map_location='cpu')
+  with open(par['input_transform'], 'rb') as f:
+    lsi_transformer_gex = pickle.load(f)
+      
+  model.load_state_dict(weight)    
+  input_test_mod1_ = lsi_transformer_gex.transform(input_test_mod1)
+
+elif mod1 == 'ADT' and mod2 == 'GEX':
+    model = ModelRegressionAdt2Gex(n_vars_mod1,n_vars_mod2)   
+    weight = torch.load(model_fp, map_location='cpu')
+
+    model.load_state_dict(weight)    
+    input_test_mod1_ = input_test_mod1.to_df()
+    
+dataset_test = ModalityMatchingDataset(input_test_mod1_, None, is_train=False)
+dataloader_test = DataLoader(dataset_test, 32, shuffle = False, num_workers = 4)
+
+outputs = []
+model.eval()
+with torch.no_grad():
+    for x in dataloader_test:
+        output = model(x.float())
+        outputs.append(output.detach().cpu().numpy())
+
+outputs = np.concatenate(outputs)
+outputs[outputs<0] = 0
+outputs = csc_matrix(outputs)
+
+adata = ad.AnnData(
+    layers={"normalized": outputs},
+    shape=outputs.shape,
+    uns={
+        'dataset_id': input_test_mod1.uns['dataset_id'],
+        'method_id': meta['functionality_name'],
+    },
+)
+adata.write_h5ad(par['output'], compression = "gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MODEL" ]; then
+  VIASH_PAR_INPUT_MODEL=$(ViashStripAutomount "$VIASH_PAR_INPUT_MODEL")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRANSFORM" ]; then
+  VIASH_PAR_INPUT_TRANSFORM=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRANSFORM")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/predict_modality/methods/novel_train/.config.vsh.yaml b/target/docker/predict_modality/methods/novel_train/.config.vsh.yaml
new file mode 100644
index 0000000000..ce0f3a22c5
--- /dev/null
+++ b/target/docker/predict_modality/methods/novel_train/.config.vsh.yaml
@@ -0,0 +1,361 @@
+functionality:
+  name: "novel_train"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Pretrained model"
+      summary: "A pretrained model for predicting the expression of one modality from\
+        \ another."
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_transform"
+    description: "The output transform file"
+    info: null
+    default:
+    - "lsi_transformer.pickle"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_train_mod2"
+    description: "copy of the input with model dim in `.uns`"
+    info: null
+    default:
+    - "train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "../helper_functions.py"
+  info:
+    type: "method_train"
+    type_info:
+      label: "Train"
+      summary: "Train a model to predict the expression of one modality from another."
+      description: "This method trains a model to predict the expression of one modality\
+        \ from another.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_pytorch_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    - "networkx"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "hightime"
+    - "midcpu"
+    - "highsharedmem"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/train/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/methods/novel_train"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/methods/novel_train/novel_train"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/predict_modality/methods/novel_train/helper_functions.py b/target/docker/predict_modality/methods/novel_train/helper_functions.py
new file mode 100644
index 0000000000..17c57c9b3b
--- /dev/null
+++ b/target/docker/predict_modality/methods/novel_train/helper_functions.py
@@ -0,0 +1,247 @@
+import torch
+
+from torch import nn
+import torch.nn.functional as F
+
+from torch.utils.data import Dataset
+
+from typing import Optional
+
+import anndata
+import numpy as np
+import pandas as pd
+import scipy.sparse
+import sklearn.decomposition
+import sklearn.feature_extraction.text
+import sklearn.preprocessing
+import sklearn.neighbors
+import sklearn.utils.extmath
+
+class tfidfTransformer():
+    def __init__(self):
+        self.idf = None
+        self.fitted = False
+
+    def fit(self, X):
+        self.idf = X.shape[0] / X.sum(axis=0)
+        self.fitted = True
+
+    def transform(self, X):
+        if not self.fitted:
+            raise RuntimeError('Transformer was not fitted on any data')
+        if scipy.sparse.issparse(X):
+            tf = X.multiply(1 / X.sum(axis=1))
+            return tf.multiply(self.idf)
+        else:
+            tf = X / X.sum(axis=1, keepdims=True)
+            return tf * self.idf
+
+    def fit_transform(self, X):
+        self.fit(X)
+        return self.transform(X)
+
+class lsiTransformer():
+    def __init__(self,
+                 n_components: int = 20,
+                 use_highly_variable = None
+                ):
+        self.n_components = n_components
+        self.use_highly_variable = use_highly_variable
+        self.tfidfTransformer = tfidfTransformer()
+        self.normalizer =  sklearn.preprocessing.Normalizer(norm="l1")
+        self.pcaTransformer = sklearn.decomposition.TruncatedSVD(n_components = self.n_components, random_state=777)
+        # self.lsi_mean = None
+        # self.lsi_std = None
+        self.fitted = None
+        
+    def fit(self, adata: anndata.AnnData):
+        if self.use_highly_variable is None:
+            self.use_highly_variable = "hvg" in adata.var
+        adata_use = adata[:, adata.var["hvg"]] if self.use_highly_variable else adata
+        X = self.tfidfTransformer.fit_transform(adata_use.X)
+        X_norm = self.normalizer.fit_transform(X)
+        X_norm = np.log1p(X_norm * 1e4)
+        X_lsi = self.pcaTransformer.fit_transform(X_norm)
+        # self.lsi_mean = X_lsi.mean(axis=1, keepdims=True)
+        # self.lsi_std = X_lsi.std(axis=1, ddof=1, keepdims=True)
+        self.fitted = True
+    
+    def transform(self, adata):
+        if not self.fitted:
+            raise RuntimeError('Transformer was not fitted on any data')
+        adata_use = adata[:, adata.var["hvg"]] if self.use_highly_variable else adata
+        X = self.tfidfTransformer.transform(adata_use.X)
+        X_norm = self.normalizer.transform(X)
+        X_norm = np.log1p(X_norm * 1e4)
+        X_lsi = self.pcaTransformer.transform(X_norm)
+        X_lsi -= X_lsi.mean(axis=1, keepdims=True)
+        X_lsi /= X_lsi.std(axis=1, ddof=1, keepdims=True)
+        lsi_df = pd.DataFrame(X_lsi, index = adata_use.obs_names)
+        return lsi_df
+    
+    def fit_transform(self, adata):
+        self.fit(adata)
+        return self.transform(adata)
+
+class ModalityMatchingDataset(Dataset):
+    def __init__(
+        self, df_modality1, df_modality2, is_train=True
+    ):
+        super().__init__()
+        self.df_modality1 = df_modality1
+        self.df_modality2 = df_modality2
+        self.is_train = is_train
+    def __len__(self):
+        return self.df_modality1.shape[0]
+    
+    def __getitem__(self, index: int):
+        if self.is_train == True:
+            x = self.df_modality1.iloc[index].values
+            y = self.df_modality2.iloc[index].values
+            return x, y
+        else:
+            x = self.df_modality1.iloc[index].values
+            return x
+
+class Swish(torch.autograd.Function):
+    @staticmethod
+    def forward(ctx, i):
+        result = i * sigmoid(i)
+        ctx.save_for_backward(i)
+        return result
+    @staticmethod
+    def backward(ctx, grad_output):
+        i = ctx.saved_variables[0]
+        sigmoid_i = sigmoid(i)
+        return grad_output * (sigmoid_i * (1 + i * (1 - sigmoid_i)))
+
+class Swish_module(nn.Module):
+    def forward(self, x):
+        return Swish.apply(x)
+    
+sigmoid = torch.nn.Sigmoid()
+
+class ModelRegressionGex2Atac(nn.Module):
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionGex2Atac, self).__init__()
+        #self.bn = torch.nn.BatchNorm1d(1024)
+        self.input_ = nn.Linear(dim_mod1, 1024)
+        self.fc = nn.Linear(1024, 256)
+        self.fc1 = nn.Linear(256, 2048)
+        self.dropout1 = nn.Dropout(p=0.298885630228993)
+        self.dropout2 = nn.Dropout(p=0.11289717442776658)
+        self.dropout3 = nn.Dropout(p=0.13523634924414762)
+        self.output = nn.Linear(2048, dim_mod2)
+    def forward(self, x):
+        x = F.gelu(self.input_(x))
+        x = self.dropout1(x)
+        x = F.gelu(self.fc(x))
+        x = self.dropout2(x)
+        x = F.gelu(self.fc1(x))
+        x = self.dropout3(x)
+        x = F.gelu(self.output(x))
+        return x
+
+class ModelRegressionAtac2Gex(nn.Module): #
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionAtac2Gex, self).__init__()
+        self.input_ = nn.Linear(dim_mod1, 2048)
+        self.fc = nn.Linear(2048, 2048)
+        self.fc1 = nn.Linear(2048, 512)
+        self.dropout1 = nn.Dropout(p=0.2649138776004753)
+        self.dropout2 = nn.Dropout(p=0.1769628308148758)
+        self.dropout3 = nn.Dropout(p=0.2516791883012817)
+        self.output = nn.Linear(512, dim_mod2)
+    def forward(self, x):
+        x = F.gelu(self.input_(x))
+        x = self.dropout1(x)
+        x = F.gelu(self.fc(x))
+        x = self.dropout2(x)
+        x = F.gelu(self.fc1(x))
+        x = self.dropout3(x)
+        x = F.gelu(self.output(x))
+        return x
+
+class ModelRegressionAdt2Gex(nn.Module):
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionAdt2Gex, self).__init__()
+        self.input_ = nn.Linear(dim_mod1, 512)
+        self.dropout1 = nn.Dropout(p=0.0)
+        self.swish = Swish_module()
+        self.fc = nn.Linear(512, 512)
+        self.fc1 = nn.Linear(512, 512)
+        self.fc2 = nn.Linear(512, 512)
+        self.output = nn.Linear(512, dim_mod2)
+    def forward(self, x):
+        x = F.gelu(self.input_(x))
+        x = F.gelu(self.fc(x))
+        x = F.gelu(self.fc1(x))
+        x = F.gelu(self.fc2(x))
+        x = F.gelu(self.output(x))
+        return x
+    
+class ModelRegressionGex2Adt(nn.Module):
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionGex2Adt, self).__init__()
+        self.input_ = nn.Linear(dim_mod1, 512)
+        self.dropout1 = nn.Dropout(p=0.20335661386636347)
+        self.dropout2 = nn.Dropout(p=0.15395289261127876)
+        self.dropout3 = nn.Dropout(p=0.16902655078832815)
+        self.fc = nn.Linear(512, 512)
+        self.fc1 = nn.Linear(512, 2048)
+        self.output = nn.Linear(2048, dim_mod2)
+    def forward(self, x):
+       # x = self.batchswap_noise(x)
+        x = F.gelu(self.input_(x))
+        x = self.dropout1(x)
+        x = F.gelu(self.fc(x))
+        x = self.dropout2(x)
+        x = F.gelu(self.fc1(x))
+        x = self.dropout3(x)
+        x = F.gelu(self.output(x))
+        return x
+
+def rmse(y, y_pred):
+    return np.sqrt(np.mean(np.square(y - y_pred)))
+
+def train_and_valid(model, optimizer, loss_fn, dataloader_train, dataloader_test, name_model, device):
+    best_score = 100000
+    for i in range(100):
+        train_losses = []
+        test_losses = []
+        model.train()
+
+        for x, y in dataloader_train:
+            optimizer.zero_grad()
+            output = model(x.float().to(device))
+            loss = torch.sqrt(loss_fn(output, y.float().to(device)))
+            loss.backward()
+            train_losses.append(loss.item())
+            optimizer.step()
+           
+        model.eval()
+        with torch.no_grad():
+            for x, y in dataloader_test:
+                output = model(x.float().to(device))
+                output[output<0] = 0.0
+                loss = torch.sqrt(loss_fn(output, y.float().to(device)))
+                test_losses.append(loss.item())
+        
+        outputs = []
+        targets = []
+        model.eval()
+        with torch.no_grad():
+            for x, y in dataloader_test:
+                output = model(x.float().to(device))
+                
+                outputs.append(output.detach().cpu().numpy())
+                targets.append(y.float().detach().cpu().numpy())
+        cat_outputs = np.concatenate(outputs)
+        cat_targets = np.concatenate(targets)
+        cat_outputs[cat_outputs<0.0] = 0
+        
+        if best_score > rmse(cat_targets,cat_outputs):
+            torch.save(model.state_dict(), name_model)
+            best_score = rmse(cat_targets,cat_outputs)
+    print("best rmse: ", best_score)
+    
diff --git a/target/docker/predict_modality/methods/novel_train/novel_train b/target/docker/predict_modality/methods/novel_train/novel_train
new file mode 100755
index 0000000000..29e2bb5e8a
--- /dev/null
+++ b/target/docker/predict_modality/methods/novel_train/novel_train
@@ -0,0 +1,1188 @@
+#!/usr/bin/env bash
+
+# novel_train 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="novel_train"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "novel_train 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+  echo ""
+  echo "    --input_train_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+  echo ""
+  echo "    --input_test_mod1"
+  echo "        type: file, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo ""
+  echo "    --output_transform"
+  echo "        type: file, output, file must exist"
+  echo "        default: lsi_transformer.pickle"
+  echo "        The output transform file"
+  echo ""
+  echo "    --output_train_mod2"
+  echo "        type: file, output, file must exist"
+  echo "        default: train_mod2.h5ad"
+  echo "        copy of the input with model dim in \`.uns\`"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_pytorch_nvidia:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scikit-learn" "networkx"
+
+LABEL org.opencontainers.image.description="Companion container for running component predict_modality/methods novel_train"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:39Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-novel_train-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "novel_train 2.0.0"
+            exit
+            ;;
+        --input_train_mod1)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_train_mod2)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test_mod1)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1=*\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_transform)
+            [ -n "$VIASH_PAR_OUTPUT_TRANSFORM" ] && ViashError Bad arguments for option \'--output_transform\': \'$VIASH_PAR_OUTPUT_TRANSFORM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TRANSFORM="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_transform. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_transform=*)
+            [ -n "$VIASH_PAR_OUTPUT_TRANSFORM" ] && ViashError Bad arguments for option \'--output_transform=*\': \'$VIASH_PAR_OUTPUT_TRANSFORM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TRANSFORM=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_train_mod2)
+            [ -n "$VIASH_PAR_OUTPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--output_train_mod2\': \'$VIASH_PAR_OUTPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TRAIN_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_train_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_train_mod2=*)
+            [ -n "$VIASH_PAR_OUTPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--output_train_mod2=*\': \'$VIASH_PAR_OUTPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TRAIN_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/novel_train:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/novel_train:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/novel_train:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/novel_train:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then
+  ViashError '--input_train_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then
+  ViashError '--input_train_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_OUTPUT_TRANSFORM+x} ]; then
+  VIASH_PAR_OUTPUT_TRANSFORM="lsi_transformer.pickle"
+fi
+if [ -z ${VIASH_PAR_OUTPUT_TRAIN_MOD2+x} ]; then
+  VIASH_PAR_OUTPUT_TRAIN_MOD2="train_mod2.h5ad"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD2' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST_MOD1' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TRANSFORM" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_TRANSFORM")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_TRANSFORM")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN_MOD2" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_TRAIN_MOD2")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_TRAIN_MOD2")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD1")" )
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD2")" )
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST_MOD1")" )
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TRANSFORM" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_TRANSFORM")" )
+  VIASH_PAR_OUTPUT_TRANSFORM=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_TRANSFORM")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_TRANSFORM" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_TRAIN_MOD2")" )
+  VIASH_PAR_OUTPUT_TRAIN_MOD2=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_TRAIN_MOD2")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_TRAIN_MOD2" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/predict_modality/methods/novel_train:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/methods/novel_train:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/methods/novel_train:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-novel_train-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import sys
+
+import torch
+from torch.utils.data import DataLoader
+# from sklearn.model_selection import train_test_split
+
+import anndata as ad
+import pickle
+
+#check gpu available
+if (torch.cuda.is_available()):
+    device = 'cuda:0' #switch to current device
+    print('current device: gpu', flush=True)
+else:
+    device = 'cpu'
+    print('current device: cpu', flush=True)
+
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_train_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_transform': $( if [ ! -z ${VIASH_PAR_OUTPUT_TRANSFORM+x} ]; then echo "r'${VIASH_PAR_OUTPUT_TRANSFORM//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_train_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_TRAIN_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_TRAIN_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+
+sys.path.append(meta['resources_dir'])
+from helper_functions import train_and_valid, lsiTransformer, ModalityMatchingDataset
+from helper_functions import ModelRegressionAtac2Gex, ModelRegressionAdt2Gex, ModelRegressionGex2Adt, ModelRegressionGex2Atac
+
+print('Load data', flush=True)
+
+input_train_mod1 = ad.read_h5ad(par['input_train_mod1'])
+input_train_mod2 = ad.read_h5ad(par['input_train_mod2'])
+
+adata = input_train_mod2.copy()
+
+mod1 = input_train_mod1.uns['modality']
+mod2 = input_train_mod2.uns['modality']
+
+input_train_mod1.X = input_train_mod1.layers['normalized']
+input_train_mod2.X = input_train_mod2.layers['normalized']
+
+input_train_mod2_df = input_train_mod2.to_df()
+
+del input_train_mod2
+
+print('Start train', flush=True)
+
+
+# Check for zero divide
+zero_row = input_train_mod1.X.sum(axis=0) == 0
+
+rem_var = None
+if True in zero_row:
+  rem_var = input_train_mod1[:, zero_row].var_names
+  input_train_mod1 = input_train_mod1[:, ~zero_row]
+  
+
+# select number of variables for LSI
+n_comp = input_train_mod1.n_vars -1 if input_train_mod1.n_vars < 256 else 256
+
+if mod1 != 'ADT':  
+  lsi_transformer_gex = lsiTransformer(n_components=n_comp)
+  input_train_mod1_df = lsi_transformer_gex.fit_transform(input_train_mod1)
+else:
+  input_train_mod1_df = input_train_mod1.to_df()
+
+# reproduce train/test split from phase 1
+batch = input_train_mod1.obs["batch"]
+train_ix = [ k for k,v in enumerate(batch) if v not in {'s1d2', 's3d7'} ]
+test_ix = [ k for k,v in enumerate(batch) if v in {'s1d2', 's3d7'} ]
+
+train_mod1 = input_train_mod1_df.iloc[train_ix, :]
+train_mod2 = input_train_mod2_df.iloc[train_ix, :]
+test_mod1 = input_train_mod1_df.iloc[test_ix, :]
+test_mod2 = input_train_mod2_df.iloc[test_ix, :]
+
+n_vars_train_mod1 = train_mod1.shape[1]
+n_vars_train_mod2 = train_mod2.shape[1]
+n_vars_test_mod1 = test_mod1.shape[1]
+n_vars_test_mod2 = test_mod2.shape[1]
+
+n_vars_mod1 = input_train_mod1_df.shape[1]
+n_vars_mod2 = input_train_mod2_df.shape[1]
+  
+if mod1 == 'ATAC' and mod2 == 'GEX':
+  dataset_train = ModalityMatchingDataset(train_mod1, train_mod2)
+  dataloader_train = DataLoader(dataset_train, 256, shuffle = True, num_workers = 8)
+
+  dataset_test = ModalityMatchingDataset(test_mod1, test_mod2)
+  dataloader_test = DataLoader(dataset_test, 64, shuffle = False, num_workers = 8)
+
+  model = ModelRegressionAtac2Gex(n_vars_mod1,n_vars_mod2).to(device)
+  optimizer = torch.optim.AdamW(model.parameters(), lr=0.00008386597445284492,weight_decay=0.000684887347727808)
+        
+elif mod1 == 'ADT' and mod2 == 'GEX':
+  dataset_train = ModalityMatchingDataset(train_mod1, train_mod2)
+  dataloader_train = DataLoader(dataset_train, 64, shuffle = True, num_workers = 4)
+
+  dataset_test = ModalityMatchingDataset(test_mod1, test_mod2)
+  dataloader_test = DataLoader(dataset_test, 32, shuffle = False, num_workers = 4)
+
+  model = ModelRegressionAdt2Gex(n_vars_mod1,n_vars_mod2).to(device)
+  optimizer = torch.optim.Adam(model.parameters(), lr=0.00041, weight_decay=0.0000139)
+
+
+elif mod1 == 'GEX' and mod2 == 'ADT':
+  dataset_train = ModalityMatchingDataset(train_mod1, train_mod2)
+  dataloader_train = DataLoader(dataset_train, 32, shuffle = True, num_workers = 8)
+
+  dataset_test = ModalityMatchingDataset(test_mod1, test_mod2)
+  dataloader_test = DataLoader(dataset_test, 64, shuffle = False, num_workers = 8)
+
+  model = ModelRegressionGex2Adt(n_vars_mod1,n_vars_mod2).to(device)
+  optimizer = torch.optim.AdamW(model.parameters(), lr=0.000034609210829678734, weight_decay=0.0009965881574697426)
+
+
+elif mod1 == 'GEX' and mod2 == 'ATAC':
+  dataset_train = ModalityMatchingDataset(train_mod1, train_mod2)
+  dataloader_train = DataLoader(dataset_train, 64, shuffle = True, num_workers = 8)
+
+  dataset_test = ModalityMatchingDataset(test_mod1, test_mod2)
+  dataloader_test = DataLoader(dataset_test, 64, shuffle = False, num_workers = 8)
+
+  model = ModelRegressionGex2Atac(n_vars_mod1,n_vars_mod2).to(device)
+  optimizer = torch.optim.AdamW(model.parameters(), lr=0.00001806762345275399, weight_decay=0.0004084171379280058)
+
+loss_fn = torch.nn.MSELoss()
+train_and_valid(model, optimizer, loss_fn, dataloader_train, dataloader_test, par['output'], device)
+
+# Add model dim for use in predict part
+adata.uns["model_dim"] = {"mod1": n_vars_mod1, "mod2": n_vars_mod2}
+if rem_var:
+  adata.uns["removed_vars"] = [rem_var[0]]
+adata.write_h5ad(par['output_train_mod2'], compression="gzip")
+
+if mod1 != 'ADT':
+    with open(par['output_transform'], 'wb') as f:
+        pickle.dump(lsi_transformer_gex, f)
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TRANSFORM" ]; then
+  VIASH_PAR_OUTPUT_TRANSFORM=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_TRANSFORM")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN_MOD2" ]; then
+  VIASH_PAR_OUTPUT_TRAIN_MOD2=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TRANSFORM" ] && [ ! -e "$VIASH_PAR_OUTPUT_TRANSFORM" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_TRANSFORM' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN_MOD2" ] && [ ! -e "$VIASH_PAR_OUTPUT_TRAIN_MOD2" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_TRAIN_MOD2' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/predict_modality/methods/simplemlp_predict/.config.vsh.yaml b/target/docker/predict_modality/methods/simplemlp_predict/.config.vsh.yaml
new file mode 100644
index 0000000000..0d01e3ea65
--- /dev/null
+++ b/target/docker/predict_modality/methods/simplemlp_predict/.config.vsh.yaml
@@ -0,0 +1,364 @@
+functionality:
+  name: "simplemlp_predict"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_model"
+    info:
+      label: "Pretrained model"
+      summary: "A pretrained model for predicting the expression of one modality from\
+        \ another."
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "../resources/"
+  info:
+    type: "method_predict"
+    type_info:
+      label: "Predict"
+      summary: "Make predictions using a trained model."
+      description: "This method makes predictions using a trained model.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_pytorch_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scikit-learn"
+    - "scanpy"
+    - "pytorch-lightning"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "hightime"
+    - "midcpu"
+    - "gpu"
+    - "highsharedmem"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/predict/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/methods/simplemlp_predict"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/methods/simplemlp_predict/simplemlp_predict"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/predict_modality/methods/simplemlp_predict/resources/models.py b/target/docker/predict_modality/methods/simplemlp_predict/resources/models.py
new file mode 100644
index 0000000000..25ce9b2995
--- /dev/null
+++ b/target/docker/predict_modality/methods/simplemlp_predict/resources/models.py
@@ -0,0 +1,68 @@
+import torch
+import pytorch_lightning as pl
+import torch.nn as nn
+import torch.nn.functional as F
+
+class MLP(pl.LightningModule):
+    def __init__(self,in_dim,out_dim,ymean,config):
+        super(MLP, self).__init__()
+        self.ymean = ymean.cuda()
+        H1 = config.H1
+        H2 = config.H2
+        p = config.dropout
+        self.config = config
+        self.fc1 = nn.Linear(in_dim, H1)
+        self.fc2 = nn.Linear(H1,H2)
+        self.fc3 = nn.Linear(H1+H2, out_dim)
+        self.dp2 = nn.Dropout(p=p)
+
+    def forward(self, x):
+        x0 = x
+        x1 = F.relu(self.fc1(x))
+        x1 = self.dp2(x1)
+        x = F.relu(self.fc2(x1))
+        x = torch.cat([x,x1],dim=1)
+        x = self.fc3(x)
+        x = self.apply_mask(x)
+        return x
+    
+    def apply_mask(self,yp):
+        tmp = torch.ones_like(yp).float()*self.ymean
+        mask = tmp<self.config.threshold
+        mask = mask.float()
+        return yp*(1-mask) + tmp*mask
+    
+    def training_step(self, batch, batch_nb):
+        x,y = batch
+        yp = self(x)
+        criterion = nn.MSELoss()
+        loss = criterion(yp, y)
+        self.log('train_loss', loss, prog_bar=True)
+        return loss
+    
+    def validation_step(self, batch, batch_idx):
+        x,y = batch
+        yp = self(x)
+        criterion = nn.MSELoss()
+        loss = criterion(yp, y)
+        self.log('valid_RMSE', loss**0.5, prog_bar=True)
+        return loss
+    
+    def predict_step(self, batch, batch_idx):
+        if len(batch) == 2:
+            x,_ = batch
+        else:
+            x = batch
+        return self(x)
+    
+    def configure_optimizers(self):
+        lr = self.config.lr
+        wd = float(self.config.wd)
+        adam = torch.optim.Adam(self.parameters(), lr=lr, weight_decay=wd)
+        if self.config.lr_schedule == 'adam':
+            return adam
+        elif self.config.lr_schedule == 'adam_cosin':
+            slr = torch.optim.lr_scheduler.CosineAnnealingLR(adam, self.config.epochs)
+            return [adam], [slr]
+        else:
+            assert 0
\ No newline at end of file
diff --git a/target/docker/predict_modality/methods/simplemlp_predict/resources/utils.py b/target/docker/predict_modality/methods/simplemlp_predict/resources/utils.py
new file mode 100644
index 0000000000..d001b8e0f7
--- /dev/null
+++ b/target/docker/predict_modality/methods/simplemlp_predict/resources/utils.py
@@ -0,0 +1,37 @@
+import yaml
+from collections import namedtuple
+
+
+def to_site_donor(data):
+    df = data.obs['batch'].copy().to_frame().reset_index()
+    df.columns = ['index','batch']
+    df['site'] = df['batch'].apply(lambda x: x[:2])
+    df['donor'] = df['batch'].apply(lambda x: x[2:]) 
+    return df
+
+
+def split(tr1, tr2, fold):
+    df = to_site_donor(tr1) 
+    mask = df['site'] == f's{fold+1}'
+    maskr = ~mask
+
+    Xt = tr1[mask].layers["normalized"].toarray()
+    X = tr1[maskr].layers["normalized"].toarray()
+
+    yt = tr2[mask].layers["normalized"].toarray()
+    y = tr2[maskr].layers["normalized"].toarray()
+
+    print(f"{X.shape}, {y.shape}, {Xt.shape}, {yt.shape}")
+
+    return X,y,Xt,yt
+
+
+def load_yaml(path):
+    with open(path) as f:
+        x = yaml.safe_load(f)
+    res = {}
+    for i in x:
+        res[i] = x[i]['value']
+    config = namedtuple('Config', res.keys())(**res)
+    print(config)
+    return config
diff --git a/target/docker/predict_modality/methods/simplemlp_predict/resources/yaml/mlp_ADT2GEX.yaml b/target/docker/predict_modality/methods/simplemlp_predict/resources/yaml/mlp_ADT2GEX.yaml
new file mode 100644
index 0000000000..13db5b490e
--- /dev/null
+++ b/target/docker/predict_modality/methods/simplemlp_predict/resources/yaml/mlp_ADT2GEX.yaml
@@ -0,0 +1,28 @@
+# sample config defaults file
+epochs:
+  desc: Number of epochs to train over
+  value: 10
+batch_size:
+  desc: Size of each mini-batch
+  value: 512
+H1:
+  desc: Number of hidden neurons in 1st layer of MLP
+  value: 256
+H2:
+  desc: Number of hidden neurons in 2nd layer of MLP
+  value: 128
+dropout:
+  desc: probs of zeroing values
+  value: 0
+lr:
+  desc: learning rate
+  value: 0.001
+wd:
+  desc: weight decay
+  value: 1e-5
+threshold:
+  desc: threshold to set values to zero
+  value: 0
+lr_schedule:
+  desc: learning rate scheduler
+  value: adam
\ No newline at end of file
diff --git a/target/docker/predict_modality/methods/simplemlp_predict/resources/yaml/mlp_ATAC2GEX.yaml b/target/docker/predict_modality/methods/simplemlp_predict/resources/yaml/mlp_ATAC2GEX.yaml
new file mode 100644
index 0000000000..ee714a47ea
--- /dev/null
+++ b/target/docker/predict_modality/methods/simplemlp_predict/resources/yaml/mlp_ATAC2GEX.yaml
@@ -0,0 +1,28 @@
+# sample config defaults file
+epochs:
+  desc: Number of epochs to train over
+  value: 10
+batch_size:
+  desc: Size of each mini-batch
+  value: 512
+H1:
+  desc: Number of hidden neurons in 1st layer of MLP
+  value: 256
+H2:
+  desc: Number of hidden neurons in 2nd layer of MLP
+  value: 128
+dropout:
+  desc: probs of zeroing values
+  value: 0.5
+lr:
+  desc: learning rate
+  value: 0.001
+wd:
+  desc: weight decay
+  value: 1e-5
+threshold:
+  desc: threshold to set values to zero
+  value: 0
+lr_schedule:
+  desc: learning rate scheduler
+  value: adam
\ No newline at end of file
diff --git a/target/docker/predict_modality/methods/simplemlp_predict/resources/yaml/mlp_GEX2ADT.yaml b/target/docker/predict_modality/methods/simplemlp_predict/resources/yaml/mlp_GEX2ADT.yaml
new file mode 100644
index 0000000000..80dfededd9
--- /dev/null
+++ b/target/docker/predict_modality/methods/simplemlp_predict/resources/yaml/mlp_GEX2ADT.yaml
@@ -0,0 +1,28 @@
+# sample config defaults file
+epochs:
+  desc: Number of epochs to train over
+  value: 10 
+batch_size:
+  desc: Size of each mini-batch
+  value: 512
+H1:
+  desc: Number of hidden neurons in 1st layer of MLP
+  value: 1024 
+H2:
+  desc: Number of hidden neurons in 2nd layer of MLP
+  value: 512
+dropout:
+  desc: probs of zeroing values
+  value: 0
+lr:
+  desc: learning rate
+  value: 0.001
+wd:
+  desc: weight decay
+  value: 1e-5
+threshold:
+  desc: threshold to set values to zero
+  value: 0.05
+lr_schedule:
+  desc: learning rate scheduler
+  value: adam_cosin
\ No newline at end of file
diff --git a/target/docker/predict_modality/methods/simplemlp_predict/simplemlp_predict b/target/docker/predict_modality/methods/simplemlp_predict/simplemlp_predict
new file mode 100755
index 0000000000..7795a388a8
--- /dev/null
+++ b/target/docker/predict_modality/methods/simplemlp_predict/simplemlp_predict
@@ -0,0 +1,1101 @@
+#!/usr/bin/env bash
+
+# simplemlp_predict 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="simplemlp_predict"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "simplemlp_predict 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train_mod1"
+  echo "        type: file, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+  echo ""
+  echo "    --input_train_mod2"
+  echo "        type: file, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+  echo ""
+  echo "    --input_test_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+  echo ""
+  echo "    --input_model"
+  echo "        type: file, required parameter, file must exist"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_pytorch_nvidia:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scikit-learn" "scanpy" "pytorch-lightning"
+
+LABEL org.opencontainers.image.description="Companion container for running component predict_modality/methods simplemlp_predict"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:40Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-simplemlp_predict-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "simplemlp_predict 2.0.0"
+            exit
+            ;;
+        --input_train_mod1)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_train_mod2)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test_mod1)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1=*\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_model)
+            [ -n "$VIASH_PAR_INPUT_MODEL" ] && ViashError Bad arguments for option \'--input_model\': \'$VIASH_PAR_INPUT_MODEL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MODEL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_model. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_model=*)
+            [ -n "$VIASH_PAR_INPUT_MODEL" ] && ViashError Bad arguments for option \'--input_model=*\': \'$VIASH_PAR_INPUT_MODEL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MODEL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/simplemlp_predict:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/simplemlp_predict:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/simplemlp_predict:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/simplemlp_predict:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then
+  ViashError '--input_test_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_MODEL+x} ]; then
+  ViashError '--input_model' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD2' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MODEL" ] && [ ! -e "$VIASH_PAR_INPUT_MODEL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MODEL' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD1")" )
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD2")" )
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST_MOD1")" )
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MODEL" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MODEL")" )
+  VIASH_PAR_INPUT_MODEL=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MODEL")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/predict_modality/methods/simplemlp_predict:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/methods/simplemlp_predict:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/methods/simplemlp_predict:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-simplemlp_predict-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+from glob import glob
+import sys
+import numpy as np
+from scipy.sparse import csc_matrix
+import anndata as ad
+import torch
+from torch.utils.data import TensorDataset,DataLoader
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_train_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_model': $( if [ ! -z ${VIASH_PAR_INPUT_MODEL+x} ]; then echo "r'${VIASH_PAR_INPUT_MODEL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+resources_dir = f"{meta['resources_dir']}/resources"
+sys.path.append(resources_dir)
+from models import MLP
+import utils
+
+def _predict(model,dl):
+  model = model.cuda()
+  model.eval()
+  yps = []
+  for x in dl:
+    with torch.no_grad():
+      yp = model(x[0].cuda())
+      yps.append(yp.detach().cpu().numpy())
+  yp = np.vstack(yps)
+  return yp
+
+
+print('Load data', flush=True)
+input_train_mod2 = ad.read_h5ad(par['input_train_mod2'])
+input_test_mod1 = ad.read_h5ad(par['input_test_mod1'])
+
+# determine variables
+mod_1 = input_test_mod1.uns['modality']
+mod_2 = input_train_mod2.uns['modality']
+
+task = f'{mod_1}2{mod_2}'
+
+print('Load ymean', flush=True)
+ymean_path = f"{par['input_model']}/{task}_ymean.npy"
+ymean = np.load(ymean_path)
+
+print('Start predict', flush=True)
+if task == 'GEX2ATAC':
+    y_pred = ymean*np.ones([input_test_mod1.n_obs, input_test_mod1.n_vars])
+else:
+    folds = [0, 1, 2]
+
+    ymean = torch.from_numpy(ymean).float()
+    yaml_path=f"{resources_dir}/yaml/mlp_{task}.yaml"
+    config = utils.load_yaml(yaml_path)
+    X = input_test_mod1.layers["normalized"].toarray()
+    X = torch.from_numpy(X).float()
+    
+    te_ds = TensorDataset(X)
+    
+    yp = 0
+    for fold in folds:
+        # load_path = f"{par['input_model']}/{task}_fold_{fold}/version_0/checkpoints/*"
+        load_path = f"{par['input_model']}/{task}_fold_{fold}/**.ckpt"
+        print(load_path)
+        ckpt = glob(load_path)[0]
+        model_inf = MLP.load_from_checkpoint(
+            ckpt,
+            in_dim=X.shape[1],
+            out_dim=input_test_mod1.n_vars,
+            ymean=ymean,
+            config=config
+        )
+        te_loader = DataLoader(
+            te_ds,
+            batch_size=config.batch_size,
+            num_workers=0,
+            shuffle=False,
+            drop_last=False
+        )
+        yp = yp + _predict(model_inf, te_loader)
+
+    y_pred = yp/len(folds)
+
+y_pred = csc_matrix(y_pred)
+
+adata = ad.AnnData(
+    layers={"normalized": y_pred},
+    shape=y_pred.shape,
+    uns={
+        'dataset_id': input_test_mod1.uns['dataset_id'],
+        'method_id': meta['functionality_name'],
+    },
+)
+
+print('Write data', flush=True)
+adata.write_h5ad(par['output'], compression = "gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MODEL" ]; then
+  VIASH_PAR_INPUT_MODEL=$(ViashStripAutomount "$VIASH_PAR_INPUT_MODEL")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/predict_modality/methods/simplemlp_train/.config.vsh.yaml b/target/docker/predict_modality/methods/simplemlp_train/.config.vsh.yaml
new file mode 100644
index 0000000000..3c1de0ca19
--- /dev/null
+++ b/target/docker/predict_modality/methods/simplemlp_train/.config.vsh.yaml
@@ -0,0 +1,336 @@
+functionality:
+  name: "simplemlp_train"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Pretrained model"
+      summary: "A pretrained model for predicting the expression of one modality from\
+        \ another."
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "../resources/"
+  info:
+    type: "method_train"
+    type_info:
+      label: "Train"
+      summary: "Train a model to predict the expression of one modality from another."
+      description: "This method trains a model to predict the expression of one modality\
+        \ from another.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_pytorch_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scikit-learn"
+    - "scanpy"
+    - "pytorch-lightning"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "hightime"
+    - "midcpu"
+    - "gpu"
+    - "highsharedmem"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/train/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/methods/simplemlp_train"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/methods/simplemlp_train/simplemlp_train"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/predict_modality/methods/simplemlp_train/resources/models.py b/target/docker/predict_modality/methods/simplemlp_train/resources/models.py
new file mode 100644
index 0000000000..25ce9b2995
--- /dev/null
+++ b/target/docker/predict_modality/methods/simplemlp_train/resources/models.py
@@ -0,0 +1,68 @@
+import torch
+import pytorch_lightning as pl
+import torch.nn as nn
+import torch.nn.functional as F
+
+class MLP(pl.LightningModule):
+    def __init__(self,in_dim,out_dim,ymean,config):
+        super(MLP, self).__init__()
+        self.ymean = ymean.cuda()
+        H1 = config.H1
+        H2 = config.H2
+        p = config.dropout
+        self.config = config
+        self.fc1 = nn.Linear(in_dim, H1)
+        self.fc2 = nn.Linear(H1,H2)
+        self.fc3 = nn.Linear(H1+H2, out_dim)
+        self.dp2 = nn.Dropout(p=p)
+
+    def forward(self, x):
+        x0 = x
+        x1 = F.relu(self.fc1(x))
+        x1 = self.dp2(x1)
+        x = F.relu(self.fc2(x1))
+        x = torch.cat([x,x1],dim=1)
+        x = self.fc3(x)
+        x = self.apply_mask(x)
+        return x
+    
+    def apply_mask(self,yp):
+        tmp = torch.ones_like(yp).float()*self.ymean
+        mask = tmp<self.config.threshold
+        mask = mask.float()
+        return yp*(1-mask) + tmp*mask
+    
+    def training_step(self, batch, batch_nb):
+        x,y = batch
+        yp = self(x)
+        criterion = nn.MSELoss()
+        loss = criterion(yp, y)
+        self.log('train_loss', loss, prog_bar=True)
+        return loss
+    
+    def validation_step(self, batch, batch_idx):
+        x,y = batch
+        yp = self(x)
+        criterion = nn.MSELoss()
+        loss = criterion(yp, y)
+        self.log('valid_RMSE', loss**0.5, prog_bar=True)
+        return loss
+    
+    def predict_step(self, batch, batch_idx):
+        if len(batch) == 2:
+            x,_ = batch
+        else:
+            x = batch
+        return self(x)
+    
+    def configure_optimizers(self):
+        lr = self.config.lr
+        wd = float(self.config.wd)
+        adam = torch.optim.Adam(self.parameters(), lr=lr, weight_decay=wd)
+        if self.config.lr_schedule == 'adam':
+            return adam
+        elif self.config.lr_schedule == 'adam_cosin':
+            slr = torch.optim.lr_scheduler.CosineAnnealingLR(adam, self.config.epochs)
+            return [adam], [slr]
+        else:
+            assert 0
\ No newline at end of file
diff --git a/target/docker/predict_modality/methods/simplemlp_train/resources/utils.py b/target/docker/predict_modality/methods/simplemlp_train/resources/utils.py
new file mode 100644
index 0000000000..d001b8e0f7
--- /dev/null
+++ b/target/docker/predict_modality/methods/simplemlp_train/resources/utils.py
@@ -0,0 +1,37 @@
+import yaml
+from collections import namedtuple
+
+
+def to_site_donor(data):
+    df = data.obs['batch'].copy().to_frame().reset_index()
+    df.columns = ['index','batch']
+    df['site'] = df['batch'].apply(lambda x: x[:2])
+    df['donor'] = df['batch'].apply(lambda x: x[2:]) 
+    return df
+
+
+def split(tr1, tr2, fold):
+    df = to_site_donor(tr1) 
+    mask = df['site'] == f's{fold+1}'
+    maskr = ~mask
+
+    Xt = tr1[mask].layers["normalized"].toarray()
+    X = tr1[maskr].layers["normalized"].toarray()
+
+    yt = tr2[mask].layers["normalized"].toarray()
+    y = tr2[maskr].layers["normalized"].toarray()
+
+    print(f"{X.shape}, {y.shape}, {Xt.shape}, {yt.shape}")
+
+    return X,y,Xt,yt
+
+
+def load_yaml(path):
+    with open(path) as f:
+        x = yaml.safe_load(f)
+    res = {}
+    for i in x:
+        res[i] = x[i]['value']
+    config = namedtuple('Config', res.keys())(**res)
+    print(config)
+    return config
diff --git a/target/docker/predict_modality/methods/simplemlp_train/resources/yaml/mlp_ADT2GEX.yaml b/target/docker/predict_modality/methods/simplemlp_train/resources/yaml/mlp_ADT2GEX.yaml
new file mode 100644
index 0000000000..13db5b490e
--- /dev/null
+++ b/target/docker/predict_modality/methods/simplemlp_train/resources/yaml/mlp_ADT2GEX.yaml
@@ -0,0 +1,28 @@
+# sample config defaults file
+epochs:
+  desc: Number of epochs to train over
+  value: 10
+batch_size:
+  desc: Size of each mini-batch
+  value: 512
+H1:
+  desc: Number of hidden neurons in 1st layer of MLP
+  value: 256
+H2:
+  desc: Number of hidden neurons in 2nd layer of MLP
+  value: 128
+dropout:
+  desc: probs of zeroing values
+  value: 0
+lr:
+  desc: learning rate
+  value: 0.001
+wd:
+  desc: weight decay
+  value: 1e-5
+threshold:
+  desc: threshold to set values to zero
+  value: 0
+lr_schedule:
+  desc: learning rate scheduler
+  value: adam
\ No newline at end of file
diff --git a/target/docker/predict_modality/methods/simplemlp_train/resources/yaml/mlp_ATAC2GEX.yaml b/target/docker/predict_modality/methods/simplemlp_train/resources/yaml/mlp_ATAC2GEX.yaml
new file mode 100644
index 0000000000..ee714a47ea
--- /dev/null
+++ b/target/docker/predict_modality/methods/simplemlp_train/resources/yaml/mlp_ATAC2GEX.yaml
@@ -0,0 +1,28 @@
+# sample config defaults file
+epochs:
+  desc: Number of epochs to train over
+  value: 10
+batch_size:
+  desc: Size of each mini-batch
+  value: 512
+H1:
+  desc: Number of hidden neurons in 1st layer of MLP
+  value: 256
+H2:
+  desc: Number of hidden neurons in 2nd layer of MLP
+  value: 128
+dropout:
+  desc: probs of zeroing values
+  value: 0.5
+lr:
+  desc: learning rate
+  value: 0.001
+wd:
+  desc: weight decay
+  value: 1e-5
+threshold:
+  desc: threshold to set values to zero
+  value: 0
+lr_schedule:
+  desc: learning rate scheduler
+  value: adam
\ No newline at end of file
diff --git a/target/docker/predict_modality/methods/simplemlp_train/resources/yaml/mlp_GEX2ADT.yaml b/target/docker/predict_modality/methods/simplemlp_train/resources/yaml/mlp_GEX2ADT.yaml
new file mode 100644
index 0000000000..80dfededd9
--- /dev/null
+++ b/target/docker/predict_modality/methods/simplemlp_train/resources/yaml/mlp_GEX2ADT.yaml
@@ -0,0 +1,28 @@
+# sample config defaults file
+epochs:
+  desc: Number of epochs to train over
+  value: 10 
+batch_size:
+  desc: Size of each mini-batch
+  value: 512
+H1:
+  desc: Number of hidden neurons in 1st layer of MLP
+  value: 1024 
+H2:
+  desc: Number of hidden neurons in 2nd layer of MLP
+  value: 512
+dropout:
+  desc: probs of zeroing values
+  value: 0
+lr:
+  desc: learning rate
+  value: 0.001
+wd:
+  desc: weight decay
+  value: 1e-5
+threshold:
+  desc: threshold to set values to zero
+  value: 0.05
+lr_schedule:
+  desc: learning rate scheduler
+  value: adam_cosin
\ No newline at end of file
diff --git a/target/docker/predict_modality/methods/simplemlp_train/simplemlp_train b/target/docker/predict_modality/methods/simplemlp_train/simplemlp_train
new file mode 100755
index 0000000000..acce92e031
--- /dev/null
+++ b/target/docker/predict_modality/methods/simplemlp_train/simplemlp_train
@@ -0,0 +1,1124 @@
+#!/usr/bin/env bash
+
+# simplemlp_train 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="simplemlp_train"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "simplemlp_train 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_train_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+  echo ""
+  echo "    --input_train_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+  echo ""
+  echo "    --input_test_mod1"
+  echo "        type: file, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_pytorch_nvidia:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scikit-learn" "scanpy" "pytorch-lightning"
+
+LABEL org.opencontainers.image.description="Companion container for running component predict_modality/methods simplemlp_train"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:40Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-simplemlp_train-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "simplemlp_train 2.0.0"
+            exit
+            ;;
+        --input_train_mod1)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--input_train_mod1=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_train_mod2)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_train_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_train_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--input_train_mod2=*\': \'$VIASH_PAR_INPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test_mod1)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--input_test_mod1=*\': \'$VIASH_PAR_INPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/simplemlp_train:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/simplemlp_train:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/simplemlp_train:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/methods/simplemlp_train:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then
+  ViashError '--input_train_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then
+  ViashError '--input_train_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TRAIN_MOD2' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST_MOD1' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD1")" )
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TRAIN_MOD2")" )
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST_MOD1")" )
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/predict_modality/methods/simplemlp_train:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/methods/simplemlp_train:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/methods/simplemlp_train:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-simplemlp_train-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import os
+import math
+import logging
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+
+import torch
+import pytorch_lightning as pl
+from torch.utils.data import TensorDataset, DataLoader
+from pytorch_lightning.callbacks import ModelCheckpoint
+from pytorch_lightning.loggers import TensorBoardLogger,WandbLogger
+
+logging.basicConfig(level=logging.INFO)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_train_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD1//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+resources_dir = f"{meta['resources_dir']}/resources"
+
+import sys
+sys.path.append(resources_dir)
+from models import MLP
+import utils
+
+def _train(X, y, Xt, yt, logger, config, num_workers):
+    
+    X = torch.from_numpy(X).float()
+    y = torch.from_numpy(y).float()
+    ymean = torch.mean(y, dim=0, keepdim=True)
+    
+    tr_ds = TensorDataset(X,y)
+    tr_loader = DataLoader(
+        tr_ds,
+        batch_size=config.batch_size,
+        num_workers=num_workers,
+        shuffle=True,
+        drop_last=True
+    )
+    
+    Xt = torch.from_numpy(Xt).float()
+    yt = torch.from_numpy(yt).float()
+    te_ds = TensorDataset(Xt,yt)
+    te_loader = DataLoader(
+        te_ds,
+        batch_size=config.batch_size,
+        num_workers=num_workers,
+        shuffle=False,
+        drop_last=False
+    )
+    
+    checkpoint_callback = ModelCheckpoint(
+        monitor='valid_RMSE',
+        dirpath=logger.save_dir,
+        save_top_k=1,
+    )
+    
+    trainer = pl.Trainer(
+        devices="auto",
+        enable_checkpointing=True,
+        logger=logger, 
+        max_epochs=config.epochs, 
+        callbacks=[checkpoint_callback],
+        default_root_dir=logger.save_dir,
+        # progress_bar_refresh_rate=5
+    )
+    
+    net = MLP(X.shape[1], y.shape[1], ymean, config)
+    trainer.fit(net, tr_loader, te_loader)
+    
+    yp = trainer.predict(net, te_loader, ckpt_path='best')
+    yp = torch.cat(yp, dim=0)
+    
+    score = ((yp-yt)**2).mean()**0.5
+    print(f"VALID RMSE {score:.3f}")
+    del trainer
+    return score,yp.detach().numpy()
+
+
+
+input_train_mod1 = ad.read_h5ad(par['input_train_mod1'])
+input_train_mod2 = ad.read_h5ad(par['input_train_mod2'])
+
+mod_1 = input_train_mod1.uns["modality"]
+mod_2 = input_train_mod2.uns["modality"]
+
+task = f'{mod_1}2{mod_2}'
+yaml_path = f'{resources_dir}/yaml/mlp_{task}.yaml'
+
+obs_info = utils.to_site_donor(input_train_mod1)
+# TODO: if we want this method to work for other datasets, resolve dependence on site notation
+sites = obs_info.site.unique()
+
+os.makedirs(par['output'], exist_ok=True)
+
+print('Compute ymean', flush=True)
+ymean = np.asarray(input_train_mod2.layers["normalized"].mean(axis=0))
+path = f"{par['output']}/{task}_ymean.npy"
+np.save(path, ymean)
+
+
+if task == "GEX2ATAC":
+    logging.info(f"No training required for this task ({task}).")
+    sys.exit(0)
+
+if not os.path.exists(yaml_path):
+    logging.error(f"No configuration file found for task '{task}'")
+    sys.exit(1)
+
+yaml_path = f'{resources_dir}/yaml/mlp_{task}.yaml'
+yps = []
+scores = []
+
+msgs = {}
+# TODO: if we want this method to work for other datasets, dont use hardcoded range
+for fold in range(3):
+
+    run_name = f"{task}_fold_{fold}"
+    save_path = f"{par['output']}/{run_name}"
+    num_workers = meta["cpus"] or 0
+
+    Path(save_path).mkdir(parents=True, exist_ok=True)   
+
+    X,y,Xt,yt = utils.split(input_train_mod1, input_train_mod2, fold)
+    
+    logger = TensorBoardLogger(save_path, name='') 
+    
+    config = utils.load_yaml(yaml_path)
+
+    if config.batch_size > X.shape[0]:
+        config = config._replace(batch_size=math.ceil(X.shape[0] / 2))
+
+    score, yp = _train(X, y, Xt, yt, logger, config, num_workers)
+    yps.append(yp)
+    scores.append(score)
+    msg = f"{task} Fold {fold} RMSE {score:.3f}"
+    msgs[f'Fold {fold}'] = f'{score:.3f}'
+    print(msg)
+
+yp = np.concatenate(yps)
+score = np.mean(scores)
+msgs['Overall'] = f'{score:.3f}'
+print('Overall', f'{score:.3f}')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD1" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TRAIN_MOD2" ]; then
+  VIASH_PAR_INPUT_TRAIN_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD1" ]; then
+  VIASH_PAR_INPUT_TEST_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/predict_modality/metrics/correlation/.config.vsh.yaml b/target/docker/predict_modality/metrics/correlation/.config.vsh.yaml
new file mode 100644
index 0000000000..05af21272e
--- /dev/null
+++ b/target/docker/predict_modality/metrics/correlation/.config.vsh.yaml
@@ -0,0 +1,290 @@
+functionality:
+  name: "correlation"
+  namespace: "predict_modality/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_prediction"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod2"
+    info:
+      label: "Test mod2"
+      summary: "The mod2 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "mean_pearson_per_cell"
+      label: "Mean pearson per cell"
+      summary: "The mean of the pearson values of per-cell expression value vectors."
+      description: "The mean of the pearson values of per-cell expression value vectors."
+      min: -1
+      max: 1
+      maximize: true
+      reference: "pearson1895regression"
+    - name: "mean_spearman_per_cell"
+      label: "Mean spearman per cell"
+      summary: "The mean of the spearman values of per-cell expression value vectors."
+      description: "The mean of the spearman values of per-cell expression value vectors."
+      min: -1
+      max: 1
+      maximize: true
+      reference: "kendall1938new"
+    - name: "mean_pearson_per_gene"
+      label: "Mean pearson per gene"
+      summary: "The mean of the pearson values of per-gene expression value vectors."
+      description: "The mean of the pearson values of per-gene expression value vectors."
+      min: -1
+      max: 1
+      maximize: true
+      reference: "pearson1895regression"
+    - name: "mean_spearman_per_gene"
+      label: "Mean spearman per gene"
+      summary: "The mean of the spearman values of per-gene expression value vectors."
+      description: "The mean of the spearman values of per-gene expression value vectors."
+      min: -1
+      max: 1
+      maximize: true
+      reference: "kendall1938new"
+    - name: "overall_pearson"
+      label: "Overall pearson"
+      summary: "The mean of the pearson values of vectorized expression matrices."
+      description: "The mean of the pearson values of vectorized expression matrices."
+      min: -1
+      max: 1
+      maximize: true
+      reference: "pearson1895regression"
+    - name: "overall_spearman"
+      label: "Overall spearman"
+      summary: "The mean of the spearman values of vectorized expression matrices."
+      description: "The mean of the spearman values of vectorized expression matrices."
+      min: -1
+      max: 1
+      maximize: true
+      reference: "kendall1938new"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A predict modality metric."
+      description: "A metric for evaluating predicted expression.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "proxyC"
+    - "testthat"
+    - "dynutils"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/metrics/correlation/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/metrics/correlation"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/metrics/correlation/correlation"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/predict_modality/metrics/correlation/correlation b/target/docker/predict_modality/metrics/correlation/correlation
new file mode 100755
index 0000000000..8f3f45bdb1
--- /dev/null
+++ b/target/docker/predict_modality/metrics/correlation/correlation
@@ -0,0 +1,1041 @@
+#!/usr/bin/env bash
+
+# correlation 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="correlation"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "correlation 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_prediction"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+  echo ""
+  echo "    --input_test_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/score.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_cran(c("proxyC", "testthat", "dynutils"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component predict_modality/metrics correlation"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:32Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-correlation-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "correlation 2.0.0"
+            exit
+            ;;
+        --input_prediction)
+            [ -n "$VIASH_PAR_INPUT_PREDICTION" ] && ViashError Bad arguments for option \'--input_prediction\': \'$VIASH_PAR_INPUT_PREDICTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_PREDICTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_prediction. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_prediction=*)
+            [ -n "$VIASH_PAR_INPUT_PREDICTION" ] && ViashError Bad arguments for option \'--input_prediction=*\': \'$VIASH_PAR_INPUT_PREDICTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_PREDICTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test_mod2)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD2" ] && ViashError Bad arguments for option \'--input_test_mod2\': \'$VIASH_PAR_INPUT_TEST_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD2" ] && ViashError Bad arguments for option \'--input_test_mod2=*\': \'$VIASH_PAR_INPUT_TEST_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/metrics/correlation:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/metrics/correlation:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/metrics/correlation:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/metrics/correlation:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_PREDICTION+x} ]; then
+  ViashError '--input_prediction' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST_MOD2+x} ]; then
+  ViashError '--input_test_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_PREDICTION" ] && [ ! -e "$VIASH_PAR_INPUT_PREDICTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_PREDICTION' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_TEST_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST_MOD2' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_PREDICTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_PREDICTION")" )
+  VIASH_PAR_INPUT_PREDICTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_PREDICTION")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST_MOD2")" )
+  VIASH_PAR_INPUT_TEST_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/predict_modality/metrics/correlation:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/metrics/correlation:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/metrics/correlation:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-correlation-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+cat("Load dependencies\\n")
+library(testthat, quietly = TRUE, warn.conflicts = FALSE)
+library(Matrix, quietly = TRUE, warn.conflicts = FALSE)
+requireNamespace("anndata", quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_prediction" = $( if [ ! -z ${VIASH_PAR_INPUT_PREDICTION+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_PREDICTION" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_test_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TEST_MOD2" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading solution file\\n")
+ad_sol <- anndata::read_h5ad(par\$input_test_mod2)
+
+cat("Reading prediction file\\n")
+ad_pred <- anndata::read_h5ad(par\$input_prediction)
+
+cat("Check prediction format\\n")
+expect_equal(
+  ad_sol\$uns\$dataset_id, ad_pred\$uns\$dataset_id,
+  info = "Prediction and solution have differing dataset_ids"
+)
+
+expect_true(
+  isTRUE(all.equal(dim(ad_sol), dim(ad_pred))),
+  info = "Dataset and prediction anndata objects should have the same shape / dimensions."
+)
+
+cat("Computing correlation metrics\\n")
+# Wrangle data
+tv <- ad_sol\$layers[["normalized"]]
+pv <- ad_pred\$layers[["normalized"]]
+
+# precompute sds
+tv_sd2 <- proxyC::colSds(tv)
+pv_sd2 <- proxyC::colSds(pv)
+tv_sd1 <- proxyC::rowSds(tv)
+pv_sd1 <- proxyC::rowSds(pv)
+
+# Compute metrics
+pearson_vec_1 <- diag(dynutils::calculate_similarity(tv, pv, method = "pearson", margin = 1, diag = TRUE, drop0 = TRUE))
+spearman_vec_1 <- diag(dynutils::calculate_similarity(tv, pv, method = "spearman", margin = 1, diag = TRUE, drop0 = TRUE))
+
+pearson_vec_1[tv_sd1 == 0 | pv_sd1 == 0] <- 0
+spearman_vec_1[tv_sd1 == 0 | pv_sd1 == 0] <- 0
+# pearson_vec_1[!is.finite(pearson_vec_1) | pearson_vec_1 > 10] <- 0
+# spearman_vec_1[!is.finite(spearman_vec_1) | spearman_vec_1 > 10] <- 0
+
+mean_pearson_per_cell <- mean(pearson_vec_1)
+mean_spearman_per_cell <- mean(spearman_vec_1)
+
+pearson_vec_2 <- diag(dynutils::calculate_similarity(tv, pv, method = "pearson", margin = 2, diag = TRUE, drop0 = TRUE))
+spearman_vec_2 <- diag(dynutils::calculate_similarity(tv, pv, method = "spearman", margin = 2, diag = TRUE, drop0 = TRUE))
+
+pearson_vec_2[tv_sd2 == 0 | pv_sd2 == 0] <- 0
+spearman_vec_2[tv_sd2 == 0 | pv_sd2 == 0] <- 0
+# pearson_vec_2[!is.finite(pearson_vec_2) | pearson_vec_2 > 10] <- 0
+# spearman_vec_2[!is.finite(spearman_vec_2) | spearman_vec_2 > 10] <- 0
+
+mean_pearson_per_gene <- mean(pearson_vec_2)
+mean_spearman_per_gene <- mean(spearman_vec_2)
+
+overall_pearson <- cor(as.vector(tv), as.vector(pv), method = "pearson")
+overall_spearman <- cor(as.vector(tv), as.vector(pv), method = "spearman")
+
+metric_ids <- c("mean_pearson_per_cell", "mean_spearman_per_cell", "mean_pearson_per_gene", "mean_spearman_per_gene", "overall_pearson", "overall_spearman")
+metric_values <- c(mean_pearson_per_cell, mean_spearman_per_cell, mean_pearson_per_gene, mean_spearman_per_gene, overall_pearson, overall_spearman)
+
+cat("Create output object\\n")
+out <- anndata::AnnData(
+  obs = data.frame(row.names = rownames(ad_sol), pearson = pearson_vec_1, spearman = spearman_vec_1),
+  var = data.frame(row.names = colnames(ad_sol), pearson = pearson_vec_2, spearman = spearman_vec_2),
+  uns = list(
+    dataset_id = ad_pred\$uns\$dataset_id,
+    method_id = ad_pred\$uns\$method_id,
+    metric_ids = metric_ids,
+    metric_values = metric_values
+  )
+)
+
+cat("Write output to h5ad file\\n")
+zzz <- out\$write_h5ad(par\$output, compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_PREDICTION" ]; then
+  VIASH_PAR_INPUT_PREDICTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_PREDICTION")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD2" ]; then
+  VIASH_PAR_INPUT_TEST_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/predict_modality/metrics/mse/.config.vsh.yaml b/target/docker/predict_modality/metrics/mse/.config.vsh.yaml
new file mode 100644
index 0000000000..93c1aa42ad
--- /dev/null
+++ b/target/docker/predict_modality/metrics/mse/.config.vsh.yaml
@@ -0,0 +1,252 @@
+functionality:
+  name: "mse"
+  namespace: "predict_modality/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_prediction"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod2"
+    info:
+      label: "Test mod2"
+      summary: "The mod2 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "rmse"
+      label: "RMSE"
+      summary: "The root mean squared error."
+      description: "The square root of the mean of the square of all of the error."
+      min: 0
+      max: "+inf"
+      maximize: false
+      reference: "chai2014root"
+    - name: "mae"
+      label: "MAE"
+      summary: "The mean absolute error."
+      description: "The average difference between the expression values and the predicted\
+        \ expression values."
+      min: 0
+      max: "+inf"
+      maximize: false
+      reference: "chai2014root"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A predict modality metric."
+      description: "A metric for evaluating predicted expression.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/metrics/mse/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/metrics/mse"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/metrics/mse/mse"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/predict_modality/metrics/mse/mse b/target/docker/predict_modality/metrics/mse/mse
new file mode 100755
index 0000000000..dae104635a
--- /dev/null
+++ b/target/docker/predict_modality/metrics/mse/mse
@@ -0,0 +1,990 @@
+#!/usr/bin/env bash
+
+# mse 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="mse"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "mse 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_prediction"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+  echo ""
+  echo "    --input_test_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/score.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component predict_modality/metrics mse"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:32Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-mse-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "mse 2.0.0"
+            exit
+            ;;
+        --input_prediction)
+            [ -n "$VIASH_PAR_INPUT_PREDICTION" ] && ViashError Bad arguments for option \'--input_prediction\': \'$VIASH_PAR_INPUT_PREDICTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_PREDICTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_prediction. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_prediction=*)
+            [ -n "$VIASH_PAR_INPUT_PREDICTION" ] && ViashError Bad arguments for option \'--input_prediction=*\': \'$VIASH_PAR_INPUT_PREDICTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_PREDICTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_test_mod2)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD2" ] && ViashError Bad arguments for option \'--input_test_mod2\': \'$VIASH_PAR_INPUT_TEST_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_test_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_test_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_TEST_MOD2" ] && ViashError Bad arguments for option \'--input_test_mod2=*\': \'$VIASH_PAR_INPUT_TEST_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_TEST_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/metrics/mse:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/metrics/mse:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/metrics/mse:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/metrics/mse:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_PREDICTION+x} ]; then
+  ViashError '--input_prediction' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_TEST_MOD2+x} ]; then
+  ViashError '--input_test_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_PREDICTION" ] && [ ! -e "$VIASH_PAR_INPUT_PREDICTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_PREDICTION' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_TEST_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_TEST_MOD2' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_PREDICTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_PREDICTION")" )
+  VIASH_PAR_INPUT_PREDICTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_PREDICTION")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_TEST_MOD2")" )
+  VIASH_PAR_INPUT_TEST_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_TEST_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/predict_modality/metrics/mse:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/metrics/mse:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/metrics/mse:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-mse-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import logging
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_prediction': $( if [ ! -z ${VIASH_PAR_INPUT_PREDICTION+x} ]; then echo "r'${VIASH_PAR_INPUT_PREDICTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_test_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD2//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+logging.info("Reading solution file")
+ad_sol = ad.read_h5ad(par["input_test_mod2"])
+
+logging.info("Reading prediction file")
+ad_pred = ad.read_h5ad(par["input_prediction"])
+
+logging.info("Check prediction format")
+if ad_sol.uns["dataset_id"] != ad_pred.uns["dataset_id"]:
+  raise ValueError("Prediction and solution have differing dataset_ids")
+
+if ad_sol.shape != ad_pred.shape:
+  raise ValueError("Dataset and prediction anndata objects should have the same shape / dimensions.")
+
+logging.info("Computing MSE metrics")
+
+tmp = ad_sol.layers["normalized"] - ad_pred.layers["normalized"]
+rmse = np.sqrt(tmp.power(2).mean())
+mae = np.abs(tmp).mean()
+
+logging.info("Create output object")
+out = ad.AnnData(
+  uns = {
+    "dataset_id" : ad_pred.uns["dataset_id"],
+    "method_id" : ad_pred.uns["method_id"],
+    "metric_ids" : ["rmse", "mae"],
+    "metric_values" : [rmse, mae],
+  }
+)
+
+logging.info("Write output to h5ad file")
+out.write_h5ad(par["output"], compression=9)
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_PREDICTION" ]; then
+  VIASH_PAR_INPUT_PREDICTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_PREDICTION")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_TEST_MOD2" ]; then
+  VIASH_PAR_INPUT_TEST_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_TEST_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/predict_modality/process_dataset/.config.vsh.yaml b/target/docker/predict_modality/process_dataset/.config.vsh.yaml
new file mode 100644
index 0000000000..e516e5d003
--- /dev/null
+++ b/target/docker/predict_modality/process_dataset/.config.vsh.yaml
@@ -0,0 +1,648 @@
+functionality:
+  name: "process_dataset"
+  namespace: "predict_modality"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_mod1"
+    info:
+      label: "Raw dataset RNA"
+      summary: "The RNA modality of the raw dataset."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Raw dataset mod2"
+      summary: "The second modality of the raw dataset. Must be an ADT or an ATAC\
+        \ dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_test_mod2"
+    info:
+      label: "Test mod2"
+      summary: "The mod2 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--seed"
+    description: "The seed for determining the train/test split."
+    info: null
+    default:
+    - 1
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--dataset_id"
+    description: "New dataset ID"
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean"
+    name: "--swap"
+    description: "Swap mod1 and mod2"
+    info: null
+    default:
+    - false
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/common/openproblems_neurips2021/bmmc_cite"
+    dest: "resources_test/common/openproblems_neurips2021/bmmc_cite"
+  info:
+    type: "process_dataset"
+    type_info:
+      label: "Data processor"
+      summary: "A predict modality dataset processor."
+      description: "A component for processing a Common Dataset into a task-specific\
+        \ dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/process_dataset/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/process_dataset"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/predict_modality/process_dataset/process_dataset"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/predict_modality/process_dataset/process_dataset b/target/docker/predict_modality/process_dataset/process_dataset
new file mode 100755
index 0000000000..6952170833
--- /dev/null
+++ b/target/docker/predict_modality/process_dataset/process_dataset
@@ -0,0 +1,1276 @@
+#!/usr/bin/env bash
+
+# process_dataset 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="process_dataset"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "process_dataset 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_mod1"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod1.h5ad"
+  echo ""
+  echo "    --input_mod2"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod2.h5ad"
+  echo ""
+  echo "    --output_train_mod1"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+  echo ""
+  echo "    --output_train_mod2"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+  echo ""
+  echo "    --output_test_mod1"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+  echo ""
+  echo "    --output_test_mod2"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+  echo ""
+  echo "    --seed"
+  echo "        type: integer"
+  echo "        default: 1"
+  echo "        The seed for determining the train/test split."
+  echo ""
+  echo "    --dataset_id"
+  echo "        type: string"
+  echo "        New dataset ID"
+  echo ""
+  echo "    --swap"
+  echo "        type: boolean"
+  echo "        default: false"
+  echo "        Swap mod1 and mod2"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component predict_modality process_dataset"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:38Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-process_dataset-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "process_dataset 2.0.0"
+            exit
+            ;;
+        --input_mod1)
+            [ -n "$VIASH_PAR_INPUT_MOD1" ] && ViashError Bad arguments for option \'--input_mod1\': \'$VIASH_PAR_INPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod1=*)
+            [ -n "$VIASH_PAR_INPUT_MOD1" ] && ViashError Bad arguments for option \'--input_mod1=*\': \'$VIASH_PAR_INPUT_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_mod2)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_mod2=*)
+            [ -n "$VIASH_PAR_INPUT_MOD2" ] && ViashError Bad arguments for option \'--input_mod2=*\': \'$VIASH_PAR_INPUT_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_train_mod1)
+            [ -n "$VIASH_PAR_OUTPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--output_train_mod1\': \'$VIASH_PAR_OUTPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TRAIN_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_train_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_train_mod1=*)
+            [ -n "$VIASH_PAR_OUTPUT_TRAIN_MOD1" ] && ViashError Bad arguments for option \'--output_train_mod1=*\': \'$VIASH_PAR_OUTPUT_TRAIN_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TRAIN_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_train_mod2)
+            [ -n "$VIASH_PAR_OUTPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--output_train_mod2\': \'$VIASH_PAR_OUTPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TRAIN_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_train_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_train_mod2=*)
+            [ -n "$VIASH_PAR_OUTPUT_TRAIN_MOD2" ] && ViashError Bad arguments for option \'--output_train_mod2=*\': \'$VIASH_PAR_OUTPUT_TRAIN_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TRAIN_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_test_mod1)
+            [ -n "$VIASH_PAR_OUTPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--output_test_mod1\': \'$VIASH_PAR_OUTPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TEST_MOD1="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_test_mod1. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_test_mod1=*)
+            [ -n "$VIASH_PAR_OUTPUT_TEST_MOD1" ] && ViashError Bad arguments for option \'--output_test_mod1=*\': \'$VIASH_PAR_OUTPUT_TEST_MOD1\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TEST_MOD1=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_test_mod2)
+            [ -n "$VIASH_PAR_OUTPUT_TEST_MOD2" ] && ViashError Bad arguments for option \'--output_test_mod2\': \'$VIASH_PAR_OUTPUT_TEST_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TEST_MOD2="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_test_mod2. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_test_mod2=*)
+            [ -n "$VIASH_PAR_OUTPUT_TEST_MOD2" ] && ViashError Bad arguments for option \'--output_test_mod2=*\': \'$VIASH_PAR_OUTPUT_TEST_MOD2\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_TEST_MOD2=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --seed)
+            [ -n "$VIASH_PAR_SEED" ] && ViashError Bad arguments for option \'--seed\': \'$VIASH_PAR_SEED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SEED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --seed. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --seed=*)
+            [ -n "$VIASH_PAR_SEED" ] && ViashError Bad arguments for option \'--seed=*\': \'$VIASH_PAR_SEED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SEED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --dataset_id)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --dataset_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --dataset_id=*)
+            [ -n "$VIASH_PAR_DATASET_ID" ] && ViashError Bad arguments for option \'--dataset_id=*\': \'$VIASH_PAR_DATASET_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DATASET_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --swap)
+            [ -n "$VIASH_PAR_SWAP" ] && ViashError Bad arguments for option \'--swap\': \'$VIASH_PAR_SWAP\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SWAP="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --swap. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --swap=*)
+            [ -n "$VIASH_PAR_SWAP" ] && ViashError Bad arguments for option \'--swap=*\': \'$VIASH_PAR_SWAP\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SWAP=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/process_dataset:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/predict_modality/process_dataset:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/process_dataset:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/predict_modality/process_dataset:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_MOD1+x} ]; then
+  ViashError '--input_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_MOD2+x} ]; then
+  ViashError '--input_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_TRAIN_MOD1+x} ]; then
+  ViashError '--output_train_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_TRAIN_MOD2+x} ]; then
+  ViashError '--output_train_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_TEST_MOD1+x} ]; then
+  ViashError '--output_test_mod1' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_TEST_MOD2+x} ]; then
+  ViashError '--output_test_mod2' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_SEED+x} ]; then
+  VIASH_PAR_SEED="1"
+fi
+if [ -z ${VIASH_PAR_SWAP+x} ]; then
+  VIASH_PAR_SWAP="false"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ] && [ ! -e "$VIASH_PAR_INPUT_MOD1" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ] && [ ! -e "$VIASH_PAR_INPUT_MOD2" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_MOD2' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_SEED" ]]; then
+  if ! [[ "$VIASH_PAR_SEED" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--seed' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_SWAP" ]]; then
+  if ! [[ "$VIASH_PAR_SWAP" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--swap' has to be a boolean. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN_MOD1" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_TRAIN_MOD1")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_TRAIN_MOD1")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN_MOD2" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_TRAIN_MOD2")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_TRAIN_MOD2")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TEST_MOD1" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_TEST_MOD1")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_TEST_MOD1")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TEST_MOD2" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_TEST_MOD2")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_TEST_MOD2")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD1")" )
+  VIASH_PAR_INPUT_MOD1=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_MOD2")" )
+  VIASH_PAR_INPUT_MOD2=$(ViashAutodetectMount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_TRAIN_MOD1")" )
+  VIASH_PAR_OUTPUT_TRAIN_MOD1=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_TRAIN_MOD1")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_TRAIN_MOD1" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_TRAIN_MOD2")" )
+  VIASH_PAR_OUTPUT_TRAIN_MOD2=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_TRAIN_MOD2")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_TRAIN_MOD2" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TEST_MOD1" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_TEST_MOD1")" )
+  VIASH_PAR_OUTPUT_TEST_MOD1=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_TEST_MOD1")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_TEST_MOD1" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TEST_MOD2" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_TEST_MOD2")" )
+  VIASH_PAR_OUTPUT_TEST_MOD2=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_TEST_MOD2")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_TEST_MOD2" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/predict_modality/process_dataset:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/process_dataset:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/predict_modality/process_dataset:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-process_dataset-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+cat("Loading dependencies\\n")
+library(anndata, warn.conflicts = FALSE)
+library(Matrix, warn.conflicts = FALSE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_MOD1" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_MOD2" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output_train_mod1" = $( if [ ! -z ${VIASH_PAR_OUTPUT_TRAIN_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT_TRAIN_MOD1" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output_train_mod2" = $( if [ ! -z ${VIASH_PAR_OUTPUT_TRAIN_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT_TRAIN_MOD2" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output_test_mod1" = $( if [ ! -z ${VIASH_PAR_OUTPUT_TEST_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT_TEST_MOD1" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output_test_mod2" = $( if [ ! -z ${VIASH_PAR_OUTPUT_TEST_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT_TEST_MOD2" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "seed" = $( if [ ! -z ${VIASH_PAR_SEED+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_SEED" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "dataset_id" = $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_DATASET_ID" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "swap" = $( if [ ! -z ${VIASH_PAR_SWAP+x} ]; then echo -n "as.logical(toupper('"; echo -n "$VIASH_PAR_SWAP" | sed "s#['\\]#\\\\&#g"; echo "'))"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Using seed ", par\$seed, "\\n", sep = "")
+set.seed(par\$seed)
+
+cat("Reading input data\\n")
+ad1 <- anndata::read_h5ad(if (!par\$swap) par\$input_mod1 else par\$input_mod2)
+ad2 <- anndata::read_h5ad(if (!par\$swap) par\$input_mod2 else par\$input_mod1)
+
+# use heuristic to determine modality
+# TODO: should be removed once modality is stored in the uns
+determine_modality <- function(ad, mod1 = TRUE) {
+  if ("modality" %in% names(ad\$uns)) {
+    ad\$uns[["modality"]]
+  } else if ("feature_types" %in% colnames(ad\$var)) {
+    unique(ad\$var[["feature_types"]])
+  } else if (mod1) {
+    "GEX"
+  } else if (grepl("cite", ad\$uns[["dataset_id"]])) {
+    "ADT"
+  } else if (grepl("multiome", ad\$uns[["dataset_id"]])) {
+    "ATAC"
+  } else {
+    stop("Could not determine modality")
+  }
+}
+ad1_mod <- determine_modality(ad1, !par\$swap)
+ad2_mod <- determine_modality(ad2, par\$swap)
+
+# determine new uns
+uns_vars <- c("dataset_id", "dataset_name", "dataset_url", "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism", "normalization_id")
+ad1_uns <- ad1\$uns[uns_vars]
+ad2_uns <- ad2\$uns[uns_vars]
+ad1_uns\$modality <- ad1_mod
+ad2_uns\$modality <- ad2_mod
+
+# Create new dataset id and name depending on the modality
+if (!is.null(par\$dataset_id)) {
+  ad1_uns[["common_dataset_id"]] <- ad2_uns[["common_dataset_id"]] <- ad1_uns\$dataset_id
+  ad1_uns\$dataset_id <- ad2_uns\$dataset_id <- par\$dataset_id
+}
+
+new_dataset_name <- paste0(ad1_uns\$dataset_name, " (", ad1_mod, "2", ad2_mod, ")")
+ad1_uns\$dataset_name <- ad2_uns\$dataset_name <- new_dataset_name
+
+# determine new obsm
+ad1_obsm <- ad2_obsm <- list()
+
+# determine new varm
+ad1_var <- ad1\$var[, intersect(colnames(ad1\$var), c("gene_ids", "hvg", "hvg_score")), drop = FALSE]
+ad2_var <- ad2\$var[, intersect(colnames(ad2\$var), c("gene_ids", "hvg", "hvg_score")), drop = FALSE]
+
+if (ad1_mod == "ATAC" && "gene_activity" %in% names(ad1\$obsm)) {
+  # copy gene activity in new object
+  ad1_uns\$gene_activity_var_names <- ad1\$uns\$gene_activity_var_names
+  ad1_obsm\$gene_activity <- as(ad1\$obsm\$gene_activity, "CsparseMatrix")
+}
+
+if (ad2_mod == "ATAC") {
+  # subset to make the task computationally feasible
+  if (ncol(ad2) > 10000) {
+    poss_ix <- which(Matrix::colSums(ad2\$layers[["normalized"]]) > 0)
+    sel_ix <- sort(sample(poss_ix, 10000))
+    ad2 <- ad2[, sel_ix]\$copy()
+    ad2_var <- ad2_var[sel_ix, , drop = FALSE]
+  }
+
+  if ("gene_activity" %in% names(ad2\$obsm)) {
+    # copy gene activity in new object
+    ad2_uns\$gene_activity_var_names <- ad2\$uns\$gene_activity_var_names
+    ad2_obsm\$gene_activity <- as(ad2\$obsm\$gene_activity, "CsparseMatrix")
+  }
+}
+
+cat("Creating train/test split\\n")
+is_train <- which(ad1\$obs[["is_train"]] == "train")
+is_test <- which(!ad1\$obs[["is_train"]] == "train")
+
+# sample cells
+if (length(is_test) > 1000) {
+  ct <- as.character(ad1\$obs[["cell_type"]][is_test])
+  ct_tab <- table(ct)
+  ct_freq <- setNames(as.vector(ct_tab) / sum(ct_tab), names(ct_tab))
+  is_test <- sample(is_test, 1000, prob = sqrt(1 / ct_freq[ct]))
+}
+
+train_obs <- ad1\$obs[is_train, intersect(colnames(ad1\$obs), c("batch", "size_factors")), drop = FALSE]
+test_obs <- ad1\$obs[is_test, intersect(colnames(ad1\$obs), c("batch", "size_factors")), drop = FALSE]
+subset_mats <- function(li, obs_filt) {
+  out <- list()
+  for (n in names(li)) {
+    out[[n]] <- li[[n]][obs_filt, , drop = FALSE]
+  }
+  out
+}
+
+cat("Create train objects\\n")
+output_train_mod1 <- anndata::AnnData(
+  layers = subset_mats(list(counts = ad1\$layers[["counts"]], normalized = ad1\$layers[["normalized"]]), is_train),
+  obsm = subset_mats(ad1_obsm, is_train),
+  obs = train_obs,
+  var = ad1_var,
+  uns = ad1_uns
+)
+output_train_mod2 <- anndata::AnnData(
+  layers = subset_mats(list(counts = ad2\$layers[["counts"]], normalized = ad2\$layers[["normalized"]]), is_train),
+  obsm = subset_mats(ad2_obsm, is_train),
+  obs = train_obs,
+  var = ad2_var,
+  uns = ad2_uns
+)
+
+cat("Create test objects\\n")
+output_test_mod1 <- anndata::AnnData(
+  layers = subset_mats(list(counts = ad1\$layers[["counts"]], normalized = ad1\$layers[["normalized"]]), is_test),
+  obsm = subset_mats(ad1_obsm, is_test),
+  obs = test_obs,
+  var = ad1_var,
+  uns = ad1_uns
+)
+output_test_mod2 <- anndata::AnnData(
+  layers = subset_mats(list(counts = ad2\$layers[["counts"]], normalized = ad2\$layers[["normalized"]]), is_test),
+  obsm = subset_mats(ad2_obsm, is_test),
+  obs = test_obs,
+  var = ad2_var,
+  uns = ad2_uns
+)
+
+cat("Saving output files as h5ad\\n")
+zzz <- output_train_mod1\$write_h5ad(par\$output_train_mod1, compression = "gzip")
+zzz <- output_train_mod2\$write_h5ad(par\$output_train_mod2, compression = "gzip")
+zzz <- output_test_mod1\$write_h5ad(par\$output_test_mod1, compression = "gzip")
+zzz <- output_test_mod2\$write_h5ad(par\$output_test_mod2, compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_MOD1" ]; then
+  VIASH_PAR_INPUT_MOD1=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_MOD2" ]; then
+  VIASH_PAR_INPUT_MOD2=$(ViashStripAutomount "$VIASH_PAR_INPUT_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN_MOD1" ]; then
+  VIASH_PAR_OUTPUT_TRAIN_MOD1=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_TRAIN_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN_MOD2" ]; then
+  VIASH_PAR_OUTPUT_TRAIN_MOD2=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_TRAIN_MOD2")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TEST_MOD1" ]; then
+  VIASH_PAR_OUTPUT_TEST_MOD1=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_TEST_MOD1")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TEST_MOD2" ]; then
+  VIASH_PAR_OUTPUT_TEST_MOD2=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_TEST_MOD2")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN_MOD1" ] && [ ! -e "$VIASH_PAR_OUTPUT_TRAIN_MOD1" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_TRAIN_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TRAIN_MOD2" ] && [ ! -e "$VIASH_PAR_OUTPUT_TRAIN_MOD2" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_TRAIN_MOD2' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TEST_MOD1" ] && [ ! -e "$VIASH_PAR_OUTPUT_TEST_MOD1" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_TEST_MOD1' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_TEST_MOD2" ] && [ ! -e "$VIASH_PAR_OUTPUT_TEST_MOD2" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_TEST_MOD2' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatial_decomposition/control_methods/random_proportions/.config.vsh.yaml b/target/docker/spatial_decomposition/control_methods/random_proportions/.config.vsh.yaml
new file mode 100644
index 0000000000..4eda6c1e62
--- /dev/null
+++ b/target/docker/spatial_decomposition/control_methods/random_proportions/.config.vsh.yaml
@@ -0,0 +1,282 @@
+functionality:
+  name: "random_proportions"
+  namespace: "spatial_decomposition/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, with true cell-type proportions for each spot / capture location."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_true"
+          description: "True cell type proportions for each spot"
+          required: true
+        uns:
+        - name: "cell_type_names"
+          type: "string"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - name: "dataset_id"
+          type: "string"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Random Proportions"
+    summary: "Negative control method that randomly assigns celltype proportions from\
+      \ a Dirichlet distribution."
+    description: "A negative control method with random assignment of predicted celltype\
+      \ proportions from a Dirichlet distribution.\n"
+    preferred_normalization: "counts"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "Control methods have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/control_methods/random_proportions/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/control_methods/random_proportions"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/control_methods/random_proportions/random_proportions"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatial_decomposition/control_methods/random_proportions/random_proportions b/target/docker/spatial_decomposition/control_methods/random_proportions/random_proportions
new file mode 100755
index 0000000000..ca3f8e285b
--- /dev/null
+++ b/target/docker/spatial_decomposition/control_methods/random_proportions/random_proportions
@@ -0,0 +1,1019 @@
+#!/usr/bin/env bash
+
+# random_proportions 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="random_proportions"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "random_proportions 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "numpy"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatial_decomposition/control_methods random_proportions"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:30Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-random_proportions-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "random_proportions 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/control_methods/random_proportions:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/control_methods/random_proportions:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/control_methods/random_proportions:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/control_methods/random_proportions:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SINGLE_CELL")" )
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SPATIAL_MASKED")" )
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatial_decomposition/control_methods/random_proportions:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/control_methods/random_proportions:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/control_methods/random_proportions:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-random_proportions-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial_masked = ad.read_h5ad(par['input_spatial_masked'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Generate predictions', flush=True)
+label_distribution = input_single_cell.obs["cell_type"].value_counts()
+input_spatial_masked.obsm["proportions_pred"] = np.random.dirichlet(label_distribution, size=input_spatial_masked.shape[0])
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial_masked.obs[[]],
+  var=input_spatial_masked.var[[]],
+  uns={
+    'cell_type_names': input_spatial_masked.uns['cell_type_names'],
+    'dataset_id': input_spatial_masked.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial_masked.obsm['coordinates'],
+    'proportions_pred': input_spatial_masked.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial_masked.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashStripAutomount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashStripAutomount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatial_decomposition/control_methods/true_proportions/.config.vsh.yaml b/target/docker/spatial_decomposition/control_methods/true_proportions/.config.vsh.yaml
new file mode 100644
index 0000000000..ddf85fde01
--- /dev/null
+++ b/target/docker/spatial_decomposition/control_methods/true_proportions/.config.vsh.yaml
@@ -0,0 +1,276 @@
+functionality:
+  name: "true_proportions"
+  namespace: "spatial_decomposition/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, with true cell-type proportions for each spot / capture location."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_true"
+          description: "True cell type proportions for each spot"
+          required: true
+        uns:
+        - name: "cell_type_names"
+          type: "string"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - name: "dataset_id"
+          type: "string"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "True Proportions"
+    summary: "Positive control method that assigns celltype proportions from the ground\
+      \ truth."
+    description: "A positive control method with perfect assignment of predicted celltype\
+      \ proportions from the ground truth.\n"
+    preferred_normalization: "counts"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "Control methods have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/control_methods/true_proportions/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/control_methods/true_proportions"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/control_methods/true_proportions/true_proportions"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatial_decomposition/control_methods/true_proportions/true_proportions b/target/docker/spatial_decomposition/control_methods/true_proportions/true_proportions
new file mode 100755
index 0000000000..c8eab42cf0
--- /dev/null
+++ b/target/docker/spatial_decomposition/control_methods/true_proportions/true_proportions
@@ -0,0 +1,1015 @@
+#!/usr/bin/env bash
+
+# true_proportions 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="true_proportions"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "true_proportions 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component spatial_decomposition/control_methods true_proportions"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:30Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-true_proportions-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "true_proportions 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/control_methods/true_proportions:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/control_methods/true_proportions:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/control_methods/true_proportions:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/control_methods/true_proportions:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SINGLE_CELL")" )
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SPATIAL_MASKED")" )
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatial_decomposition/control_methods/true_proportions:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/control_methods/true_proportions:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/control_methods/true_proportions:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-true_proportions-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial_masked = ad.read_h5ad(par['input_spatial_masked'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Generate predictions', flush=True)
+input_spatial_masked.obsm["proportions_pred"] = input_solution.obsm["proportions_true"]
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial_masked.obs[[]],
+  var=input_spatial_masked.var[[]],
+  uns={
+    'cell_type_names': input_spatial_masked.uns['cell_type_names'],
+    'dataset_id': input_spatial_masked.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial_masked.obsm['coordinates'],
+    'proportions_pred': input_spatial_masked.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial_masked.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashStripAutomount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashStripAutomount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatial_decomposition/dataset_simulator/.config.vsh.yaml b/target/docker/spatial_decomposition/dataset_simulator/.config.vsh.yaml
new file mode 100644
index 0000000000..d5b48f015d
--- /dev/null
+++ b/target/docker/spatial_decomposition/dataset_simulator/.config.vsh.yaml
@@ -0,0 +1,318 @@
+functionality:
+  name: "dataset_simulator"
+  namespace: "spatial_decomposition"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    description: "Single-cell reference dataset"
+    info:
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs."
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: false
+        - type: "integer"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: false
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+    example:
+    - "resources_test/common/cxg_mouse_pancreas_atlas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--alpha"
+    description: "Alpha value to use for generating synthetic dataset"
+    info: null
+    default:
+    - 1.0
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_obs"
+    description: "Number of spatial observations to generate. Default value is 100."
+    info: null
+    default:
+    - 100
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--cell_lb"
+    description: "Lower bound for number of cells at each spot. Default value is 10."
+    info: null
+    default:
+    - 10
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--cell_ub"
+    description: "Upper bound for number of cells at each spot. Default value is 30."
+    info: null
+    default:
+    - 30
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--umi_lb"
+    description: "Lower bound for number of cells at each spot. Default value is 1000."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--umi_ub"
+    description: "Upper bound for number of UMIs at each spot. Default value is 5000."
+    info: null
+    default:
+    - 5000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--simulated_data"
+    description: "Simulated dataset"
+    info:
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs."
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: false
+        - type: "integer"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: false
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: false
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot."
+          required: true
+        - type: "double"
+          name: "proportions_true"
+          description: "True cell type proportions for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`."
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+    example:
+    - "dataset_simulated.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/common/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/common/cxg_mouse_pancreas_atlas"
+  info:
+    type: "dataset_simulator"
+    type_info:
+      label: "Dataset simulator"
+      summary: "Simulate cell aggregates from single-cell data."
+      description: "The dataset simulator creates cell-aggregates from the single-cell\
+        \ dataset by sampling from a Dirichlet distribution. The simulated data consists\
+        \ of the the spatial expression matrix, the XY coordinates of the spots, the\
+        \ cell-type proportions in each spot, and the reference single-cell data.\n"
+      variants:
+        alpha_1:
+          alpha: 1
+        alpha_5:
+          alpha: 5
+        alpha_0_5:
+          alpha: 0.5
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    - "scanpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+- type: "native"
+  id: "native"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/dataset_simulator/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/dataset_simulator"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/dataset_simulator/dataset_simulator"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatial_decomposition/dataset_simulator/dataset_simulator b/target/docker/spatial_decomposition/dataset_simulator/dataset_simulator
new file mode 100755
index 0000000000..bee01ac91c
--- /dev/null
+++ b/target/docker/spatial_decomposition/dataset_simulator/dataset_simulator
@@ -0,0 +1,1266 @@
+#!/usr/bin/env bash
+
+# dataset_simulator 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="dataset_simulator"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "dataset_simulator 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, file must exist"
+  echo "        example: resources_test/common/cxg_mouse_pancreas_atlas/dataset.h5ad"
+  echo "        Single-cell reference dataset"
+  echo ""
+  echo "    --alpha"
+  echo "        type: double"
+  echo "        default: 1.0"
+  echo "        Alpha value to use for generating synthetic dataset"
+  echo ""
+  echo "    --n_obs"
+  echo "        type: integer"
+  echo "        default: 100"
+  echo "        Number of spatial observations to generate. Default value is 100."
+  echo ""
+  echo "    --cell_lb"
+  echo "        type: integer"
+  echo "        default: 10"
+  echo "        Lower bound for number of cells at each spot. Default value is 10."
+  echo ""
+  echo "    --cell_ub"
+  echo "        type: integer"
+  echo "        default: 30"
+  echo "        Upper bound for number of cells at each spot. Default value is 30."
+  echo ""
+  echo "    --umi_lb"
+  echo "        type: integer"
+  echo "        default: 1000"
+  echo "        Lower bound for number of cells at each spot. Default value is 1000."
+  echo ""
+  echo "    --umi_ub"
+  echo "        type: integer"
+  echo "        default: 5000"
+  echo "        Upper bound for number of UMIs at each spot. Default value is 5000."
+  echo ""
+  echo "    --simulated_data"
+  echo "        type: file, output, file must exist"
+  echo "        example: dataset_simulated.h5ad"
+  echo "        Simulated dataset"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "numpy" "scanpy"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatial_decomposition dataset_simulator"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:33Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-dataset_simulator-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "dataset_simulator 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --alpha)
+            [ -n "$VIASH_PAR_ALPHA" ] && ViashError Bad arguments for option \'--alpha\': \'$VIASH_PAR_ALPHA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_ALPHA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --alpha. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --alpha=*)
+            [ -n "$VIASH_PAR_ALPHA" ] && ViashError Bad arguments for option \'--alpha=*\': \'$VIASH_PAR_ALPHA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_ALPHA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_obs)
+            [ -n "$VIASH_PAR_N_OBS" ] && ViashError Bad arguments for option \'--n_obs\': \'$VIASH_PAR_N_OBS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_OBS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_obs. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_obs=*)
+            [ -n "$VIASH_PAR_N_OBS" ] && ViashError Bad arguments for option \'--n_obs=*\': \'$VIASH_PAR_N_OBS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_OBS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --cell_lb)
+            [ -n "$VIASH_PAR_CELL_LB" ] && ViashError Bad arguments for option \'--cell_lb\': \'$VIASH_PAR_CELL_LB\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CELL_LB="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --cell_lb. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --cell_lb=*)
+            [ -n "$VIASH_PAR_CELL_LB" ] && ViashError Bad arguments for option \'--cell_lb=*\': \'$VIASH_PAR_CELL_LB\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CELL_LB=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --cell_ub)
+            [ -n "$VIASH_PAR_CELL_UB" ] && ViashError Bad arguments for option \'--cell_ub\': \'$VIASH_PAR_CELL_UB\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CELL_UB="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --cell_ub. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --cell_ub=*)
+            [ -n "$VIASH_PAR_CELL_UB" ] && ViashError Bad arguments for option \'--cell_ub=*\': \'$VIASH_PAR_CELL_UB\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CELL_UB=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --umi_lb)
+            [ -n "$VIASH_PAR_UMI_LB" ] && ViashError Bad arguments for option \'--umi_lb\': \'$VIASH_PAR_UMI_LB\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_UMI_LB="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --umi_lb. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --umi_lb=*)
+            [ -n "$VIASH_PAR_UMI_LB" ] && ViashError Bad arguments for option \'--umi_lb=*\': \'$VIASH_PAR_UMI_LB\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_UMI_LB=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --umi_ub)
+            [ -n "$VIASH_PAR_UMI_UB" ] && ViashError Bad arguments for option \'--umi_ub\': \'$VIASH_PAR_UMI_UB\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_UMI_UB="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --umi_ub. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --umi_ub=*)
+            [ -n "$VIASH_PAR_UMI_UB" ] && ViashError Bad arguments for option \'--umi_ub=*\': \'$VIASH_PAR_UMI_UB\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_UMI_UB=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --simulated_data)
+            [ -n "$VIASH_PAR_SIMULATED_DATA" ] && ViashError Bad arguments for option \'--simulated_data\': \'$VIASH_PAR_SIMULATED_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SIMULATED_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --simulated_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --simulated_data=*)
+            [ -n "$VIASH_PAR_SIMULATED_DATA" ] && ViashError Bad arguments for option \'--simulated_data=*\': \'$VIASH_PAR_SIMULATED_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SIMULATED_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/dataset_simulator:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/dataset_simulator:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/dataset_simulator:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/dataset_simulator:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_ALPHA+x} ]; then
+  VIASH_PAR_ALPHA="1.0"
+fi
+if [ -z ${VIASH_PAR_N_OBS+x} ]; then
+  VIASH_PAR_N_OBS="100"
+fi
+if [ -z ${VIASH_PAR_CELL_LB+x} ]; then
+  VIASH_PAR_CELL_LB="10"
+fi
+if [ -z ${VIASH_PAR_CELL_UB+x} ]; then
+  VIASH_PAR_CELL_UB="30"
+fi
+if [ -z ${VIASH_PAR_UMI_LB+x} ]; then
+  VIASH_PAR_UMI_LB="1000"
+fi
+if [ -z ${VIASH_PAR_UMI_UB+x} ]; then
+  VIASH_PAR_UMI_UB="5000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_ALPHA" ]]; then
+  if ! [[ "$VIASH_PAR_ALPHA" =~ ^[-+]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)([eE][-+]?[0-9]+)?$ ]]; then
+    ViashError '--alpha' has to be a double. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_OBS" ]]; then
+  if ! [[ "$VIASH_PAR_N_OBS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_obs' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_CELL_LB" ]]; then
+  if ! [[ "$VIASH_PAR_CELL_LB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--cell_lb' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_CELL_UB" ]]; then
+  if ! [[ "$VIASH_PAR_CELL_UB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--cell_ub' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_UMI_LB" ]]; then
+  if ! [[ "$VIASH_PAR_UMI_LB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--umi_lb' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_UMI_UB" ]]; then
+  if ! [[ "$VIASH_PAR_UMI_UB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--umi_ub' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_SIMULATED_DATA" ] && [ ! -d "$(dirname "$VIASH_PAR_SIMULATED_DATA")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_SIMULATED_DATA")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_SIMULATED_DATA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_SIMULATED_DATA")" )
+  VIASH_PAR_SIMULATED_DATA=$(ViashAutodetectMount "$VIASH_PAR_SIMULATED_DATA")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_SIMULATED_DATA" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatial_decomposition/dataset_simulator:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/dataset_simulator:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/dataset_simulator:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-dataset_simulator-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+from typing import Sequence
+from typing import Union
+
+import anndata as ad
+import numpy as np
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'alpha': $( if [ ! -z ${VIASH_PAR_ALPHA+x} ]; then echo "float(r'${VIASH_PAR_ALPHA//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_obs': $( if [ ! -z ${VIASH_PAR_N_OBS+x} ]; then echo "int(r'${VIASH_PAR_N_OBS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'cell_lb': $( if [ ! -z ${VIASH_PAR_CELL_LB+x} ]; then echo "int(r'${VIASH_PAR_CELL_LB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'cell_ub': $( if [ ! -z ${VIASH_PAR_CELL_UB+x} ]; then echo "int(r'${VIASH_PAR_CELL_UB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'umi_lb': $( if [ ! -z ${VIASH_PAR_UMI_LB+x} ]; then echo "int(r'${VIASH_PAR_UMI_LB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'umi_ub': $( if [ ! -z ${VIASH_PAR_UMI_UB+x} ]; then echo "int(r'${VIASH_PAR_UMI_UB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'simulated_data': $( if [ ! -z ${VIASH_PAR_SIMULATED_DATA+x} ]; then echo "r'${VIASH_PAR_SIMULATED_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+CELLTYPE_MIN_CELLS = 25
+
+# Reading input dataset
+adata = ad.read_h5ad(par['input'])
+
+
+def generate_synthetic_dataset(
+    adata: ad.AnnData,
+    alpha: Union[float, Sequence] = 1.0,
+    n_obs: int = 1000,
+    cell_lb: int = 10,
+    cell_ub: int = 30,
+    umi_lb: int = 1000,
+    umi_ub: int = 5000,
+) -> ad.AnnData:
+    """Create cell-aggregate samples for ground-truth spatial decomposition task.
+
+    Parameters
+    ----------
+    adata: AnnData
+        Anndata object.
+    type_column: str
+        name of column in \`adata.obs\` where cell type labels are given
+    alpha: Union[float,Sequence]
+        alpha value in dirichlet distribution. If single number then all alpha_i values
+        will be set to this value. Default value is 1.
+    n_obs: int
+        number of spatial observations to generate. Default value is 1000.
+    cell_lb: int
+        lower bound for number of cells at each spot. Default value is 10.
+    cell_ub: int
+        upper bound for number of cells at each spot. Default value is 30.
+    umi_lb: int
+        lower bound for number of UMIs at each spot. Default value is 10.
+    umi_ub: int
+        upper bound for number of UMIs at each spot. Default value is 30.
+
+    Returns
+    -------
+    AnnData with:
+        - \`adata_merged.X\`: simulated counts (aggregate of sc dataset).
+        - \`adata_merged.obsm["proportions_true"]\`: true proportion values.
+        - \`adata_merged.obsm["coordinates"]\`: coordinates of each spot.
+        - \`adata_merged.obsm["n_cells"]\`: number of cells from each type at every location.
+
+    """
+    
+    # remove rare celltypes
+    adata = filter_celltypes(adata)
+
+    # set random generator seed
+    rng = np.random.default_rng(42)
+
+    # get single cell expression data
+    counts = adata.layers['counts']
+    # get cell annotations/labels
+    labels = adata.obs['cell_type'].values
+    # get unique labels
+    uni_labs = np.unique(labels)
+    # count number of labels
+    n_labs = len(uni_labs)
+    # get number of genes
+    n_genes = adata.shape[1]
+
+    # create dict with indices of each label
+    label_indices = dict()
+    for label in uni_labs:
+        label_indices[label] = np.where(labels == label)[0]
+
+    # adjust alpha to vector if single scalar
+    if not hasattr(alpha, "__len__"):
+        alpha = np.ones(n_labs) * alpha
+    else:
+        assert len(alpha) == n_labs, "alpha must be same size as number of cell types"
+
+    # generate probability of sampling label at each spot
+    sp_props = rng.dirichlet(alpha, size=n_obs)
+    # number of cells present at each spot
+    n_cells = rng.integers(cell_lb, cell_ub, size=n_obs)
+
+    # initialize spatial expression matrix
+    sp_x = np.zeros((n_obs, n_genes))
+    # initialize spatial proportion matrix
+    sp_p = np.zeros((n_obs, n_labs))
+    # initialize spatial cell number matrix
+    sp_c = np.zeros(sp_p.shape)
+
+    # generate expression vector for each spot (s)
+    for s in range(n_obs):
+        # number of cells from each label at s
+        raw_s = rng.multinomial(n_cells[s], pvals=sp_props[s, :])
+        # store number of cells from each type at s
+        sp_c[s, :] = raw_s
+        # compute proportion of each type at s
+        prop_s = raw_s / n_cells[s]
+        # store proportion of each type at s
+        sp_p[s, :] = prop_s
+
+        # initialize transcript pool at s
+        pool_s = np.zeros(n_genes)
+
+        # add molecules to transcript pool
+        for lab, n in enumerate(raw_s):
+            # get indices of cells from which transcripts should be added
+            idx_sl = rng.choice(label_indices[uni_labs[lab]], size=n)
+            # add molecules to pool
+            pool_s += counts[idx_sl, :].sum(axis=0).A.flatten()
+
+        # number of UMIs at spot s
+        n_umis = rng.integers(umi_lb, umi_ub)
+        # compute probability of sampling UMI from gene
+        prob_pool_s = pool_s / pool_s.sum()
+
+        # sample transcripts from pool
+        sp_x[s, :] = np.random.multinomial(n=n_umis, pvals=prob_pool_s)
+
+    obs_names = ["spatial_{}".format(x) for x in range(n_obs)]
+    adata_spatial = ad.AnnData(
+        sp_x,
+        obs=dict(obs_names=obs_names),
+        var=dict(var_names=adata.var_names),
+    )
+
+    # fake coordinates
+    adata_spatial.obsm["coordinates"] = rng.random((adata_spatial.shape[0], 2))
+    adata_spatial.obsm["proportions_true"] = sp_p
+    adata_spatial.obs["n_cells"] = n_cells
+    adata_spatial.obsm["n_cells"] = sp_c
+    
+    adata_merged = ad.concat(
+        {"sc": adata, "sp": adata_spatial}, 
+        label="modality",
+        join="outer", 
+        index_unique=None, 
+        merge="unique", 
+        uns_merge="unique"
+    )
+    adata_merged.X[adata_merged.X == np.inf] = adata_merged.X.max()  # remove inf
+    adata_merged.layers["counts"] = adata_merged.X
+    adata_merged.uns["cell_type_names"] = uni_labs
+    return adata_merged
+
+
+def filter_celltypes(adata, min_cells=CELLTYPE_MIN_CELLS):
+    """Filter rare celltypes from an AnnData"""
+    celltype_counts = adata.obs["cell_type"].value_counts() >= min_cells
+    keep_cells = np.isin(adata.obs["cell_type"], celltype_counts.index[celltype_counts])
+    return adata[adata.obs.index[keep_cells]].copy()
+
+
+def filter_genes_cells(adata):
+    """Remove empty cells and genes."""
+    if "var_names_all" not in adata.uns:
+        # fill in original var names before filtering
+        adata.uns["var_names_all"] = adata.var.index.to_numpy()
+    sc.pp.filter_genes(adata, min_cells=1)
+    sc.pp.filter_cells(adata, min_counts=2)
+
+
+adata.X = adata.layers["counts"]
+sc.pp.filter_genes(adata, min_counts=10)
+adata_merged = generate_synthetic_dataset(adata, 
+    alpha=par['alpha'], 
+    n_obs=par['n_obs'], 
+    cell_lb=par['cell_lb'], 
+    cell_ub=par['cell_ub'], 
+    umi_lb=par['umi_lb'], 
+    umi_ub=par['umi_ub'] 
+)
+adata_merged.uns["spatial_data_summary"] = f"Dirichlet alpha={par['alpha']}"
+filter_genes_cells(adata_merged)
+adata_merged.X = None
+
+# Convert non-string objects to categoricals to avoid
+# TypeError: Can't implicitly convert non-string objects to strings
+# In this case, the error is raised when there are NA values in .obs columns with dtype object (boolean).
+# The resulting anndata object cannot be written to a file.
+# This conversion is handled in later versions of anndata (0.10)
+for col in adata_merged.obs:
+    if adata_merged.obs[col].dtype == 'object':
+        adata_merged.obs[col] = adata_merged.obs[col].astype('category')
+
+print("Writing output to file")
+adata_merged.write_h5ad(par["simulated_data"])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_SIMULATED_DATA" ]; then
+  VIASH_PAR_SIMULATED_DATA=$(ViashStripAutomount "$VIASH_PAR_SIMULATED_DATA")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_SIMULATED_DATA" ] && [ ! -e "$VIASH_PAR_SIMULATED_DATA" ]; then
+  ViashError "Output file '$VIASH_PAR_SIMULATED_DATA' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatial_decomposition/methods/cell2location/.config.vsh.yaml b/target/docker/spatial_decomposition/methods/cell2location/.config.vsh.yaml
new file mode 100644
index 0000000000..eefba3a543
--- /dev/null
+++ b/target/docker/spatial_decomposition/methods/cell2location/.config.vsh.yaml
@@ -0,0 +1,340 @@
+functionality:
+  name: "cell2location"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--detection_alpha"
+    description: "Hyperparameter controlling normalisation of within-experiment variation\
+      \ in RNA detection."
+    info: null
+    default:
+    - 20.0
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_cells_per_location"
+    description: "The expected average cell abundance. It is a tissue-dependent hyper-prior\
+      \ which can be estimated from  histology images"
+    info: null
+    default:
+    - 20
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean"
+    name: "--hard_coded_reference"
+    description: "Whether to use hard-coded reference or negative binomial regression\
+      \ model to account for batch effects. Hard-coded reference used by default."
+    info: null
+    default:
+    - true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean"
+    name: "--amortised"
+    description: "Whether to use amortised inference."
+    info: null
+    default:
+    - false
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--num_samples"
+    description: "Number of samples to use for summarising posterior distribution."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--sc_batch_size"
+    description: "Batch size used to train regression model for estimation of reference\
+      \ single-cell gene expression signature."
+    info: null
+    default:
+    - 2500
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--st_batch_size"
+    description: "Batch size used to train cell2location model for spatial mapping."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_sc"
+    description: "Maximum number of epochs to train regression model for estimation\
+      \ of reference single-cell gene expression signature."
+    info: null
+    default:
+    - 250
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_st"
+    description: "Maximum number of epochs to train cell2location model for spatial\
+      \ mapping."
+    info: null
+    default:
+    - 30000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Cell2Location"
+    summary: "Cell2location uses a Bayesian model to resolve cell types in spatial\
+      \ transcriptomic data and create comprehensive cellular maps of diverse tissues."
+    description: "Cell2location is a decomposition method based on Negative Binomial\
+      \ regression that is able to account for batch effects in estimating the single-cell\
+      \ gene expression signature used for the spatial decomposition step. \nNote\
+      \ that when batch information is unavailable for this task, we can use either\
+      \ a hard-coded reference, or a negative-binomial learned reference without batch\
+      \ labels. The parameter alpha refers to the detection efficiency prior.\n"
+    preferred_normalization: "counts"
+    variants:
+      cell2location_amortised_detection_alpha_20:
+        detection_alpha: 20
+        amortised: true
+      cell2location_detection_alpha_1:
+        detection_alpha: 1
+      cell2location_detection_alpha_20:
+        detection_alpha: 20
+      cell2location_detection_alpha_20_nb:
+        detection_alpha: 20
+        hard_coded_reference: false
+      cell2location_detection_alpha_200:
+        detection_alpha: 200
+    reference: "kleshchevnikov2022cell2location"
+    documentation_url: "https://cell2location.readthedocs.io/en/latest/"
+    repository_url: "https://github.com/BayraktarLab/cell2location"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scvi-tools==1.0.4"
+    - "cell2location"
+    - "jax==0.4.23"
+    - "jaxlib==0.4.23"
+    - "scipy<1.13"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/cell2location/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/methods/cell2location"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/methods/cell2location/cell2location"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatial_decomposition/methods/cell2location/cell2location b/target/docker/spatial_decomposition/methods/cell2location/cell2location
new file mode 100755
index 0000000000..988bfb1a12
--- /dev/null
+++ b/target/docker/spatial_decomposition/methods/cell2location/cell2location
@@ -0,0 +1,1328 @@
+#!/usr/bin/env bash
+
+# cell2location 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="cell2location"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "cell2location 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+  echo ""
+  echo "    --detection_alpha"
+  echo "        type: double"
+  echo "        default: 20.0"
+  echo "        Hyperparameter controlling normalisation of within-experiment variation"
+  echo "        in RNA detection."
+  echo ""
+  echo "    --n_cells_per_location"
+  echo "        type: integer"
+  echo "        default: 20"
+  echo "        The expected average cell abundance. It is a tissue-dependent"
+  echo "        hyper-prior which can be estimated from  histology images"
+  echo ""
+  echo "    --hard_coded_reference"
+  echo "        type: boolean"
+  echo "        default: true"
+  echo "        Whether to use hard-coded reference or negative binomial regression"
+  echo "        model to account for batch effects. Hard-coded reference used by"
+  echo "        default."
+  echo ""
+  echo "    --amortised"
+  echo "        type: boolean"
+  echo "        default: false"
+  echo "        Whether to use amortised inference."
+  echo ""
+  echo "    --num_samples"
+  echo "        type: integer"
+  echo "        default: 1000"
+  echo "        Number of samples to use for summarising posterior distribution."
+  echo ""
+  echo "    --sc_batch_size"
+  echo "        type: integer"
+  echo "        default: 2500"
+  echo "        Batch size used to train regression model for estimation of reference"
+  echo "        single-cell gene expression signature."
+  echo ""
+  echo "    --st_batch_size"
+  echo "        type: integer"
+  echo "        Batch size used to train cell2location model for spatial mapping."
+  echo ""
+  echo "    --max_epochs_sc"
+  echo "        type: integer"
+  echo "        default: 250"
+  echo "        Maximum number of epochs to train regression model for estimation of"
+  echo "        reference single-cell gene expression signature."
+  echo ""
+  echo "    --max_epochs_st"
+  echo "        type: integer"
+  echo "        default: 30000"
+  echo "        Maximum number of epochs to train cell2location model for spatial"
+  echo "        mapping."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scvi-tools==1.0.4" "cell2location" "jax==0.4.23" "jaxlib==0.4.23" "scipy<1.13"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatial_decomposition/methods cell2location"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:34Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-cell2location-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "cell2location 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --detection_alpha)
+            [ -n "$VIASH_PAR_DETECTION_ALPHA" ] && ViashError Bad arguments for option \'--detection_alpha\': \'$VIASH_PAR_DETECTION_ALPHA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DETECTION_ALPHA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --detection_alpha. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --detection_alpha=*)
+            [ -n "$VIASH_PAR_DETECTION_ALPHA" ] && ViashError Bad arguments for option \'--detection_alpha=*\': \'$VIASH_PAR_DETECTION_ALPHA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DETECTION_ALPHA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_cells_per_location)
+            [ -n "$VIASH_PAR_N_CELLS_PER_LOCATION" ] && ViashError Bad arguments for option \'--n_cells_per_location\': \'$VIASH_PAR_N_CELLS_PER_LOCATION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_CELLS_PER_LOCATION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_cells_per_location. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_cells_per_location=*)
+            [ -n "$VIASH_PAR_N_CELLS_PER_LOCATION" ] && ViashError Bad arguments for option \'--n_cells_per_location=*\': \'$VIASH_PAR_N_CELLS_PER_LOCATION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_CELLS_PER_LOCATION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --hard_coded_reference)
+            [ -n "$VIASH_PAR_HARD_CODED_REFERENCE" ] && ViashError Bad arguments for option \'--hard_coded_reference\': \'$VIASH_PAR_HARD_CODED_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_HARD_CODED_REFERENCE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --hard_coded_reference. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --hard_coded_reference=*)
+            [ -n "$VIASH_PAR_HARD_CODED_REFERENCE" ] && ViashError Bad arguments for option \'--hard_coded_reference=*\': \'$VIASH_PAR_HARD_CODED_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_HARD_CODED_REFERENCE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --amortised)
+            [ -n "$VIASH_PAR_AMORTISED" ] && ViashError Bad arguments for option \'--amortised\': \'$VIASH_PAR_AMORTISED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_AMORTISED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --amortised. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --amortised=*)
+            [ -n "$VIASH_PAR_AMORTISED" ] && ViashError Bad arguments for option \'--amortised=*\': \'$VIASH_PAR_AMORTISED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_AMORTISED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --num_samples)
+            [ -n "$VIASH_PAR_NUM_SAMPLES" ] && ViashError Bad arguments for option \'--num_samples\': \'$VIASH_PAR_NUM_SAMPLES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_SAMPLES="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --num_samples. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --num_samples=*)
+            [ -n "$VIASH_PAR_NUM_SAMPLES" ] && ViashError Bad arguments for option \'--num_samples=*\': \'$VIASH_PAR_NUM_SAMPLES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_SAMPLES=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --sc_batch_size)
+            [ -n "$VIASH_PAR_SC_BATCH_SIZE" ] && ViashError Bad arguments for option \'--sc_batch_size\': \'$VIASH_PAR_SC_BATCH_SIZE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SC_BATCH_SIZE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --sc_batch_size. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --sc_batch_size=*)
+            [ -n "$VIASH_PAR_SC_BATCH_SIZE" ] && ViashError Bad arguments for option \'--sc_batch_size=*\': \'$VIASH_PAR_SC_BATCH_SIZE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SC_BATCH_SIZE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --st_batch_size)
+            [ -n "$VIASH_PAR_ST_BATCH_SIZE" ] && ViashError Bad arguments for option \'--st_batch_size\': \'$VIASH_PAR_ST_BATCH_SIZE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_ST_BATCH_SIZE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --st_batch_size. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --st_batch_size=*)
+            [ -n "$VIASH_PAR_ST_BATCH_SIZE" ] && ViashError Bad arguments for option \'--st_batch_size=*\': \'$VIASH_PAR_ST_BATCH_SIZE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_ST_BATCH_SIZE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_epochs_sc)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SC" ] && ViashError Bad arguments for option \'--max_epochs_sc\': \'$VIASH_PAR_MAX_EPOCHS_SC\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SC="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_epochs_sc. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_epochs_sc=*)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SC" ] && ViashError Bad arguments for option \'--max_epochs_sc=*\': \'$VIASH_PAR_MAX_EPOCHS_SC\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SC=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_epochs_st)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_ST" ] && ViashError Bad arguments for option \'--max_epochs_st\': \'$VIASH_PAR_MAX_EPOCHS_ST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_ST="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_epochs_st. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_epochs_st=*)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_ST" ] && ViashError Bad arguments for option \'--max_epochs_st=*\': \'$VIASH_PAR_MAX_EPOCHS_ST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_ST=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/cell2location:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/cell2location:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/cell2location:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/cell2location:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_DETECTION_ALPHA+x} ]; then
+  VIASH_PAR_DETECTION_ALPHA="20.0"
+fi
+if [ -z ${VIASH_PAR_N_CELLS_PER_LOCATION+x} ]; then
+  VIASH_PAR_N_CELLS_PER_LOCATION="20"
+fi
+if [ -z ${VIASH_PAR_HARD_CODED_REFERENCE+x} ]; then
+  VIASH_PAR_HARD_CODED_REFERENCE="true"
+fi
+if [ -z ${VIASH_PAR_AMORTISED+x} ]; then
+  VIASH_PAR_AMORTISED="false"
+fi
+if [ -z ${VIASH_PAR_NUM_SAMPLES+x} ]; then
+  VIASH_PAR_NUM_SAMPLES="1000"
+fi
+if [ -z ${VIASH_PAR_SC_BATCH_SIZE+x} ]; then
+  VIASH_PAR_SC_BATCH_SIZE="2500"
+fi
+if [ -z ${VIASH_PAR_MAX_EPOCHS_SC+x} ]; then
+  VIASH_PAR_MAX_EPOCHS_SC="250"
+fi
+if [ -z ${VIASH_PAR_MAX_EPOCHS_ST+x} ]; then
+  VIASH_PAR_MAX_EPOCHS_ST="30000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_DETECTION_ALPHA" ]]; then
+  if ! [[ "$VIASH_PAR_DETECTION_ALPHA" =~ ^[-+]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)([eE][-+]?[0-9]+)?$ ]]; then
+    ViashError '--detection_alpha' has to be a double. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_CELLS_PER_LOCATION" ]]; then
+  if ! [[ "$VIASH_PAR_N_CELLS_PER_LOCATION" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_cells_per_location' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_HARD_CODED_REFERENCE" ]]; then
+  if ! [[ "$VIASH_PAR_HARD_CODED_REFERENCE" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--hard_coded_reference' has to be a boolean. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_AMORTISED" ]]; then
+  if ! [[ "$VIASH_PAR_AMORTISED" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--amortised' has to be a boolean. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_NUM_SAMPLES" ]]; then
+  if ! [[ "$VIASH_PAR_NUM_SAMPLES" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--num_samples' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_SC_BATCH_SIZE" ]]; then
+  if ! [[ "$VIASH_PAR_SC_BATCH_SIZE" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--sc_batch_size' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_ST_BATCH_SIZE" ]]; then
+  if ! [[ "$VIASH_PAR_ST_BATCH_SIZE" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--st_batch_size' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_MAX_EPOCHS_SC" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_EPOCHS_SC" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_epochs_sc' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_MAX_EPOCHS_ST" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_EPOCHS_ST" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_epochs_st' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SINGLE_CELL")" )
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SPATIAL_MASKED")" )
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/cell2location:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/cell2location:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/cell2location:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-cell2location-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+from cell2location.cluster_averages.cluster_averages import compute_cluster_averages
+from cell2location.models import Cell2location
+from cell2location.models import RegressionModel
+from torch.nn import ELU
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'detection_alpha': $( if [ ! -z ${VIASH_PAR_DETECTION_ALPHA+x} ]; then echo "float(r'${VIASH_PAR_DETECTION_ALPHA//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_cells_per_location': $( if [ ! -z ${VIASH_PAR_N_CELLS_PER_LOCATION+x} ]; then echo "int(r'${VIASH_PAR_N_CELLS_PER_LOCATION//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'hard_coded_reference': $( if [ ! -z ${VIASH_PAR_HARD_CODED_REFERENCE+x} ]; then echo "r'${VIASH_PAR_HARD_CODED_REFERENCE//\'/\'\"\'\"r\'}'.lower() == 'true'"; else echo None; fi ),
+  'amortised': $( if [ ! -z ${VIASH_PAR_AMORTISED+x} ]; then echo "r'${VIASH_PAR_AMORTISED//\'/\'\"\'\"r\'}'.lower() == 'true'"; else echo None; fi ),
+  'num_samples': $( if [ ! -z ${VIASH_PAR_NUM_SAMPLES+x} ]; then echo "int(r'${VIASH_PAR_NUM_SAMPLES//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'sc_batch_size': $( if [ ! -z ${VIASH_PAR_SC_BATCH_SIZE+x} ]; then echo "int(r'${VIASH_PAR_SC_BATCH_SIZE//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'st_batch_size': $( if [ ! -z ${VIASH_PAR_ST_BATCH_SIZE+x} ]; then echo "int(r'${VIASH_PAR_ST_BATCH_SIZE//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'max_epochs_sc': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_SC+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_SC//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'max_epochs_st': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_ST+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_ST//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+input_single_cell.X = input_single_cell.layers["counts"]
+input_spatial.X = input_spatial.layers["counts"]
+
+if not par["hard_coded_reference"]:
+  if "batch" in input_single_cell.obs.columns:
+      input_single_cell.obs["batch_key"] = input_single_cell.obs["batch"].copy()
+  else:
+    input_single_cell.obs["batch_key"] = "all"
+  # REFERENCE SIGNATURE ESTIMATION FROM scRNA
+  # prepare anndata for the regression model
+  RegressionModel.setup_anndata(
+    adata=input_single_cell,
+    # 10X reaction / sample / batch
+    batch_key="batch_key",
+    # cell type, covariate used for constructing signatures
+    labels_key="cell_type",
+  )
+  sc_model = RegressionModel(input_single_cell)
+  sc_model.train(max_epochs=par["max_epochs_sc"], batch_size=par["sc_batch_size"])
+  # In this section, we export the estimated cell abundance
+  # (summary of the posterior distribution).
+  input_single_cell = sc_model.export_posterior(
+    input_single_cell,
+    sample_kwargs={"num_samples": par["num_samples"], "batch_size": par["sc_batch_size"]},
+  )
+  # export estimated expression in each cluster
+  try:
+    means_per_cluster = input_single_cell.varm["means_per_cluster_mu_fg"]
+  except KeyError:
+    # sometimes varm fails for unknown reason
+    means_per_cluster = input_single_cell.var
+  means_per_cluster = means_per_cluster[
+    [
+      f"means_per_cluster_mu_fg_{i}"
+      for i in input_single_cell.uns["mod"]["factor_names"]
+    ]
+  ].copy()
+  means_per_cluster.columns = input_single_cell.uns["mod"]["factor_names"]
+else:
+  means_per_cluster = compute_cluster_averages(
+    input_single_cell,
+    labels="cell_type",
+    layer=None,
+    use_raw=False,
+  )
+
+# SPATIAL MAPPING
+# find shared genes and subset both anndata and reference signatures
+intersect = np.intersect1d(input_spatial.var_names, means_per_cluster.index)
+input_spatial = input_spatial[:, intersect].copy()
+means_per_cluster = means_per_cluster.loc[intersect, :].copy()
+
+# prepare anndata for cell2location model
+input_spatial.obs["sample"] = "all"
+Cell2location.setup_anndata(adata=input_spatial, batch_key="sample")
+cell2location_kwargs = dict(
+    cell_state_df=means_per_cluster,
+    # the expected average cell abundance: tissue-dependent hyper-prior which can be estimated from paired histology:
+    # here = average in the simulated dataset
+    N_cells_per_location=par["n_cells_per_location"],
+    # hyperparameter controlling normalisation of within-experiment variation in RNA detection:
+    detection_alpha=par["detection_alpha"],
+)
+if par["amortised"]:
+    cell2location_kwargs["amortised"] = True
+    cell2location_kwargs["encoder_mode"] = "multiple"
+    cell2location_kwargs["encoder_kwargs"] = {
+        "dropout_rate": 0.1,
+        "n_hidden": {
+            "single": 256,
+            "n_s_cells_per_location": 10,
+            "b_s_groups_per_location": 10,
+            "z_sr_groups_factors": 64,
+            "w_sf": 256,
+            "detection_y_s": 20,
+        },
+        "use_batch_norm": False,
+        "use_layer_norm": True,
+        "n_layers": 1,
+        "activation_fn": ELU,
+    }
+# create and train the model
+st_model = Cell2location(input_spatial, **cell2location_kwargs)
+st_model.train(
+    max_epochs=par["max_epochs_st"],
+    # train using full data (batch_size=None)
+    batch_size=par["st_batch_size"],
+    # use all data points in training because we need to estimate cell abundance at all locations
+    train_size=1,
+)
+# In this section, we export the estimated cell abundance (summary of the posterior distribution).
+input_spatial = st_model.export_posterior(
+    input_spatial,
+    sample_kwargs={
+        "num_samples": par["num_samples"],
+        "batch_size": par["st_batch_size"],
+    },
+)
+
+input_spatial.obsm["proportions_pred"] = input_spatial.obsm["q05_cell_abundance_w_sf"].values
+input_spatial.obsm["proportions_pred"] /= input_spatial.obsm["proportions_pred"].sum(axis=1)[:, None]
+
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  },
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashStripAutomount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashStripAutomount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatial_decomposition/methods/destvi/.config.vsh.yaml b/target/docker/spatial_decomposition/methods/destvi/.config.vsh.yaml
new file mode 100644
index 0000000000..861954fa85
--- /dev/null
+++ b/target/docker/spatial_decomposition/methods/destvi/.config.vsh.yaml
@@ -0,0 +1,246 @@
+functionality:
+  name: "destvi"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_sc"
+    description: "Number of epochs to train the Conditional version of single-cell\
+      \ Variational Inference (CondSCVI) model using MAP inference."
+    info: null
+    default:
+    - 500
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_sp"
+    description: "Number of epochs to train the DestVI model using MAP inference."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "DestVI"
+    summary: "DestVI is a probabilistic method for multi-resolution analysis for spatial\
+      \ transcriptomics that explicitly models continuous variation within cell types"
+    description: "Deconvolution of Spatial Transcriptomics profiles using Variational\
+      \ Inference (DestVI) is a spatial decomposition method that leverages a conditional\
+      \ generative model of spatial transcriptomics down to the sub-cell-type variation\
+      \ level, which is then used to decompose the cell-type proportions determining\
+      \ the spatial organization of a tissue.\n"
+    preferred_normalization: "counts"
+    reference: "lopez2022destvi"
+    documentation_url: "https://docs.scvi-tools.org/en/stable/user_guide/models/destvi.html"
+    repository_url: "https://github.com/scverse/scvi-tools"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_pytorch_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scvi-tools>=1.1.0"
+    upgrade: true
+  - type: "docker"
+    run:
+    - "pip install -U \"jax[cuda12_pip]\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "midmem"
+    - "midcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/destvi/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/methods/destvi"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/methods/destvi/destvi"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatial_decomposition/methods/destvi/destvi b/target/docker/spatial_decomposition/methods/destvi/destvi
new file mode 100755
index 0000000000..9d1eb7293d
--- /dev/null
+++ b/target/docker/spatial_decomposition/methods/destvi/destvi
@@ -0,0 +1,1063 @@
+#!/usr/bin/env bash
+
+# destvi 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="destvi"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "destvi 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+  echo ""
+  echo "    --max_epochs_sc"
+  echo "        type: integer"
+  echo "        default: 500"
+  echo "        Number of epochs to train the Conditional version of single-cell"
+  echo "        Variational Inference (CondSCVI) model using MAP inference."
+  echo ""
+  echo "    --max_epochs_sp"
+  echo "        type: integer"
+  echo "        default: 2000"
+  echo "        Number of epochs to train the DestVI model using MAP inference."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_pytorch_nvidia:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scvi-tools>=1.1.0"
+
+RUN pip install -U "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
+
+LABEL org.opencontainers.image.description="Companion container for running component spatial_decomposition/methods destvi"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:36Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-destvi-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "destvi 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_epochs_sc)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SC" ] && ViashError Bad arguments for option \'--max_epochs_sc\': \'$VIASH_PAR_MAX_EPOCHS_SC\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SC="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_epochs_sc. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_epochs_sc=*)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SC" ] && ViashError Bad arguments for option \'--max_epochs_sc=*\': \'$VIASH_PAR_MAX_EPOCHS_SC\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SC=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_epochs_sp)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SP" ] && ViashError Bad arguments for option \'--max_epochs_sp\': \'$VIASH_PAR_MAX_EPOCHS_SP\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SP="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_epochs_sp. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_epochs_sp=*)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SP" ] && ViashError Bad arguments for option \'--max_epochs_sp=*\': \'$VIASH_PAR_MAX_EPOCHS_SP\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SP=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/destvi:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/destvi:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/destvi:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/destvi:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_MAX_EPOCHS_SC+x} ]; then
+  VIASH_PAR_MAX_EPOCHS_SC="500"
+fi
+if [ -z ${VIASH_PAR_MAX_EPOCHS_SP+x} ]; then
+  VIASH_PAR_MAX_EPOCHS_SP="2000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_MAX_EPOCHS_SC" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_EPOCHS_SC" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_epochs_sc' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_MAX_EPOCHS_SP" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_EPOCHS_SP" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_epochs_sp' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SINGLE_CELL")" )
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SPATIAL_MASKED")" )
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/destvi:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/destvi:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/destvi:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-destvi-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+from scvi.model import CondSCVI
+from scvi.model import DestVI
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'max_epochs_sc': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_SC+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_SC//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'max_epochs_sp': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_SP+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_SP//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+input_single_cell.X = input_single_cell.layers["counts"]
+input_spatial.X = input_spatial.layers["counts"]
+
+CondSCVI.setup_anndata(input_single_cell, labels_key="cell_type")
+sc_model = CondSCVI(input_single_cell, weight_obs=False)
+sc_model.train(
+  max_epochs=par['max_epochs_sc'],
+  early_stopping=True,
+  train_size=0.9,
+  validation_size=0.1,
+  early_stopping_monitor="elbo_validation",
+)
+
+DestVI.setup_anndata(input_spatial)
+st_model = DestVI.from_rna_model(input_spatial, sc_model)
+st_model.train(
+  max_epochs=par['max_epochs_sp'],
+  batch_size=min(int(input_spatial.n_obs / 20 + 3), 128),
+  plan_kwargs={"min_kl_weight": 3.0, "max_kl_weight": 3},
+)
+input_spatial.obsm["proportions_pred"] = st_model.get_proportions().to_numpy()
+
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashStripAutomount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashStripAutomount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatial_decomposition/methods/nmfreg/.config.vsh.yaml b/target/docker/spatial_decomposition/methods/nmfreg/.config.vsh.yaml
new file mode 100644
index 0000000000..a03191cbdb
--- /dev/null
+++ b/target/docker/spatial_decomposition/methods/nmfreg/.config.vsh.yaml
@@ -0,0 +1,231 @@
+functionality:
+  name: "nmfreg"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_components"
+    description: "Number of components to use for non-negative matrix factorization."
+    info: null
+    default:
+    - 30
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "NMFreg"
+    summary: "NMFreg reconstructs gene expression as a weighted combination of cell\
+      \ type signatures defined by scRNA-seq."
+    description: "Non-Negative Matrix Factorization regression (NMFreg) is a decomposition\
+      \ method that reconstructs expression of each spatial location as a weighted\
+      \ combination of cell-type signatures defined by scRNA-seq. It was originally\
+      \ developed for Slide-seq data. This is a re-implementation from https://github.com/tudaga/NMFreg_tutorial.\n"
+    preferred_normalization: "counts"
+    reference: "rodriques2019slide"
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html"
+    repository_url: "https://github.com/tudaga/NMFreg_tutorial/tree/master?tab=readme-ov-file"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    - "scipy"
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/nmfreg/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/methods/nmfreg"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/methods/nmfreg/nmfreg"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatial_decomposition/methods/nmfreg/nmfreg b/target/docker/spatial_decomposition/methods/nmfreg/nmfreg
new file mode 100755
index 0000000000..ae66a16bd4
--- /dev/null
+++ b/target/docker/spatial_decomposition/methods/nmfreg/nmfreg
@@ -0,0 +1,1064 @@
+#!/usr/bin/env bash
+
+# nmfreg 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="nmfreg"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "nmfreg 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+  echo ""
+  echo "    --n_components"
+  echo "        type: integer"
+  echo "        default: 30"
+  echo "        Number of components to use for non-negative matrix factorization."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "numpy" "scipy" "scikit-learn"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatial_decomposition/methods nmfreg"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:35Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-nmfreg-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "nmfreg 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_components)
+            [ -n "$VIASH_PAR_N_COMPONENTS" ] && ViashError Bad arguments for option \'--n_components\': \'$VIASH_PAR_N_COMPONENTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_COMPONENTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_components. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_components=*)
+            [ -n "$VIASH_PAR_N_COMPONENTS" ] && ViashError Bad arguments for option \'--n_components=*\': \'$VIASH_PAR_N_COMPONENTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_COMPONENTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/nmfreg:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/nmfreg:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/nmfreg:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/nmfreg:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_COMPONENTS+x} ]; then
+  VIASH_PAR_N_COMPONENTS="30"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_COMPONENTS" ]]; then
+  if ! [[ "$VIASH_PAR_N_COMPONENTS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_components' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SINGLE_CELL")" )
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SPATIAL_MASKED")" )
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/nmfreg:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/nmfreg:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/nmfreg:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-nmfreg-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+from scipy.optimize import nnls
+from scipy.sparse import issparse
+from sklearn.decomposition import NMF
+from sklearn.preprocessing import StandardScaler
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_components': $( if [ ! -z ${VIASH_PAR_N_COMPONENTS+x} ]; then echo "int(r'${VIASH_PAR_N_COMPONENTS//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+n_types = input_single_cell.obs["cell_type"].cat.categories.shape[0]
+
+# Learn from reference
+X = input_single_cell.layers['counts']
+X_norm = X / X.sum(1)
+X_scaled = StandardScaler(with_mean=False).fit_transform(X_norm)
+model = NMF(
+  n_components=par['n_components'],
+  init="random",
+  random_state=42
+)
+Ha = model.fit_transform(X_scaled)
+Wa = model.components_
+
+cluster_df = input_single_cell.obs[["cell_type"]].copy()
+cluster_df.loc[:, "factor"] = np.argmax(Ha, axis=1)
+cluster_df.loc[:, "code"] = cluster_df.cell_type.values.codes
+factor_to_cluster_map = np.array(
+  [
+    np.histogram(
+      cluster_df.loc[cluster_df.factor == k, "code"],
+      bins=n_types,
+      range=(0, n_types),
+    )[0]
+    for k in range(par['n_components'])
+  ]
+).T
+
+factor_to_best_celltype = np.argmax(factor_to_cluster_map, axis=0)
+
+factor_to_best_celltype_matrix = np.zeros((par['n_components'], n_types))
+for i, j in enumerate(factor_to_best_celltype):
+  factor_to_best_celltype_matrix[i, j] = 1
+
+Ha_norm = StandardScaler(with_mean=False).fit_transform(Ha)
+sc_deconv = np.dot(Ha_norm**2, factor_to_best_celltype_matrix)
+sc_deconv = sc_deconv / sc_deconv.sum(1)[:, np.newaxis]
+
+# Start run on actual spatial data
+X_sp = input_spatial.layers['counts']
+X_sp_norm = X_sp / X_sp.sum(1)
+X_sp_scaled = StandardScaler(with_mean=False).fit_transform(X_sp_norm)
+
+bead_prop_soln = np.array([nnls(Wa.T, X_sp_scaled[b, : ].toarray().reshape(-1))[0] for b in range(X_sp_scaled.shape[0])])
+bead_prop_soln = StandardScaler(with_mean=False).fit_transform(bead_prop_soln)
+bead_prop = np.dot(bead_prop_soln, factor_to_best_celltype_matrix)
+
+prop = bead_prop / bead_prop.sum(1)[:, np.newaxis]
+input_spatial.obsm["proportions_pred"] = prop
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashStripAutomount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashStripAutomount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatial_decomposition/methods/nnls/.config.vsh.yaml b/target/docker/spatial_decomposition/methods/nnls/.config.vsh.yaml
new file mode 100644
index 0000000000..a46a1b6e6a
--- /dev/null
+++ b/target/docker/spatial_decomposition/methods/nnls/.config.vsh.yaml
@@ -0,0 +1,217 @@
+functionality:
+  name: "nnls"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "NNLS"
+    summary: "NNLS is a decomposition method based on Non-Negative Least Square Regression."
+    description: "NonNegative Least Squares (NNLS), is a convex optimization problem\
+      \ with convex constraints. It was used by the AutoGeneS method to infer cellular\
+      \ proporrtions by solvong a multi-objective optimization problem.\n"
+    preferred_normalization: "counts"
+    reference: "aliee2021autogenes"
+    documentation_url: "https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.nnls.html"
+    repository_url: "https://github.com/scipy/scipy"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    - "scipy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/nnls/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/methods/nnls"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/methods/nnls/nnls"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatial_decomposition/methods/nnls/nnls b/target/docker/spatial_decomposition/methods/nnls/nnls
new file mode 100755
index 0000000000..17d51cb023
--- /dev/null
+++ b/target/docker/spatial_decomposition/methods/nnls/nnls
@@ -0,0 +1,1009 @@
+#!/usr/bin/env bash
+
+# nnls 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="nnls"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "nnls 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "numpy" "scipy"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatial_decomposition/methods nnls"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:34Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-nnls-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "nnls 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/nnls:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/nnls:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/nnls:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/nnls:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SINGLE_CELL")" )
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SPATIAL_MASKED")" )
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/nnls:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/nnls:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/nnls:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-nnls-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+from scipy.optimize import nnls
+from scipy.sparse import issparse
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+# Compute means over each 'cell_type'
+labels = input_single_cell.obs['cell_type'].cat.categories
+n_var = input_single_cell.shape[1]
+means = np.empty((labels.shape[0], n_var))
+for i, lab in enumerate(labels):
+  adata_lab = input_single_cell[input_single_cell.obs['cell_type'] == lab]
+  x_lab = adata_lab.layers['counts']
+  means[i, :] = x_lab.mean(axis=0).flatten()
+adata_means = ad.AnnData(means)
+adata_means.obs_names = labels
+adata_means.var_names = input_single_cell.var_names
+
+X = adata_means.X.T
+y = input_spatial.layers['counts'].T
+res = np.zeros((y.shape[1], X.shape[1]))  # (voxels, cells)
+for i in range(y.shape[1]):
+  x, _ = nnls(X, y[:, i].toarray().reshape(-1))
+  res[i] = x
+
+# Normalize coefficients to sum to 1
+res[res < 0] = 0
+res = res / res.sum(axis=1, keepdims=1)
+
+input_spatial.obsm["proportions_pred"] = res
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashStripAutomount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashStripAutomount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatial_decomposition/methods/rctd/.config.vsh.yaml b/target/docker/spatial_decomposition/methods/rctd/.config.vsh.yaml
new file mode 100644
index 0000000000..4d8464c8cd
--- /dev/null
+++ b/target/docker/spatial_decomposition/methods/rctd/.config.vsh.yaml
@@ -0,0 +1,247 @@
+functionality:
+  name: "rctd"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--fc_cutoff"
+    description: "Minimum log-fold-change (across cell types) for genes to be included\
+      \ in the platform effect normalization step."
+    info: null
+    default:
+    - 0.5
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--fc_cutoff_reg"
+    description: "Minimum log-fold-change (across cell types) for genes to be included\
+      \ in the RCTD step."
+    info: null
+    default:
+    - 0.75
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "RCTD"
+    summary: "RCTD learns cell type profiles from scRNA-seq to decompose cell type\
+      \ mixtures while correcting for differences across sequencing technologies."
+    description: "RCTD (Robust Cell Type Decomposition) is a decomposition method\
+      \ that uses signatures learnt from single-cell data to decompose spatial expression\
+      \ of tissues. It is able to use a platform effect normalization step, which\
+      \ normalizes the scRNA-seq cell type profiles to match the platform effects\
+      \ of the spatial transcriptomics dataset.\n"
+    preferred_normalization: "counts"
+    reference: "cable2021robust"
+    documentation_url: "https://raw.githack.com/dmcable/spacexr/master/vignettes/spatial-transcriptomics.html"
+    repository_url: "https://github.com/dmcable/spacexr"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "Matrix"
+    - "pak"
+    bioc_force_install: false
+  - type: "r"
+    script:
+    - "pak::pkg_install(\"dmcable/spacexr\")"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/rctd/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/methods/rctd"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/methods/rctd/rctd"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatial_decomposition/methods/rctd/rctd b/target/docker/spatial_decomposition/methods/rctd/rctd
new file mode 100755
index 0000000000..af525dc898
--- /dev/null
+++ b/target/docker/spatial_decomposition/methods/rctd/rctd
@@ -0,0 +1,1103 @@
+#!/usr/bin/env bash
+
+# rctd 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="rctd"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "rctd 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+  echo ""
+  echo "    --fc_cutoff"
+  echo "        type: double"
+  echo "        default: 0.5"
+  echo "        Minimum log-fold-change (across cell types) for genes to be included in"
+  echo "        the platform effect normalization step."
+  echo ""
+  echo "    --fc_cutoff_reg"
+  echo "        type: double"
+  echo "        default: 0.75"
+  echo "        Minimum log-fold-change (across cell types) for genes to be included in"
+  echo "        the RCTD step."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_cran(c("Matrix", "pak"), repos = "https://cran.rstudio.com")'
+
+RUN Rscript -e 'pak::pkg_install("dmcable/spacexr")'
+
+LABEL org.opencontainers.image.description="Companion container for running component spatial_decomposition/methods rctd"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:36Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-rctd-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "rctd 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --fc_cutoff)
+            [ -n "$VIASH_PAR_FC_CUTOFF" ] && ViashError Bad arguments for option \'--fc_cutoff\': \'$VIASH_PAR_FC_CUTOFF\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_FC_CUTOFF="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --fc_cutoff. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --fc_cutoff=*)
+            [ -n "$VIASH_PAR_FC_CUTOFF" ] && ViashError Bad arguments for option \'--fc_cutoff=*\': \'$VIASH_PAR_FC_CUTOFF\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_FC_CUTOFF=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --fc_cutoff_reg)
+            [ -n "$VIASH_PAR_FC_CUTOFF_REG" ] && ViashError Bad arguments for option \'--fc_cutoff_reg\': \'$VIASH_PAR_FC_CUTOFF_REG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_FC_CUTOFF_REG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --fc_cutoff_reg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --fc_cutoff_reg=*)
+            [ -n "$VIASH_PAR_FC_CUTOFF_REG" ] && ViashError Bad arguments for option \'--fc_cutoff_reg=*\': \'$VIASH_PAR_FC_CUTOFF_REG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_FC_CUTOFF_REG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/rctd:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/rctd:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/rctd:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/rctd:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_FC_CUTOFF+x} ]; then
+  VIASH_PAR_FC_CUTOFF="0.5"
+fi
+if [ -z ${VIASH_PAR_FC_CUTOFF_REG+x} ]; then
+  VIASH_PAR_FC_CUTOFF_REG="0.75"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_FC_CUTOFF" ]]; then
+  if ! [[ "$VIASH_PAR_FC_CUTOFF" =~ ^[-+]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)([eE][-+]?[0-9]+)?$ ]]; then
+    ViashError '--fc_cutoff' has to be a double. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_FC_CUTOFF_REG" ]]; then
+  if ! [[ "$VIASH_PAR_FC_CUTOFF_REG" =~ ^[-+]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)([eE][-+]?[0-9]+)?$ ]]; then
+    ViashError '--fc_cutoff_reg' has to be a double. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SINGLE_CELL")" )
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SPATIAL_MASKED")" )
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/rctd:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/rctd:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/rctd:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-rctd-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+library(anndata)
+library(spacexr)
+library(Matrix)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_single_cell" = $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_SINGLE_CELL" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_spatial_masked" = $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "fc_cutoff" = $( if [ ! -z ${VIASH_PAR_FC_CUTOFF+x} ]; then echo -n "as.numeric('"; echo -n "$VIASH_PAR_FC_CUTOFF" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "fc_cutoff_reg" = $( if [ ! -z ${VIASH_PAR_FC_CUTOFF_REG+x} ]; then echo -n "as.numeric('"; echo -n "$VIASH_PAR_FC_CUTOFF_REG" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading input files\\n")
+input_single_cell <- anndata::read_h5ad(par\$input_single_cell)
+input_spatial <- anndata::read_h5ad(par\$input_spatial)
+
+# set spatial coordinates for the single cell data
+coordinates <- matrix(1, dim(input_single_cell)[1], 2)
+rownames(coordinates) <- rownames(input_single_cell)
+input_single_cell\$obsm <- list(coordinates = coordinates)
+
+# remove rare cell types to prevent RCTD error
+# celltype_counts <- table(input_single_cell\$obs\$cell_type)
+# input_single_cell <- input_single_cell[input_single_cell\$obs\$cell_type %in% names(as.table(celltype_counts[celltype_counts > 25]))]
+
+# get single cell reference counts
+sc_counts <- t(input_single_cell\$layers['counts'])
+# get single cell reference labels
+sc_cell_types <- factor(input_single_cell\$obs\$cell_type)
+names(sc_cell_types) <- rownames(input_single_cell)
+# construct reference object (specific for RCTD)
+reference <- Reference(sc_counts, sc_cell_types)
+
+# get spatial data counts
+sp_counts <- t(input_spatial\$layers['counts'])
+# get spatial data coordinates
+sp_coords <- as.data.frame(input_spatial\$obsm['coordinates'])
+colnames(sp_coords) <- c("x", "y")
+rownames(sp_coords) <- rownames(input_spatial)
+# create spatial object to use in RCTD
+puck <- SpatialRNA(sp_coords, sp_counts)
+
+# create RCTD object from reference and spatialRNA objects
+if (!is.null(meta\$cpus)) {
+max_cores <- meta\$cpus
+} else {
+max_cores <- 1
+}
+rctd <- create.RCTD(
+  puck,
+  reference,
+  max_cores = max_cores,
+  fc_cutoff = par\$fc_cutoff,
+  fc_cutoff_reg = par\$fc_cutoff_reg,
+  test_mode = FALSE,
+  UMI_min_sigma = 100,
+  CELL_MIN_INSTANCE = 1
+)
+
+# run analysis and get results
+rctd <- run.RCTD(rctd)
+results <- rctd@results
+cell_type_names <- rctd@cell_type_info\$info[[2]]
+
+# extract proportions and normalize them (to sum to one)
+norm_weights <- sweep(results\$weights, 1, rowSums(results\$weights), "/")
+norm_weights <- as.matrix(norm_weights)
+coordinates <- as.matrix(sp_coords)
+
+cat("Write output AnnData to file\\n")
+output <- anndata::AnnData(
+  shape = input_spatial\$shape, 
+  obs = input_spatial\$obs,
+  var = input_spatial\$var,
+  uns = list(
+    cell_type_names = input_spatial\$uns['cell_type_names'],
+    dataset_id = input_spatial\$uns[["dataset_id"]],
+    method_id = meta[["functionality_name"]]
+  ),
+  obsm = list(
+    coordinates = coordinates,
+    proportions_pred = norm_weights
+  ),
+  layers = list(
+    counts = input_spatial\$layers['counts']
+  )
+)
+output\$write_h5ad(par[["output"]], compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashStripAutomount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashStripAutomount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatial_decomposition/methods/seurat/.config.vsh.yaml b/target/docker/spatial_decomposition/methods/seurat/.config.vsh.yaml
new file mode 100644
index 0000000000..1768993440
--- /dev/null
+++ b/target/docker/spatial_decomposition/methods/seurat/.config.vsh.yaml
@@ -0,0 +1,241 @@
+functionality:
+  name: "seurat"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pcs"
+    description: "Number of principal components."
+    info: null
+    default:
+    - 30
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--sctransform_n_cells"
+    description: "Number of cells sampled to build NB regression."
+    info: null
+    default:
+    - 5000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Seurat"
+    summary: "Seurat method that is based on Canonical Correlation Analysis (CCA)."
+    description: "This method applies the 'anchor'-based integration workflow introduced\
+      \ in Seurat v3, that enables the probabilistic transfer of annotations from\
+      \ a reference to a query set. First, mutual nearest neighbors (anchors) are\
+      \ identified from the reference scRNA-seq and query spatial datasets. Then,\
+      \ annotations are transfered from the single cell reference data to the sptial\
+      \ data along with prediction scores for each spot.\n"
+    preferred_normalization: "counts"
+    reference: "stuart2019comprehensive"
+    documentation_url: "https://satijalab.org/seurat/articles/spatial_vignette"
+    repository_url: "https://github.com/satijalab/seurat"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "Matrix"
+    - "Seurat"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/seurat/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/methods/seurat"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/methods/seurat/seurat"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatial_decomposition/methods/seurat/seurat b/target/docker/spatial_decomposition/methods/seurat/seurat
new file mode 100755
index 0000000000..76ad54085f
--- /dev/null
+++ b/target/docker/spatial_decomposition/methods/seurat/seurat
@@ -0,0 +1,1105 @@
+#!/usr/bin/env bash
+
+# seurat 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="seurat"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "seurat 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+  echo ""
+  echo "    --n_pcs"
+  echo "        type: integer"
+  echo "        default: 30"
+  echo "        Number of principal components."
+  echo ""
+  echo "    --sctransform_n_cells"
+  echo "        type: integer"
+  echo "        default: 5000"
+  echo "        Number of cells sampled to build NB regression."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_r:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_cran(c("Matrix", "Seurat"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component spatial_decomposition/methods seurat"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:34Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-seurat-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "seurat 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_pcs)
+            [ -n "$VIASH_PAR_N_PCS" ] && ViashError Bad arguments for option \'--n_pcs\': \'$VIASH_PAR_N_PCS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_pcs. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_pcs=*)
+            [ -n "$VIASH_PAR_N_PCS" ] && ViashError Bad arguments for option \'--n_pcs=*\': \'$VIASH_PAR_N_PCS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --sctransform_n_cells)
+            [ -n "$VIASH_PAR_SCTRANSFORM_N_CELLS" ] && ViashError Bad arguments for option \'--sctransform_n_cells\': \'$VIASH_PAR_SCTRANSFORM_N_CELLS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SCTRANSFORM_N_CELLS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --sctransform_n_cells. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --sctransform_n_cells=*)
+            [ -n "$VIASH_PAR_SCTRANSFORM_N_CELLS" ] && ViashError Bad arguments for option \'--sctransform_n_cells=*\': \'$VIASH_PAR_SCTRANSFORM_N_CELLS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SCTRANSFORM_N_CELLS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/seurat:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/seurat:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/seurat:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/seurat:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_PCS+x} ]; then
+  VIASH_PAR_N_PCS="30"
+fi
+if [ -z ${VIASH_PAR_SCTRANSFORM_N_CELLS+x} ]; then
+  VIASH_PAR_SCTRANSFORM_N_CELLS="5000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_PCS" ]]; then
+  if ! [[ "$VIASH_PAR_N_PCS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_pcs' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_SCTRANSFORM_N_CELLS" ]]; then
+  if ! [[ "$VIASH_PAR_SCTRANSFORM_N_CELLS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--sctransform_n_cells' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SINGLE_CELL")" )
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SPATIAL_MASKED")" )
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/seurat:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/seurat:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/seurat:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-seurat-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+library(anndata)
+library(Seurat)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_single_cell" = $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_SINGLE_CELL" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_spatial_masked" = $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "n_pcs" = $( if [ ! -z ${VIASH_PAR_N_PCS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_PCS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "sctransform_n_cells" = $( if [ ! -z ${VIASH_PAR_SCTRANSFORM_N_CELLS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_SCTRANSFORM_N_CELLS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading input files\\n")
+input_single_cell <- anndata::read_h5ad(par\$input_single_cell)
+input_spatial <- anndata::read_h5ad(par\$input_spatial)
+
+cat(">> Converting AnnData to Seurat\\n")
+anndataToSeurat <- function(adata, assay) {
+  obj <- SeuratObject::CreateSeuratObject(counts = as(Matrix::t(adata\$layers[["counts"]]), "CsparseMatrix"), assay = assay)
+  obj <- SeuratObject::AddMetaData(object = obj, metadata = adata\$obs)
+  obj
+}
+
+seurat_sc <- anndataToSeurat(input_single_cell, "RNA")
+seurat_sp <- anndataToSeurat(input_spatial, "spatial")
+
+cat(">> Generate predictions\\n")
+
+# Normalize and do dimred for spatial data
+seurat_sp <- SCTransform(
+  seurat_sp,
+  assay = "spatial",
+  ncells = min(par\$sctransform_n_cells, nrow(seurat_sp)),
+  verbose = TRUE,
+  conserve.memory = TRUE
+)
+
+seurat_sp <- RunPCA(seurat_sp, assay = "SCT", verbose = FALSE, n_pcs = par\$n_pcs)
+
+# Normalize and do dimred for single cell data
+seurat_sc <- SCTransform(
+  seurat_sc,
+  assay = "RNA",
+  ncells = min(par\$sctransform_n_cells, nrow(seurat_sc)),
+  verbose = TRUE,
+  conserve.memory = TRUE
+)
+
+seurat_sc <- RunPCA(seurat_sc, verbose = FALSE, n_pcs = par\$n_pcs)
+
+# find anchors (MNN's to compute adjustmen vectors)
+anchors <- FindTransferAnchors(
+  reference = seurat_sc,
+  query = seurat_sp,
+  normalization.method = "SCT"
+)
+
+# transfer labels from single cell data to spatial
+predictions_assay <- TransferData(
+  anchorset = anchors,
+  refdata = as.factor(as.character(seurat_sc@meta.data\$cell_type)),
+  prediction.assay = TRUE,
+  weight.reduction = seurat_sp[["pca"]],
+  dims = 1:par\$n_pcs
+)
+
+# format data and return results
+predictions <- LayerData(predictions_assay, layer = "data")
+predictions <- predictions[!(rownames(predictions) == "max"), ]
+predictions <- t(predictions)
+
+sp_coords <- as.data.frame(input_spatial\$obsm['coordinates'])
+colnames(sp_coords) <- c("x", "y")
+rownames(sp_coords) <- rownames(input_spatial)
+sp_coords <- as.matrix(sp_coords)
+
+cat("Write output AnnData to file\\n")
+output <- anndata::AnnData(
+  shape = input_spatial\$shape, 
+  obs = input_spatial\$obs,
+  var = input_spatial\$var,
+  uns = list(
+    cell_type_names = input_spatial\$uns['cell_type_names'],
+    dataset_id = input_spatial\$uns[["dataset_id"]],
+    method_id = meta[["functionality_name"]]
+  ),
+  obsm = list(
+    coordinates = sp_coords,
+    proportions_pred = predictions
+  ),
+  layers = list(
+    counts = input_spatial\$layers['counts']
+  )
+)
+output\$write_h5ad(par[["output"]], compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashStripAutomount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashStripAutomount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatial_decomposition/methods/stereoscope/.config.vsh.yaml b/target/docker/spatial_decomposition/methods/stereoscope/.config.vsh.yaml
new file mode 100644
index 0000000000..c5fc94043f
--- /dev/null
+++ b/target/docker/spatial_decomposition/methods/stereoscope/.config.vsh.yaml
@@ -0,0 +1,243 @@
+functionality:
+  name: "stereoscope"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_sc"
+    description: "Number of of epochs to train RNAStereoscope model."
+    info: null
+    default:
+    - 100
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_sp"
+    description: "Number of of epochs to train SpatialStereoscope model."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Stereoscope"
+    summary: "Stereoscope is a decomposition method based on Negative Binomial regression."
+    description: "Stereoscope is a decomposition method based on Negative Binomial\
+      \ regression. It is similar in scope and implementation to cell2location but\
+      \ less flexible to incorporate additional covariates such as batch effects and\
+      \ other type of experimental design annotations.\n"
+    preferred_normalization: "counts"
+    reference: "andersson2020single"
+    documentation_url: "https://docs.scvi-tools.org/en/stable/user_guide/models/stereoscope.html"
+    repository_url: "https://github.com/scverse/scvi-tools"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_pytorch_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scvi-tools>=1.1.0"
+    upgrade: true
+  - type: "docker"
+    run:
+    - "pip install -U \"jax[cuda12_pip]\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "midmem"
+    - "midcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/stereoscope/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/methods/stereoscope"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/methods/stereoscope/stereoscope"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatial_decomposition/methods/stereoscope/stereoscope b/target/docker/spatial_decomposition/methods/stereoscope/stereoscope
new file mode 100755
index 0000000000..8d4093b897
--- /dev/null
+++ b/target/docker/spatial_decomposition/methods/stereoscope/stereoscope
@@ -0,0 +1,1061 @@
+#!/usr/bin/env bash
+
+# stereoscope 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="stereoscope"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "stereoscope 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+  echo ""
+  echo "    --max_epochs_sc"
+  echo "        type: integer"
+  echo "        default: 100"
+  echo "        Number of of epochs to train RNAStereoscope model."
+  echo ""
+  echo "    --max_epochs_sp"
+  echo "        type: integer"
+  echo "        default: 1000"
+  echo "        Number of of epochs to train SpatialStereoscope model."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_pytorch_nvidia:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scvi-tools>=1.1.0"
+
+RUN pip install -U "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
+
+LABEL org.opencontainers.image.description="Companion container for running component spatial_decomposition/methods stereoscope"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:35Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-stereoscope-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "stereoscope 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_epochs_sc)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SC" ] && ViashError Bad arguments for option \'--max_epochs_sc\': \'$VIASH_PAR_MAX_EPOCHS_SC\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SC="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_epochs_sc. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_epochs_sc=*)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SC" ] && ViashError Bad arguments for option \'--max_epochs_sc=*\': \'$VIASH_PAR_MAX_EPOCHS_SC\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SC=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_epochs_sp)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SP" ] && ViashError Bad arguments for option \'--max_epochs_sp\': \'$VIASH_PAR_MAX_EPOCHS_SP\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SP="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_epochs_sp. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_epochs_sp=*)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SP" ] && ViashError Bad arguments for option \'--max_epochs_sp=*\': \'$VIASH_PAR_MAX_EPOCHS_SP\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SP=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/stereoscope:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/stereoscope:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/stereoscope:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/stereoscope:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_MAX_EPOCHS_SC+x} ]; then
+  VIASH_PAR_MAX_EPOCHS_SC="100"
+fi
+if [ -z ${VIASH_PAR_MAX_EPOCHS_SP+x} ]; then
+  VIASH_PAR_MAX_EPOCHS_SP="1000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_MAX_EPOCHS_SC" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_EPOCHS_SC" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_epochs_sc' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_MAX_EPOCHS_SP" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_EPOCHS_SP" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_epochs_sp' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SINGLE_CELL")" )
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SPATIAL_MASKED")" )
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/stereoscope:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/stereoscope:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/stereoscope:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-stereoscope-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+from scvi.external import RNAStereoscope
+from scvi.external import SpatialStereoscope
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'max_epochs_sc': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_SC+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_SC//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'max_epochs_sp': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_SP+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_SP//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+input_single_cell.X = input_single_cell.layers["counts"]
+input_spatial.X = input_spatial.layers["counts"]
+
+print('Generate predictions', flush=True)
+
+RNAStereoscope.setup_anndata(input_single_cell, labels_key="cell_type")
+sc_model = RNAStereoscope(input_single_cell)
+sc_model.train(
+  max_epochs=par["max_epochs_sc"],
+  # early_stopping=True,
+  # early_stopping_monitor="elbo_validation"
+)
+
+SpatialStereoscope.setup_anndata(input_spatial)
+st_model = SpatialStereoscope.from_rna_model(input_spatial, sc_model)
+st_model.train(
+  max_epochs=par["max_epochs_sp"],
+  # early_stopping=True,
+  # early_stopping_monitor="elbo_validation"
+)
+input_spatial.obsm["proportions_pred"] = st_model.get_proportions().to_numpy()
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  },
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashStripAutomount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashStripAutomount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatial_decomposition/methods/tangram/.config.vsh.yaml b/target/docker/spatial_decomposition/methods/tangram/.config.vsh.yaml
new file mode 100644
index 0000000000..c6ac4a410d
--- /dev/null
+++ b/target/docker/spatial_decomposition/methods/tangram/.config.vsh.yaml
@@ -0,0 +1,240 @@
+functionality:
+  name: "tangram"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--num_epochs"
+    description: "Number of epochs to use while mapping single cells to spatial locations."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_markers"
+    description: "Number of marker genes to use."
+    info: null
+    default:
+    - 100
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Tangram"
+    summary: "Tanagram maps single-cell gene expression data onto spatial gene expression\
+      \ data by fitting gene expression on shared genes"
+    description: "Tangram is a method to map gene expression signatures from scRNA-seq\
+      \ data to spatial data. It performs the cell type mapping by learning a similarity\
+      \ matrix between single-cell and spatial locations based on gene expression\
+      \ profiles.\n"
+    preferred_normalization: "counts"
+    reference: "biancalani2021deep"
+    documentation_url: "https://tangram-sc.readthedocs.io/en/latest/index.html"
+    repository_url: "https://github.com/broadinstitute/Tangram"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "tangram-sc"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/tangram/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/methods/tangram"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/methods/tangram/tangram"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatial_decomposition/methods/tangram/tangram b/target/docker/spatial_decomposition/methods/tangram/tangram
new file mode 100755
index 0000000000..b45ad244ca
--- /dev/null
+++ b/target/docker/spatial_decomposition/methods/tangram/tangram
@@ -0,0 +1,1082 @@
+#!/usr/bin/env bash
+
+# tangram 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="tangram"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "tangram 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+  echo ""
+  echo "    --num_epochs"
+  echo "        type: integer"
+  echo "        default: 1000"
+  echo "        Number of epochs to use while mapping single cells to spatial locations."
+  echo ""
+  echo "    --n_markers"
+  echo "        type: integer"
+  echo "        default: 100"
+  echo "        Number of marker genes to use."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "tangram-sc"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatial_decomposition/methods tangram"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:36Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-tangram-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "tangram 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --num_epochs)
+            [ -n "$VIASH_PAR_NUM_EPOCHS" ] && ViashError Bad arguments for option \'--num_epochs\': \'$VIASH_PAR_NUM_EPOCHS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_EPOCHS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --num_epochs. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --num_epochs=*)
+            [ -n "$VIASH_PAR_NUM_EPOCHS" ] && ViashError Bad arguments for option \'--num_epochs=*\': \'$VIASH_PAR_NUM_EPOCHS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_EPOCHS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_markers)
+            [ -n "$VIASH_PAR_N_MARKERS" ] && ViashError Bad arguments for option \'--n_markers\': \'$VIASH_PAR_N_MARKERS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_MARKERS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_markers. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_markers=*)
+            [ -n "$VIASH_PAR_N_MARKERS" ] && ViashError Bad arguments for option \'--n_markers=*\': \'$VIASH_PAR_N_MARKERS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_MARKERS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/tangram:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/tangram:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/tangram:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/tangram:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_NUM_EPOCHS+x} ]; then
+  VIASH_PAR_NUM_EPOCHS="1000"
+fi
+if [ -z ${VIASH_PAR_N_MARKERS+x} ]; then
+  VIASH_PAR_N_MARKERS="100"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_NUM_EPOCHS" ]]; then
+  if ! [[ "$VIASH_PAR_NUM_EPOCHS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--num_epochs' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_MARKERS" ]]; then
+  if ! [[ "$VIASH_PAR_N_MARKERS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_markers' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SINGLE_CELL")" )
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SPATIAL_MASKED")" )
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/tangram:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/tangram:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/tangram:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-tangram-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import pandas as pd
+import scanpy as sc
+import tangram as tg
+import torch
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'num_epochs': $( if [ ! -z ${VIASH_PAR_NUM_EPOCHS+x} ]; then echo "int(r'${VIASH_PAR_NUM_EPOCHS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_markers': $( if [ ! -z ${VIASH_PAR_N_MARKERS+x} ]; then echo "int(r'${VIASH_PAR_N_MARKERS//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+print('Generate predictions', flush=True)
+# analysis based on github.com/broadinstitute/Tangram/blob/master/tutorial_tangram_with_squidpy.ipynb
+# using tangram from PyPi, not github version
+
+input_single_cell.X = input_single_cell.layers["counts"]
+input_spatial.X = input_spatial.layers["counts"]
+
+# pre-process single cell data
+sc.pp.normalize_total(input_single_cell, 1e4)
+sc.pp.log1p(input_single_cell)
+# identify marker genes
+sc.tl.rank_genes_groups(input_single_cell, groupby="cell_type", use_raw=False)
+
+# extract marker genes to data frame
+markers_df = pd.DataFrame(input_single_cell.uns["rank_genes_groups"]["names"]).iloc[0:par['n_markers'], :]
+
+# get union of all marker genes
+markers = list(set(markers_df.melt().value.values))
+
+# match genes between single cell and spatial data
+tg.pp_adatas(input_single_cell, input_spatial, genes=markers)
+
+# get device
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+
+# map single cells to spatial locations
+ad_map = tg.map_cells_to_space(
+  input_single_cell,
+  input_spatial,
+  device=device,
+  num_epochs=par['num_epochs'],
+)
+
+# transfer labels from mapped cells to spatial location
+tg.project_cell_annotations(adata_map=ad_map, adata_sp=input_spatial, annotation="cell_type")
+
+# normalize scores
+pred_props = input_spatial.obsm["tangram_ct_pred"].to_numpy()
+input_spatial.obsm["proportions_pred"] = pred_props / pred_props.sum(axis=1)[:, None]
+
+# remove un-normalized predictions
+del input_spatial.obsm["tangram_ct_pred"]
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  },
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashStripAutomount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashStripAutomount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatial_decomposition/methods/vanillanmf/.config.vsh.yaml b/target/docker/spatial_decomposition/methods/vanillanmf/.config.vsh.yaml
new file mode 100644
index 0000000000..b14ffef66b
--- /dev/null
+++ b/target/docker/spatial_decomposition/methods/vanillanmf/.config.vsh.yaml
@@ -0,0 +1,233 @@
+functionality:
+  name: "vanillanmf"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_iter"
+    description: "Maximum number of iterations before timing out."
+    info: null
+    default:
+    - 4000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "NMF"
+    summary: "NMF reconstructs gene expression as a weighted combination of cell type\
+      \ signatures defined by scRNA-seq."
+    description: "NMF is a decomposition method based on Non-negative Matrix Factorization\
+      \ (NMF) that reconstructs expression of each spatial location as a weighted\
+      \ combination of cell-type signatures defined by scRNA-seq. It is a simpler\
+      \ baseline than NMFreg as it only performs the NMF step based on mean expression\
+      \ signatures of cell types, returning the weights loading of the NMF as (normalized)\
+      \ cell type proportions, without the regression step.\n"
+    preferred_normalization: "counts"
+    reference: "cichocki2009fast"
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html"
+    repository_url: "https://github.com/scikit-learn/scikit-learn/blob/92c9b1866/sklearn/decomposition/"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    - "scipy"
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/vanillanmf/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/methods/vanillanmf"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/methods/vanillanmf/vanillanmf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatial_decomposition/methods/vanillanmf/vanillanmf b/target/docker/spatial_decomposition/methods/vanillanmf/vanillanmf
new file mode 100755
index 0000000000..bc29a2ea11
--- /dev/null
+++ b/target/docker/spatial_decomposition/methods/vanillanmf/vanillanmf
@@ -0,0 +1,1050 @@
+#!/usr/bin/env bash
+
+# vanillanmf 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="vanillanmf"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "vanillanmf 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+  echo ""
+  echo "    --max_iter"
+  echo "        type: integer"
+  echo "        default: 4000"
+  echo "        Maximum number of iterations before timing out."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "numpy" "scipy" "scikit-learn"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatial_decomposition/methods vanillanmf"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:35Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-vanillanmf-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "vanillanmf 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_iter)
+            [ -n "$VIASH_PAR_MAX_ITER" ] && ViashError Bad arguments for option \'--max_iter\': \'$VIASH_PAR_MAX_ITER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_ITER="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_iter. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_iter=*)
+            [ -n "$VIASH_PAR_MAX_ITER" ] && ViashError Bad arguments for option \'--max_iter=*\': \'$VIASH_PAR_MAX_ITER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_ITER=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/vanillanmf:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/vanillanmf:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/vanillanmf:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/vanillanmf:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_MAX_ITER+x} ]; then
+  VIASH_PAR_MAX_ITER="4000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_MAX_ITER" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_ITER" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_iter' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SINGLE_CELL")" )
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SPATIAL_MASKED")" )
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/vanillanmf:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/vanillanmf:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/methods/vanillanmf:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-vanillanmf-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+from scipy.sparse import issparse
+from sklearn.decomposition import NMF
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'max_iter': $( if [ ! -z ${VIASH_PAR_MAX_ITER+x} ]; then echo "int(r'${VIASH_PAR_MAX_ITER//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+print('Generate predictions', flush=True)
+
+n_types = input_single_cell.obs["cell_type"].cat.categories.shape[0]
+vanila_nmf_model = NMF(
+  n_components=n_types,
+  beta_loss="kullback-leibler",
+  solver="mu",
+  max_iter=par['max_iter'],
+  alpha_W=0.1,
+  alpha_H=0.1,
+  init="custom",
+  random_state=42,
+)
+
+# Make profiles from single-cell expression dataset
+# Compute means over each 'cell_type'
+labels = input_single_cell.obs['cell_type'].cat.categories
+n_var = input_single_cell.shape[1]
+means = np.empty((labels.shape[0], n_var))
+for i, lab in enumerate(labels):
+  adata_lab = input_single_cell[input_single_cell.obs['cell_type'] == lab]
+  x_lab = adata_lab.layers['counts']
+  means[i, :] = x_lab.mean(axis=0).flatten()
+adata_means = ad.AnnData(means)
+adata_means.obs_names = labels
+adata_means.var_names = input_single_cell.var_names
+
+X = input_spatial.layers['counts'].toarray()
+
+Wa = vanila_nmf_model.fit_transform(
+  X.astype(adata_means.X.dtype),
+  H=adata_means.X,
+  W=np.ones((input_spatial.shape[0], n_types), dtype=adata_means.X.dtype),
+)
+
+prop = Wa / Wa.sum(1)[:, np.newaxis]
+input_spatial.obsm["proportions_pred"] = prop
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  VIASH_PAR_INPUT_SINGLE_CELL=$(ViashStripAutomount "$VIASH_PAR_INPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashStripAutomount "$VIASH_PAR_INPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatial_decomposition/metrics/r2/.config.vsh.yaml b/target/docker/spatial_decomposition/metrics/r2/.config.vsh.yaml
new file mode 100644
index 0000000000..7699a3d6ce
--- /dev/null
+++ b/target/docker/spatial_decomposition/metrics/r2/.config.vsh.yaml
@@ -0,0 +1,252 @@
+functionality:
+  name: "r2"
+  namespace: "spatial_decomposition/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_method"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, with true cell-type proportions for each spot / capture location."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_true"
+          description: "True cell type proportions for each spot"
+          required: true
+        uns:
+        - name: "cell_type_names"
+          type: "string"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - name: "dataset_id"
+          type: "string"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file."
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "r2"
+      label: "R2"
+      summary: "R2 represents the proportion of variance in the true proportions which\
+        \ is explained by the predicted proportions."
+      description: "R2, or the “coefficient of determination”, reports the fraction\
+        \ of the true proportion values' variance that can be explained by the predicted\
+        \ proportion values. The best score, and upper bound, is 1.0. There is no\
+        \ fixed lower bound for the metric. The uniform/non-weighted average across\
+        \ all cell types/states is used to summarise performance. By default, cases\
+        \ resulting in a score of NaN (perfect predictions) or -Inf (imperfect predictions)\
+        \ are replaced with 1.0 (perfect predictions) or 0.0 (imperfect predictions)\
+        \ respectively.\n"
+      reference: "miles2005rsquared"
+      documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html"
+      repository_url: "https://github.com/scikit-learn/scikit-learn/tree/5c4aa5d0d90ba66247d675d4c3fc2fdfba3c39ff"
+      min: "-inf"
+      max: 1
+      maximize: true
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A spatial decomposition metric."
+      description: "A metric for evaluating accuracy of cell type proportion estimate\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/metrics/r2/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/metrics/r2"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/metrics/r2/r2"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatial_decomposition/metrics/r2/r2 b/target/docker/spatial_decomposition/metrics/r2/r2
new file mode 100755
index 0000000000..97a8817d0d
--- /dev/null
+++ b/target/docker/spatial_decomposition/metrics/r2/r2
@@ -0,0 +1,984 @@
+#!/usr/bin/env bash
+
+# r2 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="r2"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "r2 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_method"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/score.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scikit-learn"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatial_decomposition/metrics r2"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:31Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-r2-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "r2 2.0.0"
+            exit
+            ;;
+        --input_method)
+            [ -n "$VIASH_PAR_INPUT_METHOD" ] && ViashError Bad arguments for option \'--input_method\': \'$VIASH_PAR_INPUT_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_METHOD="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_method. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_method=*)
+            [ -n "$VIASH_PAR_INPUT_METHOD" ] && ViashError Bad arguments for option \'--input_method=*\': \'$VIASH_PAR_INPUT_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_METHOD=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/metrics/r2:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/metrics/r2:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/metrics/r2:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/metrics/r2:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_METHOD+x} ]; then
+  ViashError '--input_method' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_METHOD" ] && [ ! -e "$VIASH_PAR_INPUT_METHOD" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_METHOD' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_METHOD" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_METHOD")" )
+  VIASH_PAR_INPUT_METHOD=$(ViashAutodetectMount "$VIASH_PAR_INPUT_METHOD")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatial_decomposition/metrics/r2:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/metrics/r2:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/metrics/r2:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-r2-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import sklearn.metrics
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_method': $( if [ ! -z ${VIASH_PAR_INPUT_METHOD+x} ]; then echo "r'${VIASH_PAR_INPUT_METHOD//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_method = ad.read_h5ad(par['input_method'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Compute metrics', flush=True)
+prop_true = input_solution.obsm["proportions_true"]
+prop_pred = input_method.obsm["proportions_pred"]
+r2_score = sklearn.metrics.r2_score(
+  prop_true, prop_pred, sample_weight=None, multioutput="uniform_average"
+)
+
+uns_metric_ids = [ 'r2' ]
+uns_metric_values = [ r2_score ]
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  uns={
+    'dataset_id': input_method.uns['dataset_id'],
+    'method_id': input_method.uns['method_id'],
+    'metric_ids': uns_metric_ids,
+    'metric_values': uns_metric_values
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_METHOD" ]; then
+  VIASH_PAR_INPUT_METHOD=$(ViashStripAutomount "$VIASH_PAR_INPUT_METHOD")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatial_decomposition/process_dataset/.config.vsh.yaml b/target/docker/spatial_decomposition/process_dataset/.config.vsh.yaml
new file mode 100644
index 0000000000..48d27d3145
--- /dev/null
+++ b/target/docker/spatial_decomposition/process_dataset/.config.vsh.yaml
@@ -0,0 +1,308 @@
+functionality:
+  name: "process_dataset"
+  namespace: "spatial_decomposition"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Common Dataset"
+      summary: "A subset of the common dataset."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs."
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot."
+          required: false
+        - type: "double"
+          name: "proportions_true"
+          description: "True cell type proportions for each spot"
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: false
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/dataset_simulated.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_solution"
+    info:
+      label: "Solution"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, with true cell-type proportions for each spot / capture location."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_true"
+          description: "True cell type proportions for each spot"
+          required: true
+        uns:
+        - name: "cell_type_names"
+          type: "string"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - name: "dataset_id"
+          type: "string"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/subset_anndata.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/common/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/common/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  info:
+    type: "process_dataset"
+    type_info:
+      label: "Data processor"
+      summary: "A spatial decomposition dataset processor."
+      description: "Prepare a common dataset for the spatial_decomposition task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/process_dataset/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/process_dataset"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatial_decomposition/process_dataset/process_dataset"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatial_decomposition/process_dataset/process_dataset b/target/docker/spatial_decomposition/process_dataset/process_dataset
new file mode 100755
index 0000000000..2fe7cce099
--- /dev/null
+++ b/target/docker/spatial_decomposition/process_dataset/process_dataset
@@ -0,0 +1,1027 @@
+#!/usr/bin/env bash
+
+# process_dataset 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="process_dataset"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "process_dataset 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/dataset_simulated.h5ad"
+  echo ""
+  echo "    --output_single_cell"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --output_spatial_masked"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --output_solution"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_python:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component spatial_decomposition process_dataset"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:33Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-process_dataset-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "process_dataset 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_single_cell)
+            [ -n "$VIASH_PAR_OUTPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--output_single_cell\': \'$VIASH_PAR_OUTPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_single_cell=*)
+            [ -n "$VIASH_PAR_OUTPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--output_single_cell=*\': \'$VIASH_PAR_OUTPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_spatial_masked)
+            [ -n "$VIASH_PAR_OUTPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--output_spatial_masked\': \'$VIASH_PAR_OUTPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_spatial_masked=*)
+            [ -n "$VIASH_PAR_OUTPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--output_spatial_masked=*\': \'$VIASH_PAR_OUTPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_solution)
+            [ -n "$VIASH_PAR_OUTPUT_SOLUTION" ] && ViashError Bad arguments for option \'--output_solution\': \'$VIASH_PAR_OUTPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_solution=*)
+            [ -n "$VIASH_PAR_OUTPUT_SOLUTION" ] && ViashError Bad arguments for option \'--output_solution=*\': \'$VIASH_PAR_OUTPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/process_dataset:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/process_dataset:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/process_dataset:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatial_decomposition/process_dataset:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_SINGLE_CELL+x} ]; then
+  ViashError '--output_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--output_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_SOLUTION+x} ]; then
+  ViashError '--output_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT_SINGLE_CELL" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_SINGLE_CELL")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_SINGLE_CELL")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SPATIAL_MASKED" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_SPATIAL_MASKED")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_SPATIAL_MASKED")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_SOLUTION")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_SOLUTION")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SINGLE_CELL" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_SINGLE_CELL")" )
+  VIASH_PAR_OUTPUT_SINGLE_CELL=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_SINGLE_CELL")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_SINGLE_CELL" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SPATIAL_MASKED" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_SPATIAL_MASKED")" )
+  VIASH_PAR_OUTPUT_SPATIAL_MASKED=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_SPATIAL_MASKED")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_SPATIAL_MASKED" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_SOLUTION")" )
+  VIASH_PAR_OUTPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_SOLUTION")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_SOLUTION" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatial_decomposition/process_dataset:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/process_dataset:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatial_decomposition/process_dataset:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-process_dataset-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import sys 
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_single_cell': $( if [ ! -z ${VIASH_PAR_OUTPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_OUTPUT_SINGLE_CELL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_spatial_masked': $( if [ ! -z ${VIASH_PAR_OUTPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_OUTPUT_SPATIAL_MASKED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_solution': $( if [ ! -z ${VIASH_PAR_OUTPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_OUTPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta['resources_dir'])
+from subset_anndata import read_config_slots_info, subset_anndata
+
+print(">> Load dataset", flush=True)
+adata = ad.read_h5ad(par["input"])
+
+# TO DO: Non-integer values in the counts layer are detected as un-normalized data by some methods, thereby causing them to fail.
+adata.layers['counts'] = adata.layers['counts'].floor()
+
+print(">> Figuring out which data needs to be copied to which output file", flush=True)
+slot_info = read_config_slots_info(meta["config"])
+
+print(">> Split dataset by modality", flush=True)
+is_sp = adata.obs["modality"] == "sp"
+adata_sp = adata[is_sp, :].copy()
+adata_sc = adata[~is_sp, :].copy()
+
+print(">> Create dataset for methods", flush=True)
+output_spatial_masked = subset_anndata(adata_sp, slot_info['output_spatial_masked'])
+output_single_cell = subset_anndata(adata_sc, slot_info['output_single_cell'])
+
+print(">> Create solution object for metrics", flush=True)
+output_solution = subset_anndata(adata_sp, slot_info['output_solution'])
+
+print(">> Write to disk", flush=True)
+output_spatial_masked.write_h5ad(par["output_spatial_masked"])
+output_single_cell.write_h5ad(par["output_single_cell"])
+output_solution.write_h5ad(par["output_solution"])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SINGLE_CELL" ]; then
+  VIASH_PAR_OUTPUT_SINGLE_CELL=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_SINGLE_CELL")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SPATIAL_MASKED" ]; then
+  VIASH_PAR_OUTPUT_SPATIAL_MASKED=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_SPATIAL_MASKED")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ]; then
+  VIASH_PAR_OUTPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_OUTPUT_SINGLE_CELL" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_OUTPUT_SPATIAL_MASKED" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_OUTPUT_SOLUTION" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatial_decomposition/process_dataset/subset_anndata.py b/target/docker/spatial_decomposition/process_dataset/subset_anndata.py
new file mode 100644
index 0000000000..80bd160872
--- /dev/null
+++ b/target/docker/spatial_decomposition/process_dataset/subset_anndata.py
@@ -0,0 +1,83 @@
+"""Helper functions related to subsetting AnnData objects based on the file format 
+specifications in the .config.vsh.yaml and slot mapping overrides."""
+
+def read_config_slots_info(config_file, slot_mapping = {}):
+    """Read the .config.vsh.yaml to find out which output slots need to be copied to which output file.
+    
+    Arguments:
+    config_file -- Path to the .config.vsh.yaml file (required).
+    slot_mapping -- Which slots to retain. Must be a dictionary whose keys are the names 
+      of the AnnData structs, and values is another dictionary with destination value
+      names as keys and source value names as values.
+      Example of slot_mapping: 
+        ```
+        slot_mapping = {
+          "layers": {
+            "counts": par["layer_counts"],
+          },
+          "obs": {
+            "cell_type": par["obs_cell_type"],
+            "batch": par["obs_batch"],
+          }
+        }
+        ```
+    """
+    import yaml
+    import re
+
+    # read output spec from yaml
+    with open(config_file, "r") as object_name:
+        config = yaml.safe_load(object_name)
+
+    output_struct_slots = {}
+
+    # fetch info on which slots should be copied to which file
+    for arg in config["functionality"]["arguments"]:
+        # argument is an output file with a slot specification
+        if arg["direction"] == "output" and arg.get("info", {}).get("slots"):
+            object_name = re.sub("--", "", arg["name"])
+            
+            struct_slots = arg['info']['slots']
+            out = {}
+            for (struct, slots) in struct_slots.items():
+                out_struct = {}
+                for slot in slots:
+                    # if slot_mapping[struct][slot['name']] exists, use that as the source slot name
+                    # otherwise use slot['name']
+                    source_slot = slot_mapping.get(struct, {}).get(slot["name"], slot["name"])
+                    out_struct[slot["name"]] = source_slot
+                out[struct] = out_struct
+
+            output_struct_slots[object_name] = out
+
+    return output_struct_slots
+
+# create new anndata objects according to api spec
+def subset_anndata(adata, slot_info):
+    """Create new anndata object according to slot info specifications.
+    
+    Arguments:
+    adata -- An AnnData object to subset (required)
+    slot_info -- Which slots to retain, typically one of the items in the output of read_config_slots_info.
+      Must be a dictionary whose keys are the names of the AnnData structs, and values is another 
+      dictionary with destination value names as keys and source value names as values. 
+      """
+    import pandas as pd
+    import anndata as ad
+
+    structs = ["layers", "obs", "var", "uns", "obsp", "obsm", "varp", "varm"]
+    kwargs = {}
+
+    for struct in structs:
+        slot_mapping = slot_info.get(struct, {})
+        data = {dest : getattr(adata, struct)[src] for (dest, src) in slot_mapping.items()}
+        if len(data) > 0:
+            if struct in ['obs', 'var']:
+                data = pd.concat(data, axis=1)
+            kwargs[struct] = data
+        elif struct in ['obs', 'var']:
+            # if no columns need to be copied, we still need an 'obs' and a 'var' 
+            # to help determine the shape of the adata
+            kwargs[struct] = getattr(adata, struct).iloc[:,[]]
+
+    return ad.AnnData(**kwargs)
\ No newline at end of file
diff --git a/target/docker/spatially_variable_genes/control_methods/random_ranking/.config.vsh.yaml b/target/docker/spatially_variable_genes/control_methods/random_ranking/.config.vsh.yaml
new file mode 100644
index 0000000000..c38de22347
--- /dev/null
+++ b/target/docker/spatially_variable_genes/control_methods/random_ranking/.config.vsh.yaml
@@ -0,0 +1,247 @@
+functionality:
+  name: "random_ranking"
+  namespace: "spatially_variable_genes/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Anndata with true spatial variability."
+      description: "Anndata with true spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature (e.g., ESEMBL gene id suffixed\
+            \ with alpha value)."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: true
+        - type: "string"
+          name: "orig_feature_name"
+          description: "Original human-readable name for the feature, usually a gene\
+            \ symbol."
+          required: true
+        - type: "double"
+          name: "true_spatial_var_score"
+          description: "True spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: true
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Random Ranking"
+    summary: "Negative control method that randomly rank genes."
+    description: "A negative control method with random ranking of genes.\n"
+    preferred_normalization: "counts"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "Control methods have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "pandas"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/control_methods/random_ranking/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/control_methods/random_ranking"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/control_methods/random_ranking/random_ranking"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/control_methods/random_ranking/random_ranking b/target/docker/spatially_variable_genes/control_methods/random_ranking/random_ranking
new file mode 100755
index 0000000000..142399467a
--- /dev/null
+++ b/target/docker/spatially_variable_genes/control_methods/random_ranking/random_ranking
@@ -0,0 +1,974 @@
+#!/usr/bin/env bash
+
+# random_ranking 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="random_ranking"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "random_ranking 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM ghcr.io/openproblems-bio/base_python:1.0.4
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "pandas"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/control_methods random_ranking"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:40Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-random_ranking-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "random_ranking 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/control_methods/random_ranking:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/control_methods/random_ranking:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/control_methods/random_ranking:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/control_methods/random_ranking:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_DATA")" )
+  VIASH_PAR_INPUT_DATA=$(ViashAutodetectMount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/control_methods/random_ranking:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/control_methods/random_ranking:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/control_methods/random_ranking:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-random_ranking-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Generate predictions', flush=True)
+input_data = ad.read_h5ad(par['input_data'])
+
+df = input_data.var[["feature_id"]]
+
+np.random.seed(0)
+df['pred_spatial_var_score'] = np.random.rand(len(df['feature_id']))
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': input_data.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_PAR_INPUT_DATA=$(ViashStripAutomount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/control_methods/true_ranking/.config.vsh.yaml b/target/docker/spatially_variable_genes/control_methods/true_ranking/.config.vsh.yaml
new file mode 100644
index 0000000000..e33cbfdbe3
--- /dev/null
+++ b/target/docker/spatially_variable_genes/control_methods/true_ranking/.config.vsh.yaml
@@ -0,0 +1,247 @@
+functionality:
+  name: "true_ranking"
+  namespace: "spatially_variable_genes/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Anndata with true spatial variability."
+      description: "Anndata with true spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature (e.g., ESEMBL gene id suffixed\
+            \ with alpha value)."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: true
+        - type: "string"
+          name: "orig_feature_name"
+          description: "Original human-readable name for the feature, usually a gene\
+            \ symbol."
+          required: true
+        - type: "double"
+          name: "true_spatial_var_score"
+          description: "True spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: true
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "True Ranking"
+    summary: "Positive control method that correctly rank genes."
+    description: "A positive control method with correct ranking of genes.\n"
+    preferred_normalization: "counts"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "Control methods have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "pandas"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/control_methods/true_ranking/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/control_methods/true_ranking"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/control_methods/true_ranking/true_ranking"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/control_methods/true_ranking/true_ranking b/target/docker/spatially_variable_genes/control_methods/true_ranking/true_ranking
new file mode 100755
index 0000000000..9371f7fd98
--- /dev/null
+++ b/target/docker/spatially_variable_genes/control_methods/true_ranking/true_ranking
@@ -0,0 +1,971 @@
+#!/usr/bin/env bash
+
+# true_ranking 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="true_ranking"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "true_ranking 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM ghcr.io/openproblems-bio/base_python:1.0.4
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "pandas"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/control_methods true_ranking"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:40Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-true_ranking-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "true_ranking 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/control_methods/true_ranking:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/control_methods/true_ranking:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/control_methods/true_ranking:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/control_methods/true_ranking:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_DATA")" )
+  VIASH_PAR_INPUT_DATA=$(ViashAutodetectMount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/control_methods/true_ranking:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/control_methods/true_ranking:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/control_methods/true_ranking:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-true_ranking-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Generate predictions', flush=True)
+input_solution = ad.read_h5ad(par['input_solution'])
+
+df = input_solution.var[["feature_id", "true_spatial_var_score"]]
+df.rename(columns={'true_spatial_var_score': 'pred_spatial_var_score'}, inplace=True)
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': input_solution.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_PAR_INPUT_DATA=$(ViashStripAutomount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/methods/boostgp/.config.vsh.yaml b/target/docker/spatially_variable_genes/methods/boostgp/.config.vsh.yaml
new file mode 100644
index 0000000000..ae5769a57b
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/boostgp/.config.vsh.yaml
@@ -0,0 +1,209 @@
+functionality:
+  name: "boostgp"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_iter"
+    description: "Number of iterations."
+    info:
+      test_default: 7
+    default:
+    - 10
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "BOOST-GP"
+    summary: "Bayesian modeling of spatial molecular profiling data via Gaussian process"
+    description: "BOOST-GP a novel Bayesian hierarchical model to analyze spatial\
+      \ transcriptomics data, \nwith several unique characteristics. It models the\
+      \ zero-inflated and over-dispersed \ncounts by deploying a zero-inflated negative\
+      \ binomial model that greatly increases \nmodel stability and robustness. Besides,\
+      \ the Bayesian inference framework allows us \nto borrow strength in parameter\
+      \ estimation in a de novo fashion. As a result, \nthe proposed model shows competitive\
+      \ performances in accuracy and robustness \nover existing methods in both simulation\
+      \ studies and two real data applications.\n"
+    preferred_normalization: "counts"
+    reference: "li2021bayesian"
+    documentation_url: "https://github.com/Minzhe/BOOST-GP"
+    repository_url: "https://github.com/Minzhe/BOOST-GP"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_r:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone https://github.com/Minzhe/BOOST-GP.git /opt/BOOST-GP\n"
+  - type: "r"
+    cran:
+    - "RcppDist"
+    - "ggplot2"
+    - "anndata"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "veryhightime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/boostgp/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/boostgp"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/boostgp/boostgp"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/methods/boostgp/boostgp b/target/docker/spatially_variable_genes/methods/boostgp/boostgp
new file mode 100755
index 0000000000..88f793a787
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/boostgp/boostgp
@@ -0,0 +1,1006 @@
+#!/usr/bin/env bash
+
+# boostgp 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="boostgp"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "boostgp 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+  echo ""
+  echo "    --n_iter"
+  echo "        type: integer"
+  echo "        default: 10"
+  echo "        Number of iterations."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM ghcr.io/openproblems-bio/base_r:1.0.4
+
+ENTRYPOINT []
+
+ 
+RUN apt-get update && \
+  DEBIAN_FRONTEND=noninteractive apt-get install -y git && \
+  rm -rf /var/lib/apt/lists/*
+
+RUN git clone https://github.com/Minzhe/BOOST-GP.git /opt/BOOST-GP
+
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_cran(c("RcppDist", "ggplot2", "anndata"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/methods boostgp"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:43Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-boostgp-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "boostgp 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_iter)
+            [ -n "$VIASH_PAR_N_ITER" ] && ViashError Bad arguments for option \'--n_iter\': \'$VIASH_PAR_N_ITER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_ITER="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_iter. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_iter=*)
+            [ -n "$VIASH_PAR_N_ITER" ] && ViashError Bad arguments for option \'--n_iter=*\': \'$VIASH_PAR_N_ITER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_ITER=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/boostgp:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/boostgp:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/boostgp:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/boostgp:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_ITER+x} ]; then
+  VIASH_PAR_N_ITER="10"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_ITER" ]]; then
+  if ! [[ "$VIASH_PAR_N_ITER" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_iter' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_DATA")" )
+  VIASH_PAR_INPUT_DATA=$(ViashAutodetectMount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/boostgp:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/boostgp:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/boostgp:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-boostgp-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+library(RcppDist)
+library(anndata)
+
+dest <- getwd()
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_data" = $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_DATA" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "n_iter" = $( if [ ! -z ${VIASH_PAR_N_ITER+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_ITER" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+# VIASH END
+
+cat("Load data\\n")
+adata <- anndata::read_h5ad(par\$input_data)
+
+setwd("/opt/BOOST-GP")
+source("./R/boost.gp.R")
+
+counts <- as.matrix(adata\$layers[["counts"]])
+colnames(counts) <- adata\$var_names
+rownames(counts) <- adata\$obs_names
+mode(counts) <- "integer"
+
+loc <- as.data.frame(adata\$obsm[["spatial"]])
+rownames(loc) <- adata\$obs_names
+colnames(loc) <- c("x", "y")
+
+cat("Run BOOST-GP\\n")
+df <- as.data.frame(boost.gp(Y = counts, loc = loc, iter = par\$n_iter, burn = 5))
+
+df\$feature_id <- rownames(df)
+df <- subset(df, select = c("feature_id", "PPI"))
+colnames(df) <- c("feature_id", "pred_spatial_var_score")
+
+# save output
+cat("Write output AnnData to file\\n")
+output <- anndata::AnnData(
+    shape = adata\$shape,
+    var = df,
+    uns = list(
+        "dataset_id" = adata\$uns[["dataset_id"]],
+        "method_id" = meta[["functionality_name"]]
+    )
+)
+
+zzz <- output\$write_h5ad(paste0(dest, "/", par\$output), compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_PAR_INPUT_DATA=$(ViashStripAutomount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/methods/gpcounts/.config.vsh.yaml b/target/docker/spatially_variable_genes/methods/gpcounts/.config.vsh.yaml
new file mode 100644
index 0000000000..6713f67c84
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/gpcounts/.config.vsh.yaml
@@ -0,0 +1,218 @@
+functionality:
+  name: "gpcounts"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_features"
+    description: "Number of features to include."
+    info:
+      test_default: 120
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "GPcounts"
+    summary: "GPcounts is non-parametric modelling of temporal and spatial counts\
+      \ data from RNA-seq experiments."
+    description: "The GPcounts package implements GP regression methods for modelling\
+      \ counts data using a \nnegative binomial likelihood function. Computational\
+      \ efficiency is achieved through the use of \nvariational Bayesian inference.\
+      \ The GP function models changes in the mean of the negative binomial \nlikelihood\
+      \ through a logarithmic link function and the dispersion parameter is fitted\
+      \ by maximum \nlikelihood. We validate the method on simulated time course data,\
+      \ showing better performance to identify \nchanges in over-dispersed counts\
+      \ data than methods based on Gaussian or Poisson likelihoods. \n"
+    preferred_normalization: "counts"
+    reference: "bintayyash2021non"
+    documentation_url: "https://github.com/ManchesterBioinference/GPcounts/blob/master/demo_notebooks/GPcounts_spatial.ipynb"
+    repository_url: "https://github.com/ManchesterBioinference/GPcounts"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_tensorflow_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    interactive: false
+  - type: "python"
+    user: false
+    packages:
+    - "tensorflow-probability"
+    - "tensorflow[and-cuda]"
+    - "gpflow"
+    - "scipy==1.9.1"
+    upgrade: true
+  - type: "docker"
+    run:
+    - "git clone https://github.com/markvdw/RobustGP.git /opt/RobustGP && \\\ngit\
+      \ clone https://github.com/lzj1769/GPcounts.git /opt/GPcounts\n"
+  - type: "python"
+    user: false
+    packages:
+    - "/opt/RobustGP"
+    - "/opt/GPcounts"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "veryhightime"
+    - "midmem"
+    - "midcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/gpcounts/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/gpcounts"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/gpcounts/gpcounts"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/methods/gpcounts/gpcounts b/target/docker/spatially_variable_genes/methods/gpcounts/gpcounts
new file mode 100755
index 0000000000..b62419c4d8
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/gpcounts/gpcounts
@@ -0,0 +1,1038 @@
+#!/usr/bin/env bash
+
+# gpcounts 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="gpcounts"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "gpcounts 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+  echo ""
+  echo "    --n_features"
+  echo "        type: integer"
+  echo "        Number of features to include."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM openproblems/base_tensorflow_nvidia:1.0.0
+
+ENTRYPOINT []
+
+ 
+RUN apt-get update && \
+  DEBIAN_FRONTEND=noninteractive apt-get install -y git && \
+  rm -rf /var/lib/apt/lists/*
+
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "tensorflow-probability" "tensorflow[and-cuda]" "gpflow" "scipy==1.9.1"
+
+RUN git clone https://github.com/markvdw/RobustGP.git /opt/RobustGP && \
+git clone https://github.com/lzj1769/GPcounts.git /opt/GPcounts
+
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "/opt/RobustGP" "/opt/GPcounts"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/methods gpcounts"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:39Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-gpcounts-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "gpcounts 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_features)
+            [ -n "$VIASH_PAR_N_FEATURES" ] && ViashError Bad arguments for option \'--n_features\': \'$VIASH_PAR_N_FEATURES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_FEATURES="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_features. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_features=*)
+            [ -n "$VIASH_PAR_N_FEATURES" ] && ViashError Bad arguments for option \'--n_features=*\': \'$VIASH_PAR_N_FEATURES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_FEATURES=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/gpcounts:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/gpcounts:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/gpcounts:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/gpcounts:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_FEATURES" ]]; then
+  if ! [[ "$VIASH_PAR_N_FEATURES" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_features' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_DATA")" )
+  VIASH_PAR_INPUT_DATA=$(ViashAutodetectMount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/gpcounts:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/gpcounts:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/gpcounts:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-gpcounts-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import statsmodels.api as sm
+import statsmodels.formula.api as smf
+import pandas as pd
+import anndata as ad
+import scipy
+from GPcounts.RNA_seq_GP import rna_seq_gp
+import warnings
+warnings.filterwarnings('ignore')
+
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_features': $( if [ ! -z ${VIASH_PAR_N_FEATURES+x} ]; then echo "int(r'${VIASH_PAR_N_FEATURES//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run GPcounts')
+
+# Subset if required
+if par['n_features']:
+    adata = adata[:, :par['n_features']]
+
+counts = adata.layers["counts"]
+if scipy.sparse.issparse(counts): 
+    counts = counts.todense()
+
+# spatialx = [str(i) for i in adata.obsm['spatial'][:, 0]]
+# spatialy = [str(i) for i in adata.obsm['spatial'][:, 1]]
+
+# index_names = [i+'x'+j for i, j in zip(spatialx, spatialy)]
+# Y = pd.DataFrame(data=counts, index=index_names, columns=adata.var.index)
+
+# spatial_locations = pd.DataFrame(index=Y.index)
+# spatial_locations['x'] = Y.index.str.split('x').str.get(0).map(float)
+# spatial_locations['y'] = Y.index.str.split('x').str.get(1).map(float)
+
+# spatial_locations['total_counts'] = Y.sum(1)
+
+Y = pd.DataFrame(data=counts, 
+                index=adata.obs_names, 
+                columns=adata.var_names)
+
+spatial_locations = pd.DataFrame(data=adata.obsm['spatial'], 
+                                index=adata.obs_names, 
+                                columns=['x', 'y'])
+spatial_locations['total_counts'] = Y.sum(1)
+
+Y = Y.loc[spatial_locations.index]
+X = spatial_locations[['x', 'y']]
+
+scales = []
+for i in range(0, len(Y.columns)):
+    model = smf.glm(formula="Y.iloc[:,i]~0+spatial_locations['total_counts']", data=Y,
+                    family=sm.families.NegativeBinomial(sm.families.links.identity())).fit()
+    res = model.params[0]*spatial_locations['total_counts']
+    scales.append(res)
+scalesdf = pd.DataFrame(scales)
+scalesdf = scalesdf.T
+
+Y = Y.T
+X = X[['x', 'y']]
+
+sparse = True
+nb_scaled = True  # set the nb_scaled argument to True to pass the scale factors
+gene_name = Y.index
+likelihood = 'Negative_binomial'
+gp_counts = rna_seq_gp(
+    X, Y.loc[gene_name], sparse=sparse, M=250, scale=scalesdf, safe_mode=False)
+
+log_likelihood_ratio = gp_counts.One_sample_test(likelihood)
+
+df = gp_counts.calculate_FDR(log_likelihood_ratio)
+
+# save results
+df = df.loc[adata.var_names][['log_likelihood_ratio']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_PAR_INPUT_DATA=$(ViashStripAutomount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/methods/moran_i/.config.vsh.yaml b/target/docker/spatially_variable_genes/methods/moran_i/.config.vsh.yaml
new file mode 100644
index 0000000000..2332323b5b
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/moran_i/.config.vsh.yaml
@@ -0,0 +1,201 @@
+functionality:
+  name: "moran_i"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--coord_type_moran_i"
+    description: "Type of coordinate system. Valid options are \"grid\" for grid coordinates\
+      \ or \"generic\" for generic coordinates."
+    info: null
+    default:
+    - "generic"
+    required: false
+    choices:
+    - "grid"
+    - "generic"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Moran's I"
+    summary: "Moran's I is a measurement of spatial autocorrelation."
+    description: "The MoranI global spatial auto-correlation statistics evaluates\
+      \ whether features (i.e. genes) \nshows a pattern that is clustered, dispersed\
+      \ or random in the tissue are under consideration.\n"
+    preferred_normalization: "counts"
+    reference: "palla2022squidpy"
+    documentation_url: "https://squidpy.readthedocs.io/en/stable/api/squidpy.gr.spatial_autocorr.html"
+    repository_url: "https://github.com/scverse/squidpy"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "pandas"
+    - "squidpy==1.4.1"
+    - "matplotlib==3.8.3"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/moran_i/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/moran_i"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/moran_i/moran_i"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/methods/moran_i/moran_i b/target/docker/spatially_variable_genes/methods/moran_i/moran_i
new file mode 100755
index 0000000000..36aaa490a3
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/moran_i/moran_i
@@ -0,0 +1,994 @@
+#!/usr/bin/env bash
+
+# moran_i 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="moran_i"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "moran_i 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+  echo ""
+  echo "    --coord_type_moran_i"
+  echo "        type: string"
+  echo "        default: generic"
+  echo "        choices: [ grid, generic ]"
+  echo "        Type of coordinate system. Valid options are \"grid\" for grid coordinates"
+  echo "        or \"generic\" for generic coordinates."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM ghcr.io/openproblems-bio/base_python:1.0.4
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "pandas" "squidpy==1.4.1" "matplotlib==3.8.3"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/methods moran_i"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:42Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-moran_i-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "moran_i 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --coord_type_moran_i)
+            [ -n "$VIASH_PAR_COORD_TYPE_MORAN_I" ] && ViashError Bad arguments for option \'--coord_type_moran_i\': \'$VIASH_PAR_COORD_TYPE_MORAN_I\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_COORD_TYPE_MORAN_I="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --coord_type_moran_i. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --coord_type_moran_i=*)
+            [ -n "$VIASH_PAR_COORD_TYPE_MORAN_I" ] && ViashError Bad arguments for option \'--coord_type_moran_i=*\': \'$VIASH_PAR_COORD_TYPE_MORAN_I\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_COORD_TYPE_MORAN_I=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/moran_i:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/moran_i:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/moran_i:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/moran_i:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_COORD_TYPE_MORAN_I+x} ]; then
+  VIASH_PAR_COORD_TYPE_MORAN_I="generic"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# check whether value is belongs to a set of choices
+if [ ! -z "$VIASH_PAR_COORD_TYPE_MORAN_I" ]; then
+  VIASH_PAR_COORD_TYPE_MORAN_I_CHOICES=("grid:generic")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_COORD_TYPE_MORAN_I_CHOICES[*]}:" =~ ":$VIASH_PAR_COORD_TYPE_MORAN_I:" ]]; then
+    ViashError '--coord_type_moran_i' specified value of \'$VIASH_PAR_COORD_TYPE_MORAN_I\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_DATA")" )
+  VIASH_PAR_INPUT_DATA=$(ViashAutodetectMount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/moran_i:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/moran_i:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/moran_i:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-moran_i-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import warnings
+warnings.filterwarnings('ignore')
+
+import anndata as ad
+import squidpy as sq
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'coord_type_moran_i': $( if [ ! -z ${VIASH_PAR_COORD_TYPE_MORAN_I+x} ]; then echo "r'${VIASH_PAR_COORD_TYPE_MORAN_I//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run moranI', flush=True)
+sq.gr.spatial_neighbors(adata,
+                        coord_type=par['coord_type_moran_i'],
+                        delaunay=True)
+
+sq.gr.spatial_autocorr(adata,
+                       mode="moran",
+                       layer='normalized',
+                       n_perms=100,
+                       genes=adata.var_names)
+
+# save results
+df = adata.uns["moranI"]
+df = df.loc[adata.var_names][['I']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_PAR_INPUT_DATA=$(ViashStripAutomount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/methods/nnsvg/.config.vsh.yaml b/target/docker/spatially_variable_genes/methods/nnsvg/.config.vsh.yaml
new file mode 100644
index 0000000000..3f7f81f623
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/nnsvg/.config.vsh.yaml
@@ -0,0 +1,190 @@
+functionality:
+  name: "nnsvg"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "nnSVG"
+    summary: "nnSVG is based on nearest-neighbor Gaussian process (NNGP) models to\
+      \ estimate parameters in GPs"
+    description: "nnSVG identifies genes that vary in expression continuously across\
+      \ the entire tissue or within a priori defined \nspatial domains. It uses gene-specific\
+      \ estimates of length scale parameters within the Gaussian process models, \n\
+      and scales linearly with the number of spatial locations.\n"
+    preferred_normalization: "counts"
+    reference: "weber2023nnsvg"
+    documentation_url: "https://bioconductor.org/packages/release/bioc/vignettes/nnSVG/inst/doc/nnSVG.html"
+    repository_url: "https://github.com/lmweber/nnSVG"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_r:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "anndata"
+    - "dplyr"
+    bioc:
+    - "SpatialExperiment"
+    - "scran"
+    - "nnSVG"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/nnsvg/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/nnsvg"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/nnsvg/nnsvg"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/methods/nnsvg/nnsvg b/target/docker/spatially_variable_genes/methods/nnsvg/nnsvg
new file mode 100755
index 0000000000..d7059952ed
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/nnsvg/nnsvg
@@ -0,0 +1,996 @@
+#!/usr/bin/env bash
+
+# nnsvg 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="nnsvg"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "nnsvg 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM ghcr.io/openproblems-bio/base_r:1.0.4
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")' && \
+  Rscript -e 'if (!requireNamespace("SpatialExperiment", quietly = TRUE)) BiocManager::install("SpatialExperiment")' && \
+  Rscript -e 'if (!requireNamespace("scran", quietly = TRUE)) BiocManager::install("scran")' && \
+  Rscript -e 'if (!requireNamespace("nnSVG", quietly = TRUE)) BiocManager::install("nnSVG")' && \
+  Rscript -e 'remotes::install_cran(c("anndata", "dplyr"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/methods nnsvg"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:38Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-nnsvg-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "nnsvg 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/nnsvg:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/nnsvg:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/nnsvg:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/nnsvg:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_DATA")" )
+  VIASH_PAR_INPUT_DATA=$(ViashAutodetectMount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/nnsvg:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/nnsvg:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/nnsvg:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-nnsvg-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+suppressMessages(library(SpatialExperiment))
+suppressMessages(library(scran))
+suppressMessages(library(nnSVG))
+suppressMessages(library(anndata))
+suppressMessages(library(dplyr))
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_data" = $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_DATA" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+# VIASH END
+
+# load data
+cat('Load data')
+adata <- read_h5ad(par\$input_data)
+counts <- t(as.matrix(adata\$layers[['counts']]))
+    
+colnames(counts) <- adata\$obs_names
+rownames(counts) <- adata\$var_names
+    
+loc <- as.data.frame(adata\$obsm[['spatial']])
+
+row_data = adata\$var
+row_data\$gene_id = rownames(row_data)
+row_data\$feature_type = "Gene Expression"
+
+colnames(loc) <- c("x", "y")
+rownames(loc) <- colnames(counts)
+
+spe <- SpatialExperiment(
+    assays = list(counts = counts),
+    rowData = row_data,
+    colData = loc, 
+    spatialCoordsNames = c("x", "y"))
+
+# calculate logcounts (log-transformed normalized counts) using scran package
+# using library size factors
+spe <- computeLibraryFactors(spe)
+spe <- logNormCounts(spe)
+
+# run nnSVG
+if (!is.null(meta\$cpus)) {
+    n_cpus <- meta\$cpus
+} else {
+    n_cpus <- 1
+}
+
+cat('Run nnSVG')
+spe <- nnSVG(spe, n_threads=n_cpus)
+
+df <- as.data.frame(rowData(spe)) %>%
+    subset(select = c('feature_id', 'LR_stat'))
+
+colnames(df) <- c('feature_id', 'pred_spatial_var_score')
+rownames(df) <- NULL
+
+# save output
+cat("Write output AnnData to file\\n")
+output = anndata::AnnData(
+    shape = adata\$shape, 
+    var=df,
+    uns=list('dataset_id' = adata\$uns[['dataset_id']],
+             'method_id' =  meta[['functionality_name']]))
+
+anndata::write_h5ad(anndata = output, filename = par\$output)
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_PAR_INPUT_DATA=$(ViashStripAutomount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/methods/scgco/.config.vsh.yaml b/target/docker/spatially_variable_genes/methods/scgco/.config.vsh.yaml
new file mode 100644
index 0000000000..fab4b9adad
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/scgco/.config.vsh.yaml
@@ -0,0 +1,224 @@
+functionality:
+  name: "scgco"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "scGCO"
+    summary: "Identification of spatially variable genes with graph cuts."
+    description: "Single-cell gene expression data with positional information is\
+      \ critical to dissect \nmechanisms and architectures of multicellular organisms,\
+      \ but the potential is limited \nby the scalability of current data analysis\
+      \ strategies. Here, we present scGCO, \na method based on fast optimization\
+      \ of hidden Markov Random Fields with graph cuts \nto identify spatially variable\
+      \ genes. Comparing to existing methods, scGCO delivers \na superior performance\
+      \ with lower false positive rate and improved specificity, \nwhile demonstrates\
+      \ a more robust performance in the presence of noises. \nCritically, scGCO scales\
+      \ near linearly with inputs and demonstrates orders of \nmagnitude better running\
+      \ time and memory requirement than existing methods, \nand could represent a\
+      \ valuable solution when spatial transcriptomics data grows \ninto millions\
+      \ of data points and beyond..\n"
+    preferred_normalization: "counts"
+    reference: "zhang2022identification"
+    documentation_url: "https://github.com/WangPeng-Lab/scGCO/blob/master/code/Tutorial/scGCO_tutorial.ipynb"
+    repository_url: "https://github.com/WangPeng-Lab/scGCO"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.9.16"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    - "procps"
+    - "libhdf5-dev"
+    - "cmake"
+    - "gdal-bin"
+    - "libgdal-dev"
+    interactive: false
+  - type: "docker"
+    run:
+    - "pip install Cython==0.29.33 numpy==1.23.5 scipy==1.9.1\n"
+  - type: "docker"
+    run:
+    - "git clone https://github.com/lzj1769/scGCO_simple.git /opt/scGCO/scGCO_simple\n"
+  - type: "python"
+    user: false
+    packages:
+    - "h5py==3.8.0"
+    - "pandas==1.5.3"
+    - "parmap==1.6.0"
+    - "scanpy==1.9.3"
+    - "tqdm==4.65.0"
+    - "anndata==0.8.0"
+    - "matplotlib==3.7.1"
+    - "scikit-learn==1.2.2"
+    - "hdbscan"
+    - "seaborn==0.12.2"
+    - "pysal==2.0.0"
+    - "pygco==0.0.16"
+    - "shapely==2.0.1"
+    - "networkx==2.5"
+    - "scikit-image"
+    - "pyyaml"
+    - "requests"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/scgco/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/scgco"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/scgco/scgco"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/methods/scgco/scgco b/target/docker/spatially_variable_genes/methods/scgco/scgco
new file mode 100755
index 0000000000..128dd8001e
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/scgco/scgco
@@ -0,0 +1,986 @@
+#!/usr/bin/env bash
+
+# scgco 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="scgco"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "scgco 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM python:3.9.16
+
+ENTRYPOINT []
+
+ 
+RUN apt-get update && \
+  DEBIAN_FRONTEND=noninteractive apt-get install -y git procps libhdf5-dev cmake gdal-bin libgdal-dev && \
+  rm -rf /var/lib/apt/lists/*
+
+RUN pip install Cython==0.29.33 numpy==1.23.5 scipy==1.9.1
+
+RUN git clone https://github.com/lzj1769/scGCO_simple.git /opt/scGCO/scGCO_simple
+
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "h5py==3.8.0" "pandas==1.5.3" "parmap==1.6.0" "scanpy==1.9.3" "tqdm==4.65.0" "anndata==0.8.0" "matplotlib==3.7.1" "scikit-learn==1.2.2" "hdbscan" "seaborn==0.12.2" "pysal==2.0.0" "pygco==0.0.16" "shapely==2.0.1" "networkx==2.5" "scikit-image" "pyyaml" "requests"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/methods scgco"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:39Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-scgco-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "scgco 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/scgco:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/scgco:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/scgco:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/scgco:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_DATA")" )
+  VIASH_PAR_INPUT_DATA=$(ViashAutodetectMount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/scgco:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/scgco:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/scgco:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-scgco-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import warnings
+warnings.filterwarnings('ignore')
+
+import pandas as pd
+import anndata as ad
+import numpy as np
+import scipy
+import sys
+sys.path.append("/opt/scGCO")
+
+from scGCO_simple import *
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+counts = adata.layers["counts"]
+if scipy.sparse.issparse(counts): 
+    counts = counts.todense()
+
+data = pd.DataFrame(
+    counts,
+    columns=adata.var_names,
+    index=adata.obs_names
+)
+
+print('Run scGCO', flush=True)
+data_norm = normalize_count_cellranger(data)
+
+exp = data.iloc[:, 0]
+locs = adata.obsm['spatial'].copy()
+
+print('Create graph with weight', flush=True)
+cellGraph = create_graph_with_weight(locs, exp)
+gmmDict = gmm_model(data_norm)
+
+print('Identify spatial genes', flush=True)
+df = identify_spatial_genes(locs, data_norm, cellGraph, gmmDict)
+
+# save results
+df = df.loc[adata.var_names][['fdr']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+# Transform the values via -log10 to make sure a bigger score represents a 
+# higher spatial variation
+df['pred_spatial_var_score'] = -np.log10(df['pred_spatial_var_score'].tolist())
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_PAR_INPUT_DATA=$(ViashStripAutomount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/methods/sepal/.config.vsh.yaml b/target/docker/spatially_variable_genes/methods/sepal/.config.vsh.yaml
new file mode 100644
index 0000000000..5b4ee8cc51
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/sepal/.config.vsh.yaml
@@ -0,0 +1,216 @@
+functionality:
+  name: "sepal"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_neighs_sepal"
+    description: "Maximum number of neighbors of a node in the spatial graph. Valid\
+      \ options are 4 (square-grid) and 6 (hexagonal-grid)."
+    info: null
+    default:
+    - 6
+    required: false
+    choices:
+    - 4
+    - 6
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--coord_type_sepal"
+    description: "Type of coordinate system. Valid options are \"grid\" for grid coordinates\
+      \ or \"generic\" for generic coordinates."
+    info: null
+    default:
+    - "grid"
+    required: false
+    choices:
+    - "grid"
+    - "generic"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Sepal"
+    summary: "Sepal simulates diffusion of individual transcripts to extract genes\
+      \ with spatial patterns."
+    description: "This method assesses the degree of randomness exhibited by each\
+      \ transcript profile and rank them accordingly.\n"
+    preferred_normalization: "counts"
+    reference: "andersson2021sepal"
+    documentation_url: "https://squidpy.readthedocs.io/en/stable/api/squidpy.gr.sepal.html"
+    repository_url: "https://github.com/scverse/squidpy"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "pandas"
+    - "squidpy==1.4.1"
+    - "matplotlib==3.8.3"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/sepal/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/sepal"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/sepal/sepal"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/methods/sepal/sepal b/target/docker/spatially_variable_genes/methods/sepal/sepal
new file mode 100755
index 0000000000..9d7ca71c0a
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/sepal/sepal
@@ -0,0 +1,1030 @@
+#!/usr/bin/env bash
+
+# sepal 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="sepal"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "sepal 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+  echo ""
+  echo "    --max_neighs_sepal"
+  echo "        type: integer"
+  echo "        default: 6"
+  echo "        choices: [ 4, 6 ]"
+  echo "        Maximum number of neighbors of a node in the spatial graph. Valid"
+  echo "        options are 4 (square-grid) and 6 (hexagonal-grid)."
+  echo ""
+  echo "    --coord_type_sepal"
+  echo "        type: string"
+  echo "        default: grid"
+  echo "        choices: [ grid, generic ]"
+  echo "        Type of coordinate system. Valid options are \"grid\" for grid coordinates"
+  echo "        or \"generic\" for generic coordinates."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM ghcr.io/openproblems-bio/base_python:1.0.4
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "pandas" "squidpy==1.4.1" "matplotlib==3.8.3"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/methods sepal"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:42Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-sepal-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "sepal 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_neighs_sepal)
+            [ -n "$VIASH_PAR_MAX_NEIGHS_SEPAL" ] && ViashError Bad arguments for option \'--max_neighs_sepal\': \'$VIASH_PAR_MAX_NEIGHS_SEPAL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_NEIGHS_SEPAL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_neighs_sepal. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_neighs_sepal=*)
+            [ -n "$VIASH_PAR_MAX_NEIGHS_SEPAL" ] && ViashError Bad arguments for option \'--max_neighs_sepal=*\': \'$VIASH_PAR_MAX_NEIGHS_SEPAL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_NEIGHS_SEPAL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --coord_type_sepal)
+            [ -n "$VIASH_PAR_COORD_TYPE_SEPAL" ] && ViashError Bad arguments for option \'--coord_type_sepal\': \'$VIASH_PAR_COORD_TYPE_SEPAL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_COORD_TYPE_SEPAL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --coord_type_sepal. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --coord_type_sepal=*)
+            [ -n "$VIASH_PAR_COORD_TYPE_SEPAL" ] && ViashError Bad arguments for option \'--coord_type_sepal=*\': \'$VIASH_PAR_COORD_TYPE_SEPAL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_COORD_TYPE_SEPAL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/sepal:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/sepal:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/sepal:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/sepal:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_MAX_NEIGHS_SEPAL+x} ]; then
+  VIASH_PAR_MAX_NEIGHS_SEPAL="6"
+fi
+if [ -z ${VIASH_PAR_COORD_TYPE_SEPAL+x} ]; then
+  VIASH_PAR_COORD_TYPE_SEPAL="grid"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_MAX_NEIGHS_SEPAL" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_NEIGHS_SEPAL" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_neighs_sepal' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# check whether value is belongs to a set of choices
+if [ ! -z "$VIASH_PAR_MAX_NEIGHS_SEPAL" ]; then
+  VIASH_PAR_MAX_NEIGHS_SEPAL_CHOICES=("4:6")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_MAX_NEIGHS_SEPAL_CHOICES[*]}:" =~ ":$VIASH_PAR_MAX_NEIGHS_SEPAL:" ]]; then
+    ViashError '--max_neighs_sepal' specified value of \'$VIASH_PAR_MAX_NEIGHS_SEPAL\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+if [ ! -z "$VIASH_PAR_COORD_TYPE_SEPAL" ]; then
+  VIASH_PAR_COORD_TYPE_SEPAL_CHOICES=("grid:generic")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_COORD_TYPE_SEPAL_CHOICES[*]}:" =~ ":$VIASH_PAR_COORD_TYPE_SEPAL:" ]]; then
+    ViashError '--coord_type_sepal' specified value of \'$VIASH_PAR_COORD_TYPE_SEPAL\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_DATA")" )
+  VIASH_PAR_INPUT_DATA=$(ViashAutodetectMount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/sepal:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/sepal:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/sepal:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-sepal-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import squidpy as sq
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'max_neighs_sepal': $( if [ ! -z ${VIASH_PAR_MAX_NEIGHS_SEPAL+x} ]; then echo "int(r'${VIASH_PAR_MAX_NEIGHS_SEPAL//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'coord_type_sepal': $( if [ ! -z ${VIASH_PAR_COORD_TYPE_SEPAL+x} ]; then echo "r'${VIASH_PAR_COORD_TYPE_SEPAL//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Generate predictions', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+sq.gr.spatial_neighbors(adata,
+                        coord_type=par['coord_type_sepal'],
+                        delaunay=False)
+
+sq.gr.sepal(adata, 
+            layer='normalized',
+            max_neighs=par['max_neighs_sepal'], 
+            genes=adata.var_names,
+            n_jobs=1)
+
+# save results
+df = adata.uns["sepal_score"]
+df = df.loc[adata.var_names][['sepal_score']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_PAR_INPUT_DATA=$(ViashStripAutomount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/methods/somde/.config.vsh.yaml b/target/docker/spatially_variable_genes/methods/somde/.config.vsh.yaml
new file mode 100644
index 0000000000..bbf019ec14
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/somde/.config.vsh.yaml
@@ -0,0 +1,192 @@
+functionality:
+  name: "somde"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SOMDE"
+    summary: "SOMDE is a scalable method for identifying spatially variable genes\
+      \ with self-organizing map."
+    description: "SOMDE uses self-organizing map to cluster neighboring cells into\
+      \ nodes, and then uses a Gaussian process \nto fit the node-level spatial gene\
+      \ expression to identify SVgenes. Experiments show that SOMDE is about \n5 to\
+      \ 50 times faster than existing methods with comparable results. \nThe adjustable\
+      \ resolution of SOMDE makes it the only method that can give results in about\
+      \ \n5 min in large datasets of more than 20 000 sequencing sites.\n"
+    preferred_normalization: "counts"
+    reference: "hao2021somde"
+    documentation_url: "https://github.com/WhirlFirst/somde/blob/master/slide_seq0819_11_SOM.ipynb"
+    repository_url: "https://github.com/XuegongLab/somde"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "somde"
+    - "scanpy==1.9.8"
+    - "pandas==2.2.1"
+    - "numpy==1.26.4"
+    - "scipy==1.11.4"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/somde/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/somde"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/somde/somde"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/methods/somde/somde b/target/docker/spatially_variable_genes/methods/somde/somde
new file mode 100755
index 0000000000..caf9ef5233
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/somde/somde
@@ -0,0 +1,968 @@
+#!/usr/bin/env bash
+
+# somde 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="somde"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "somde 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM ghcr.io/openproblems-bio/base_python:1.0.4
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "somde" "scanpy==1.9.8" "pandas==2.2.1" "numpy==1.26.4" "scipy==1.11.4"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/methods somde"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:41Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-somde-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "somde 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/somde:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/somde:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/somde:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/somde:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_DATA")" )
+  VIASH_PAR_INPUT_DATA=$(ViashAutodetectMount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/somde:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/somde:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/somde:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-somde-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import pandas as pd
+import numpy as np
+import scanpy as sc
+from somde import SomNode
+import scipy
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run SOMDE', flush=True)
+counts = adata.layers["counts"]
+if scipy.sparse.issparse(counts): 
+    counts = counts.todense()
+    
+data = pd.DataFrame(
+    counts, 
+    columns=adata.var_names, 
+    index=adata.obs_names
+)
+
+X = pd.DataFrame(adata.obsm["spatial"], 
+                     index=adata.obs_names, 
+                     columns=["x", "y"]).values.astype(np.float32)
+    
+som = SomNode(X, k=10)
+ndf, ninfo = som.mtx(data.transpose())
+nres = som.norm() 
+
+df, SVnum = som.run()
+
+# save results
+df.set_index("g", inplace=True)
+df = df.loc[adata.var_names][['FSV']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_PAR_INPUT_DATA=$(ViashStripAutomount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/methods/spagcn/.config.vsh.yaml b/target/docker/spatially_variable_genes/methods/spagcn/.config.vsh.yaml
new file mode 100644
index 0000000000..4e5df5dc77
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/spagcn/.config.vsh.yaml
@@ -0,0 +1,206 @@
+functionality:
+  name: "spagcn"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SpaGCN"
+    summary: "Integrating gene expression, spatial location and histology to identify\
+      \ spatial domains and spatially variable genes by graph convolutional network."
+    description: "To elucidate spatial gene expression variation, we present SpaGCN,\
+      \ a graph convolutional \nnetwork approach that integrates gene expression,\
+      \ spatial location and histology in SRT data analysis. \nThrough graph convolution,\
+      \ SpaGCN aggregates gene expression of each spot from its neighboring spots,\
+      \ \nwhich enables the identification of spatial domains with coherent expression\
+      \ and histology. \nThe subsequent domain guided differential expression (DE)\
+      \ analysis then detects genes with \nenriched expression patterns in the identified\
+      \ domains. Analyzing seven SRT datasets using \nSpaGCN, we show it can detect\
+      \ genes with much more enriched spatial expression patterns than competing methods.\
+      \ Furthermore, genes detected by SpaGCN are transferrable and can be utilized\
+      \ to study spatial variation of gene expression in other datasets. SpaGCN is\
+      \ computationally \nfast, platform independent, making it a desirable tool for\
+      \ diverse SRT studies.\n"
+    preferred_normalization: "counts"
+    reference: "hu2021spagcn"
+    documentation_url: "https://github.com/jianhuupenn/SpaGCN/blob/master/tutorial/tutorial.ipynb"
+    repository_url: "https://github.com/jianhuupenn/SpaGCN"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    - "procps"
+    - "libhdf5-dev"
+    - "cmake"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone https://github.com/jianhuupenn/SpaGCN.git /opt/SpaGCN\n"
+  - type: "python"
+    user: false
+    packages:
+    - "numpy<2.0"
+    - "/opt/SpaGCN/SpaGCN_package"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spagcn/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/spagcn"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/spagcn/spagcn"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/methods/spagcn/spagcn b/target/docker/spatially_variable_genes/methods/spagcn/spagcn
new file mode 100755
index 0000000000..d652759c28
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/spagcn/spagcn
@@ -0,0 +1,1053 @@
+#!/usr/bin/env bash
+
+# spagcn 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="spagcn"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "spagcn 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM ghcr.io/openproblems-bio/base_python:1.0.4
+
+ENTRYPOINT []
+
+ 
+RUN apt-get update && \
+  DEBIAN_FRONTEND=noninteractive apt-get install -y git procps libhdf5-dev cmake && \
+  rm -rf /var/lib/apt/lists/*
+
+RUN git clone https://github.com/jianhuupenn/SpaGCN.git /opt/SpaGCN
+
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "numpy<2.0" "/opt/SpaGCN/SpaGCN_package"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/methods spagcn"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:41Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-spagcn-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "spagcn 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spagcn:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spagcn:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spagcn:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spagcn:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_DATA")" )
+  VIASH_PAR_INPUT_DATA=$(ViashAutodetectMount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spagcn:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spagcn:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spagcn:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-spagcn-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import SpaGCN as spg
+import pandas as pd
+import numpy as np
+import scanpy as sc
+import random
+import torch
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+# run normalization
+adata.X = adata.layers['counts'].copy()
+sc.pp.normalize_total(adata=adata)
+sc.pp.log1p(adata)
+
+print('Run SpaGCN', flush=True)
+random_seed = 100
+
+# Set seed
+random.seed(random_seed)
+torch.manual_seed(random_seed)
+np.random.seed(random_seed)
+
+p = 0.5
+min_in_group_fraction = 0
+min_in_out_group_ratio = 0
+min_fold_change = 0
+
+
+adj = spg.calculate_adj_matrix(
+    x=adata.obsm["spatial"][:, 0],
+    y=adata.obsm["spatial"][:, 1],
+    histology=False
+)
+l = spg.search_l(p, adj, start=0.01, end=1000, tol=0.01, max_run=100)
+
+clf = spg.SpaGCN()
+clf.set_l(l)
+
+# Run
+clf.train(
+    adata,
+    adj,
+    init_spa=True,
+    init="louvain",
+    res=0.5,
+    tol=5e-3,
+    lr=0.05,
+    max_epochs=200,
+)
+
+y_pred, prob = clf.predict()
+adata.obs["pred"] = y_pred
+de_genes_all = list()
+n_clusters = len(adata.obs["pred"].unique())
+
+# identify DE genes
+for target in range(n_clusters):
+    print(f"target: {target}")
+    start, end = np.quantile(adj[adj != 0], q=0.001), np.quantile(
+        adj[adj != 0], q=0.1
+    )
+    r = spg.search_radius(
+        target_cluster=target,
+        cell_id=adata.obs.index.tolist(),
+        x=adata.obsm["spatial"][:, 0],
+        y=adata.obsm["spatial"][:, 1],
+        pred=adata.obs["pred"].tolist(),
+        start=start,
+        end=end,
+        num_min=10,
+        num_max=14,
+        max_run=100,
+    )
+
+    try:
+        nbr_domians = spg.find_neighbor_clusters(
+            target_cluster=target,
+            cell_id=adata.obs.index.tolist(),
+            x=adata.obsm["spatial"][:, 0],
+            y=adata.obsm["spatial"][:, 1],
+            pred=adata.obs["pred"].tolist(),
+            radius=r,
+            ratio=0,
+        )
+
+        de_genes_info = spg.rank_genes_groups(
+            input_adata=adata,
+            target_cluster=target,
+            nbr_list=nbr_domians,
+            label_col="pred",
+            adj_nbr=True,
+            log=True,
+        )
+        de_genes_all.append(de_genes_info)
+    except (RuntimeError, TypeError, NameError):
+        pass
+
+if len(de_genes_all) == 0:
+    df = adata.var
+    df['pvals_adj'] = np.random.random(adata.n_vars)
+else:
+    df_res = pd.concat(de_genes_all)
+    df_res = df_res.groupby(["genes"]).min()
+    df_res = df_res.loc[adata.var_names]
+    df = pd.concat([df_res, adata.var], axis=1)
+
+# save results
+df = df.loc[adata.var_names][['pvals_adj']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+# reverse it to make sure a bigger score represents a higher spatial variation
+df['pred_spatial_var_score'] = -np.log10(df['pred_spatial_var_score'])
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_PAR_INPUT_DATA=$(ViashStripAutomount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/methods/spagft/.config.vsh.yaml b/target/docker/spatially_variable_genes/methods/spagft/.config.vsh.yaml
new file mode 100644
index 0000000000..f1b73ec467
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/spagft/.config.vsh.yaml
@@ -0,0 +1,216 @@
+functionality:
+  name: "spagft"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SpaGFT"
+    summary: "SpaGFT is a graph Fourier transform for tissue module identification\
+      \ from spatially resolved transcriptomics"
+    description: "The tissue module (TM) was defined as an architectural area containing\
+      \ recurrent cellular \ncommunities executing specific biological functions at\
+      \ different tissue sites. \nHowever, the computational identification of TMs\
+      \ poses challenges owing to their various \nlength scales, convoluted biological\
+      \ processes, not well-defined molecular features, and \nirregular spatial patterns.\
+      \ Here, we present a hypothesis-free graph Fourier transform model, \nSpaGFT,\
+      \ to characterize TMs. For the first time, SpaGFT transforms complex gene expression\
+      \ \npatterns into simple, but informative signals, leading to the accurate identification\
+      \ of \nspatially variable genes (SVGs) at a fast computational speed. Based\
+      \ on clustering the \ntransformed signals of the SVGs, SpaGFT provides a novel\
+      \ computational framework for TM \ncharacterization. Three case studies were\
+      \ used to illustrate TM identities, the biological \nprocesses of convoluted\
+      \ TMs in the lymph node, and conserved TMs across multiple samples constituting\
+      \ \nthe complex organ. The superior accuracy, scalability, and interpretability\
+      \ of SpaGFT indicate \nthat it is a novel and powerful tool for the investigation\
+      \ of TMs to gain new insights into a variety \nof biological questions.\n"
+    preferred_normalization: "counts"
+    reference: "chang2022spatial"
+    documentation_url: "https://spagft.readthedocs.io/en/latest/"
+    repository_url: "https://github.com/jxLiu-bio/SpaGFT"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.10"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    - "procps"
+    - "libhdf5-dev"
+    - "cmake"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone https://github.com/jxLiu-bio/SpaGFT.git /opt/SpaGFT\n"
+  - type: "python"
+    user: false
+    packages:
+    - "h5py"
+    - "numba==0.55.1"
+    - "louvain==0.7.1"
+    - "chardet==5.1.0"
+    - "charset-normalizer==3.1.0"
+    - "anndata"
+    - "/opt/SpaGFT"
+    - "mizani==0.9.3"
+    - "pyyaml"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spagft/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/spagft"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/spagft/spagft"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/methods/spagft/spagft b/target/docker/spatially_variable_genes/methods/spagft/spagft
new file mode 100755
index 0000000000..09e86020c1
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/spagft/spagft
@@ -0,0 +1,965 @@
+#!/usr/bin/env bash
+
+# spagft 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="spagft"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "spagft 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM python:3.10
+
+ENTRYPOINT []
+
+ 
+RUN apt-get update && \
+  DEBIAN_FRONTEND=noninteractive apt-get install -y git procps libhdf5-dev cmake && \
+  rm -rf /var/lib/apt/lists/*
+
+RUN git clone https://github.com/jxLiu-bio/SpaGFT.git /opt/SpaGFT
+
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "h5py" "numba==0.55.1" "louvain==0.7.1" "chardet==5.1.0" "charset-normalizer==3.1.0" "anndata" "/opt/SpaGFT" "mizani==0.9.3" "pyyaml"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/methods spagft"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:42Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-spagft-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "spagft 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spagft:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spagft:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spagft:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spagft:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_DATA")" )
+  VIASH_PAR_INPUT_DATA=$(ViashAutodetectMount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spagft:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spagft:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spagft:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-spagft-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import SpaGFT as spg
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run SpaGFT', flush=True)
+
+adata.X = adata.layers['normalized'].copy()
+
+adata.obs.loc[:, ['array_row', 'array_col']] = adata.obsm['spatial']
+
+(ratio_low, ratio_high) = spg.gft.determine_frequency_ratio(adata, ratio_neighbors=1)
+
+df = spg.detect_svg(adata,
+                    spatial_info=['array_row', 'array_col'],
+                    ratio_low_freq=ratio_low,
+                    ratio_high_freq=ratio_high,
+                    ratio_neighbors=1,
+                    filter_peaks=True,
+                    S=6)
+
+
+# save results
+df = df.loc[adata.var_names][['gft_score']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_PAR_INPUT_DATA=$(ViashStripAutomount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/methods/spanve/.config.vsh.yaml b/target/docker/spatially_variable_genes/methods/spanve/.config.vsh.yaml
new file mode 100644
index 0000000000..62adeade97
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/spanve/.config.vsh.yaml
@@ -0,0 +1,203 @@
+functionality:
+  name: "spanve"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Spanve"
+    summary: "Spanve is a non-parametric statistical approach based on modeling space\
+      \ dependence as a distance of two distributions for detecting SV genes."
+    description: "The depiction of in situ gene expression through spatial transcriptomics\
+      \ facilitates the inference of cell \nfunction mechanisms. To build spatial\
+      \ maps of transcriptomes, the first and crucial step is to \nidentify spatially\
+      \ variable (SV) genes. However, current methods fall short in dealing with \n\
+      large-scale spatial transcriptomics data and may result in a high false positive\
+      \ rate due to the \nmodeling of gene expression into parametric distributions.\
+      \ \nThis paper introduces Spanve (https://github.com/zjupgx/Spanve), a non-parametric\
+      \ statistical approach \nbased on modeling space dependence as a distance of\
+      \ two distributions for detecting SV genes. \nThe high computing efficiency\
+      \ and accuracy of Spanve is demonstrated through comprehensive benchmarking.\
+      \ \nAdditionally, Spanve can detect clustering-friendly SV genes and spatially\
+      \ variable co-expression, \nfacilitating the identification of spatial tissue\
+      \ domains by an imputation.   \n"
+    preferred_normalization: "counts"
+    reference: "cai2023spanve"
+    documentation_url: "https://github.com/zjupgx/Spanve/blob/main/tutorial.ipynb"
+    repository_url: "https://github.com/zjupgx/Spanve"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone https://github.com/gx-Cai/Spanve.git /opt/Spanve\n"
+  - type: "python"
+    user: false
+    packages:
+    - "/opt/Spanve"
+    - "numpy==1.26.4"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spanve/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/spanve"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/spanve/spanve"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/methods/spanve/spanve b/target/docker/spatially_variable_genes/methods/spanve/spanve
new file mode 100755
index 0000000000..a4a2d00828
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/spanve/spanve
@@ -0,0 +1,954 @@
+#!/usr/bin/env bash
+
+# spanve 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="spanve"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "spanve 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM ghcr.io/openproblems-bio/base_python:1.0.4
+
+ENTRYPOINT []
+
+ 
+RUN apt-get update && \
+  DEBIAN_FRONTEND=noninteractive apt-get install -y git && \
+  rm -rf /var/lib/apt/lists/*
+
+RUN git clone https://github.com/gx-Cai/Spanve.git /opt/Spanve
+
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "/opt/Spanve" "numpy==1.26.4"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/methods spanve"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:43Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-spanve-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "spanve 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spanve:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spanve:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spanve:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spanve:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_DATA")" )
+  VIASH_PAR_INPUT_DATA=$(ViashAutodetectMount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spanve:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spanve:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spanve:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-spanve-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+from Spanve import Spanve
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run Spanve', flush=True)
+adata.X = adata.layers['counts']
+spanve = Spanve(adata)
+spanve.fit(verbose=False)
+
+# save results
+df = spanve.result_df
+df = df.loc[adata.var_names][['ent']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_PAR_INPUT_DATA=$(ViashStripAutomount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/methods/spark/.config.vsh.yaml b/target/docker/spatially_variable_genes/methods/spark/.config.vsh.yaml
new file mode 100644
index 0000000000..bc7b1bf5db
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/spark/.config.vsh.yaml
@@ -0,0 +1,184 @@
+functionality:
+  name: "spark"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SPARK"
+    summary: "Spatial PAttern Recognition via Kernels"
+    description: "SPARK builds upon a generalized linear spatial model (GLSM) with\
+      \ a variety of spatial kernels to accommodate count data.\nWith a newly developed\
+      \ penalized quasi-likelihood (PQL) algorithm, SPARK is scalable to analyzing\
+      \ tens of \nthousands of genes across tens of thousands spatial locations.\n"
+    preferred_normalization: "counts"
+    reference: "sun2020statistical"
+    documentation_url: "https://xzhoulab.github.io/SPARK/02_SPARK_Example/"
+    repository_url: "https://github.com/xzhoulab/SPARK"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_r:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    github:
+    - "xzhoulab/SPARK"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "veryhightime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spark/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/spark"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/spark/spark"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/methods/spark/spark b/target/docker/spatially_variable_genes/methods/spark/spark
new file mode 100755
index 0000000000..962492ff66
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/spark/spark
@@ -0,0 +1,996 @@
+#!/usr/bin/env bash
+
+# spark 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="spark"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "spark 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM ghcr.io/openproblems-bio/base_r:1.0.4
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_github(c("xzhoulab/SPARK"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/methods spark"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:41Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-spark-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "spark 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spark:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spark:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spark:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spark:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_DATA")" )
+  VIASH_PAR_INPUT_DATA=$(ViashAutodetectMount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spark:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spark:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spark:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-spark-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+suppressMessages(library(SPARK))
+suppressMessages(library(anndata))
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_data" = $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_DATA" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+# VIASH END
+
+# load data
+cat("Load data\\n")
+adata <- anndata::read_h5ad(par\$input_data)
+counts <- t(as.matrix(adata\$layers[["counts"]]))
+colnames(counts) <- adata\$obs_names
+rownames(counts) <- adata\$var_names
+info <- as.data.frame(adata\$obsm[["spatial"]])
+rownames(info) <- colnames(counts)
+colnames(info) <- c("x", "y")
+
+# run SPARK
+cat("Run SPARK\\n")
+if (!is.null(meta\$cpus)) {
+    n_cpus <- meta\$cpus
+} else {
+    n_cpus <- 1
+}
+
+spark <- CreateSPARKObject(
+    counts = counts, percentage = 0,
+    min_total_counts = 0, location = info[, 1:2]
+)
+
+spark@lib_size <- apply(spark@counts, 2, sum)
+spark <- spark.vc(spark,
+    covariates = NULL,
+    lib_size = spark@lib_size,
+    num_core = n_cpus,
+    verbose = FALSE
+)
+
+## Calculating pval
+spark <- spark.test(spark,
+    check_positive = T,
+    verbose = F
+)
+
+df <- as.data.frame(spark@res_mtest)
+
+df\$feature_id <- rownames(df)
+
+df <- subset(df, select = c("feature_id", "adjusted_pvalue"))
+colnames(df) <- c("feature_id", "pred_spatial_var_score")
+
+# because SPARK only generates p-values, we here transform the values
+# via -log10 to make sure a bigger score represents a higher spatial variation
+df\$pred_spatial_var_score <- -log10(df\$pred_spatial_var_score)
+
+# save output
+cat("Write output AnnData to file\\n")
+output <- anndata::AnnData(
+    shape = adata\$shape,
+    var = df,
+    uns = list(
+        "dataset_id" = adata\$uns[["dataset_id"]],
+        "method_id" = meta[["functionality_name"]]
+    )
+)
+
+anndata::write_h5ad(anndata = output, filename = par\$output)
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_PAR_INPUT_DATA=$(ViashStripAutomount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/methods/spark_x/.config.vsh.yaml b/target/docker/spatially_variable_genes/methods/spark_x/.config.vsh.yaml
new file mode 100644
index 0000000000..c51cae3f46
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/spark_x/.config.vsh.yaml
@@ -0,0 +1,191 @@
+functionality:
+  name: "spark_x"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SPARK-X"
+    summary: "SPARK-X is a non-parametric method for rapid and effective detection\
+      \ of spatially expressed genes in large spatial transcriptomic studies."
+    description: "Spatial transcriptomic studies are becoming increasingly common\
+      \ and large, posing important \nstatistical and computational challenges for\
+      \ many analytic tasks. Here, we present SPARK-X, \na non-parametric method for\
+      \ rapid and effective detection of spatially expressed genes in large \nspatial\
+      \ transcriptomic studies. SPARK-X not only produces effective type I error control\
+      \ and \nhigh power but also brings orders of magnitude computational savings.\
+      \ We apply SPARK-X to \nanalyze three large datasets, one of which is only analyzable\
+      \ by SPARK-X. In these data, \nSPARK-X identifies many spatially expressed genes\
+      \ including those that are spatially \nexpressed within the same cell type,\
+      \ revealing new biological insights.\n"
+    preferred_normalization: "counts"
+    reference: "zhu2021spark"
+    documentation_url: "https://xzhoulab.github.io/SPARK/02_SPARK_Example/"
+    repository_url: "https://github.com/xzhoulab/SPARK"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_r:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    github:
+    - "xzhoulab/SPARK"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spark_x/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/spark_x"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/spark_x/spark_x"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/methods/spark_x/spark_x b/target/docker/spatially_variable_genes/methods/spark_x/spark_x
new file mode 100755
index 0000000000..6baf5d2fb0
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/spark_x/spark_x
@@ -0,0 +1,978 @@
+#!/usr/bin/env bash
+
+# spark_x 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="spark_x"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "spark_x 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM ghcr.io/openproblems-bio/base_r:1.0.4
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_github(c("xzhoulab/SPARK"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/methods spark_x"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:40Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-spark_x-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "spark_x 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spark_x:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spark_x:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spark_x:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spark_x:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_DATA")" )
+  VIASH_PAR_INPUT_DATA=$(ViashAutodetectMount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spark_x:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spark_x:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spark_x:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-spark_x-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+suppressMessages(library(SPARK))
+suppressMessages(library(anndata))
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_data" = $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_DATA" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+# VIASH END
+
+# load data
+cat("Load data\\n")
+adata <- anndata::read_h5ad(par\$input_data)
+counts <- t(as.matrix(adata\$layers[["counts"]]))
+colnames(counts) <- adata\$obs_names
+rownames(counts) <- adata\$var_names
+info <- as.data.frame(adata\$obsm[["spatial"]])
+rownames(info) <- colnames(counts)
+colnames(info) <- c("x", "y")
+
+# run SPARK-X
+cat("Load SPARK-X\\n")
+if (!is.null(meta\$cpus)) {
+    n_cpus <- meta\$cpus
+} else {
+    n_cpus <- 1
+}
+
+sparkX <- sparkx(counts, info[, 1:2], numCores = n_cpus, option = "mixture")
+
+df <- as.data.frame(sparkX\$res_mtest)
+df\$feature_id <- rownames(df)
+df <- subset(df, select = c("feature_id", "adjustedPval"))
+colnames(df) <- c("feature_id", "pred_spatial_var_score")
+rownames(df) <- NULL
+
+# because SPARK-X only generates p-values, we here transform the values
+# via -log10 to make sure a bigger score represents a higher spatial variation
+df\$pred_spatial_var_score <- -log10(df\$pred_spatial_var_score)
+
+# save output
+cat("Write output AnnData to file\\n")
+output <- anndata::AnnData(
+    shape = adata\$shape,
+    var = df,
+    uns = list(
+        "dataset_id" = adata\$uns[["dataset_id"]],
+        "method_id" = meta[["functionality_name"]]
+    )
+)
+
+anndata::write_h5ad(anndata = output, filename = par\$output)
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_PAR_INPUT_DATA=$(ViashStripAutomount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/methods/spatialde/.config.vsh.yaml b/target/docker/spatially_variable_genes/methods/spatialde/.config.vsh.yaml
new file mode 100644
index 0000000000..2f38cba338
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/spatialde/.config.vsh.yaml
@@ -0,0 +1,197 @@
+functionality:
+  name: "spatialde"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SpatialDE"
+    summary: "SpatialDE is a method for identify spatially variable genes based on\
+      \ Gaussian Process model "
+    description: "SpatialDE decomposes expression variability into spatial and nonspatial\
+      \ components using two random effect terms: a spatial variance term that parametrizes\
+      \ gene expression covariance by pairwise distances of samples, and a noise term\
+      \ that models nonspatial variability.\n"
+    preferred_normalization: "counts"
+    reference: "svensson2018spatialde"
+    documentation_url: "https://github.com/Teichlab/SpatialDE"
+    repository_url: "https://github.com/Teichlab/SpatialDE"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone https://github.com/Teichlab/SpatialDE.git /opt/SpatialDE\n"
+  - type: "python"
+    user: false
+    packages:
+    - "/opt/SpatialDE/Python-module"
+    - "scanpy==1.9.8"
+    - "pandas==2.2.1"
+    - "numpy==1.26.4"
+    - "scipy==1.11.4"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spatialde/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/spatialde"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/spatialde/spatialde"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/methods/spatialde/spatialde b/target/docker/spatially_variable_genes/methods/spatialde/spatialde
new file mode 100755
index 0000000000..33df382e64
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/spatialde/spatialde
@@ -0,0 +1,974 @@
+#!/usr/bin/env bash
+
+# spatialde 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="spatialde"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "spatialde 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM ghcr.io/openproblems-bio/base_python:1.0.4
+
+ENTRYPOINT []
+
+ 
+RUN apt-get update && \
+  DEBIAN_FRONTEND=noninteractive apt-get install -y git && \
+  rm -rf /var/lib/apt/lists/*
+
+RUN git clone https://github.com/Teichlab/SpatialDE.git /opt/SpatialDE
+
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "/opt/SpatialDE/Python-module" "scanpy==1.9.8" "pandas==2.2.1" "numpy==1.26.4" "scipy==1.11.4"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/methods spatialde"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:39Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-spatialde-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "spatialde 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spatialde:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spatialde:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spatialde:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spatialde:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_DATA")" )
+  VIASH_PAR_INPUT_DATA=$(ViashAutodetectMount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spatialde:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spatialde:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spatialde:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-spatialde-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import warnings
+warnings.filterwarnings('ignore')
+
+import scanpy as sc
+import anndata as ad
+import NaiveDE
+import SpatialDE
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+# run spatialDE
+print('Run spatialDE')
+sc.pp.calculate_qc_metrics(adata, 
+                           layer='counts', 
+                           inplace=True, 
+                           percent_top=[10])
+    
+counts = sc.get.obs_df(adata, 
+                       keys=list(adata.var_names), 
+                       use_raw=False, 
+                       layer='counts')
+
+total_counts = sc.get.obs_df(adata, keys=["total_counts"])
+norm_expr = NaiveDE.stabilize(counts.T).T
+resid_expr = NaiveDE.regress_out(total_counts, 
+                                 norm_expr.T, 
+                                 "np.log(total_counts)").T
+    
+df = SpatialDE.run(adata.obsm["spatial"], resid_expr)
+
+# save results
+df.set_index("g", inplace=True)
+df = df.loc[adata.var_names][['FSV']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_PAR_INPUT_DATA=$(ViashStripAutomount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/methods/spatialde2/.config.vsh.yaml b/target/docker/spatially_variable_genes/methods/spatialde2/.config.vsh.yaml
new file mode 100644
index 0000000000..8e69c59a1c
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/spatialde2/.config.vsh.yaml
@@ -0,0 +1,213 @@
+functionality:
+  name: "spatialde2"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SpatialDE2"
+    summary: "SpatialDE2: Fast and localized variance component analysis of spatial\
+      \ transcriptomics"
+    description: "Spatial transcriptomics is now a mature technology, allowing to\
+      \ assay gene expression changes \nin the histological context of complex tissues.\
+      \ A canonical analysis workflow starts with the \nidentification of tissue zones\
+      \ that share similar expression profiles, followed by the detection \nof highly\
+      \ variable or spatially variable genes. Rapid increases in the scale and complexity\
+      \ of \nspatial transcriptomic datasets demand that these analysis steps are\
+      \ conducted in a consistent \nand integrated manner, a requirement that is not\
+      \ met by current methods. To address this, we \nhere present SpatialDE2, which\
+      \ unifies the mapping of tissue zones and spatial variable gene \ndetection\
+      \ as integrated software framework, while at the same time advancing current\
+      \ algorithms \nfor both of these steps. Formulated in a Bayesian framework,\
+      \ the model accounts for the Poisson \ncount noise, while simultaneously offering\
+      \ superior computational speed compared to previous methods. \nWe validate SpatialDE2\
+      \ using simulated data and illustrate its utility in the context of two real-world\
+      \ \napplications to the spatial transcriptomics profiles of the mouse brain\
+      \ and human endometrium.\n"
+    preferred_normalization: "counts"
+    reference: "kats2021spatialde2"
+    documentation_url: "https://pmbio.github.io/SpatialDE/"
+    repository_url: "https://github.com/PMBio/SpatialDE"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.7.12"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    - "procps"
+    - "libhdf5-dev"
+    - "cmake"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone https://github.com/PMBio/SpatialDE.git /opt/SpatialDE2\n"
+  - type: "python"
+    user: false
+    packages:
+    - "scanpy"
+    - "anndata"
+    - "patsy"
+    - "/opt/SpatialDE2"
+    - "pyyaml"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spatialde2/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/spatialde2"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/methods/spatialde2/spatialde2"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/methods/spatialde2/spatialde2 b/target/docker/spatially_variable_genes/methods/spatialde2/spatialde2
new file mode 100755
index 0000000000..3e3a19bc01
--- /dev/null
+++ b/target/docker/spatially_variable_genes/methods/spatialde2/spatialde2
@@ -0,0 +1,972 @@
+#!/usr/bin/env bash
+
+# spatialde2 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="spatialde2"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "spatialde2 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM python:3.7.12
+
+ENTRYPOINT []
+
+ 
+RUN apt-get update && \
+  DEBIAN_FRONTEND=noninteractive apt-get install -y git procps libhdf5-dev cmake && \
+  rm -rf /var/lib/apt/lists/*
+
+RUN git clone https://github.com/PMBio/SpatialDE.git /opt/SpatialDE2
+
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "scanpy" "anndata" "patsy" "/opt/SpatialDE2" "pyyaml"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/methods spatialde2"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:40Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-spatialde2-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "spatialde2 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spatialde2:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spatialde2:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spatialde2:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spatialde2:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_DATA")" )
+  VIASH_PAR_INPUT_DATA=$(ViashAutodetectMount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spatialde2:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spatialde2:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/methods/spatialde2:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-spatialde2-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import scanpy as sc
+import anndata as ad
+import SpatialDE as sd
+import NaiveDE
+import warnings
+warnings.filterwarnings("ignore")
+
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+# run SpatialDE2
+print('Run spatialDE2', flush=True)
+adata.X = adata.layers['counts'].copy()
+sc.pp.calculate_qc_metrics(adata, inplace=True, percent_top=[10])
+
+counts = sc.get.obs_df(adata,
+                       keys=list(adata.var_names),
+                       use_raw=False,
+                       layer='counts')
+
+total_counts = sc.get.obs_df(adata, keys=["total_counts"])
+norm_expr = NaiveDE.stabilize(counts.T).T
+adata.X = NaiveDE.regress_out(
+    total_counts, norm_expr.T, "np.log(total_counts)").T
+
+# run SpatialDE2
+df = sd.fit(adata, normalized=True, control=None)
+df.set_index("gene", inplace=True)
+
+# save results
+df = df.loc[adata.var_names][['FSV']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ]; then
+  VIASH_PAR_INPUT_DATA=$(ViashStripAutomount "$VIASH_PAR_INPUT_DATA")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/metrics/correlation/.config.vsh.yaml b/target/docker/spatially_variable_genes/metrics/correlation/.config.vsh.yaml
new file mode 100644
index 0000000000..b6e7bcdcb1
--- /dev/null
+++ b/target/docker/spatially_variable_genes/metrics/correlation/.config.vsh.yaml
@@ -0,0 +1,241 @@
+functionality:
+  name: "correlation"
+  namespace: "spatially_variable_genes/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_method"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Anndata with true spatial variability."
+      description: "Anndata with true spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature (e.g., ESEMBL gene id suffixed\
+            \ with alpha value)."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: true
+        - type: "string"
+          name: "orig_feature_name"
+          description: "Original human-readable name for the feature, usually a gene\
+            \ symbol."
+          required: true
+        - type: "double"
+          name: "true_spatial_var_score"
+          description: "True spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: true
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file."
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "correlation"
+      label: "correlation"
+      summary: "Correlation represents the agreement of true and predicted spatial\
+        \ variability."
+      description: "Kendall rank correlation coefficient measures the ordinal association\
+        \ between two measured quantities. The best score and upper bound is 1 (observations\
+        \ have an identical rank), while the lower bound is -1 (observations have\
+        \ a completely different rank).\n"
+      reference: "kendall1938new"
+      documentation_url: "https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient"
+      repository_url: "https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html"
+      min: -1
+      max: 1
+      maximize: true
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A spatially variable genes identification metric."
+      description: "A metric for evaluating accuracy spatially variable genes identification\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "pandas"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/metrics/correlation/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/metrics/correlation"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/metrics/correlation/correlation"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/metrics/correlation/correlation b/target/docker/spatially_variable_genes/metrics/correlation/correlation
new file mode 100755
index 0000000000..3346cc04f6
--- /dev/null
+++ b/target/docker/spatially_variable_genes/metrics/correlation/correlation
@@ -0,0 +1,982 @@
+#!/usr/bin/env bash
+
+# correlation 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="correlation"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "correlation 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_method"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/score.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM ghcr.io/openproblems-bio/base_python:1.0.4
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "pandas"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/metrics correlation"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:40Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-correlation-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "correlation 2.0.0"
+            exit
+            ;;
+        --input_method)
+            [ -n "$VIASH_PAR_INPUT_METHOD" ] && ViashError Bad arguments for option \'--input_method\': \'$VIASH_PAR_INPUT_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_METHOD="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_method. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_method=*)
+            [ -n "$VIASH_PAR_INPUT_METHOD" ] && ViashError Bad arguments for option \'--input_method=*\': \'$VIASH_PAR_INPUT_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_METHOD=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/metrics/correlation:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/metrics/correlation:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/metrics/correlation:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/metrics/correlation:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_METHOD+x} ]; then
+  ViashError '--input_method' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_METHOD" ] && [ ! -e "$VIASH_PAR_INPUT_METHOD" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_METHOD' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT_METHOD" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_METHOD")" )
+  VIASH_PAR_INPUT_METHOD=$(ViashAutodetectMount "$VIASH_PAR_INPUT_METHOD")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT_SOLUTION")" )
+  VIASH_PAR_INPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/metrics/correlation:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/metrics/correlation:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/metrics/correlation:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-correlation-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import pandas as pd
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_method': $( if [ ! -z ${VIASH_PAR_INPUT_METHOD+x} ]; then echo "r'${VIASH_PAR_INPUT_METHOD//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_method = ad.read_h5ad(par['input_method'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Compute metrics', flush=True)
+df = pd.merge(input_method.var, input_solution.var, how='left', on='feature_id')
+groupby = df.groupby('orig_feature_name', observed=True)
+corr = groupby.apply(lambda x: x['pred_spatial_var_score'].corr(x['true_spatial_var_score'], method='kendall'))
+
+uns_metric_ids = [ 'correlation' ]
+uns_metric_values = [ corr.mean() ]
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  uns={
+    'dataset_id': input_method.uns['dataset_id'],
+    'method_id': input_method.uns['method_id'],
+    'metric_ids': uns_metric_ids,
+    'metric_values': uns_metric_values
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT_METHOD" ]; then
+  VIASH_PAR_INPUT_METHOD=$(ViashStripAutomount "$VIASH_PAR_INPUT_METHOD")
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  VIASH_PAR_INPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_INPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/process_dataset/select_reference/.config.vsh.yaml b/target/docker/spatially_variable_genes/process_dataset/select_reference/.config.vsh.yaml
new file mode 100644
index 0000000000..0affe489ff
--- /dev/null
+++ b/target/docker/spatially_variable_genes/process_dataset/select_reference/.config.vsh.yaml
@@ -0,0 +1,254 @@
+functionality:
+  name: "select_reference"
+  namespace: "spatially_variable_genes/process_dataset"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Common Dataset"
+      summary: "A subset of the common dataset."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "counts"
+          description: "Normalized expression values."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: false
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+    example:
+    - "resources_test/common/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--input_layer"
+    description: "Which layer to use as input."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Common Dataset"
+      summary: "A subset of the common dataset."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "counts"
+          description: "Normalized expression values."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: false
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+    example:
+    - "resources_test/common/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--coord_type_proc"
+    description: "How to create spatial graph to select reference genes."
+    info: null
+    default:
+    - "grid"
+    required: false
+    choices:
+    - "grid"
+    - "generic"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--num_features"
+    description: "The number of variable genes to select"
+    info: null
+    default:
+    - 200
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Compute SVG"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/mouse_brain_coronal_section1"
+    dest: "resources_test/common/mouse_brain_coronal_section1"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_processor"
+    type_info:
+      label: "select_reference"
+      description: "Computes the spatially variable genes scores and select certain\
+        \ number of SVGs as reference.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "squidpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/select_reference/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/process_dataset/select_reference"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/process_dataset/select_reference/select_reference"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/process_dataset/select_reference/select_reference b/target/docker/spatially_variable_genes/process_dataset/select_reference/select_reference
new file mode 100755
index 0000000000..a01f2e461b
--- /dev/null
+++ b/target/docker/spatially_variable_genes/process_dataset/select_reference/select_reference
@@ -0,0 +1,1032 @@
+#!/usr/bin/env bash
+
+# select_reference 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="select_reference"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "select_reference 2.0.0"
+  echo ""
+  echo "Compute SVG"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/common/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --input_layer"
+  echo "        type: string"
+  echo "        default: normalized"
+  echo "        Which layer to use as input."
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/common/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --coord_type_proc"
+  echo "        type: string"
+  echo "        default: grid"
+  echo "        choices: [ grid, generic ]"
+  echo "        How to create spatial graph to select reference genes."
+  echo ""
+  echo "    --num_features"
+  echo "        type: integer"
+  echo "        default: 200"
+  echo "        The number of variable genes to select"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM ghcr.io/openproblems-bio/base_python:1.0.4
+
+ENTRYPOINT []
+
+ 
+RUN pip install --upgrade pip && \
+  pip install --upgrade --no-cache-dir "squidpy"
+
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/process_dataset select_reference"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:40Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-select_reference-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "select_reference 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_layer)
+            [ -n "$VIASH_PAR_INPUT_LAYER" ] && ViashError Bad arguments for option \'--input_layer\': \'$VIASH_PAR_INPUT_LAYER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_LAYER="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_layer. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_layer=*)
+            [ -n "$VIASH_PAR_INPUT_LAYER" ] && ViashError Bad arguments for option \'--input_layer=*\': \'$VIASH_PAR_INPUT_LAYER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_LAYER=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --coord_type_proc)
+            [ -n "$VIASH_PAR_COORD_TYPE_PROC" ] && ViashError Bad arguments for option \'--coord_type_proc\': \'$VIASH_PAR_COORD_TYPE_PROC\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_COORD_TYPE_PROC="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --coord_type_proc. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --coord_type_proc=*)
+            [ -n "$VIASH_PAR_COORD_TYPE_PROC" ] && ViashError Bad arguments for option \'--coord_type_proc=*\': \'$VIASH_PAR_COORD_TYPE_PROC\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_COORD_TYPE_PROC=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --num_features)
+            [ -n "$VIASH_PAR_NUM_FEATURES" ] && ViashError Bad arguments for option \'--num_features\': \'$VIASH_PAR_NUM_FEATURES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_FEATURES="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --num_features. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --num_features=*)
+            [ -n "$VIASH_PAR_NUM_FEATURES" ] && ViashError Bad arguments for option \'--num_features=*\': \'$VIASH_PAR_NUM_FEATURES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_FEATURES=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/select_reference:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/select_reference:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/select_reference:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/select_reference:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_INPUT_LAYER+x} ]; then
+  VIASH_PAR_INPUT_LAYER="normalized"
+fi
+if [ -z ${VIASH_PAR_COORD_TYPE_PROC+x} ]; then
+  VIASH_PAR_COORD_TYPE_PROC="grid"
+fi
+if [ -z ${VIASH_PAR_NUM_FEATURES+x} ]; then
+  VIASH_PAR_NUM_FEATURES="200"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_NUM_FEATURES" ]]; then
+  if ! [[ "$VIASH_PAR_NUM_FEATURES" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--num_features' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# check whether value is belongs to a set of choices
+if [ ! -z "$VIASH_PAR_COORD_TYPE_PROC" ]; then
+  VIASH_PAR_COORD_TYPE_PROC_CHOICES=("grid:generic")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_COORD_TYPE_PROC_CHOICES[*]}:" =~ ":$VIASH_PAR_COORD_TYPE_PROC:" ]]; then
+    ViashError '--coord_type_proc' specified value of \'$VIASH_PAR_COORD_TYPE_PROC\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/select_reference:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/select_reference:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/select_reference:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-select_reference-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import squidpy as sq
+
+### VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_layer': $( if [ ! -z ${VIASH_PAR_INPUT_LAYER+x} ]; then echo "r'${VIASH_PAR_INPUT_LAYER//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'coord_type_proc': $( if [ ! -z ${VIASH_PAR_COORD_TYPE_PROC+x} ]; then echo "r'${VIASH_PAR_COORD_TYPE_PROC//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'num_features': $( if [ ! -z ${VIASH_PAR_NUM_FEATURES+x} ]; then echo "int(r'${VIASH_PAR_NUM_FEATURES//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+### VIASH END
+
+print(">> Load data", flush=True)
+adata = ad.read_h5ad(par['input'])
+
+print(">> Run Moran's I spatial autocorrelation", flush=True)
+sq.gr.spatial_neighbors(adata, 
+                        coord_type=par['coord_type_proc'], 
+                        delaunay=False)
+sq.gr.spatial_autocorr(adata, 
+                       layer="normalized",
+                       mode="moran", 
+                       n_perms=100, n_jobs=10, 
+                       genes=adata.var_names)
+
+n_svgs = par['num_features']
+sel_genes = (
+    adata.uns["moranI"]["I"].sort_values(ascending=False).head(n_svgs).index.tolist()
+)
+
+adata = adata[:, sel_genes]
+
+print(">> Writing data", flush=True)
+adata.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/process_dataset/simulate_svg/.config.vsh.yaml b/target/docker/spatially_variable_genes/process_dataset/simulate_svg/.config.vsh.yaml
new file mode 100644
index 0000000000..616d5d51df
--- /dev/null
+++ b/target/docker/spatially_variable_genes/process_dataset/simulate_svg/.config.vsh.yaml
@@ -0,0 +1,250 @@
+functionality:
+  name: "simulate_svg"
+  namespace: "spatially_variable_genes/process_dataset"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Common Dataset"
+      summary: "A subset of the common dataset."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "counts"
+          description: "Normalized expression values."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: false
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+    example:
+    - "resources_test/common/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Common Dataset"
+      summary: "A subset of the common dataset."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: true
+        - type: "string"
+          name: "orig_feature_id"
+          description: "Original unique identifier for the feature, usually a ENSEMBL\
+            \ gene id."
+          required: false
+        - type: "string"
+          name: "orig_feature_name"
+          description: "Original human-readable name for the feature, usually a gene\
+            \ symbol."
+          required: true
+        - type: "double"
+          name: "true_spatial_var_score"
+          description: "True spatial variability score"
+          required: true
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: true
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/simulated_dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--gp_k"
+    description: "Dimension of basis used for the Gaussian process smoother."
+    info:
+      test_default: 50
+    default:
+    - 500
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--select_top_variable_genes"
+    description: "Number of top variable genes to use for subsetting."
+    info: null
+    default:
+    - 50
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/common/mouse_brain_coronal_section1"
+    dest: "resources_test/common/mouse_brain_coronal_section1"
+  info:
+    type: "process_dataset"
+    type_info:
+      label: "Data processor"
+      summary: "A spatially variable genes simulator."
+      description: "Simulate spatially variable and spatially non-variable genes.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_r:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    github:
+    - "SONGDONGYUAN1994/scDesign3"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/simulate_svg/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/process_dataset/simulate_svg"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/process_dataset/simulate_svg/simulate_svg"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/process_dataset/simulate_svg/simulate_svg b/target/docker/spatially_variable_genes/process_dataset/simulate_svg/simulate_svg
new file mode 100755
index 0000000000..78732307d3
--- /dev/null
+++ b/target/docker/spatially_variable_genes/process_dataset/simulate_svg/simulate_svg
@@ -0,0 +1,1170 @@
+#!/usr/bin/env bash
+
+# simulate_svg 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="simulate_svg"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "simulate_svg 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/common/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/simulated_dataset.h5ad"
+  echo ""
+  echo "    --gp_k"
+  echo "        type: integer"
+  echo "        default: 500"
+  echo "        Dimension of basis used for the Gaussian process smoother."
+  echo ""
+  echo "    --select_top_variable_genes"
+  echo "        type: integer"
+  echo "        default: 50"
+  echo "        Number of top variable genes to use for subsetting."
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM ghcr.io/openproblems-bio/base_r:1.0.4
+
+ENTRYPOINT []
+
+ 
+RUN Rscript -e 'if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")' && \
+  Rscript -e 'remotes::install_github(c("SONGDONGYUAN1994/scDesign3"), repos = "https://cran.rstudio.com")'
+
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/process_dataset simulate_svg"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:39Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-simulate_svg-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "simulate_svg 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --gp_k)
+            [ -n "$VIASH_PAR_GP_K" ] && ViashError Bad arguments for option \'--gp_k\': \'$VIASH_PAR_GP_K\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GP_K="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --gp_k. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --gp_k=*)
+            [ -n "$VIASH_PAR_GP_K" ] && ViashError Bad arguments for option \'--gp_k=*\': \'$VIASH_PAR_GP_K\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GP_K=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --select_top_variable_genes)
+            [ -n "$VIASH_PAR_SELECT_TOP_VARIABLE_GENES" ] && ViashError Bad arguments for option \'--select_top_variable_genes\': \'$VIASH_PAR_SELECT_TOP_VARIABLE_GENES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SELECT_TOP_VARIABLE_GENES="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --select_top_variable_genes. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --select_top_variable_genes=*)
+            [ -n "$VIASH_PAR_SELECT_TOP_VARIABLE_GENES" ] && ViashError Bad arguments for option \'--select_top_variable_genes=*\': \'$VIASH_PAR_SELECT_TOP_VARIABLE_GENES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SELECT_TOP_VARIABLE_GENES=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/simulate_svg:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/simulate_svg:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/simulate_svg:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/simulate_svg:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_GP_K+x} ]; then
+  VIASH_PAR_GP_K="500"
+fi
+if [ -z ${VIASH_PAR_SELECT_TOP_VARIABLE_GENES+x} ]; then
+  VIASH_PAR_SELECT_TOP_VARIABLE_GENES="50"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_GP_K" ]]; then
+  if ! [[ "$VIASH_PAR_GP_K" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--gp_k' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_SELECT_TOP_VARIABLE_GENES" ]]; then
+  if ! [[ "$VIASH_PAR_SELECT_TOP_VARIABLE_GENES" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--select_top_variable_genes' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT")" )
+  VIASH_PAR_OUTPUT=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/simulate_svg:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/simulate_svg:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/simulate_svg:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-simulate_svg-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+requireNamespace("scDesign3", quietly = TRUE)
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("Matrix", quietly = TRUE)
+requireNamespace("SingleCellExperiment", quietly = TRUE)
+library(rlang)
+
+# set random seed
+set.seed(2024)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "gp_k" = $( if [ ! -z ${VIASH_PAR_GP_K+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_GP_K" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "select_top_variable_genes" = $( if [ ! -z ${VIASH_PAR_SELECT_TOP_VARIABLE_GENES+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_SELECT_TOP_VARIABLE_GENES" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Read AnnData\\n")
+adata <- anndata::read_h5ad(par\$input)
+
+cat("Transform into SCE\\n")
+df_loc <- as.data.frame(adata\$obsm[['spatial']])
+colnames(df_loc) <- c("spatial1", "spatial2")
+rownames(df_loc) <- adata\$obs_names
+
+ref_sce <- SingleCellExperiment::SingleCellExperiment(
+  list(counts = Matrix::t(adata\$layers[["counts"]])),
+  colData = df_loc
+)
+
+ref_sce
+
+# check the number of genes in reference object
+n_genes <- dim(ref_sce)[1]
+
+mu_formula <- paste0(
+  "s(spatial1, spatial2, bs = 'gp', k = ", par\$gp_k, ")"
+)
+
+if (n_genes > par\$select_top_variable_genes) {
+  cat("Select ", par\$select_top_variable_genes, " genes among ", n_genes, " reference genes ", "\\n", sep = "")
+
+  cat("Transform into scDesign3 data format\\n")
+  ref_data <- scDesign3::construct_data(
+    sce = ref_sce,
+    assay_use = "counts",
+    celltype = NULL,
+    pseudotime = NULL,
+    spatial = c("spatial1", "spatial2"),
+    other_covariates = NULL,
+    corr_by = "1"
+  )
+
+  cat("Fit regression models for each feature\\n")
+  ref_marginal <- scDesign3::fit_marginal(
+    data = ref_data,
+    predictor = "gene",
+    mu_formula = mu_formula,
+    sigma_formula = "1",
+    family_use = "nb",
+    parallelization = "pbmcmapply",
+    n_cores = 2L,
+    usebam = FALSE,
+    trace = TRUE
+  )
+
+  cat("Subset to the top variable genes\\n")
+  dev_explain <- sapply(ref_marginal, function(x) {
+    if (length(x\$fit) == 1 && is.na(x\$fit)) {
+      return(NA_real_)
+    }
+    summary(x\$fit)\$dev.expl
+  })
+  top_sel <- names(sort(dev_explain, decreasing = TRUE))[seq_len(par\$select_top_variable_genes)]
+} else {
+  top_sel <- adata\$var_names
+}
+
+ref_sce <- ref_sce[top_sel, ]
+var_subset <- adata\$var[top_sel, , drop = FALSE]
+
+cat("Transform subset matrix into scDesign3 data format\\n")
+ref_data <- scDesign3::construct_data(
+  sce = ref_sce,
+  assay_use = "counts",
+  celltype = NULL,
+  pseudotime = NULL,
+  spatial = c("spatial1", "spatial2"),
+  other_covariates = NULL,
+  corr_by = "1"
+)
+
+cat("Fit expression of each gene with GP model\\n")
+ref_marginal <- scDesign3::fit_marginal(
+  data = ref_data,
+  predictor = "gene",
+  mu_formula = mu_formula,
+  sigma_formula = "1",
+  family_use = "nb",
+  parallelization = "pbmcmapply",
+  n_cores = 2L,
+  usebam = FALSE,
+  trace = TRUE
+)
+
+cat("Fit a copula, obtain AIC and BIC\\n")
+ref_copula <- scDesign3::fit_copula(
+  sce = ref_sce,
+  assay_use = "counts",
+  marginal_list = ref_marginal,
+  family_use = "nb",
+  copula = "gaussian",
+  parallelization = "pbmcmapply",
+  n_cores = 2L,
+  input_data = ref_data\$dat
+)
+
+cat("Extract out the estimated parameters\\n")
+ref_para <- scDesign3::extract_para(
+  sce = ref_sce,
+  marginal_list = ref_marginal,
+  family_use = "nb",
+  new_covariate = ref_data\$newCovariate,
+  data = ref_data\$dat,
+  parallelization = "pbmcmapply",
+  n_cores = 2L
+)
+
+cat("Simulate the new count matrix\\n")
+# generate non-spatially variable mean values with shuffling
+shuffle_idx <- sample(nrow(ref_para\$mean_mat))
+non_de_mat <- ref_para\$mean_mat[shuffle_idx, ]
+
+# simulate data with varied spatial variability
+outputs <- lapply(seq(0, 1.0, 0.05), function(alpha){
+  cat("Simulate data with alpha = ", alpha, "\\n", sep = "")
+  counts <- scDesign3::simu_new(
+    sce = ref_sce,
+    mean_mat = alpha * ref_para\$mean_mat + (1 - alpha) * non_de_mat,
+    sigma_mat = ref_para\$sigma_mat,
+    zero_mat = ref_para\$zero_mat,
+    quantile_mat = NULL,
+    copula_list = ref_copula\$copula_list,
+    n_cores = 5L,
+    family_use = "nb",
+    input_data = ref_data\$dat,
+    new_covariate = ref_data\$newCovariate,
+    important_feature = rep(TRUE, nrow(ref_sce)),
+    filtered_gene = NULL
+  )
+
+  if ("feature_id" %in% names(var_subset)) {
+    new_var <- data.frame(
+      feature_id = paste0(var_subset\$feature_id, "_", alpha),
+      feature_name = paste0(var_subset\$feature_name, "_", alpha),
+      orig_feature_id = var_subset\$feature_id,
+      orig_feature_name = var_subset\$feature_name,
+      true_spatial_var_score = alpha
+    )
+    rownames(counts) <- new_var\$feature_id
+    rownames(new_var) <- new_var\$feature_id
+  } else {
+    new_var <- data.frame(
+      feature_id = paste0(var_subset\$feature_name, "_", alpha),
+      feature_name = paste0(var_subset\$feature_name, "_", alpha),
+      orig_feature_name = var_subset\$feature_name,
+      true_spatial_var_score = alpha
+    )
+    rownames(counts) <- new_var\$feature_name
+    rownames(new_var) <- new_var\$feature_name
+  }
+
+  list(
+    counts = Matrix::t(counts),
+    var = new_var
+  )
+})
+
+cat("Collecting final output\\n", sep = "")
+final_counts <- do.call(cbind, lapply(outputs, function(x) x\$counts))
+final_var <- do.call(rbind, lapply(outputs, function(x) x\$var))
+final_uns <- adata\$uns[c("dataset_id", "dataset_name", "dataset_description", "dataset_summary", "dataset_url", "dataset_organism", "dataset_reference")]
+
+output <- anndata::AnnData(
+  layers = list(counts = final_counts),
+  obs = adata\$obs,
+  var = final_var,
+  obsm = adata\$obsm,
+  uns = final_uns
+)
+
+zzz <- output\$write_h5ad(par\$output, compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT" ]; then
+  VIASH_PAR_OUTPUT=$(ViashStripAutomount "$VIASH_PAR_OUTPUT")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/process_dataset/split_dataset/.config.vsh.yaml b/target/docker/spatially_variable_genes/process_dataset/split_dataset/.config.vsh.yaml
new file mode 100644
index 0000000000..b9ef89918e
--- /dev/null
+++ b/target/docker/spatially_variable_genes/process_dataset/split_dataset/.config.vsh.yaml
@@ -0,0 +1,270 @@
+functionality:
+  name: "split_dataset"
+  namespace: "spatially_variable_genes/process_dataset"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Common Dataset"
+      summary: "A subset of the common dataset."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: true
+        - type: "string"
+          name: "orig_feature_id"
+          description: "Original unique identifier for the feature, usually a ENSEMBL\
+            \ gene id."
+          required: false
+        - type: "string"
+          name: "orig_feature_name"
+          description: "Original human-readable name for the feature, usually a gene\
+            \ symbol."
+          required: true
+        - type: "double"
+          name: "true_spatial_var_score"
+          description: "True spatial variability score"
+          required: true
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: true
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/simulated_dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_dataset"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_solution"
+    info:
+      label: "Solution"
+      summary: "Anndata with true spatial variability."
+      description: "Anndata with true spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature (e.g., ESEMBL gene id suffixed\
+            \ with alpha value)."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: true
+        - type: "string"
+          name: "orig_feature_name"
+          description: "Original human-readable name for the feature, usually a gene\
+            \ symbol."
+          required: true
+        - type: "double"
+          name: "true_spatial_var_score"
+          description: "True spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: true
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/subset_anndata.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  info:
+    type: "process_dataset"
+    type_info:
+      label: "Data processor"
+      summary: "A spatially variable genes dataset processor."
+      description: "Split the common dataset for the spatially_variable_genes task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/split_dataset/config.vsh.yaml"
+  platform: "docker"
+  output: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/process_dataset/split_dataset"
+  executable: "/home/runner/work/openproblems/openproblems/target/docker/spatially_variable_genes/process_dataset/split_dataset/split_dataset"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/docker/spatially_variable_genes/process_dataset/split_dataset/split_dataset b/target/docker/spatially_variable_genes/process_dataset/split_dataset/split_dataset
new file mode 100755
index 0000000000..b91fb9df05
--- /dev/null
+++ b/target/docker/spatially_variable_genes/process_dataset/split_dataset/split_dataset
@@ -0,0 +1,980 @@
+#!/usr/bin/env bash
+
+# split_dataset 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="split_dataset"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "split_dataset 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/simulated_dataset.h5ad"
+  echo ""
+  echo "    --output_dataset"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output_solution"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+}
+
+######## Helper functions for setting up Docker images for viash ########
+# expects: ViashDockerBuild
+
+# ViashDockerInstallationCheck: check whether Docker is installed correctly
+#
+# examples:
+#   ViashDockerInstallationCheck
+function ViashDockerInstallationCheck {
+  ViashDebug "Checking whether Docker is installed"
+  if [ ! command -v docker &> /dev/null ]; then
+    ViashCritical "Docker doesn't seem to be installed. See 'https://docs.docker.com/get-docker/' for instructions."
+    exit 1
+  fi
+
+  ViashDebug "Checking whether the Docker daemon is running"
+  save=$-; set +e
+  docker_version=$(docker version --format '{{.Client.APIVersion}}' 2> /dev/null)
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashCritical "Docker daemon does not seem to be running. Try one of the following:"
+    ViashCritical "- Try running 'dockerd' in the command line"
+    ViashCritical "- See https://docs.docker.com/config/daemon/"
+    exit 1
+  fi
+}
+
+# ViashDockerRemoteTagCheck: check whether a Docker image is available 
+# on a remote. Assumes `docker login` has been performed, if relevant.
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerRemoteTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerRemoteTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerRemoteTagCheck {
+  docker manifest inspect $1 > /dev/null 2> /dev/null
+}
+
+# ViashDockerLocalTagCheck: check whether a Docker image is available locally
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   docker pull python:latest
+#   ViashDockerLocalTagCheck python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerLocalTagCheck sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerLocalTagCheck {
+  [ -n "$(docker images -q $1)" ]
+}
+
+# ViashDockerPull: pull a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPull python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPull sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPull {
+  ViashNotice "Checking if Docker image is available at '$1'"
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker pull $1 && return 0 || return 1
+  else
+    save=$-; set +e
+    docker pull $1 2> /dev/null > /dev/null
+    out=$?
+    [[ $save =~ e ]] && set -e
+    if [ $out -ne 0 ]; then
+      ViashWarning "Could not pull from '$1'. Docker image doesn't exist or is not accessible."
+    fi
+    return $out
+  fi
+}
+
+# ViashDockerPush: push a Docker image
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# exit code $?        : whether or not the image was found
+# examples:
+#   ViashDockerPush python:latest
+#   echo $?                                     # returns '0'
+#   ViashDockerPush sdaizudceahifu
+#   echo $?                                     # returns '1'
+function ViashDockerPush {
+  ViashNotice "Pushing image to '$1'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker push $1
+    out=$?
+  else
+    docker push $1 2> /dev/null > /dev/null
+    out=$?
+  fi
+  [[ $save =~ e ]] && set -e
+  if [ $out -eq 0 ]; then
+    ViashNotice "Container '$VSHD_ID' push succeeded."
+  else
+    ViashError "Container '$VSHD_ID' push errored. You might not be logged in or have the necessary permissions."
+  fi
+  return $out
+}
+
+# ViashDockerPullElseBuild: pull a Docker image, else build it
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerPullElseBuild mynewcomponent
+function ViashDockerPullElseBuild {
+  save=$-; set +e
+  ViashDockerPull $1
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashDockerBuild $@
+  fi
+}
+
+# ViashDockerSetup: create a Docker image, according to specified docker setup strategy
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $2                  : docker setup strategy, see DockerSetupStrategy.scala
+# ViashDockerBuild    : a Bash function which builds a docker image, takes image identifier as argument.
+# examples:
+#   ViashDockerSetup mynewcomponent alwaysbuild
+function ViashDockerSetup {
+  VSHD_ID="$1"
+  VSHD_STRAT="$2"
+  if [ "$VSHD_STRAT" == "alwaysbuild" -o "$VSHD_STRAT" == "build" -o "$VSHD_STRAT" == "b" ]; then
+    ViashDockerBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspull" -o "$VSHD_STRAT" == "pull" -o "$VSHD_STRAT" == "p" ]; then
+    ViashDockerPull $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayspullelsebuild" -o "$VSHD_STRAT" == "pullelsebuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID --no-cache
+  elif [ "$VSHD_STRAT" == "alwayspullelsecachedbuild" -o "$VSHD_STRAT" == "pullelsecachedbuild" ]; then
+    ViashDockerPullElseBuild $VSHD_ID
+  elif [ "$VSHD_STRAT" == "alwayscachedbuild" -o "$VSHD_STRAT" == "cachedbuild" -o "$VSHD_STRAT" == "cb" ]; then
+    ViashDockerBuild $VSHD_ID
+  elif [[ "$VSHD_STRAT" =~ ^ifneedbe ]]; then
+    save=$-; set +e
+    ViashDockerLocalTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashInfo "Image $VSHD_ID already exists"
+    elif [ "$VSHD_STRAT" == "ifneedbebuild" ]; then
+      ViashDockerBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbecachedbuild" ]; then
+      ViashDockerBuild $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepull" ]; then
+      ViashDockerPull $VSHD_ID
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsebuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID --no-cache
+    elif [ "$VSHD_STRAT" == "ifneedbepullelsecachedbuild" ]; then
+      ViashDockerPullElseBuild $VSHD_ID
+    else
+      ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+      exit 1
+    fi
+  elif [ "$VSHD_STRAT" == "push" -o "$VSHD_STRAT" == "forcepush" -o "$VSHD_STRAT" == "alwayspush" ]; then
+    ViashDockerPush "$VSHD_ID"
+  elif [ "$VSHD_STRAT" == "pushifnotpresent" -o "$VSHD_STRAT" == "gentlepush" -o "$VSHD_STRAT" == "maybepush" ]; then
+    save=$-; set +e
+    ViashDockerRemoteTagCheck $VSHD_ID
+    outCheck=$?
+    [[ $save =~ e ]] && set -e
+    if [ $outCheck -eq 0 ]; then
+      ViashNotice "Container '$VSHD_ID' exists, doing nothing."
+    else
+      ViashNotice "Container '$VSHD_ID' does not yet exist."
+      ViashDockerPush "$VSHD_ID"
+    fi
+  elif [ "$VSHD_STRAT" == "donothing" -o "$VSHD_STRAT" == "meh" ]; then
+    ViashNotice "Skipping setup."
+  else
+    ViashError "Unrecognised Docker strategy: $VSHD_STRAT"
+    exit 1
+  fi
+}
+
+# ViashDockerCheckCommands: Check whether a docker container has the required commands
+#
+# $1                  : image identifier with format `[registry/]image[:tag]`
+# $@                  : commands to verify being present
+# examples:
+#   ViashDockerCheckCommands bash:4.0 bash ps foo
+function ViashDockerCheckCommands {
+  tag=$1
+  shift 1
+  commands="$@"
+  save=$-; set +e
+  missing=$(docker run --rm --entrypoint=sh $tag -c "for command in $commands; do command -v \$command >/dev/null 2>&1; if [ \$? -ne 0 ]; then echo \$command; exit 1; fi; done")
+  outCheck=$?
+  [[ $save =~ e ]] && set -e
+  if [ $outCheck -ne 0 ]; then
+  	ViashError "Docker container '$tag' does not contain command '$missing'."
+  	exit 1
+  fi
+}
+
+
+######## End of helper functions for setting up Docker images for viash ########
+
+# ViashDockerFile: print the dockerfile to stdout
+# return : dockerfile required to run this component
+# examples:
+#   ViashDockerFile
+function ViashDockerfile {
+  cat << 'VIASHDOCKER'
+FROM ghcr.io/openproblems-bio/base_python:1.0.4
+
+ENTRYPOINT []
+
+ 
+RUN :
+LABEL org.opencontainers.image.description="Companion container for running component spatially_variable_genes/process_dataset split_dataset"
+LABEL org.opencontainers.image.created="2024-09-08T05:02:39Z"
+LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems"
+LABEL org.opencontainers.image.revision="b782e93f596e0060953ca0260098b8ae2569194b"
+LABEL org.opencontainers.image.version="2.0.0"
+
+VIASHDOCKER
+}
+
+# ViashDockerBuild: build a docker container
+# $1              : image identifier with format `[registry/]image[:tag]`
+# exit code $?    : whether or not the image was built
+function ViashDockerBuild {
+  # create temporary directory to store dockerfile & optional resources in
+  tmpdir=$(mktemp -d "$VIASH_META_TEMP_DIR/dockerbuild-split_dataset-XXXXXX")
+  dockerfile="$tmpdir/Dockerfile"
+  function clean_up {
+    rm -rf "$tmpdir"
+  }
+  trap clean_up EXIT
+
+  # store dockerfile and resources
+  ViashDockerfile > $dockerfile
+
+  # Build the container
+  ViashNotice "Building container '$1' with Dockerfile"
+  ViashInfo "Running 'docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile'"
+  save=$-; set +e
+  if [ $VIASH_VERBOSITY -ge $VIASH_LOGCODE_INFO ]; then
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile
+  else
+    docker build -t $@ $VIASH_META_RESOURCES_DIR -f $dockerfile &> $tmpdir/docker_build.log
+  fi
+  out=$?
+  [[ $save =~ e ]] && set -e
+  if [ $out -ne 0 ]; then
+    ViashError "Error occurred while building container '$1'"
+    if [ $VIASH_VERBOSITY -lt $VIASH_LOGCODE_INFO ]; then
+      ViashError "Transcript: --------------------------------"
+      cat "$tmpdir/docker_build.log"
+      ViashError "End of transcript --------------------------"
+    fi
+    exit 1
+  fi
+  ViashDockerCheckCommands "$1" 'bash'
+}
+
+# ViashAbsolutePath: generate absolute path from relative path
+# borrowed from https://stackoverflow.com/a/21951256
+# $1     : relative filename
+# return : absolute path
+# examples:
+#   ViashAbsolutePath some_file.txt   # returns /path/to/some_file.txt
+#   ViashAbsolutePath /foo/bar/..     # returns /foo
+function ViashAbsolutePath {
+  local thePath
+  if [[ ! "$1" =~ ^/ ]]; then
+    thePath="$PWD/$1"
+  else
+    thePath="$1"
+  fi
+  echo "$thePath" | (
+    IFS=/
+    read -a parr
+    declare -a outp
+    for i in "${parr[@]}"; do
+      case "$i" in
+      ''|.) continue ;;
+      ..)
+        len=${#outp[@]}
+        if ((len==0)); then
+          continue
+        else
+          unset outp[$((len-1))]
+        fi
+        ;;
+      *)
+        len=${#outp[@]}
+        outp[$len]="$i"
+      ;;
+      esac
+    done
+    echo /"${outp[*]}"
+  )
+}
+# ViashAutodetectMount: auto configuring docker mounts from parameters
+# $1                  : The parameter value
+# returns             : New parameter
+# $VIASH_EXTRA_MOUNTS : Added another parameter to be passed to docker
+# examples:
+#   ViashAutodetectMount /path/to/bar      # returns '/viash_automount/path/to/bar'
+#   ViashAutodetectMountArg /path/to/bar   # returns '--volume="/path/to:/viash_automount/path/to"'
+function ViashAutodetectMount {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  if [ -z "$base_name" ]; then
+    echo "$mount_target"
+  else
+    echo "$mount_target/$base_name"
+  fi
+}
+function ViashAutodetectMountArg {
+  abs_path=$(ViashAbsolutePath "$1")
+  if [ -d "$abs_path" ]; then
+    mount_source="$abs_path"
+    base_name=""
+  else
+    mount_source=`dirname "$abs_path"`
+    base_name=`basename "$abs_path"`
+  fi
+  mount_target="/viash_automount$mount_source"
+  ViashDebug "ViashAutodetectMountArg $1 -> $mount_source -> $mount_target"
+  echo "--volume=\"$mount_source:$mount_target\""
+}
+function ViashStripAutomount {
+  abs_path=$(ViashAbsolutePath "$1")
+  echo "${abs_path#/viash_automount}"
+}
+# ViashExtractFlags: Retain leading flag
+# $1     : string with a possible leading flag
+# return : leading flag
+# examples:
+#   ViashExtractFlags --foo=bar  # returns --foo
+function ViashExtractFlags {
+  echo $1 | sed 's/=.*//'
+}
+# initialise variables
+VIASH_EXTRA_MOUNTS=()
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "split_dataset 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_dataset)
+            [ -n "$VIASH_PAR_OUTPUT_DATASET" ] && ViashError Bad arguments for option \'--output_dataset\': \'$VIASH_PAR_OUTPUT_DATASET\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_DATASET="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_dataset. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_dataset=*)
+            [ -n "$VIASH_PAR_OUTPUT_DATASET" ] && ViashError Bad arguments for option \'--output_dataset=*\': \'$VIASH_PAR_OUTPUT_DATASET\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_DATASET=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output_solution)
+            [ -n "$VIASH_PAR_OUTPUT_SOLUTION" ] && ViashError Bad arguments for option \'--output_solution\': \'$VIASH_PAR_OUTPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output_solution=*)
+            [ -n "$VIASH_PAR_OUTPUT_SOLUTION" ] && ViashError Bad arguments for option \'--output_solution=*\': \'$VIASH_PAR_OUTPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---setup)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$2"
+            shift 1
+            ;;
+        ---setup=*)
+            VIASH_MODE='docker_setup'
+            VIASH_DOCKER_SETUP_STRATEGY="$(ViashRemoveFlags "$1")"
+            shift 2
+            ;;
+        ---dockerfile)
+            ViashDockerfile
+            exit 0
+            ;;
+        ---v|---volume)
+            VIASH_EXTRA_MOUNTS+=("--volume='$2'")
+            shift 2
+            ;;
+        ---volume=*)
+            VIASH_EXTRA_MOUNTS+=("--volume='$(ViashRemoveFlags "$2")'")
+            shift 1
+            ;;
+        ---debug)
+            VIASH_MODE='docker_debug'
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+ViashDockerInstallationCheck
+
+if [ $VIASH_MODE == "docker_setup" ]; then
+  ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/split_dataset:2.0.0' "$VIASH_DOCKER_SETUP_STRATEGY"
+  exit 0
+fi
+ViashDockerSetup 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/split_dataset:2.0.0' ifneedbepullelsecachedbuild
+
+if [ $VIASH_MODE == "docker_debug" ]; then
+  ViashNotice "+ docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/split_dataset:2.0.0'"
+  docker run --entrypoint=bash -i --rm -v "$(pwd)":/pwd --workdir /pwd -t 'ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/split_dataset:2.0.0'
+  exit 0
+fi
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_DATASET+x} ]; then
+  ViashError '--output_dataset' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT_SOLUTION+x} ]; then
+  ViashError '--output_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT_DATASET" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_DATASET")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_DATASET")"
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT_SOLUTION")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT_SOLUTION")"
+fi
+
+# detect volumes from file arguments
+VIASH_CHOWN_VARS=()
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_INPUT")" )
+  VIASH_PAR_INPUT=$(ViashAutodetectMount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_DATASET" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_DATASET")" )
+  VIASH_PAR_OUTPUT_DATASET=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_DATASET")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_DATASET" )
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_PAR_OUTPUT_SOLUTION")" )
+  VIASH_PAR_OUTPUT_SOLUTION=$(ViashAutodetectMount "$VIASH_PAR_OUTPUT_SOLUTION")
+  VIASH_CHOWN_VARS+=( "$VIASH_PAR_OUTPUT_SOLUTION" )
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_RESOURCES_DIR")" )
+  VIASH_META_RESOURCES_DIR=$(ViashAutodetectMount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_EXECUTABLE")" )
+  VIASH_META_EXECUTABLE=$(ViashAutodetectMount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_CONFIG")" )
+  VIASH_META_CONFIG=$(ViashAutodetectMount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_EXTRA_MOUNTS+=( "$(ViashAutodetectMountArg "$VIASH_META_TEMP_DIR")" )
+  VIASH_META_TEMP_DIR=$(ViashAutodetectMount "$VIASH_META_TEMP_DIR")
+fi
+
+# get unique mounts
+VIASH_UNIQUE_MOUNTS=($(for val in "${VIASH_EXTRA_MOUNTS[@]}"; do echo "$val"; done | sort -u))
+
+# change file ownership
+function ViashPerformChown {
+  if (( ${#VIASH_CHOWN_VARS[@]} )); then
+    set +e
+    eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/split_dataset:2.0.0 -c "'chown $(id -u):$(id -g) --silent --recursive ${VIASH_CHOWN_VARS[@]}'"
+    set -e
+  fi
+}
+trap ViashPerformChown EXIT
+
+# helper function for filling in extra docker args
+VIASH_EXTRA_DOCKER_ARGS=""
+if [ ! -z "$VIASH_META_MEMORY_MB" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --memory=${VIASH_META_MEMORY_MB}m"
+fi
+if [ ! -z "$VIASH_META_CPUS" ]; then
+  VIASH_EXTRA_DOCKER_ARGS="$VIASH_EXTRA_DOCKER_ARGS --cpus=${VIASH_META_CPUS}"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: $(echo docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/split_dataset:2.0.0)"
+cat << VIASHEOF | eval docker run --entrypoint=bash -i --rm ${VIASH_UNIQUE_MOUNTS[@]} $VIASH_EXTRA_DOCKER_ARGS ghcr.io/openproblems-bio/openproblems/spatially_variable_genes/process_dataset/split_dataset:2.0.0
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-split_dataset-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import sys 
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_dataset': $( if [ ! -z ${VIASH_PAR_OUTPUT_DATASET+x} ]; then echo "r'${VIASH_PAR_OUTPUT_DATASET//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output_solution': $( if [ ! -z ${VIASH_PAR_OUTPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_OUTPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta['resources_dir'])
+from subset_anndata import read_config_slots_info, subset_anndata
+
+print(">> Load dataset", flush=True)
+adata = ad.read_h5ad(par["input"])
+
+print(">> Figuring out which data needs to be copied to which output file", flush=True)
+slot_info = read_config_slots_info(meta["config"])
+
+print(">> Create dataset for methods", flush=True)
+output_dataset = subset_anndata(adata, slot_info['output_dataset'])
+
+print(">> Create solution object for metrics", flush=True)
+output_solution = subset_anndata(adata, slot_info['output_solution'])
+
+print(">> Write to disk", flush=True)
+output_dataset.write_h5ad(par["output_dataset"])
+output_solution.write_h5ad(par["output_solution"])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# strip viash automount from file paths
+if [ ! -z "$VIASH_PAR_INPUT" ]; then
+  VIASH_PAR_INPUT=$(ViashStripAutomount "$VIASH_PAR_INPUT")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_DATASET" ]; then
+  VIASH_PAR_OUTPUT_DATASET=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_DATASET")
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ]; then
+  VIASH_PAR_OUTPUT_SOLUTION=$(ViashStripAutomount "$VIASH_PAR_OUTPUT_SOLUTION")
+fi
+if [ ! -z "$VIASH_META_RESOURCES_DIR" ]; then
+  VIASH_META_RESOURCES_DIR=$(ViashStripAutomount "$VIASH_META_RESOURCES_DIR")
+fi
+if [ ! -z "$VIASH_META_EXECUTABLE" ]; then
+  VIASH_META_EXECUTABLE=$(ViashStripAutomount "$VIASH_META_EXECUTABLE")
+fi
+if [ ! -z "$VIASH_META_CONFIG" ]; then
+  VIASH_META_CONFIG=$(ViashStripAutomount "$VIASH_META_CONFIG")
+fi
+if [ ! -z "$VIASH_META_TEMP_DIR" ]; then
+  VIASH_META_TEMP_DIR=$(ViashStripAutomount "$VIASH_META_TEMP_DIR")
+fi
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT_DATASET" ] && [ ! -e "$VIASH_PAR_OUTPUT_DATASET" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_DATASET' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_OUTPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_OUTPUT_SOLUTION" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/docker/spatially_variable_genes/process_dataset/split_dataset/subset_anndata.py b/target/docker/spatially_variable_genes/process_dataset/split_dataset/subset_anndata.py
new file mode 100644
index 0000000000..80bd160872
--- /dev/null
+++ b/target/docker/spatially_variable_genes/process_dataset/split_dataset/subset_anndata.py
@@ -0,0 +1,83 @@
+"""Helper functions related to subsetting AnnData objects based on the file format 
+specifications in the .config.vsh.yaml and slot mapping overrides."""
+
+def read_config_slots_info(config_file, slot_mapping = {}):
+    """Read the .config.vsh.yaml to find out which output slots need to be copied to which output file.
+    
+    Arguments:
+    config_file -- Path to the .config.vsh.yaml file (required).
+    slot_mapping -- Which slots to retain. Must be a dictionary whose keys are the names 
+      of the AnnData structs, and values is another dictionary with destination value
+      names as keys and source value names as values.
+      Example of slot_mapping: 
+        ```
+        slot_mapping = {
+          "layers": {
+            "counts": par["layer_counts"],
+          },
+          "obs": {
+            "cell_type": par["obs_cell_type"],
+            "batch": par["obs_batch"],
+          }
+        }
+        ```
+    """
+    import yaml
+    import re
+
+    # read output spec from yaml
+    with open(config_file, "r") as object_name:
+        config = yaml.safe_load(object_name)
+
+    output_struct_slots = {}
+
+    # fetch info on which slots should be copied to which file
+    for arg in config["functionality"]["arguments"]:
+        # argument is an output file with a slot specification
+        if arg["direction"] == "output" and arg.get("info", {}).get("slots"):
+            object_name = re.sub("--", "", arg["name"])
+            
+            struct_slots = arg['info']['slots']
+            out = {}
+            for (struct, slots) in struct_slots.items():
+                out_struct = {}
+                for slot in slots:
+                    # if slot_mapping[struct][slot['name']] exists, use that as the source slot name
+                    # otherwise use slot['name']
+                    source_slot = slot_mapping.get(struct, {}).get(slot["name"], slot["name"])
+                    out_struct[slot["name"]] = source_slot
+                out[struct] = out_struct
+
+            output_struct_slots[object_name] = out
+
+    return output_struct_slots
+
+# create new anndata objects according to api spec
+def subset_anndata(adata, slot_info):
+    """Create new anndata object according to slot info specifications.
+    
+    Arguments:
+    adata -- An AnnData object to subset (required)
+    slot_info -- Which slots to retain, typically one of the items in the output of read_config_slots_info.
+      Must be a dictionary whose keys are the names of the AnnData structs, and values is another 
+      dictionary with destination value names as keys and source value names as values. 
+      """
+    import pandas as pd
+    import anndata as ad
+
+    structs = ["layers", "obs", "var", "uns", "obsp", "obsm", "varp", "varm"]
+    kwargs = {}
+
+    for struct in structs:
+        slot_mapping = slot_info.get(struct, {})
+        data = {dest : getattr(adata, struct)[src] for (dest, src) in slot_mapping.items()}
+        if len(data) > 0:
+            if struct in ['obs', 'var']:
+                data = pd.concat(data, axis=1)
+            kwargs[struct] = data
+        elif struct in ['obs', 'var']:
+            # if no columns need to be copied, we still need an 'obs' and a 'var' 
+            # to help determine the shape of the adata
+            kwargs[struct] = getattr(adata, struct).iloc[:,[]]
+
+    return ad.AnnData(**kwargs)
\ No newline at end of file
diff --git a/target/native/common/create_component/.config.vsh.yaml b/target/native/common/create_component/.config.vsh.yaml
new file mode 100644
index 0000000000..9e0a886b3f
--- /dev/null
+++ b/target/native/common/create_component/.config.vsh.yaml
@@ -0,0 +1,172 @@
+functionality:
+  name: "create_component"
+  namespace: "common"
+  version: "2.0.0"
+  arguments:
+  - type: "string"
+    name: "--task"
+    description: "Which task the component will be added to."
+    info: null
+    example:
+    - "denoising"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--type"
+    description: "The type of component to create. Typically must be one of 'method',\
+      \ 'control_method' or 'metric'."
+    info: null
+    example:
+    - "metric"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--language"
+    description: "Which scripting language to use. Options are 'python', 'r'."
+    info: null
+    default:
+    - "python"
+    required: false
+    choices:
+    - "python"
+    - "r"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--name"
+    description: "Name of the new method, formatted in snake case."
+    info: null
+    example:
+    - "new_comp"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Path to the component directory. Suggested location is `src/<TASK>/<TYPE>s/<NAME>`."
+    info: null
+    default:
+    - "src/tasks/${VIASH_PAR_TASK}/${VIASH_PAR_TYPE}s/${VIASH_PAR_NAME}"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--api_file"
+    description: "Which API file to use. Defaults to `src/<TASK>/api/comp_<TYPE>.yaml`.\n\
+      In tasks with different subtypes of method, this location might not exist and\
+      \ you might need\nto manually specify a different API file to inherit from.\n"
+    info: null
+    default:
+    - "src/tasks/${VIASH_PAR_TASK}/api/comp_${VIASH_PAR_TYPE}.yaml"
+    must_exist: false
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--viash_yaml"
+    description: "Path to the project config file. Needed for knowing the relative\
+      \ location of a file to the project root.\n"
+    info: null
+    default:
+    - "_viash.yaml"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/read_and_merge_yaml.py"
+  description: "Create a component Viash component.\n\nUsage:\n```\nbin/create_component\
+    \ --task denoising --type method --language r --name foo\nbin/create_component\
+    \ --task denoising --type metric --language python --name bar\n```\n"
+  test_resources:
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  - type: "file"
+    path: "src"
+    dest: "openproblems/src"
+  - type: "file"
+    path: "_viash.yaml"
+    dest: "openproblems/_viash.yaml"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.10-slim"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "ruamel.yaml"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/create_component/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/common/create_component"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/common/create_component/create_component"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/common/create_component/create_component b/target/native/common/create_component/create_component
new file mode 100755
index 0000000000..2ac6171a16
--- /dev/null
+++ b/target/native/common/create_component/create_component
@@ -0,0 +1,1039 @@
+#!/usr/bin/env bash
+
+# create_component 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="create_component"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "create_component 2.0.0"
+  echo ""
+  echo "Create a component Viash component."
+  echo ""
+  echo "Usage:"
+  echo "\`\`\`"
+  echo "bin/create_component --task denoising --type method --language r --name foo"
+  echo "bin/create_component --task denoising --type metric --language python --name bar"
+  echo "\`\`\`"
+  echo ""
+  echo "Arguments:"
+  echo "    --task"
+  echo "        type: string"
+  echo "        example: denoising"
+  echo "        Which task the component will be added to."
+  echo ""
+  echo "    --type"
+  echo "        type: string"
+  echo "        example: metric"
+  echo "        The type of component to create. Typically must be one of 'method',"
+  echo "        'control_method' or 'metric'."
+  echo ""
+  echo "    --language"
+  echo "        type: string"
+  echo "        default: python"
+  echo "        choices: [ python, r ]"
+  echo "        Which scripting language to use. Options are 'python', 'r'."
+  echo ""
+  echo "    --name"
+  echo "        type: string"
+  echo "        example: new_comp"
+  echo "        Name of the new method, formatted in snake case."
+  echo ""
+  echo "    --output"
+  echo "        type: file, output, file must exist"
+  echo "        default:"
+  echo "src/tasks/\${VIASH_PAR_TASK}/\${VIASH_PAR_TYPE}s/\${VIASH_PAR_NAME}"
+  echo "        Path to the component directory. Suggested location is"
+  echo "        \`src/<TASK>/<TYPE>s/<NAME>\`."
+  echo ""
+  echo "    --api_file"
+  echo "        type: file"
+  echo "        default: src/tasks/\${VIASH_PAR_TASK}/api/comp_\${VIASH_PAR_TYPE}.yaml"
+  echo "        Which API file to use. Defaults to \`src/<TASK>/api/comp_<TYPE>.yaml\`."
+  echo "        In tasks with different subtypes of method, this location might not"
+  echo "        exist and you might need"
+  echo "        to manually specify a different API file to inherit from."
+  echo ""
+  echo "    --viash_yaml"
+  echo "        type: file, file must exist"
+  echo "        default: _viash.yaml"
+  echo "        Path to the project config file. Needed for knowing the relative"
+  echo "        location of a file to the project root."
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "create_component 2.0.0"
+            exit
+            ;;
+        --task)
+            [ -n "$VIASH_PAR_TASK" ] && ViashError Bad arguments for option \'--task\': \'$VIASH_PAR_TASK\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --task. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --task=*)
+            [ -n "$VIASH_PAR_TASK" ] && ViashError Bad arguments for option \'--task=*\': \'$VIASH_PAR_TASK\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --type)
+            [ -n "$VIASH_PAR_TYPE" ] && ViashError Bad arguments for option \'--type\': \'$VIASH_PAR_TYPE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TYPE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --type. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --type=*)
+            [ -n "$VIASH_PAR_TYPE" ] && ViashError Bad arguments for option \'--type=*\': \'$VIASH_PAR_TYPE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TYPE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --language)
+            [ -n "$VIASH_PAR_LANGUAGE" ] && ViashError Bad arguments for option \'--language\': \'$VIASH_PAR_LANGUAGE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LANGUAGE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --language. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --language=*)
+            [ -n "$VIASH_PAR_LANGUAGE" ] && ViashError Bad arguments for option \'--language=*\': \'$VIASH_PAR_LANGUAGE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_LANGUAGE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --name)
+            [ -n "$VIASH_PAR_NAME" ] && ViashError Bad arguments for option \'--name\': \'$VIASH_PAR_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NAME="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --name. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --name=*)
+            [ -n "$VIASH_PAR_NAME" ] && ViashError Bad arguments for option \'--name=*\': \'$VIASH_PAR_NAME\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NAME=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --api_file)
+            [ -n "$VIASH_PAR_API_FILE" ] && ViashError Bad arguments for option \'--api_file\': \'$VIASH_PAR_API_FILE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_API_FILE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --api_file. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --api_file=*)
+            [ -n "$VIASH_PAR_API_FILE" ] && ViashError Bad arguments for option \'--api_file=*\': \'$VIASH_PAR_API_FILE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_API_FILE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --viash_yaml)
+            [ -n "$VIASH_PAR_VIASH_YAML" ] && ViashError Bad arguments for option \'--viash_yaml\': \'$VIASH_PAR_VIASH_YAML\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VIASH_YAML="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --viash_yaml. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --viash_yaml=*)
+            [ -n "$VIASH_PAR_VIASH_YAML" ] && ViashError Bad arguments for option \'--viash_yaml=*\': \'$VIASH_PAR_VIASH_YAML\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VIASH_YAML=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_LANGUAGE+x} ]; then
+  VIASH_PAR_LANGUAGE="python"
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  VIASH_PAR_OUTPUT="src/tasks/${VIASH_PAR_TASK}/${VIASH_PAR_TYPE}s/${VIASH_PAR_NAME}"
+fi
+if [ -z ${VIASH_PAR_API_FILE+x} ]; then
+  VIASH_PAR_API_FILE="src/tasks/${VIASH_PAR_TASK}/api/comp_${VIASH_PAR_TYPE}.yaml"
+fi
+if [ -z ${VIASH_PAR_VIASH_YAML+x} ]; then
+  VIASH_PAR_VIASH_YAML="_viash.yaml"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_VIASH_YAML" ] && [ ! -e "$VIASH_PAR_VIASH_YAML" ]; then
+  ViashError "Input file '$VIASH_PAR_VIASH_YAML' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# check whether value is belongs to a set of choices
+if [ ! -z "$VIASH_PAR_LANGUAGE" ]; then
+  VIASH_PAR_LANGUAGE_CHOICES=("python:r")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_LANGUAGE_CHOICES[*]}:" =~ ":$VIASH_PAR_LANGUAGE:" ]]; then
+    ViashError '--language' specified value of \'$VIASH_PAR_LANGUAGE\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-create_component-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+from typing import Any
+from pathlib import Path
+import sys
+import os
+import re
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'task': $( if [ ! -z ${VIASH_PAR_TASK+x} ]; then echo "r'${VIASH_PAR_TASK//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'type': $( if [ ! -z ${VIASH_PAR_TYPE+x} ]; then echo "r'${VIASH_PAR_TYPE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'language': $( if [ ! -z ${VIASH_PAR_LANGUAGE+x} ]; then echo "r'${VIASH_PAR_LANGUAGE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'name': $( if [ ! -z ${VIASH_PAR_NAME+x} ]; then echo "r'${VIASH_PAR_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'api_file': $( if [ ! -z ${VIASH_PAR_API_FILE+x} ]; then echo "r'${VIASH_PAR_API_FILE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'viash_yaml': $( if [ ! -z ${VIASH_PAR_VIASH_YAML+x} ]; then echo "r'${VIASH_PAR_VIASH_YAML//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# import helper function
+sys.path.append(meta["resources_dir"])
+from read_and_merge_yaml import read_and_merge_yaml
+
+def strip_margin(text: str) -> str:
+  return re.sub("(^|\\n)[ \\t]*\\|", "\\\\1", text)
+
+def create_config(par, component_type, pretty_name, script_path) -> str:
+  info_str = generate_info(par, component_type, pretty_name)
+  resources_str = generate_resources(par, script_path)
+  docker_platform = generate_docker_platform(par)
+
+  return strip_margin(f'''\\
+    |# The API specifies which type of component this is.
+    |# It contains specifications for:
+    |#   - The input/output files
+    |#   - Common parameters
+    |#   - A unit test
+    |__merge__: {os.path.relpath(par["api_file"], par["output"])}
+    |
+    |functionality:
+    |  # A unique identifier for your component (required).
+    |  # Can contain only lowercase letters or underscores.
+    |  name: {par["name"]}
+    |
+    |  # Metadata for your component
+    |  info:
+    |{info_str}
+    |  # Component-specific parameters (optional)
+    |  # arguments:
+    |  #   - name: "--n_neighbors"
+    |  #     type: "integer"
+    |  #     default: 5
+    |  #     description: Number of neighbors to use.
+    |
+    |  # Resources required to run the component
+    |  resources:
+    |{resources_str}
+    |platforms:
+    |  # Specifications for the Docker image for this component.
+    |{docker_platform}
+    |  # This platform allows running the component natively
+    |  - type: native
+    |  # Allows turning the component into a Nextflow module / pipeline.
+    |  - type: nextflow
+    |    directives:
+    |      label: [midtime,midmem, midcpu]
+    |'''
+  )
+
+def generate_info(par, component_type, pretty_name) -> str:
+  """Generate the functionality info for a component."""
+  if component_type in ["method", "control_method"]:
+    str = strip_margin(f'''\\
+      |    # A relatively short label, used when rendering visualisarions (required)
+      |    label: {pretty_name}
+      |    # A one sentence summary of how this method works (required). Used when 
+      |    # rendering summary tables.
+      |    summary: "FILL IN: A one sentence summary of this method."
+      |    # A multi-line description of how this component works (required). Used
+      |    # when rendering reference documentation.
+      |    description: |
+      |      FILL IN: A (multi-line) description of how this method works.
+      |    # Which normalisation method this component prefers to use (required).
+      |    preferred_normalization: log_cp10k
+      |''')
+    if component_type == "method":
+      str += strip_margin(f'''\\
+        |    # A reference key from the bibtex library at src/common/library.bib (required).
+        |    reference: bibtex_reference_key
+        |    # URL to the documentation for this method (required).
+        |    documentation_url: https://url.to/the/documentation
+        |    # URL to the code repository for this method (required).
+        |    repository_url: https://github.com/organisation/repository
+        |''')
+    return str
+  elif component_type == "metric":
+    return strip_margin(f'''\\
+      |    metrics:
+      |      # A unique identifier for your metric (required).
+      |      # Can contain only lowercase letters or underscores.
+      |      name: {par["name"]}
+      |      # A relatively short label, used when rendering visualisarions (required)
+      |      label: {pretty_name}
+      |      # A one sentence summary of how this metric works (required). Used when 
+      |      # rendering summary tables.
+      |      summary: "FILL IN: A one sentence summary of this metric."
+      |      # A multi-line description of how this component works (required). Used
+      |      # when rendering reference documentation.
+      |      description: |
+      |        FILL IN: A (multi-line) description of how this metric works.
+      |      # A reference key from the bibtex library at src/common/library.bib (required).
+      |      reference: bibtex_reference_key
+      |      # URL to the documentation for this metric (required).
+      |      documentation_url: https://url.to/the/documentation
+      |      # URL to the code repository for this metric (required).
+      |      repository_url: https://github.com/organisation/repository
+      |      # The minimum possible value for this metric (required)
+      |      min: 0
+      |      # The maximum possible value for this metric (required)
+      |      max: 1
+      |      # Whether a higher value represents a 'better' solution (required)
+      |      maximize: true
+      |''')
+
+
+def generate_resources(par, script_path) -> str:
+  """Add the script to the functionality resources."""
+  if par["language"] == "python":
+    type_str = "python_script"
+  elif par["language"] == "r":
+    type_str = "r_script"
+
+  return strip_margin(f'''\\
+    |    # The script of your component (required)
+    |    - type: {type_str}
+    |      path: {script_path}
+    |    # Additional resources your script needs (optional)
+    |    # - type: file
+    |    #   path: weights.pt
+    |''')
+
+def generate_docker_platform(par) -> str:
+  """Set up the docker platform for Python."""
+  if par["language"] == "python":
+    image_str = "openproblems/base_python:1.0.0"
+    setup_type = "python"
+    package_example = "scib==1.1.5"
+  elif par["language"] == "r":
+    image_str = "openproblems/base_r:1.0.0"
+    setup_type = "r"
+    package_example = "tidyverse"
+  return strip_margin(f'''\\
+    |  - type: docker
+    |    image: {image_str}
+    |    # Add custom dependencies here (optional). For more information, see
+    |    # https://viash.io/reference/config/platforms/docker/#setup .
+    |    # setup:
+    |    #   - type: {setup_type}
+    |    #     packages: {package_example}
+    |''')
+
+def set_par_values(config) -> None:
+  """Adds values to each of the arguments in a config file."""
+  args = config['functionality']['arguments']
+  for argi, arg in enumerate(args):
+    key = re.sub("^-*", "", arg['name'])
+
+    # find value
+    if arg["type"] != "file":
+      value = arg.get("default", arg.get("example", "..."))
+    elif arg.get("direction", "input") == "input":
+      key_strip = key.replace("input_", "")
+      value = f'resources_test/{par["task"]}/pancreas/{key_strip}.h5ad'
+    else:
+      key_strip = key.replace("output_", "")
+      value = f'{key_strip}.h5ad'
+
+    # store key and value
+    config['functionality']['arguments'][argi]["key"] = key
+    config['functionality']['arguments'][argi]["value"] = value
+  
+def look_for_adata_arg(args, uns_field):
+  """Look for an argument that has a .uns[uns_field] in its info.slots."""
+  for arg in args:
+    uns = arg.get("info", {}).get("slots", {}).get("uns", [])
+    for unval in uns:
+      if unval.get("name") == uns_field:
+        return arg["key"]
+  return "adata"
+
+def write_output_python(arg, copy_from_adata, is_metric):
+  """Create code for writing the output h5ad files."""
+  slots = arg.get("info", {}).get("slots", {})
+  outer = []
+  for group_name, slots in slots.items():
+    inner = []
+    for slot in slots:
+      if group_name == "uns" and slot["name"] in ["dataset_id", "normalization_id"]:
+        value = f"{copy_from_adata}.uns['{slot['name']}']"
+      elif group_name == "uns" and slot["name"] == "method_id":
+        if is_metric:
+          value = f"{copy_from_adata}.uns['{slot['name']}']"
+        else:
+          value = "meta['functionality_name']"
+      else:
+        value = group_name + "_" + slot["name"]
+      inner.append(f"'{slot['name']}': {value}")
+    inner_values = ',\\n    '.join(inner)
+    outer.append(f"{group_name}={{\\n    {inner_values}\\n  }}")
+  outer_values = ',\\n  '.join(outer)
+  return strip_margin(
+    f'''\\
+      |print("Write {arg["key"]} AnnData to file", flush=True)
+      |{arg["key"]} = ad.AnnData(
+      |  {outer_values}
+      |)
+      |{arg["key"]}.write_h5ad(par['{arg["key"]}'], compression='gzip')'''
+  )
+
+def write_output_r(arg, copy_from_adata, is_metric):
+  """Create code for writing the output h5ad files."""
+  slots = arg.get("info", {}).get("slots", {})
+  outer = []
+  for group_name, slots in slots.items():
+    inner = []
+    for slot in slots:
+      if group_name == "uns" and slot["name"] in ["dataset_id", "normalization_id"]:
+        value = f"{copy_from_adata}\$uns[[\\"{slot['name']}\\"]]"
+      elif group_name == "uns" and slot["name"] == "method_id":
+        if is_metric:
+          value = f"{copy_from_adata}\$uns[[\\"{slot['name']}\\"]]"
+        else:
+          value = "meta[[\\"functionality_name\\"]]"
+      else:
+        value = group_name + "_" + slot["name"]
+      inner.append(f"{slot['name']} = {value}")
+    inner_values = ',\\n    '.join(inner)
+    outer.append(f"{group_name} = list(\\n    {inner_values}\\n  )")
+  outer_values = ',\\n  '.join(outer)
+  return strip_margin(
+    f'''\\
+      |cat("Write {arg["key"]} AnnData to file\\\\n")
+      |{arg["key"]} <- anndata::AnnData(
+      |  {outer_values}
+      |)
+      |{arg["key"]}\$write_h5ad(par[["{arg["key"]}"]], compression = "gzip")'''
+  )
+
+def create_python_script(par, config, type):
+  args = config['functionality']['arguments']
+
+  # create the arguments of the par string
+  par_string = ",\\n  ".join(f"'{arg['key']}': '{arg['value']}'" for arg in args)
+
+  # create code for reading the input h5ad file
+  read_h5ad_string = "\\n".join(
+    f"{arg['key']} = ad.read_h5ad(par['{arg['key']}'])"
+    for arg in args
+    if arg['type'] == "file"
+    and arg.get('direction', "input") == "input"
+  )
+
+  # determine which adata to copy from
+  copy_from_adata = look_for_adata_arg(args, "method_id" if type == "metric" else "dataset_id")
+
+  # create code for writing the output h5ad files
+  write_h5ad_string = "\\n".join(
+    write_output_python(arg, copy_from_adata, type == "metric")
+    for arg in args
+    if arg["type"] == "file"
+    and arg.get("direction", "input") == "output"
+  )
+
+  if type == 'metric':
+    processing_string = strip_margin(f'''\\
+      |print('Compute metrics', flush=True)
+      |# metric_ids and metric_values can have length > 1
+      |# but should be of equal length
+      |uns_metric_ids = [ '{par['name']}' ]
+      |uns_metric_values = [ 0.5 ]''')
+  else:
+    processing_string = strip_margin(f'''\\
+      |print('Preprocess data', flush=True)
+      |# ... preprocessing ...
+      |
+      |print('Train model', flush=True)
+      |# ... train model ...
+      |
+      |print('Generate predictions', flush=True)
+      |# ... generate predictions ...''')
+
+  script = strip_margin(f'''\\
+    |import anndata as ad
+    |
+    |## VIASH START
+    |# Note: this section is auto-generated by viash at runtime. To edit it, make changes
+    |# in config.vsh.yaml and then run \`viash config inject config.vsh.yaml\`.
+    |par = {{
+    |  {par_string}
+    |}}
+    |meta = {{
+    |  'functionality_name': '{par["name"]}'
+    |}}
+    |## VIASH END
+    |
+    |print('Reading input files', flush=True)
+    |{read_h5ad_string}
+    |
+    |{processing_string}
+    |
+    |{write_h5ad_string}
+    |''')
+
+  return script
+
+def create_r_script(par, api_spec, type):
+  args = api_spec['functionality']['arguments']
+
+  # create the arguments of the par string
+  par_string = ",\\n  ".join(f'{arg["key"]} = "{arg["value"]}"' for arg in args)
+
+  # create helpers for reading the h5ad file
+  read_h5ad_string = "\\n".join(
+    f'{arg["key"]} <- anndata::read_h5ad(par[["{arg["key"]}"]])'
+    for arg in args
+    if arg['type'] == "file"
+    and arg.get("direction", "input") == "input"
+  )
+
+  # determine which adata to copy from
+  copy_from_adata = look_for_adata_arg(args, "method_id" if type == "metric" else "dataset_id")
+
+  # create code for writing the output h5ad files
+  write_h5ad_string = "\\n".join(
+    write_output_r(arg, copy_from_adata, type == "metric")
+    for arg in args
+    if arg["type"] == "file"
+    and arg.get("direction", "input") == "output"
+  )
+
+  if type == 'metric':
+    processing_string = strip_margin(f'''\\
+      |cat("Compute metrics\\\\n")
+      |# metric_ids and metric_values can have length > 1
+      |# but should be of equal length
+      |uns_metric_ids <- c("{par['name']}")
+      |uns_metric_values <- c(0.5)''')
+  else:
+    processing_string = strip_margin(f'''\\
+      |cat("Preprocess data\\\\n")
+      |# ... preprocessing ...
+      |
+      |cat("Train model\\\\n")
+      |# ... train model ...
+      |
+      |cat("Generate predictions\\\\n")
+      |# ... generate predictions ...''')
+
+  script = strip_margin(f'''\\
+    |library(anndata)
+    |
+    |## VIASH START
+    |par <- list(
+    |  {par_string}
+    |)
+    |meta <- list(
+    |  functionality_name = "{par["name"]}"
+    |)
+    |## VIASH END
+    |
+    |cat("Reading input files\\\\n")
+    |{read_h5ad_string}
+    |
+    |{processing_string}
+    |
+    |{write_h5ad_string}
+    |''')
+
+  return script
+
+# def read_viash_config(file):
+#   file = file.absolute()
+
+#   # read in config
+#   command = ["viash", "config", "view", str(file)]
+
+#   # Execute the command and capture the output
+#   output = subprocess.check_output(
+#     command,
+#     universal_newlines=True,
+#     cwd=str(file.parent)
+#   )
+
+#   # Parse the output as YAML
+#   config = yaml.load(output)
+
+#   return config
+
+
+def main(par):
+  ####### CHECK INPUTS #######
+  print("Check inputs", flush=True)
+  assert re.match("[a-z][a-z0-9_]*", par["name"]), "Name should match the regular expression '[a-z][a-z0-9_]*'. Example: 'my_component'."
+  assert len(par['name']) <= 50, "Method name should be at most 50 characters."
+
+  pretty_name = re.sub("_", " ", par['name']).title()
+
+  ####### CHECK LANGUAGE #######
+  print("Check language", flush=True)
+  # check language and determine script path
+  if par["language"] == "python":
+    script_path = "script.py"
+  elif par["language"] == "r":
+    script_path = "script.R"
+  else:
+    sys.exit(f"Unrecognized language parameter '{par['language']}'.")
+
+  ## CHECK API FILE
+  print("Check API file", flush=True)
+  api_file = Path(par["api_file"])
+  viash_yaml = Path(par["viash_yaml"])
+  project_dir = viash_yaml.parent
+  if not api_file.exists():
+    comp_types = [x.with_suffix("").name.removeprefix("comp_") for x in api_file.parent.glob("**/comp_*.y*ml")]
+    list.sort(comp_types)
+    sys.exit(strip_margin(f"""\\
+      |Error: Invalid --type argument.
+      |  Reason: Could not find API file at '{api_file.relative_to(project_dir)}'.
+      |  Possible values for --type: {', '.join(comp_types)}."""))
+  
+  ## READ API FILE
+  print("Read API file", flush=True)
+  api = read_and_merge_yaml(api_file)
+  comp_type = api.get("functionality", {}).get("info", {}).get("type", {})
+  if not comp_type:
+    sys.exit(strip_margin(f"""\\
+      |Error: API file is incorrectly formatted.
+      |  Reason: Could not find component type at \`.functionality.info.type\`.'
+      |  Please fix the formatting of the API file."""))
+
+  ####### CREATE OUTPUT DIR #######
+  print("Create output dir", flush=True)
+  out_dir = Path(par["output"])
+  out_dir.mkdir(exist_ok=True)
+
+  ####### CREATE CONFIG #######
+  print("Create config", flush=True)
+  config_file = out_dir / "config.vsh.yaml"
+
+  # get config template
+  config_str = create_config(par, comp_type, pretty_name, script_path)
+
+  with open(config_file, "w") as f:
+    f.write(config_str)
+
+  ####### CREATE SCRIPT #######
+  print("Create script", flush=True)
+  script_file = out_dir / script_path
+
+  # set reasonable values
+  set_par_values(api)
+
+  if par["language"] == "python":
+    script_out = create_python_script(par, api, comp_type)
+
+  if par["language"] == "r":
+    script_out = create_r_script(par, api, comp_type)
+  
+  # write script
+  with open(script_file, "w") as f:
+    f.write(script_out)
+
+  print("Done!", flush=True)
+
+
+if __name__ == "__main__":
+  main(par)
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/common/create_component/read_and_merge_yaml.py b/target/native/common/create_component/read_and_merge_yaml.py
new file mode 100644
index 0000000000..b74995aed1
--- /dev/null
+++ b/target/native/common/create_component/read_and_merge_yaml.py
@@ -0,0 +1,52 @@
+def read_and_merge_yaml(path):
+    """Read a Viash YAML
+    
+    If the YAML contains a "__merge__" key anywhere in the yaml,
+    the path specified in that YAML will be read and the two
+    lists will be merged. This is a recursive procedure.
+    
+    Arguments:
+    path -- Path to the Viash YAML"""
+    from ruamel.yaml import YAML
+
+    yaml = YAML(typ='safe', pure=True)
+
+    with open(path, 'r') as stream:
+        data = yaml.load(stream)
+    return _ram_process_merge(data, path)
+
+def _ram_deep_merge(dict1, dict2):
+    if isinstance(dict1, dict) and isinstance(dict2, dict):
+        keys = set(list(dict1.keys()) + list(dict2.keys()))
+        out = {}
+        for key in keys:
+            if key in dict1:
+                if key in dict2:
+                    out[key] = _ram_deep_merge(dict1[key], dict2[key])
+                else:
+                    out[key] = dict1[key]
+            else:
+                out[key] = dict2[key]
+        return out
+    elif isinstance(dict1, list) and isinstance(dict2, list):
+        return dict1 + dict2
+    else:
+        return dict2
+
+def _ram_process_merge(data, path):
+    import os
+    if isinstance(data, dict):
+        processed_data = {k: _ram_process_merge(v, path) for k, v in data.items()}
+
+        if "__merge__" in processed_data:
+            new_data_path = os.path.join(os.path.dirname(path), processed_data["__merge__"])
+            new_data = read_and_merge_yaml(new_data_path)
+        else:
+            new_data = {}
+
+        return _ram_deep_merge(new_data, processed_data)
+    elif isinstance(data, list):
+        return [_ram_process_merge(dat, path) for dat in data]
+    else:
+        return data
+
diff --git a/target/native/common/create_task_readme/.config.vsh.yaml b/target/native/common/create_task_readme/.config.vsh.yaml
new file mode 100644
index 0000000000..8dc6364005
--- /dev/null
+++ b/target/native/common/create_task_readme/.config.vsh.yaml
@@ -0,0 +1,175 @@
+functionality:
+  name: "create_task_readme"
+  namespace: "common"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "string"
+      name: "--task"
+      description: "Which task the component will be added to."
+      info: null
+      example:
+      - "denoising"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--task_dir"
+      description: "Path to the task directory."
+      info: null
+      default:
+      - "src/tasks/${VIASH_PAR_TASK}"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--viash_yaml"
+      description: "Path to the project config file. Needed for knowing the relative\
+        \ location of a file to the project root.\n"
+      info: null
+      default:
+      - "_viash.yaml"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--github_url"
+      description: "URL to the GitHub repository. Needed for linking to the source\
+        \ code.\n"
+      info: null
+      default:
+      - "https://github.com/openproblems-bio/openproblems/tree/main/"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output"
+      description: "Path to the component directory. Suggested location is `src/tasks/<TASK>/README.md`."
+      info: null
+      default:
+      - "src/tasks/${VIASH_PAR_TASK}/README.md"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/read_and_merge_yaml.R"
+  - type: "file"
+    path: "src/common/helper_functions/read_api_files.R"
+  - type: "file"
+    path: "src/common/helper_functions/strip_margin.R"
+  description: "Create a README for the task.\n"
+  test_resources:
+  - type: "r_script"
+    path: "test.R"
+    is_executable: true
+  - type: "file"
+    path: "src"
+    dest: "openproblems/src"
+  - type: "file"
+    path: "_viash.yaml"
+    dest: "openproblems/_viash.yaml"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    packages:
+    - "dplyr"
+    - "purrr"
+    - "rlang"
+    - "glue"
+    - "yaml"
+    - "fs"
+    - "cli"
+    - "igraph"
+    - "rmarkdown"
+    - "processx"
+    bioc_force_install: false
+  - type: "apt"
+    packages:
+    - "jq"
+    - "curl"
+    interactive: false
+  - type: "docker"
+    run:
+    - "release_info=$(curl -s https://api.github.com/repos/quarto-dev/quarto-cli/releases/latest)\
+      \ && \\\n  download_url=$(printf \"%s\" \"$release_info\" | jq -r '.assets[]\
+      \ | select(.name | test(\"quarto-.*-linux-amd64.deb\")) | .browser_download_url')\
+      \ && \\\n  curl -sL \"$download_url\" -o /opt/quarto.deb && \\\n  dpkg -i /opt/quarto.deb\
+      \ && \\\n  rm /opt/quarto.deb\n"
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/create_task_readme/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/common/create_task_readme"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/common/create_task_readme/create_task_readme"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/common/create_task_readme/create_task_readme b/target/native/common/create_task_readme/create_task_readme
new file mode 100755
index 0000000000..8eca107a13
--- /dev/null
+++ b/target/native/common/create_task_readme/create_task_readme
@@ -0,0 +1,649 @@
+#!/usr/bin/env bash
+
+# create_task_readme 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="create_task_readme"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "create_task_readme 2.0.0"
+  echo ""
+  echo "Create a README for the task."
+  echo ""
+  echo "Inputs:"
+  echo "    --task"
+  echo "        type: string"
+  echo "        example: denoising"
+  echo "        Which task the component will be added to."
+  echo ""
+  echo "    --task_dir"
+  echo "        type: file, file must exist"
+  echo "        default: src/tasks/\${VIASH_PAR_TASK}"
+  echo "        Path to the task directory."
+  echo ""
+  echo "    --viash_yaml"
+  echo "        type: file, file must exist"
+  echo "        default: _viash.yaml"
+  echo "        Path to the project config file. Needed for knowing the relative"
+  echo "        location of a file to the project root."
+  echo ""
+  echo "    --github_url"
+  echo "        type: string"
+  echo "        default: https://github.com/openproblems-bio/openproblems/tree/main/"
+  echo "        URL to the GitHub repository. Needed for linking to the source code."
+  echo ""
+  echo "Outputs:"
+  echo "    --output"
+  echo "        type: file, output, file must exist"
+  echo "        default: src/tasks/\${VIASH_PAR_TASK}/README.md"
+  echo "        Path to the component directory. Suggested location is"
+  echo "        \`src/tasks/<TASK>/README.md\`."
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "create_task_readme 2.0.0"
+            exit
+            ;;
+        --task)
+            [ -n "$VIASH_PAR_TASK" ] && ViashError Bad arguments for option \'--task\': \'$VIASH_PAR_TASK\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --task. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --task=*)
+            [ -n "$VIASH_PAR_TASK" ] && ViashError Bad arguments for option \'--task=*\': \'$VIASH_PAR_TASK\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --task_dir)
+            [ -n "$VIASH_PAR_TASK_DIR" ] && ViashError Bad arguments for option \'--task_dir\': \'$VIASH_PAR_TASK_DIR\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_DIR="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --task_dir. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --task_dir=*)
+            [ -n "$VIASH_PAR_TASK_DIR" ] && ViashError Bad arguments for option \'--task_dir=*\': \'$VIASH_PAR_TASK_DIR\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_DIR=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --viash_yaml)
+            [ -n "$VIASH_PAR_VIASH_YAML" ] && ViashError Bad arguments for option \'--viash_yaml\': \'$VIASH_PAR_VIASH_YAML\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VIASH_YAML="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --viash_yaml. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --viash_yaml=*)
+            [ -n "$VIASH_PAR_VIASH_YAML" ] && ViashError Bad arguments for option \'--viash_yaml=*\': \'$VIASH_PAR_VIASH_YAML\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_VIASH_YAML=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --github_url)
+            [ -n "$VIASH_PAR_GITHUB_URL" ] && ViashError Bad arguments for option \'--github_url\': \'$VIASH_PAR_GITHUB_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GITHUB_URL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --github_url. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --github_url=*)
+            [ -n "$VIASH_PAR_GITHUB_URL" ] && ViashError Bad arguments for option \'--github_url=*\': \'$VIASH_PAR_GITHUB_URL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_GITHUB_URL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_TASK_DIR+x} ]; then
+  VIASH_PAR_TASK_DIR="src/tasks/${VIASH_PAR_TASK}"
+fi
+if [ -z ${VIASH_PAR_VIASH_YAML+x} ]; then
+  VIASH_PAR_VIASH_YAML="_viash.yaml"
+fi
+if [ -z ${VIASH_PAR_GITHUB_URL+x} ]; then
+  VIASH_PAR_GITHUB_URL="https://github.com/openproblems-bio/openproblems/tree/main/"
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  VIASH_PAR_OUTPUT="src/tasks/${VIASH_PAR_TASK}/README.md"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_TASK_DIR" ] && [ ! -e "$VIASH_PAR_TASK_DIR" ]; then
+  ViashError "Input file '$VIASH_PAR_TASK_DIR' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_VIASH_YAML" ] && [ ! -e "$VIASH_PAR_VIASH_YAML" ]; then
+  ViashError "Input file '$VIASH_PAR_VIASH_YAML' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-create_task_readme-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+library(rlang, quietly = TRUE, warn.conflicts = FALSE)
+library(purrr, quietly = TRUE, warn.conflicts = FALSE)
+library(dplyr, quietly = TRUE, warn.conflicts = FALSE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "task" = $( if [ ! -z ${VIASH_PAR_TASK+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_TASK" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "task_dir" = $( if [ ! -z ${VIASH_PAR_TASK_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_TASK_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "viash_yaml" = $( if [ ! -z ${VIASH_PAR_VIASH_YAML+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_VIASH_YAML" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "github_url" = $( if [ ! -z ${VIASH_PAR_GITHUB_URL+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_GITHUB_URL" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+if (is.null(par\$task) && is.null(par\$task_dir)) {
+  stop("Either 'task' or 'task_dir' must be provided")
+}
+if (is.null(par\$viash_yaml)) {
+  stop("Argument 'viash_yaml' must be provided")
+}
+if (is.null(par\$output)) {
+  stop("Argument 'output' must be provided")
+}
+
+# import helper function
+source(paste0(meta["resources_dir"], "/read_and_merge_yaml.R"))
+source(paste0(meta["resources_dir"], "/strip_margin.R"))
+source(paste0(meta["resources_dir"], "/read_api_files.R"))
+
+cat("Read task info\\n")
+task_api <- read_task_api(par[["task_dir"]])
+
+# determine ordering
+root <- .task_graph_get_root(task_api)
+
+r_graph <- render_task_graph(task_api, root)
+
+cat("Render API details\\n")
+order <- names(igraph::bfs(task_api\$task_graph, root)\$order)
+r_details <- map_chr(
+  order,
+  function(file_name) {
+    if (file_name %in% names(task_api\$comp_specs)) {
+      render_component(task_api\$comp_specs[[file_name]])
+    } else {
+      render_file(task_api\$file_specs[[file_name]])
+    }
+  }
+)
+
+cat("Render authors\\n")
+authors_str <-
+  if (nrow(task_api\$authors) > 0) {
+    paste0(
+      "\\n## Authors & contributors\\n\\n",
+      task_api\$authors %>% knitr::kable() %>% paste(collapse = "\\n"),
+      "\\n"
+    )
+  } else {
+    ""
+  }
+readme_str <-
+  if (is.null(task_api\$task_info\$readme) || is.na(task_api\$task_info\$readme)) {
+    ""
+  } else {
+    paste0(
+      "\\n## README\\n\\n",
+      task_api\$task_info\$readme,
+      "\\n"
+    )
+  }
+
+cat("Generate qmd content\\n")
+relative_path <- par[["task_dir"]] %>%
+  gsub(paste0(dirname(par[["viash_yaml"]]), "/*"), "", .) %>%
+  gsub("/*\$", "", .)
+source_url <- paste0(par[["github_url"]], relative_path)
+qmd_content <- strip_margin(glue::glue("
+  §---
+  §title: \\"{task_api\$task_info\$label}\\"
+  §format: gfm
+  §---
+  §
+  §<!--
+  §This file is automatically generated from the tasks's api/*.yaml files.
+  §Do not edit this file directly.
+  §-->
+  §
+  §{task_api\$task_info\$summary}
+  §
+  §Path to source: [\`{relative_path}\`]({source_url})
+  §
+  §{readme_str}
+  §
+  §## Motivation
+  §
+  §{task_api\$task_info\$motivation}
+  §
+  §## Description
+  §
+  §{task_api\$task_info\$description}
+  §{authors_str}
+  §## API
+  §
+  §{r_graph}
+  §
+  §{paste(r_details, collapse = '\\n\\n')}
+  §
+  §"), symbol = "§")
+
+cat("Write README.qmd to file\\n")
+qmd_file <- tempfile(
+  pattern = "README_",
+  fileext = ".qmd",
+  tmpdir = meta\$temp_dir
+)
+
+if (!dir.exists(meta\$temp_dir)) {
+  dir.create(meta\$temp_dir, recursive = TRUE)
+}
+writeLines(qmd_content, qmd_file)
+
+cat("Render README.qmd to README.md\\n")
+out <- processx::run(
+  command = "quarto",
+  args = c("render", qmd_file, "--output", "-"),
+  echo = TRUE
+)
+
+writeLines(out\$stdout, par\$output)
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/common/create_task_readme/read_and_merge_yaml.R b/target/native/common/create_task_readme/read_and_merge_yaml.R
new file mode 100644
index 0000000000..932d3feb92
--- /dev/null
+++ b/target/native/common/create_task_readme/read_and_merge_yaml.R
@@ -0,0 +1,144 @@
+#' Read a Viash YAML
+#'
+#' If the YAML contains a "__merge__" key anywhere in the yaml,
+#' the path specified in that YAML will be read and the two
+#' lists will be merged. This is a recursive procedure.
+#'
+#' @param path Path to Viash YAML
+read_and_merge_yaml <- function(path, project_path = .ram_find_project(path)) {
+  path <- normalizePath(path, mustWork = FALSE)
+  data <- tryCatch({
+    suppressWarnings(yaml::read_yaml(path))
+  }, error = function(e) {
+    stop("Could not read ", path, ". Error: ", e)
+  })
+  .ram_process_merge(data, data, path, project_path)
+}
+
+.ram_find_project <- function(path) {
+  path <- normalizePath(path, mustWork = FALSE)
+  check <- paste0(dirname(path), "/_viash.yaml")
+  if (file.exists(check)) {
+    dirname(check)
+  } else if (check == "//_viash.yaml") {
+    NULL
+  } else {
+    .ram_find_project(dirname(check))
+  }
+}
+
+.ram_is_named_list <- function(obj) {
+  is.null(obj) || (is.list(obj) && (length(obj) == 0 || !is.null(names(obj))))
+}
+
+.ram_process_merge <- function(data, root_data, path, project_path) {
+  if (.ram_is_named_list(data)) {
+    # check whether children have `__merge__` entries
+    processed_data <- lapply(data, function(dat) {
+      .ram_process_merge(dat, root_data, path, project_path)
+    })
+    processed_data <- lapply(names(data), function(nm) {
+      dat <- data[[nm]]
+      .ram_process_merge(dat, root_data, path, project_path)
+    })
+    names(processed_data) <- names(data)
+
+    # if current element has __merge__, read list2 yaml and combine with data
+    new_data <-
+      if ("__merge__" %in% names(processed_data) && !.ram_is_named_list(processed_data$`__merge__`)) {
+        new_data_path <- .ram_resolve_path(
+          path = processed_data$`__merge__`,
+          project_path = project_path,
+          parent_path = dirname(path)
+        )
+        read_and_merge_yaml(new_data_path, project_path)
+      } else if ("$ref" %in% names(processed_data) && !.ram_is_named_list(processed_data$`$ref`)) {
+        ref_parts <- strsplit(processed_data$`$ref`, "#")[[1]]
+
+        # resolve the path in $ref
+        x <-
+          if (ref_parts[[1]] == "") {
+            root_data
+          } else {
+            new_data_path <- .ram_resolve_path(
+              path = ref_parts[[1]],
+              project_path = project_path,
+              parent_path = dirname(path)
+            )
+            new_data_path <- normalizePath(new_data_path, mustWork = FALSE)
+
+            # read in the new data
+            tryCatch({
+              suppressWarnings(yaml::read_yaml(new_data_path))
+            }, error = function(e) {
+              stop("Could not read ", new_data_path, ". Error: ", e)
+            })
+          }
+        x_root <- x
+        
+
+        # Navigate the path and retrieve the referenced data
+        ref_path_parts <- unlist(strsplit(ref_parts[[2]], "/"))
+        for (part in ref_path_parts) {
+          if (part == "") {
+            next
+          } else if (part %in% names(x)) {
+            x <- x[[part]]
+          } else {
+            stop("Could not find ", processed_data$`$ref`, " in ", path)
+          }
+        }
+
+        # postprocess the new data
+        if (ref_parts[[1]] == "") {
+          x
+        } else {
+          .ram_process_merge(x, x_root, new_data_path, project_path)
+        }
+      } else {
+        list()
+      }
+
+    .ram_deep_merge(new_data, processed_data)
+  } else if (is.list(data)) {
+    lapply(data, function(dat) {
+      .ram_process_merge(dat, root_data, path, project_path)
+    })
+  } else {
+    data
+  }
+}
+
+.ram_resolve_path <- function(path, project_path, parent_path) {
+  ifelse(
+    grepl("^/", path),
+    paste0(project_path, "/", path),
+    fs::path_abs(path, parent_path)
+  )
+}
+
+.ram_deep_merge <- function(list1, list2) {
+  if (.ram_is_named_list(list1) && .ram_is_named_list(list2)) {
+    # if list1 and list2 are objects, recursively merge
+    keys <- unique(c(names(list1), names(list2)))
+    out <- lapply(keys, function(key) {
+      if (key %in% names(list1)) {
+        if (key %in% names(list2)) {
+          .ram_deep_merge(list1[[key]], list2[[key]])
+        } else {
+          list1[[key]]
+        }
+      } else {
+        list2[[key]]
+      }
+    })
+    names(out) <- keys
+    out
+  } else if (is.list(list1) && is.list(list2)) {
+    # if list1 and list2 are both lists, append
+    c(list1, list2)
+  } else {
+    # else override list1 with list2
+    list2
+  }
+}
\ No newline at end of file
diff --git a/target/native/common/create_task_readme/read_api_files.R b/target/native/common/create_task_readme/read_api_files.R
new file mode 100644
index 0000000000..f2cf49b2f8
--- /dev/null
+++ b/target/native/common/create_task_readme/read_api_files.R
@@ -0,0 +1,493 @@
+
+anndata_struct_names <- c("obs", "var", "obsm", "obsp", "varm", "varp", "layers", "uns")
+
+read_file_spec <- function(path) {
+  spec <- read_and_merge_yaml(path)
+  out <- list(
+    info = read_file_info(spec, path)
+  )
+  if (out$info$file_type == "h5ad" || "slots" %in% names(spec$info)) {
+    out$info$file_type <- "h5ad"
+    out$slots <- read_anndata_slots(spec, path)
+  }
+  if (out$info$file_type == "csv" || out$info$file_type == "tsv" || out$info$file_type == "parquet") {
+    out$columns <- read_tabular_columns(spec, path)
+  }
+  out
+}
+read_file_info <- function(spec, path) {
+  # TEMP: make it readable
+  spec$info$slots <- NULL
+  df <- list_as_tibble(spec)
+  if (list_contains_tibble(spec$info)) {
+    df <- dplyr::bind_cols(df, list_as_tibble(spec$info))
+  }
+  df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+  df$description <- df$description %||% NA_character_ %>% as.character
+  df$summary <- df$summary %||% NA_character_ %>% as.character
+  as_tibble(df)
+}
+read_anndata_slots <- function(spec, path) {
+  map_df(
+    anndata_struct_names,
+    function(struct_name, slot) {
+      slot <- spec$info$slots[[struct_name]]
+      if (is.null(slot)) return(NULL)
+      df <- map_df(slot, as.data.frame)
+      df$struct <- struct_name
+      df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+      df$required <- df$required %||% TRUE %|% TRUE
+      df$multiple <- df$multiple %||% FALSE %|% FALSE
+      as_tibble(df)
+    }
+  )
+}
+read_tabular_columns <- function(spec, path) {
+  map_df(
+    spec$info$columns,
+    function(column) {
+      df <- list_as_tibble(column)
+      df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+      df$required <- df$required %||% TRUE %|% TRUE
+      df$multiple <- df$multiple %||% FALSE %|% FALSE
+      as_tibble(df)
+    }
+  )
+}
+
+format_file_format <- function(spec) {
+  if (spec$info$file_type == "h5ad") {
+    example <- spec$slots %>%
+      group_by(struct) %>%
+      summarise(
+        str = paste0(unique(struct), ": ", paste0("'", name, "'", collapse = ", "))
+      ) %>%
+      arrange(match(struct, anndata_struct_names))
+
+    c("    AnnData object", paste0("     ", example$str))
+  } else if (spec$info$file_type == "csv" || spec$info$file_type == "tsv" || spec$info$file_type == "parquet") {
+    example <- spec$columns %>%
+      summarise(
+        str = paste0("'", name, "'", collapse = ", ")
+      )
+
+    c("    Tabular data", paste0("     ", example$str))
+  } else {
+    ""
+  }
+}
+
+format_file_format_as_kable <- function(spec) {
+  if (spec$info$file_type == "h5ad") {
+    spec$slots %>%
+      mutate(
+        tag_str = pmap_chr(lst(required), function(required) {
+          out <- c()
+          if (!required) {
+            out <- c(out, "Optional")
+          }
+          if (length(out) == 0) {
+            ""
+          } else {
+            paste0("(_", paste(out, collapse = ", "), "_) ")
+          }
+        })
+      ) %>%
+      transmute(
+        Slot = paste0("`", struct, "[\"", name, "\"]`"),
+        Type = paste0("`", type, "`"),
+        Description = paste0(
+          tag_str,
+          description %>% gsub(" *\n *", " ", .) %>% gsub("\\. *$", "", .), 
+          "."
+        )
+      ) %>%
+      knitr::kable()
+  } else if (spec$info$file_type == "csv" || spec$info$file_type == "tsv" || spec$info$file_type == "parquet") {
+    spec$columns %>%
+      mutate(
+        tag_str = pmap_chr(lst(required), function(required) {
+          out <- c()
+          if (!required) {
+            out <- c(out, "Optional")
+          }
+          if (length(out) == 0) {
+            ""
+          } else {
+            paste0("(_", paste(out, collapse = ", "), "_) ")
+          }
+        })
+      ) %>%
+      transmute(
+        Column = paste0("`", name, "`"),
+        Type = paste0("`", type, "`"),
+        Description = paste0(
+          tag_str,
+          description %>% gsub(" *\n *", " ", .) %>% gsub("\\. *$", "", .), 
+          "."
+        )
+      ) %>%
+      knitr::kable()
+  } else {
+    ""
+  }
+}
+
+list_contains_tibble <- function(li) {
+  is.list(li) && any(sapply(li, is.atomic))
+}
+
+list_as_tibble <- function(li) {
+  as.data.frame(li[sapply(li, is.atomic)], check.names = FALSE)
+}
+
+read_comp_spec <- function(path) {
+  spec_yaml <- read_and_merge_yaml(path)
+  list(
+    info = read_comp_info(spec_yaml, path),
+    args = read_comp_args(spec_yaml, path)
+  )
+}
+
+read_comp_info <- function(spec_yaml, path) {
+  # TEMP: make it readable
+  spec_yaml$functionality$arguments <- NULL
+  spec_yaml$functionality$argument_groups <- NULL
+  
+  df <- list_as_tibble(spec_yaml$functionality)
+  if (nrow(df) == 0) {
+    df <- data.frame(a = 1)[, integer(0)]
+  }
+  if (list_contains_tibble(spec_yaml$functionality$info)) {
+    df <- dplyr::bind_cols(df, list_as_tibble(spec_yaml$functionality$info))
+  }
+  if (list_contains_tibble(spec_yaml$functionality$info$type_info)) {
+    df <- dplyr::bind_cols(df, list_as_tibble(spec_yaml$functionality$info$type_info))
+  }
+  df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+  as_tibble(df)
+}
+
+read_comp_args <- function(spec_yaml, path) {
+  arguments <- spec_yaml$functionality$arguments
+  for (arg_group in spec_yaml$functionality$argument_groups) {
+    arguments <- c(arguments, arg_group$arguments)
+  }
+  map_df(arguments, function(arg) {
+    df <- list_as_tibble(arg)
+    if (list_contains_tibble(arg$info)) {
+      df <- dplyr::bind_cols(df, list_as_tibble(arg$info))
+    }
+    df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+    df$arg_name <- gsub("^-*", "", arg$name)
+    df$direction <- df$direction %||% "input" %|% "input"
+    df$parent <- df$`__merge__` %||% NA_character_ %>% basename() %>% gsub("\\.yaml", "", .)
+    df$required <- df$required %||% FALSE %|% FALSE
+    df$default <- df$default %||% NA_character_ %>% as.character
+    df$example <- df$example %||% NA_character_ %>% as.character
+    df$description <- df$description %||% NA_character_ %>% as.character
+    df$summary <- df$summary %||% NA_character_ %>% as.character
+    df
+  })
+}
+
+format_comp_args_as_tibble <- function(spec) {
+  if (nrow(spec$args) == 0) return("")
+  spec$args %>%
+    mutate(
+      tag_str = pmap_chr(lst(required, direction), function(required, direction) {
+        out <- c()
+        if (!required) {
+          out <- c(out, "Optional")
+        }
+        if (direction == "output") {
+          out <- c(out, "Output")
+        }
+        if (length(out) == 0) {
+          ""
+        } else {
+          paste0("(_", paste(out, collapse = ", "), "_) ")
+        }
+      })
+    ) %>%
+    transmute(
+      Name = paste0("`--", arg_name, "`"),
+      Type = paste0("`", type, "`"),
+      Description = paste0(
+        tag_str,
+        (summary %|% description) %>% gsub(" *\n *", " ", .) %>% gsub("\\. *$", "", .), 
+        ".",
+        ifelse(!is.na(default), paste0(" Default: `", default, "`."), "")
+      )
+    ) %>%
+    knitr::kable()
+}
+
+# path <- "src/datasets/api/comp_processor_knn.yaml"
+render_component <- function(spec) {
+  if (is.character(spec)) {
+    spec <- read_comp_spec(spec)
+  }
+
+  strip_margin(glue::glue("
+    §## Component type: {spec$info$label}
+    §
+    §Path: [`src/{spec$info$namespace}`](https://github.com/openproblems-bio/openproblems/tree/main/src/{spec$info$namespace})
+    §
+    §{spec$info$summary}
+    §
+    §Arguments:
+    §
+    §:::{{.small}}
+    §{paste(format_comp_args_as_tibble(spec), collapse = '\n')}
+    §:::
+    §
+    §"), symbol = "§")
+}
+
+# path <- "src/datasets/api/file_pca.yaml"
+render_file <- function(spec) {
+  if (is.character(spec)) {
+    spec <- read_file_spec(spec)
+  }
+
+  if (!"label" %in% names(spec$info)) {
+    spec$info$label <- basename(spec$info$example)
+  }
+
+  example <-
+    if (is.null(spec$info$example) || is.na(spec$info$example)) {
+      ""
+    } else {
+      paste0("Example file: `", spec$info$example, "`")
+    }
+
+  description <-
+    if (is.null(spec$info$description) || is.na(spec$info$description)) {
+      ""
+    } else {
+      paste0("Description:\n\n", spec$info$description)
+    }
+
+  strip_margin(glue::glue("
+    §## File format: {spec$info$label}
+    §
+    §{spec$info$summary %||% ''}
+    §
+    §{example}
+    §
+    §{description}
+    §
+    §Format:
+    §
+    §:::{{.small}}
+    §{paste(format_file_format(spec), collapse = '\n')}
+    §:::
+    §
+    §Slot description:
+    §
+    §:::{{.small}}
+    §{paste(format_file_format_as_kable(spec), collapse = '\n')}
+    §:::
+    §
+    §"), symbol = "§")
+}
+
+# path <- "src/tasks/denoising"
+read_task_api <- function(path) {
+  cli::cli_inform("Looking for project root")
+  project_path <- .ram_find_project(path)
+  api_dir <- paste0(path, "/api")
+
+  cli::cli_inform("Reading task info")
+  task_info_yaml <- list.files(api_dir, pattern = "task_info.ya?ml", full.names = TRUE)
+  assertthat::assert_that(length(task_info_yaml) == 1)
+  task_info <- read_and_merge_yaml(task_info_yaml, project_path)
+
+  cli::cli_inform("Reading task authors")
+  authors <- map_df(task_info$authors, function(aut) {
+    aut$roles <- paste(aut$roles, collapse = ", ")
+    list_as_tibble(aut)
+  })
+
+  cli::cli_inform("Reading component yamls")
+  comp_yamls <- list.files(api_dir, pattern = "comp_.*\\.ya?ml", full.names = TRUE)
+  comps <- map(comp_yamls, read_comp_spec)
+  comp_info <- map_df(comps, "info")
+  comp_args <- map_df(comps, "args")
+  names(comps) <- basename(comp_yamls) %>% gsub("\\..*$", "", .)
+
+  cli::cli_inform("Reading file yamls")
+  file_yamls <- .ram_resolve_path(
+    path = na.omit(unique(comp_args$`__merge__`)),
+    project_path = project_path,
+    parent_path = api_dir
+  )
+  files <- map(file_yamls, read_file_spec)
+  names(files) <- basename(file_yamls) %>% gsub("\\..*$", "", .)
+  file_info <- map_df(files, "info")
+  file_slots <- map_df(files, "slots")
+
+  cli::cli_inform("Generating task graph")
+  task_graph <- create_task_graph(file_info, comp_info, comp_args)
+
+  list(
+    task_info = task_info,
+    file_specs = files,
+    file_info = file_info,
+    file_slots = file_slots,
+    comp_specs = comps,
+    comp_info = comp_info,
+    comp_args = comp_args,
+    task_graph = task_graph,
+    authors = authors
+  )
+}
+
+
+create_task_graph <- function(file_info, comp_info, comp_args) {
+  clean_id <- function(id) {
+    gsub("graph", "graaf", id)
+  }
+  nodes <-
+    bind_rows(
+      file_info %>%
+        mutate(id = file_name, label = label, is_comp = FALSE),
+      comp_info %>%
+        mutate(id = file_name, label = label, is_comp = TRUE)
+    ) %>%
+      select(id, label, everything()) %>%
+      mutate(str = paste0(
+        "  ",
+        clean_id(id),
+        ifelse(is_comp, "[/\"", "(\""),
+        label,
+        ifelse(is_comp, "\"/]", "\")")
+      ))
+  edges <- bind_rows(
+    comp_args %>%
+      filter(type == "file", direction == "input") %>%
+      mutate(
+        from = parent,
+        to = file_name,
+        arrow = "---"
+      ),
+    comp_args %>%
+      filter(type == "file", direction == "output") %>%
+      mutate(
+        from = file_name,
+        to = parent,
+        arrow = "-->"
+      )
+  ) %>%
+    select(from, to, everything()) %>%
+    mutate(str = paste0("  ", clean_id(from), arrow, clean_id(to)))
+
+  igraph::graph_from_data_frame(
+    edges,
+    vertices = nodes,
+    directed = TRUE
+  )
+}
+
+.task_graph_get_root <- function(task_api) {
+  root <- names(which(igraph::degree(task_api$task_graph, mode = "in") == 0))
+  if (length(root) > 1) {
+    warning(
+      "There should probably only be one node with in-degree equal to 0.\n",
+      "  Nodes with in-degree == 0: ", paste(root, collapse = ", ")
+    )
+  }
+  root[[1]]
+}
+
+render_task_graph <- function(task_api, root = .task_graph_get_root(task_api)) {
+  order <- names(igraph::bfs(task_api$task_graph, root)$order)
+
+  vdf <- igraph::as_data_frame(task_api$task_graph, "vertices") %>%
+    arrange(match(name, order))
+  edf <- igraph::as_data_frame(task_api$task_graph, "edges") %>%
+    arrange(match(from, order), match(to, order))
+
+  strip_margin(glue::glue("
+    §```mermaid
+    §flowchart LR
+    §{paste(vdf$str, collapse = '\n')}
+    §{paste(edf$str, collapse = '\n')}
+    §```
+    §"), symbol = "§")
+}
+
+
+
+# Recursive function to process each property with indentation
+.render_example_process_property <- function(prop, prop_name = NULL, indent_level = 0) {
+  if (is.null(prop_name)) {
+    prop_name <- ""
+  }
+
+  out <- c()
+
+  # define helper variables
+  indent_spaces <- strrep(" ", indent_level)
+  next_indent_spaces <- strrep(" ", indent_level + 2)
+
+  # add comment if available
+  if ("description" %in% names(prop)) {
+    comment <- gsub("\n", paste0("\n", indent_spaces, "# "), stringr::str_trim(prop$description))
+    out <- c(out, indent_spaces, "# ", comment, "\n")
+  }
+
+  # add variable
+  out <- c(out, indent_spaces, prop_name, ": ")
+
+  if (prop$type == "object" && "properties" %in% names(prop)) {
+    # Handle object with properties
+    prop_names <- setdiff(names(prop$properties), "additionalProperties")
+    sub_props <- unlist(lapply(prop_names, function(sub_prop_name) {
+      prop_out <- .render_example_process_property(
+        prop$properties[[sub_prop_name]],
+        sub_prop_name,
+        indent_level + 2
+      )
+      c(prop_out, "\n")
+    }))
+    c(out, "\n", sub_props[-length(sub_props)])
+  } else if (prop$type == "array") {
+    if (is.list(prop$items) && "properties" %in% names(prop$items)) {
+      # Handle array of objects
+      array_items_yaml <- unlist(lapply(names(prop$items$properties), function(item_prop_name) {
+        prop_out <- .render_example_process_property(
+          prop$items$properties[[item_prop_name]],
+          item_prop_name,
+          indent_level + 4
+        )
+        c(prop_out, "\n")
+      }))
+      c(out, "\n", next_indent_spaces, "- ", array_items_yaml[-1])
+    } else {
+      # Handle simple array
+      c(out, "[ ... ]")
+    }
+  } else {
+    c(out, "...")
+  }
+}
+
+# Function for rendering an example yaml based on a JSON schema
+render_example <- function(json_schema) {
+  if (!"properties" %in% names(json_schema)) {
+    return("")
+  }
+  text <-
+    unlist(lapply(names(json_schema$properties), function(prop_name) {
+      out <- .render_example_process_property(
+        json_schema$properties[[prop_name]],
+        prop_name,
+        0
+      )
+      c(out, "\n")
+    }))
+
+  paste(text, collapse = "")
+}
\ No newline at end of file
diff --git a/target/native/common/create_task_readme/strip_margin.R b/target/native/common/create_task_readme/strip_margin.R
new file mode 100644
index 0000000000..3830d58d79
--- /dev/null
+++ b/target/native/common/create_task_readme/strip_margin.R
@@ -0,0 +1,3 @@
+strip_margin <- function(text, symbol = "\\|") {
+  gsub(paste0("(^|\n)[ \t]*", symbol), "\\1", text)
+}
\ No newline at end of file
diff --git a/target/native/common/process_task_results/yaml_to_json/.config.vsh.yaml b/target/native/common/process_task_results/yaml_to_json/.config.vsh.yaml
new file mode 100644
index 0000000000..02a6435781
--- /dev/null
+++ b/target/native/common/process_task_results/yaml_to_json/.config.vsh.yaml
@@ -0,0 +1,110 @@
+functionality:
+  name: "yaml_to_json"
+  namespace: "common/process_task_results"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    description: "A yaml file"
+    info: null
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--task_id"
+    description: "A task dir"
+    info: null
+    example:
+    - "label_projection"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Output json"
+    info: null
+    default:
+    - "output.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "convert yaml file to json file"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_get_info.py"
+    is_executable: true
+  - type: "file"
+    path: "src"
+    dest: "openproblems/src"
+  - type: "file"
+    path: "_viash.yaml"
+    dest: "openproblems/_viash.yaml"
+  - type: "file"
+    path: "resources_test/common/task_metadata/dataset_info.yaml"
+    dest: "test_file.yaml"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+- type: "native"
+  id: "native"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/yaml_to_json/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/common/process_task_results/yaml_to_json"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/common/process_task_results/yaml_to_json/yaml_to_json"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/common/process_task_results/yaml_to_json/yaml_to_json b/target/native/common/process_task_results/yaml_to_json/yaml_to_json
new file mode 100755
index 0000000000..00df160617
--- /dev/null
+++ b/target/native/common/process_task_results/yaml_to_json/yaml_to_json
@@ -0,0 +1,478 @@
+#!/usr/bin/env bash
+
+# yaml_to_json 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="yaml_to_json"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "yaml_to_json 2.0.0"
+  echo ""
+  echo "convert yaml file to json file"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, file must exist"
+  echo "        A yaml file"
+  echo ""
+  echo "    --task_id"
+  echo "        type: string"
+  echo "        example: label_projection"
+  echo "        A task dir"
+  echo ""
+  echo "    --output"
+  echo "        type: file, output, file must exist"
+  echo "        default: output.json"
+  echo "        Output json"
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "yaml_to_json 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --task_id)
+            [ -n "$VIASH_PAR_TASK_ID" ] && ViashError Bad arguments for option \'--task_id\': \'$VIASH_PAR_TASK_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_ID="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --task_id. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --task_id=*)
+            [ -n "$VIASH_PAR_TASK_ID" ] && ViashError Bad arguments for option \'--task_id=*\': \'$VIASH_PAR_TASK_ID\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TASK_ID=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  VIASH_PAR_OUTPUT="output.json"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-yaml_to_json-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import yaml
+import json
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'task_id': $( if [ ! -z ${VIASH_PAR_TASK_ID+x} ]; then echo "r'${VIASH_PAR_TASK_ID//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+with open(par["input"], "r") as f:
+    yaml_file = yaml.safe_load(f)
+
+with open(par["output"], "w") as out:
+    json.dump(yaml_file, out, indent=2)
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/common/sync_test_resources/.config.vsh.yaml b/target/native/common/sync_test_resources/.config.vsh.yaml
new file mode 100644
index 0000000000..53bcbe858c
--- /dev/null
+++ b/target/native/common/sync_test_resources/.config.vsh.yaml
@@ -0,0 +1,126 @@
+functionality:
+  name: "sync_test_resources"
+  namespace: "common"
+  version: "2.0.0"
+  arguments:
+  - type: "string"
+    name: "--input"
+    alternatives:
+    - "-i"
+    description: "Path to the S3 bucket to sync from."
+    info: null
+    default:
+    - "s3://openproblems-data/resources_test"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    alternatives:
+    - "-o"
+    description: "Path to the test resource directory."
+    info: null
+    default:
+    - "resources_test"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean_true"
+    name: "--quiet"
+    description: "Displays the operations that would be performed using the specified\
+      \ command without actually running them."
+    info: null
+    direction: "input"
+    dest: "par"
+  - type: "boolean_true"
+    name: "--dryrun"
+    description: "Does not display the operations performed from the specified command."
+    info: null
+    direction: "input"
+    dest: "par"
+  - type: "boolean_true"
+    name: "--delete"
+    description: "Files that exist in the destination but not in the source are deleted\
+      \ during sync."
+    info: null
+    direction: "input"
+    dest: "par"
+  - type: "string"
+    name: "--exclude"
+    description: "Exclude all files or objects from the command that matches the specified\
+      \ pattern."
+    info: null
+    required: false
+    direction: "input"
+    multiple: true
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "bash_script"
+    path: "script.sh"
+    is_executable: true
+  description: "Synchronise the test resources from s3 to resources_test"
+  usage: "sync_test_resources\nsync_test_resources --input s3://openproblems-data/resources_test\
+    \ --output resources_test\n"
+  test_resources:
+  - type: "bash_script"
+    path: "run_test.sh"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "amazon/aws-cli:2.7.12"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/sync_test_resources/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/common/sync_test_resources"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/common/sync_test_resources/sync_test_resources"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/common/sync_test_resources/sync_test_resources b/target/native/common/sync_test_resources/sync_test_resources
new file mode 100755
index 0000000000..b017c00f1c
--- /dev/null
+++ b/target/native/common/sync_test_resources/sync_test_resources
@@ -0,0 +1,572 @@
+#!/usr/bin/env bash
+
+# sync_test_resources 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="sync_test_resources"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "sync_test_resources 2.0.0"
+  echo ""
+  echo "Synchronise the test resources from s3 to resources_test"
+  echo ""
+  echo "Usage:"
+  echo "sync_test_resources"
+  echo "sync_test_resources --input s3://openproblems-data/resources_test --output"
+  echo "resources_test"
+  echo ""
+  echo "Arguments:"
+  echo "    -i, --input"
+  echo "        type: string"
+  echo "        default: s3://openproblems-data/resources_test"
+  echo "        Path to the S3 bucket to sync from."
+  echo ""
+  echo "    -o, --output"
+  echo "        type: file, output, file must exist"
+  echo "        default: resources_test"
+  echo "        Path to the test resource directory."
+  echo ""
+  echo "    --quiet"
+  echo "        type: boolean_true"
+  echo "        Displays the operations that would be performed using the specified"
+  echo "        command without actually running them."
+  echo ""
+  echo "    --dryrun"
+  echo "        type: boolean_true"
+  echo "        Does not display the operations performed from the specified command."
+  echo ""
+  echo "    --delete"
+  echo "        type: boolean_true"
+  echo "        Files that exist in the destination but not in the source are deleted"
+  echo "        during sync."
+  echo ""
+  echo "    --exclude"
+  echo "        type: string, multiple values allowed"
+  echo "        Exclude all files or objects from the command that matches the specified"
+  echo "        pattern."
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "sync_test_resources 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        -i)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'-i\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to -i. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        -o)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'-o\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to -o. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --quiet)
+            [ -n "$VIASH_PAR_QUIET" ] && ViashError Bad arguments for option \'--quiet\': \'$VIASH_PAR_QUIET\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_QUIET=true
+            shift 1
+            ;;
+        --dryrun)
+            [ -n "$VIASH_PAR_DRYRUN" ] && ViashError Bad arguments for option \'--dryrun\': \'$VIASH_PAR_DRYRUN\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DRYRUN=true
+            shift 1
+            ;;
+        --delete)
+            [ -n "$VIASH_PAR_DELETE" ] && ViashError Bad arguments for option \'--delete\': \'$VIASH_PAR_DELETE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DELETE=true
+            shift 1
+            ;;
+        --exclude)
+            if [ -z "$VIASH_PAR_EXCLUDE" ]; then
+              VIASH_PAR_EXCLUDE="$2"
+            else
+              VIASH_PAR_EXCLUDE="$VIASH_PAR_EXCLUDE:""$2"
+            fi
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --exclude. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --exclude=*)
+            if [ -z "$VIASH_PAR_EXCLUDE" ]; then
+              VIASH_PAR_EXCLUDE=$(ViashRemoveFlags "$1")
+            else
+              VIASH_PAR_EXCLUDE="$VIASH_PAR_EXCLUDE:"$(ViashRemoveFlags "$1")
+            fi
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  VIASH_PAR_INPUT="s3://openproblems-data/resources_test"
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  VIASH_PAR_OUTPUT="resources_test"
+fi
+if [ -z ${VIASH_PAR_QUIET+x} ]; then
+  VIASH_PAR_QUIET="false"
+fi
+if [ -z ${VIASH_PAR_DRYRUN+x} ]; then
+  VIASH_PAR_DRYRUN="false"
+fi
+if [ -z ${VIASH_PAR_DELETE+x} ]; then
+  VIASH_PAR_DELETE="false"
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_QUIET" ]]; then
+  if ! [[ "$VIASH_PAR_QUIET" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--quiet' has to be a boolean_true. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_DRYRUN" ]]; then
+  if ! [[ "$VIASH_PAR_DRYRUN" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--dryrun' has to be a boolean_true. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_DELETE" ]]; then
+  if ! [[ "$VIASH_PAR_DELETE" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--delete' has to be a boolean_true. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-sync_test_resources-XXXXXX").sh
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+#!/bin/bash
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+$( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "${VIASH_PAR_INPUT}" | sed "s#'#'\"'\"'#g;s#.*#par_input='&'#" ; else echo "# par_input="; fi )
+$( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "${VIASH_PAR_OUTPUT}" | sed "s#'#'\"'\"'#g;s#.*#par_output='&'#" ; else echo "# par_output="; fi )
+$( if [ ! -z ${VIASH_PAR_QUIET+x} ]; then echo "${VIASH_PAR_QUIET}" | sed "s#'#'\"'\"'#g;s#.*#par_quiet='&'#" ; else echo "# par_quiet="; fi )
+$( if [ ! -z ${VIASH_PAR_DRYRUN+x} ]; then echo "${VIASH_PAR_DRYRUN}" | sed "s#'#'\"'\"'#g;s#.*#par_dryrun='&'#" ; else echo "# par_dryrun="; fi )
+$( if [ ! -z ${VIASH_PAR_DELETE+x} ]; then echo "${VIASH_PAR_DELETE}" | sed "s#'#'\"'\"'#g;s#.*#par_delete='&'#" ; else echo "# par_delete="; fi )
+$( if [ ! -z ${VIASH_PAR_EXCLUDE+x} ]; then echo "${VIASH_PAR_EXCLUDE}" | sed "s#'#'\"'\"'#g;s#.*#par_exclude='&'#" ; else echo "# par_exclude="; fi )
+$( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "${VIASH_META_FUNCTIONALITY_NAME}" | sed "s#'#'\"'\"'#g;s#.*#meta_functionality_name='&'#" ; else echo "# meta_functionality_name="; fi )
+$( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "${VIASH_META_RESOURCES_DIR}" | sed "s#'#'\"'\"'#g;s#.*#meta_resources_dir='&'#" ; else echo "# meta_resources_dir="; fi )
+$( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "${VIASH_META_EXECUTABLE}" | sed "s#'#'\"'\"'#g;s#.*#meta_executable='&'#" ; else echo "# meta_executable="; fi )
+$( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "${VIASH_META_CONFIG}" | sed "s#'#'\"'\"'#g;s#.*#meta_config='&'#" ; else echo "# meta_config="; fi )
+$( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "${VIASH_META_TEMP_DIR}" | sed "s#'#'\"'\"'#g;s#.*#meta_temp_dir='&'#" ; else echo "# meta_temp_dir="; fi )
+$( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "${VIASH_META_CPUS}" | sed "s#'#'\"'\"'#g;s#.*#meta_cpus='&'#" ; else echo "# meta_cpus="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "${VIASH_META_MEMORY_B}" | sed "s#'#'\"'\"'#g;s#.*#meta_memory_b='&'#" ; else echo "# meta_memory_b="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "${VIASH_META_MEMORY_KB}" | sed "s#'#'\"'\"'#g;s#.*#meta_memory_kb='&'#" ; else echo "# meta_memory_kb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "${VIASH_META_MEMORY_MB}" | sed "s#'#'\"'\"'#g;s#.*#meta_memory_mb='&'#" ; else echo "# meta_memory_mb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "${VIASH_META_MEMORY_GB}" | sed "s#'#'\"'\"'#g;s#.*#meta_memory_gb='&'#" ; else echo "# meta_memory_gb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "${VIASH_META_MEMORY_TB}" | sed "s#'#'\"'\"'#g;s#.*#meta_memory_tb='&'#" ; else echo "# meta_memory_tb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "${VIASH_META_MEMORY_PB}" | sed "s#'#'\"'\"'#g;s#.*#meta_memory_pb='&'#" ; else echo "# meta_memory_pb="; fi )
+
+## VIASH END
+
+extra_params=( )
+
+if [ "\$par_quiet" == "true" ]; then
+  extra_params+=( "--quiet" )
+fi
+if [ "\$par_dryrun" == "true" ]; then
+  extra_params+=( "--dryrun" )
+fi
+if [ "\$par_delete" == "true" ]; then
+  extra_params+=( "--delete" )
+fi
+
+if [ ! -z \${par_exclude+x} ]; then
+  IFS=":"
+  for var in \$par_exclude; do
+    unset IFS
+    extra_params+=( "--exclude" "\$var" )
+  done
+fi
+
+
+# Disable the use of the Amazon EC2 instance metadata service (IMDS).
+# see https://florian.ec/blog/github-actions-awscli-errors/
+# or https://github.com/aws/aws-cli/issues/5234#issuecomment-705831465
+export AWS_EC2_METADATA_DISABLED=true
+
+aws s3 sync "\$par_input" "\$par_output" --no-sign-request "\${extra_params[@]}"
+VIASHMAIN
+bash "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/dimensionality_reduction/methods/densmap/.config.vsh.yaml b/target/native/dimensionality_reduction/methods/densmap/.config.vsh.yaml
new file mode 100644
index 0000000000..0e49d41338
--- /dev/null
+++ b/target/native/dimensionality_reduction/methods/densmap/.config.vsh.yaml
@@ -0,0 +1,203 @@
+functionality:
+  name: "densmap"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to subset to. If not specified,\
+      \ the input matrix will not be subset."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pca_dims"
+    description: "Number of PCA dimensions to use. If not specified, no PCA will be\
+      \ performed."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "densMAP"
+    summary: "Modified UMAP with preservation of local density information"
+    description: "A modification of UMAP that adds an extra cost term in order to\
+      \ preserve information about the relative local density of the data. It is performed\
+      \ on the same inputs as UMAP."
+    reference: "narayan2021assessing"
+    repository_url: "https://github.com/lmcinnes/umap"
+    documentation_url: "https://github.com/lmcinnes/umap#readme"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/umap.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      densmap_logCP10k: null
+      densmap_pca_logCP10k:
+        n_pca_dims: 50
+      densmap_logCP10k_1kHVG:
+        n_hvg: 1000
+      densmap_pca_logCP10k_1kHVG:
+        n_pca_dims: 50
+        n_hvg: 1000
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "umap-learn"
+    - "pynndescent==0.5.11"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/densmap/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/dimensionality_reduction/methods/densmap"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/dimensionality_reduction/methods/densmap/densmap"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/dimensionality_reduction/methods/densmap/densmap b/target/native/dimensionality_reduction/methods/densmap/densmap
new file mode 100755
index 0000000000..5bdc5f0323
--- /dev/null
+++ b/target/native/dimensionality_reduction/methods/densmap/densmap
@@ -0,0 +1,541 @@
+#!/usr/bin/env bash
+
+# densmap 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="densmap"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "densmap 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --n_hvg"
+  echo "        type: integer"
+  echo "        Number of highly variable genes to subset to. If not specified, the"
+  echo "        input matrix will not be subset."
+  echo ""
+  echo "    --n_pca_dims"
+  echo "        type: integer"
+  echo "        Number of PCA dimensions to use. If not specified, no PCA will be"
+  echo "        performed."
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "densmap 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_hvg)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_hvg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_hvg=*)
+            [ -n "$VIASH_PAR_N_HVG" ] && ViashError Bad arguments for option \'--n_hvg=*\': \'$VIASH_PAR_N_HVG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_HVG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_pca_dims)
+            [ -n "$VIASH_PAR_N_PCA_DIMS" ] && ViashError Bad arguments for option \'--n_pca_dims\': \'$VIASH_PAR_N_PCA_DIMS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCA_DIMS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_pca_dims. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_pca_dims=*)
+            [ -n "$VIASH_PAR_N_PCA_DIMS" ] && ViashError Bad arguments for option \'--n_pca_dims=*\': \'$VIASH_PAR_N_PCA_DIMS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCA_DIMS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_HVG" ]]; then
+  if ! [[ "$VIASH_PAR_N_HVG" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_hvg' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_PCA_DIMS" ]]; then
+  if ! [[ "$VIASH_PAR_N_PCA_DIMS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_pca_dims' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-densmap-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+from umap import UMAP
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_pca_dims': $( if [ ! -z ${VIASH_PAR_N_PCA_DIMS+x} ]; then echo "int(r'${VIASH_PAR_N_PCA_DIMS//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+if par["n_pca_dims"]:
+    print("Apply PCA to normalized data", flush=True)
+    umap_input = sc.tl.pca(
+        X_mat,
+        n_comps=par["n_pca_dims"],
+        svd_solver="arpack"
+    )
+else:
+    print("Use normalized data as input for UMAP", flush=True)
+    umap_input = X_mat
+
+print("Run densMAP", flush=True)
+X_emb = UMAP(densmap=True, random_state=42).fit_transform(umap_input)
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/dimensionality_reduction/methods/simlr/.config.vsh.yaml b/target/native/dimensionality_reduction/methods/simlr/.config.vsh.yaml
new file mode 100644
index 0000000000..b98464b1e4
--- /dev/null
+++ b/target/native/dimensionality_reduction/methods/simlr/.config.vsh.yaml
@@ -0,0 +1,252 @@
+functionality:
+  name: "simlr"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_dim"
+    description: "Number of dimensions."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_clusters"
+    description: "Number of clusters to be estimated over the input dataset."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--tuning_param"
+    description: "Number of dimensions."
+    info: null
+    default:
+    - 10
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean"
+    name: "--impute"
+    description: "Should the input data be transposed?"
+    info: null
+    default:
+    - false
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean"
+    name: "--normalize"
+    description: "Should the input data be normalized?"
+    info: null
+    default:
+    - false
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--cores_ratio"
+    description: "Ratio of the number of cores to be used when computing the multi-kernel."
+    info: null
+    default:
+    - 1
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SIMLR"
+    summary: "Multikernel-based learning of distance metrics from gene expression\
+      \ data for dimension reduction, clustering and visulaization."
+    description: "Single-cell Interpretation via Multikernel LeaRning (SIMLR) learns\
+      \ cell-to-cell similarity measures from single-cell RNA-seq data in using Gaussian\
+      \ kernels with various hyperparameters in order to perform dimension reduction,\
+      \ clustering and visualization. \nSIMLR assumes that if C separable populations\
+      \ exist among the N cells, then the similarity matrix should have an approximate\
+      \ block-diagonal structure with C blocks whereby cells have larger similarities\
+      \ to other cells within the same subpopulations. Learned similarity between\
+      \ two cells should be small if the Euclidean distance between them is large.\
+      \ The cell-to-cell similarity is computed using an optimization framework over\
+      \ an N x N similarity matrix, a low-dimensional auxilary matrix enforcing low\
+      \ rank constraint on the similarity matrix, and the kernel weights. \nDimension\
+      \ reduction is achieved by the stochastic neighbor embedding methodology with\
+      \ the learned similarities as input. \n"
+    preferred_normalization: "log_cp10k"
+    reference: "wang2017visualization"
+    documentation_url: "https://github.com/BatzoglouLabSU/SIMLR/blob/SIMLR/README.md"
+    repository_url: "https://github.com/BatzoglouLabSU/SIMLR"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    packages:
+    - "grDevices"
+    cran:
+    - "Matrix"
+    - "parallel"
+    - "Rcpp"
+    - "pracma"
+    - "RcppAnnoy"
+    - "RSpectra"
+    - "igraph"
+    bioc:
+    - "SIMLR"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/simlr/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/dimensionality_reduction/methods/simlr"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/dimensionality_reduction/methods/simlr/simlr"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/dimensionality_reduction/methods/simlr/simlr b/target/native/dimensionality_reduction/methods/simlr/simlr
new file mode 100755
index 0000000000..6f13cb7e8b
--- /dev/null
+++ b/target/native/dimensionality_reduction/methods/simlr/simlr
@@ -0,0 +1,664 @@
+#!/usr/bin/env bash
+
+# simlr 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="simlr"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "simlr 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --n_dim"
+  echo "        type: integer"
+  echo "        Number of dimensions."
+  echo ""
+  echo "    --n_clusters"
+  echo "        type: integer"
+  echo "        Number of clusters to be estimated over the input dataset."
+  echo ""
+  echo "    --tuning_param"
+  echo "        type: integer"
+  echo "        default: 10"
+  echo "        Number of dimensions."
+  echo ""
+  echo "    --impute"
+  echo "        type: boolean"
+  echo "        default: false"
+  echo "        Should the input data be transposed?"
+  echo ""
+  echo "    --normalize"
+  echo "        type: boolean"
+  echo "        default: false"
+  echo "        Should the input data be normalized?"
+  echo ""
+  echo "    --cores_ratio"
+  echo "        type: integer"
+  echo "        default: 1"
+  echo "        Ratio of the number of cores to be used when computing the multi-kernel."
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "simlr 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_dim)
+            [ -n "$VIASH_PAR_N_DIM" ] && ViashError Bad arguments for option \'--n_dim\': \'$VIASH_PAR_N_DIM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_DIM="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_dim. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_dim=*)
+            [ -n "$VIASH_PAR_N_DIM" ] && ViashError Bad arguments for option \'--n_dim=*\': \'$VIASH_PAR_N_DIM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_DIM=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_clusters)
+            [ -n "$VIASH_PAR_N_CLUSTERS" ] && ViashError Bad arguments for option \'--n_clusters\': \'$VIASH_PAR_N_CLUSTERS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_CLUSTERS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_clusters. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_clusters=*)
+            [ -n "$VIASH_PAR_N_CLUSTERS" ] && ViashError Bad arguments for option \'--n_clusters=*\': \'$VIASH_PAR_N_CLUSTERS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_CLUSTERS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --tuning_param)
+            [ -n "$VIASH_PAR_TUNING_PARAM" ] && ViashError Bad arguments for option \'--tuning_param\': \'$VIASH_PAR_TUNING_PARAM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TUNING_PARAM="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --tuning_param. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --tuning_param=*)
+            [ -n "$VIASH_PAR_TUNING_PARAM" ] && ViashError Bad arguments for option \'--tuning_param=*\': \'$VIASH_PAR_TUNING_PARAM\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_TUNING_PARAM=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --impute)
+            [ -n "$VIASH_PAR_IMPUTE" ] && ViashError Bad arguments for option \'--impute\': \'$VIASH_PAR_IMPUTE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_IMPUTE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --impute. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --impute=*)
+            [ -n "$VIASH_PAR_IMPUTE" ] && ViashError Bad arguments for option \'--impute=*\': \'$VIASH_PAR_IMPUTE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_IMPUTE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --normalize)
+            [ -n "$VIASH_PAR_NORMALIZE" ] && ViashError Bad arguments for option \'--normalize\': \'$VIASH_PAR_NORMALIZE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORMALIZE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --normalize. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --normalize=*)
+            [ -n "$VIASH_PAR_NORMALIZE" ] && ViashError Bad arguments for option \'--normalize=*\': \'$VIASH_PAR_NORMALIZE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NORMALIZE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --cores_ratio)
+            [ -n "$VIASH_PAR_CORES_RATIO" ] && ViashError Bad arguments for option \'--cores_ratio\': \'$VIASH_PAR_CORES_RATIO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CORES_RATIO="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --cores_ratio. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --cores_ratio=*)
+            [ -n "$VIASH_PAR_CORES_RATIO" ] && ViashError Bad arguments for option \'--cores_ratio=*\': \'$VIASH_PAR_CORES_RATIO\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CORES_RATIO=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT+x} ]; then
+  ViashError '--input' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_TUNING_PARAM+x} ]; then
+  VIASH_PAR_TUNING_PARAM="10"
+fi
+if [ -z ${VIASH_PAR_IMPUTE+x} ]; then
+  VIASH_PAR_IMPUTE="false"
+fi
+if [ -z ${VIASH_PAR_NORMALIZE+x} ]; then
+  VIASH_PAR_NORMALIZE="false"
+fi
+if [ -z ${VIASH_PAR_CORES_RATIO+x} ]; then
+  VIASH_PAR_CORES_RATIO="1"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_DIM" ]]; then
+  if ! [[ "$VIASH_PAR_N_DIM" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_dim' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_CLUSTERS" ]]; then
+  if ! [[ "$VIASH_PAR_N_CLUSTERS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_clusters' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_TUNING_PARAM" ]]; then
+  if ! [[ "$VIASH_PAR_TUNING_PARAM" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--tuning_param' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_IMPUTE" ]]; then
+  if ! [[ "$VIASH_PAR_IMPUTE" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--impute' has to be a boolean. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_NORMALIZE" ]]; then
+  if ! [[ "$VIASH_PAR_NORMALIZE" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--normalize' has to be a boolean. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_CORES_RATIO" ]]; then
+  if ! [[ "$VIASH_PAR_CORES_RATIO" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--cores_ratio' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-simlr-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("SIMLR", quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "n_dim" = $( if [ ! -z ${VIASH_PAR_N_DIM+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_DIM" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "n_clusters" = $( if [ ! -z ${VIASH_PAR_N_CLUSTERS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_CLUSTERS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "tuning_param" = $( if [ ! -z ${VIASH_PAR_TUNING_PARAM+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_TUNING_PARAM" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "impute" = $( if [ ! -z ${VIASH_PAR_IMPUTE+x} ]; then echo -n "as.logical(toupper('"; echo -n "$VIASH_PAR_IMPUTE" | sed "s#['\\]#\\\\&#g"; echo "'))"; else echo NULL; fi ),
+  "normalize" = $( if [ ! -z ${VIASH_PAR_NORMALIZE+x} ]; then echo -n "as.logical(toupper('"; echo -n "$VIASH_PAR_NORMALIZE" | sed "s#['\\]#\\\\&#g"; echo "'))"; else echo NULL; fi ),
+  "cores_ratio" = $( if [ ! -z ${VIASH_PAR_CORES_RATIO+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_CORES_RATIO" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading input files\\n")
+input <- anndata::read_h5ad(par\$input)
+
+X <- t(as.matrix(input\$layers[["normalized"]]))
+
+if (is.null(par\$n_clusters)) {
+  cat("Estimating the number of clusters\\n")
+  set.seed(1)
+  NUMC = 2:5
+  estimates <- SIMLR::SIMLR_Estimate_Number_of_Clusters(
+    X = X,
+    NUMC = NUMC,
+    cores.ratio = par\$cores_ratio
+  )
+  n_clusters <- NUMC[which.min(estimates\$K2)]
+} else {
+  n_clusters <- par\$n_clusters
+}
+
+if (is.null(par\$n_dim)) {
+  n_dim <- NA
+} else {
+  n_dim <- par\$n_dim
+}
+
+cat("Running SIMLR\\n")
+simlr_result <- SIMLR::SIMLR(
+  X = X,
+  c = n_clusters,
+  no.dim = n_dim,
+  k = par\$tuning_param,
+  if.impute = par\$impute,
+  normalize = par\$normalize,
+  cores.ratio = par\$cores_ratio
+)
+obsm_X_emb <- simlr_result\$ydata
+
+cat("Write output AnnData to file\\n")
+output <- anndata::AnnData(
+  uns = list(
+    dataset_id = input\$uns[["dataset_id"]],
+    method_id = meta\$functionality_name,
+    normalization_id = input\$uns[["normalization_id"]]
+  ),
+  obsm = list(
+    X_emb = obsm_X_emb
+  ),
+  shape = input\$shape
+)
+output\$write_h5ad(par\$output, compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/dimensionality_reduction/metrics/clustering_performance/.config.vsh.yaml b/target/native/dimensionality_reduction/metrics/clustering_performance/.config.vsh.yaml
new file mode 100644
index 0000000000..6e8a4e577c
--- /dev/null
+++ b/target/native/dimensionality_reduction/metrics/clustering_performance/.config.vsh.yaml
@@ -0,0 +1,299 @@
+functionality:
+  name: "clustering_performance"
+  namespace: "dimensionality_reduction/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_embedding"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Test data"
+      summary: "The data for evaluating a dimensionality reduction."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--nmi_avg_method"
+    description: "Method to compute normalizer in the denominator for normalized mutual\
+      \ information score calculation."
+    info: null
+    default:
+    - "arithmetic"
+    required: false
+    choices:
+    - "min"
+    - "geometric"
+    - "arithmetic"
+    - "max"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "normalized_mutual_information"
+      label: "NMI"
+      summary: "Normalized Mutual Information (NMI) is a measure of the concordance\
+        \ between clustering obtained from the reduced-dimensional embeddings and\
+        \ the cell labels."
+      description: "The Normalized Mutual Information (NMI) is a measure of the similarity\
+        \ between cluster labels obtained from the clustering of dimensionality reduction\
+        \ embeddings and the true cell labels. It is a normalization of the Mutual\
+        \ Information (MI) score to scale the results between 0 (no mutual information)\
+        \ and 1 (perfect correlation). \nMutual Information quantifies the \"amount\
+        \ of information\" obtained about one random variable by observing the other\
+        \ random variable. Assuming two label assignments X and Y, it is given by:\
+        \ \n  $MI(X,Y) = \\sum_{x=1}^{X}\\sum_{y=1}^{Y}p(x,y)log(\\frac{P(x,y)}{P(x)P'(y)})$,\
+        \ \nwhere P(x,y) is the joint probability mass function of X and Y, and P(x),\
+        \ P'(y) are the marginal probability mass functions of X and Y respectively.\
+        \ The mutual information is normalized by some generalized mean of H(X) and\
+        \ H(Y). Therefore, Normalized Mutual Information can be defined as: \n  $NMI(X,Y)\
+        \ = \\frac{MI(X,Y)}{mean(H(X),H(Y))}$, \nwhere H(X) and H(Y) are the entropies\
+        \ of X and Y respectively. Higher NMI score suggests that the method is effective\
+        \ in preserving relevant information.\n"
+      reference: "emmons2016analysis"
+      documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.normalized_mutual_info_score.html"
+      repository_url: "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.normalized_mutual_info_score.html"
+      min: 0
+      max: 1
+      maximize: true
+    - name: "adjusted_rand_index"
+      label: "ARI"
+      summary: "Adjusted Rand Index (ARI) is a measure of the similarities between\
+        \ two cluster assignments of the reduced-dimensional embeddings and the true\
+        \ cell types."
+      description: "Adjusted Rand Index (ARI) is a measure of similarity between two\
+        \ clusterings by considering all pairs of samples and counting pairs that\
+        \ are assigned in the same or different clusters in the predicted (from the\
+        \ reduced dimensional embeddings) and true clusterings (cell type labels).\
+        \ It is the Rand Index (RI) adjusted for chance.\nAssuming the C as the cell\
+        \ type labels and K as the clustering of the reduced dimensional embedding,\
+        \ Rand Index can be defined as:\n  $RI = \\frac{a + b}{{C}_{2}^{n_{samples}}}$,\n\
+        where 'a' is the number of pairs of elements that are in the same set in C\
+        \ and in the same set in K, 'b' is the number of pairs of elements that are\
+        \ in different sets in C and in different sets in K, and ${C}_{2}^{n_{samples}}$\
+        \ is the total number of possible pairs in the dataset. Random label assignments\
+        \ can be discounted as follows: \n  $ARI = \\frac{RI - E[RI]}{max(RI) - E[RI]}$,\
+        \ \nwhere E[RI] is the expected RI of random labellings.\n"
+      reference: "santos2009on"
+      documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html#sklearn.metrics.adjusted_rand_score"
+      repository_url: "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html#sklearn.metrics.adjusted_rand_score"
+      min: 0
+      max: 1
+      maximize: true
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A dimensionality reduction metric."
+      description: "A metric for evaluating dimensionality reductions.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    - "scanpy"
+    - "leidenalg"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/clustering_performance/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/dimensionality_reduction/metrics/clustering_performance"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/dimensionality_reduction/metrics/clustering_performance/clustering_performance"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/dimensionality_reduction/metrics/clustering_performance/clustering_performance b/target/native/dimensionality_reduction/metrics/clustering_performance/clustering_performance
new file mode 100755
index 0000000000..744bb9236a
--- /dev/null
+++ b/target/native/dimensionality_reduction/metrics/clustering_performance/clustering_performance
@@ -0,0 +1,565 @@
+#!/usr/bin/env bash
+
+# clustering_performance 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="clustering_performance"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "clustering_performance 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_embedding"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example: resources_test/dimensionality_reduction/pancreas/score.h5ad"
+  echo ""
+  echo "    --nmi_avg_method"
+  echo "        type: string"
+  echo "        default: arithmetic"
+  echo "        choices: [ min, geometric, arithmetic, max ]"
+  echo "        Method to compute normalizer in the denominator for normalized mutual"
+  echo "        information score calculation."
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "clustering_performance 2.0.0"
+            exit
+            ;;
+        --input_embedding)
+            [ -n "$VIASH_PAR_INPUT_EMBEDDING" ] && ViashError Bad arguments for option \'--input_embedding\': \'$VIASH_PAR_INPUT_EMBEDDING\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_EMBEDDING="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_embedding. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_embedding=*)
+            [ -n "$VIASH_PAR_INPUT_EMBEDDING" ] && ViashError Bad arguments for option \'--input_embedding=*\': \'$VIASH_PAR_INPUT_EMBEDDING\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_EMBEDDING=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --nmi_avg_method)
+            [ -n "$VIASH_PAR_NMI_AVG_METHOD" ] && ViashError Bad arguments for option \'--nmi_avg_method\': \'$VIASH_PAR_NMI_AVG_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NMI_AVG_METHOD="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --nmi_avg_method. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --nmi_avg_method=*)
+            [ -n "$VIASH_PAR_NMI_AVG_METHOD" ] && ViashError Bad arguments for option \'--nmi_avg_method=*\': \'$VIASH_PAR_NMI_AVG_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NMI_AVG_METHOD=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_EMBEDDING+x} ]; then
+  ViashError '--input_embedding' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_NMI_AVG_METHOD+x} ]; then
+  VIASH_PAR_NMI_AVG_METHOD="arithmetic"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_EMBEDDING" ] && [ ! -e "$VIASH_PAR_INPUT_EMBEDDING" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_EMBEDDING' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# check whether value is belongs to a set of choices
+if [ ! -z "$VIASH_PAR_NMI_AVG_METHOD" ]; then
+  VIASH_PAR_NMI_AVG_METHOD_CHOICES=("min:geometric:arithmetic:max")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_NMI_AVG_METHOD_CHOICES[*]}:" =~ ":$VIASH_PAR_NMI_AVG_METHOD:" ]]; then
+    ViashError '--nmi_avg_method' specified value of \'$VIASH_PAR_NMI_AVG_METHOD\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-clustering_performance-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import scanpy as sc
+from sklearn.cluster import KMeans
+from sklearn.metrics import normalized_mutual_info_score
+from sklearn.metrics import adjusted_rand_score
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_embedding': $( if [ ! -z ${VIASH_PAR_INPUT_EMBEDDING+x} ]; then echo "r'${VIASH_PAR_INPUT_EMBEDDING//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'nmi_avg_method': $( if [ ! -z ${VIASH_PAR_NMI_AVG_METHOD+x} ]; then echo "r'${VIASH_PAR_NMI_AVG_METHOD//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_embedding = ad.read_h5ad(par['input_embedding'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Compute metrics', flush=True)
+
+# Perform Leiden clustering on dimensionlity reduction embedding
+n = 20
+resolutions = [2 * x / n for x in range(1, n + 1)]
+score_max = 0
+res_max = resolutions[0]
+key_max = None
+score_all = []
+
+if "neighbors" not in input_embedding.uns:
+  sc.pp.neighbors(input_embedding, use_rep="X_emb")
+
+for res in resolutions:
+  key_added = f"X_emb_leiden_{res}"
+  sc.tl.leiden(input_embedding, resolution=res, key_added=key_added)
+  score = normalized_mutual_info_score(input_solution.obs["cell_type"], input_embedding.obs[key_added], average_method = par['nmi_avg_method'])
+  score_all.append(score)
+
+  if score_max < score:
+    score_max = score
+    res_max = res
+    key_max = key_added
+
+# Compute NMI scores
+nmi = normalized_mutual_info_score(input_solution.obs["cell_type"], input_embedding.obs[key_max], average_method = par['nmi_avg_method'])
+
+# Compute ARI scores
+ari = adjusted_rand_score(input_solution.obs["cell_type"], input_embedding.obs[key_max])
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  uns={
+    'dataset_id': input_embedding.uns['dataset_id'],
+    'normalization_id': input_embedding.uns['normalization_id'],
+    'method_id': input_embedding.uns['method_id'],
+    'metric_ids': [ 'normalized_mutual_information', 'adjusted_rand_index' ],
+    'metric_values': [ nmi, ari ]
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatial_decomposition/control_methods/random_proportions/.config.vsh.yaml b/target/native/spatial_decomposition/control_methods/random_proportions/.config.vsh.yaml
new file mode 100644
index 0000000000..edeb22b14b
--- /dev/null
+++ b/target/native/spatial_decomposition/control_methods/random_proportions/.config.vsh.yaml
@@ -0,0 +1,282 @@
+functionality:
+  name: "random_proportions"
+  namespace: "spatial_decomposition/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, with true cell-type proportions for each spot / capture location."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_true"
+          description: "True cell type proportions for each spot"
+          required: true
+        uns:
+        - name: "cell_type_names"
+          type: "string"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - name: "dataset_id"
+          type: "string"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Random Proportions"
+    summary: "Negative control method that randomly assigns celltype proportions from\
+      \ a Dirichlet distribution."
+    description: "A negative control method with random assignment of predicted celltype\
+      \ proportions from a Dirichlet distribution.\n"
+    preferred_normalization: "counts"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "Control methods have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/control_methods/random_proportions/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/control_methods/random_proportions"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/control_methods/random_proportions/random_proportions"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatial_decomposition/control_methods/random_proportions/random_proportions b/target/native/spatial_decomposition/control_methods/random_proportions/random_proportions
new file mode 100755
index 0000000000..8e2c8fe412
--- /dev/null
+++ b/target/native/spatial_decomposition/control_methods/random_proportions/random_proportions
@@ -0,0 +1,535 @@
+#!/usr/bin/env bash
+
+# random_proportions 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="random_proportions"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "random_proportions 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "random_proportions 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-random_proportions-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial_masked = ad.read_h5ad(par['input_spatial_masked'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Generate predictions', flush=True)
+label_distribution = input_single_cell.obs["cell_type"].value_counts()
+input_spatial_masked.obsm["proportions_pred"] = np.random.dirichlet(label_distribution, size=input_spatial_masked.shape[0])
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial_masked.obs[[]],
+  var=input_spatial_masked.var[[]],
+  uns={
+    'cell_type_names': input_spatial_masked.uns['cell_type_names'],
+    'dataset_id': input_spatial_masked.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial_masked.obsm['coordinates'],
+    'proportions_pred': input_spatial_masked.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial_masked.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatial_decomposition/control_methods/true_proportions/.config.vsh.yaml b/target/native/spatial_decomposition/control_methods/true_proportions/.config.vsh.yaml
new file mode 100644
index 0000000000..7bb4fb0f88
--- /dev/null
+++ b/target/native/spatial_decomposition/control_methods/true_proportions/.config.vsh.yaml
@@ -0,0 +1,276 @@
+functionality:
+  name: "true_proportions"
+  namespace: "spatial_decomposition/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, with true cell-type proportions for each spot / capture location."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_true"
+          description: "True cell type proportions for each spot"
+          required: true
+        uns:
+        - name: "cell_type_names"
+          type: "string"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - name: "dataset_id"
+          type: "string"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "True Proportions"
+    summary: "Positive control method that assigns celltype proportions from the ground\
+      \ truth."
+    description: "A positive control method with perfect assignment of predicted celltype\
+      \ proportions from the ground truth.\n"
+    preferred_normalization: "counts"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "Control methods have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/control_methods/true_proportions/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/control_methods/true_proportions"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/control_methods/true_proportions/true_proportions"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatial_decomposition/control_methods/true_proportions/true_proportions b/target/native/spatial_decomposition/control_methods/true_proportions/true_proportions
new file mode 100755
index 0000000000..8f7cdfea9b
--- /dev/null
+++ b/target/native/spatial_decomposition/control_methods/true_proportions/true_proportions
@@ -0,0 +1,533 @@
+#!/usr/bin/env bash
+
+# true_proportions 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="true_proportions"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "true_proportions 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "true_proportions 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-true_proportions-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial_masked = ad.read_h5ad(par['input_spatial_masked'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Generate predictions', flush=True)
+input_spatial_masked.obsm["proportions_pred"] = input_solution.obsm["proportions_true"]
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial_masked.obs[[]],
+  var=input_spatial_masked.var[[]],
+  uns={
+    'cell_type_names': input_spatial_masked.uns['cell_type_names'],
+    'dataset_id': input_spatial_masked.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial_masked.obsm['coordinates'],
+    'proportions_pred': input_spatial_masked.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial_masked.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatial_decomposition/dataset_simulator/.config.vsh.yaml b/target/native/spatial_decomposition/dataset_simulator/.config.vsh.yaml
new file mode 100644
index 0000000000..02301cc1ef
--- /dev/null
+++ b/target/native/spatial_decomposition/dataset_simulator/.config.vsh.yaml
@@ -0,0 +1,318 @@
+functionality:
+  name: "dataset_simulator"
+  namespace: "spatial_decomposition"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    description: "Single-cell reference dataset"
+    info:
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs."
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: false
+        - type: "integer"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: false
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+    example:
+    - "resources_test/common/cxg_mouse_pancreas_atlas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--alpha"
+    description: "Alpha value to use for generating synthetic dataset"
+    info: null
+    default:
+    - 1.0
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_obs"
+    description: "Number of spatial observations to generate. Default value is 100."
+    info: null
+    default:
+    - 100
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--cell_lb"
+    description: "Lower bound for number of cells at each spot. Default value is 10."
+    info: null
+    default:
+    - 10
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--cell_ub"
+    description: "Upper bound for number of cells at each spot. Default value is 30."
+    info: null
+    default:
+    - 30
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--umi_lb"
+    description: "Lower bound for number of cells at each spot. Default value is 1000."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--umi_ub"
+    description: "Upper bound for number of UMIs at each spot. Default value is 5000."
+    info: null
+    default:
+    - 5000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--simulated_data"
+    description: "Simulated dataset"
+    info:
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs."
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: false
+        - type: "integer"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: false
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: false
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot."
+          required: true
+        - type: "double"
+          name: "proportions_true"
+          description: "True cell type proportions for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`."
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+    example:
+    - "dataset_simulated.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/common/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/common/cxg_mouse_pancreas_atlas"
+  info:
+    type: "dataset_simulator"
+    type_info:
+      label: "Dataset simulator"
+      summary: "Simulate cell aggregates from single-cell data."
+      description: "The dataset simulator creates cell-aggregates from the single-cell\
+        \ dataset by sampling from a Dirichlet distribution. The simulated data consists\
+        \ of the the spatial expression matrix, the XY coordinates of the spots, the\
+        \ cell-type proportions in each spot, and the reference single-cell data.\n"
+      variants:
+        alpha_1:
+          alpha: 1
+        alpha_5:
+          alpha: 5
+        alpha_0_5:
+          alpha: 0.5
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    - "scanpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+- type: "native"
+  id: "native"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/dataset_simulator/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/dataset_simulator"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/dataset_simulator/dataset_simulator"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatial_decomposition/dataset_simulator/dataset_simulator b/target/native/spatial_decomposition/dataset_simulator/dataset_simulator
new file mode 100755
index 0000000000..283de7b57b
--- /dev/null
+++ b/target/native/spatial_decomposition/dataset_simulator/dataset_simulator
@@ -0,0 +1,796 @@
+#!/usr/bin/env bash
+
+# dataset_simulator 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="dataset_simulator"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "dataset_simulator 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input"
+  echo "        type: file, file must exist"
+  echo "        example: resources_test/common/cxg_mouse_pancreas_atlas/dataset.h5ad"
+  echo "        Single-cell reference dataset"
+  echo ""
+  echo "    --alpha"
+  echo "        type: double"
+  echo "        default: 1.0"
+  echo "        Alpha value to use for generating synthetic dataset"
+  echo ""
+  echo "    --n_obs"
+  echo "        type: integer"
+  echo "        default: 100"
+  echo "        Number of spatial observations to generate. Default value is 100."
+  echo ""
+  echo "    --cell_lb"
+  echo "        type: integer"
+  echo "        default: 10"
+  echo "        Lower bound for number of cells at each spot. Default value is 10."
+  echo ""
+  echo "    --cell_ub"
+  echo "        type: integer"
+  echo "        default: 30"
+  echo "        Upper bound for number of cells at each spot. Default value is 30."
+  echo ""
+  echo "    --umi_lb"
+  echo "        type: integer"
+  echo "        default: 1000"
+  echo "        Lower bound for number of cells at each spot. Default value is 1000."
+  echo ""
+  echo "    --umi_ub"
+  echo "        type: integer"
+  echo "        default: 5000"
+  echo "        Upper bound for number of UMIs at each spot. Default value is 5000."
+  echo ""
+  echo "    --simulated_data"
+  echo "        type: file, output, file must exist"
+  echo "        example: dataset_simulated.h5ad"
+  echo "        Simulated dataset"
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "dataset_simulator 2.0.0"
+            exit
+            ;;
+        --input)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input=*)
+            [ -n "$VIASH_PAR_INPUT" ] && ViashError Bad arguments for option \'--input=*\': \'$VIASH_PAR_INPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --alpha)
+            [ -n "$VIASH_PAR_ALPHA" ] && ViashError Bad arguments for option \'--alpha\': \'$VIASH_PAR_ALPHA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_ALPHA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --alpha. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --alpha=*)
+            [ -n "$VIASH_PAR_ALPHA" ] && ViashError Bad arguments for option \'--alpha=*\': \'$VIASH_PAR_ALPHA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_ALPHA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_obs)
+            [ -n "$VIASH_PAR_N_OBS" ] && ViashError Bad arguments for option \'--n_obs\': \'$VIASH_PAR_N_OBS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_OBS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_obs. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_obs=*)
+            [ -n "$VIASH_PAR_N_OBS" ] && ViashError Bad arguments for option \'--n_obs=*\': \'$VIASH_PAR_N_OBS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_OBS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --cell_lb)
+            [ -n "$VIASH_PAR_CELL_LB" ] && ViashError Bad arguments for option \'--cell_lb\': \'$VIASH_PAR_CELL_LB\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CELL_LB="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --cell_lb. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --cell_lb=*)
+            [ -n "$VIASH_PAR_CELL_LB" ] && ViashError Bad arguments for option \'--cell_lb=*\': \'$VIASH_PAR_CELL_LB\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CELL_LB=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --cell_ub)
+            [ -n "$VIASH_PAR_CELL_UB" ] && ViashError Bad arguments for option \'--cell_ub\': \'$VIASH_PAR_CELL_UB\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CELL_UB="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --cell_ub. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --cell_ub=*)
+            [ -n "$VIASH_PAR_CELL_UB" ] && ViashError Bad arguments for option \'--cell_ub=*\': \'$VIASH_PAR_CELL_UB\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_CELL_UB=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --umi_lb)
+            [ -n "$VIASH_PAR_UMI_LB" ] && ViashError Bad arguments for option \'--umi_lb\': \'$VIASH_PAR_UMI_LB\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_UMI_LB="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --umi_lb. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --umi_lb=*)
+            [ -n "$VIASH_PAR_UMI_LB" ] && ViashError Bad arguments for option \'--umi_lb=*\': \'$VIASH_PAR_UMI_LB\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_UMI_LB=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --umi_ub)
+            [ -n "$VIASH_PAR_UMI_UB" ] && ViashError Bad arguments for option \'--umi_ub\': \'$VIASH_PAR_UMI_UB\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_UMI_UB="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --umi_ub. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --umi_ub=*)
+            [ -n "$VIASH_PAR_UMI_UB" ] && ViashError Bad arguments for option \'--umi_ub=*\': \'$VIASH_PAR_UMI_UB\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_UMI_UB=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --simulated_data)
+            [ -n "$VIASH_PAR_SIMULATED_DATA" ] && ViashError Bad arguments for option \'--simulated_data\': \'$VIASH_PAR_SIMULATED_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SIMULATED_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --simulated_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --simulated_data=*)
+            [ -n "$VIASH_PAR_SIMULATED_DATA" ] && ViashError Bad arguments for option \'--simulated_data=*\': \'$VIASH_PAR_SIMULATED_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SIMULATED_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_ALPHA+x} ]; then
+  VIASH_PAR_ALPHA="1.0"
+fi
+if [ -z ${VIASH_PAR_N_OBS+x} ]; then
+  VIASH_PAR_N_OBS="100"
+fi
+if [ -z ${VIASH_PAR_CELL_LB+x} ]; then
+  VIASH_PAR_CELL_LB="10"
+fi
+if [ -z ${VIASH_PAR_CELL_UB+x} ]; then
+  VIASH_PAR_CELL_UB="30"
+fi
+if [ -z ${VIASH_PAR_UMI_LB+x} ]; then
+  VIASH_PAR_UMI_LB="1000"
+fi
+if [ -z ${VIASH_PAR_UMI_UB+x} ]; then
+  VIASH_PAR_UMI_UB="5000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT" ] && [ ! -e "$VIASH_PAR_INPUT" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_ALPHA" ]]; then
+  if ! [[ "$VIASH_PAR_ALPHA" =~ ^[-+]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)([eE][-+]?[0-9]+)?$ ]]; then
+    ViashError '--alpha' has to be a double. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_OBS" ]]; then
+  if ! [[ "$VIASH_PAR_N_OBS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_obs' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_CELL_LB" ]]; then
+  if ! [[ "$VIASH_PAR_CELL_LB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--cell_lb' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_CELL_UB" ]]; then
+  if ! [[ "$VIASH_PAR_CELL_UB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--cell_ub' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_UMI_LB" ]]; then
+  if ! [[ "$VIASH_PAR_UMI_LB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--umi_lb' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_UMI_UB" ]]; then
+  if ! [[ "$VIASH_PAR_UMI_UB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--umi_ub' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_SIMULATED_DATA" ] && [ ! -d "$(dirname "$VIASH_PAR_SIMULATED_DATA")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_SIMULATED_DATA")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-dataset_simulator-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+from typing import Sequence
+from typing import Union
+
+import anndata as ad
+import numpy as np
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'alpha': $( if [ ! -z ${VIASH_PAR_ALPHA+x} ]; then echo "float(r'${VIASH_PAR_ALPHA//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_obs': $( if [ ! -z ${VIASH_PAR_N_OBS+x} ]; then echo "int(r'${VIASH_PAR_N_OBS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'cell_lb': $( if [ ! -z ${VIASH_PAR_CELL_LB+x} ]; then echo "int(r'${VIASH_PAR_CELL_LB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'cell_ub': $( if [ ! -z ${VIASH_PAR_CELL_UB+x} ]; then echo "int(r'${VIASH_PAR_CELL_UB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'umi_lb': $( if [ ! -z ${VIASH_PAR_UMI_LB+x} ]; then echo "int(r'${VIASH_PAR_UMI_LB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'umi_ub': $( if [ ! -z ${VIASH_PAR_UMI_UB+x} ]; then echo "int(r'${VIASH_PAR_UMI_UB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'simulated_data': $( if [ ! -z ${VIASH_PAR_SIMULATED_DATA+x} ]; then echo "r'${VIASH_PAR_SIMULATED_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+CELLTYPE_MIN_CELLS = 25
+
+# Reading input dataset
+adata = ad.read_h5ad(par['input'])
+
+
+def generate_synthetic_dataset(
+    adata: ad.AnnData,
+    alpha: Union[float, Sequence] = 1.0,
+    n_obs: int = 1000,
+    cell_lb: int = 10,
+    cell_ub: int = 30,
+    umi_lb: int = 1000,
+    umi_ub: int = 5000,
+) -> ad.AnnData:
+    """Create cell-aggregate samples for ground-truth spatial decomposition task.
+
+    Parameters
+    ----------
+    adata: AnnData
+        Anndata object.
+    type_column: str
+        name of column in \`adata.obs\` where cell type labels are given
+    alpha: Union[float,Sequence]
+        alpha value in dirichlet distribution. If single number then all alpha_i values
+        will be set to this value. Default value is 1.
+    n_obs: int
+        number of spatial observations to generate. Default value is 1000.
+    cell_lb: int
+        lower bound for number of cells at each spot. Default value is 10.
+    cell_ub: int
+        upper bound for number of cells at each spot. Default value is 30.
+    umi_lb: int
+        lower bound for number of UMIs at each spot. Default value is 10.
+    umi_ub: int
+        upper bound for number of UMIs at each spot. Default value is 30.
+
+    Returns
+    -------
+    AnnData with:
+        - \`adata_merged.X\`: simulated counts (aggregate of sc dataset).
+        - \`adata_merged.obsm["proportions_true"]\`: true proportion values.
+        - \`adata_merged.obsm["coordinates"]\`: coordinates of each spot.
+        - \`adata_merged.obsm["n_cells"]\`: number of cells from each type at every location.
+
+    """
+    
+    # remove rare celltypes
+    adata = filter_celltypes(adata)
+
+    # set random generator seed
+    rng = np.random.default_rng(42)
+
+    # get single cell expression data
+    counts = adata.layers['counts']
+    # get cell annotations/labels
+    labels = adata.obs['cell_type'].values
+    # get unique labels
+    uni_labs = np.unique(labels)
+    # count number of labels
+    n_labs = len(uni_labs)
+    # get number of genes
+    n_genes = adata.shape[1]
+
+    # create dict with indices of each label
+    label_indices = dict()
+    for label in uni_labs:
+        label_indices[label] = np.where(labels == label)[0]
+
+    # adjust alpha to vector if single scalar
+    if not hasattr(alpha, "__len__"):
+        alpha = np.ones(n_labs) * alpha
+    else:
+        assert len(alpha) == n_labs, "alpha must be same size as number of cell types"
+
+    # generate probability of sampling label at each spot
+    sp_props = rng.dirichlet(alpha, size=n_obs)
+    # number of cells present at each spot
+    n_cells = rng.integers(cell_lb, cell_ub, size=n_obs)
+
+    # initialize spatial expression matrix
+    sp_x = np.zeros((n_obs, n_genes))
+    # initialize spatial proportion matrix
+    sp_p = np.zeros((n_obs, n_labs))
+    # initialize spatial cell number matrix
+    sp_c = np.zeros(sp_p.shape)
+
+    # generate expression vector for each spot (s)
+    for s in range(n_obs):
+        # number of cells from each label at s
+        raw_s = rng.multinomial(n_cells[s], pvals=sp_props[s, :])
+        # store number of cells from each type at s
+        sp_c[s, :] = raw_s
+        # compute proportion of each type at s
+        prop_s = raw_s / n_cells[s]
+        # store proportion of each type at s
+        sp_p[s, :] = prop_s
+
+        # initialize transcript pool at s
+        pool_s = np.zeros(n_genes)
+
+        # add molecules to transcript pool
+        for lab, n in enumerate(raw_s):
+            # get indices of cells from which transcripts should be added
+            idx_sl = rng.choice(label_indices[uni_labs[lab]], size=n)
+            # add molecules to pool
+            pool_s += counts[idx_sl, :].sum(axis=0).A.flatten()
+
+        # number of UMIs at spot s
+        n_umis = rng.integers(umi_lb, umi_ub)
+        # compute probability of sampling UMI from gene
+        prob_pool_s = pool_s / pool_s.sum()
+
+        # sample transcripts from pool
+        sp_x[s, :] = np.random.multinomial(n=n_umis, pvals=prob_pool_s)
+
+    obs_names = ["spatial_{}".format(x) for x in range(n_obs)]
+    adata_spatial = ad.AnnData(
+        sp_x,
+        obs=dict(obs_names=obs_names),
+        var=dict(var_names=adata.var_names),
+    )
+
+    # fake coordinates
+    adata_spatial.obsm["coordinates"] = rng.random((adata_spatial.shape[0], 2))
+    adata_spatial.obsm["proportions_true"] = sp_p
+    adata_spatial.obs["n_cells"] = n_cells
+    adata_spatial.obsm["n_cells"] = sp_c
+    
+    adata_merged = ad.concat(
+        {"sc": adata, "sp": adata_spatial}, 
+        label="modality",
+        join="outer", 
+        index_unique=None, 
+        merge="unique", 
+        uns_merge="unique"
+    )
+    adata_merged.X[adata_merged.X == np.inf] = adata_merged.X.max()  # remove inf
+    adata_merged.layers["counts"] = adata_merged.X
+    adata_merged.uns["cell_type_names"] = uni_labs
+    return adata_merged
+
+
+def filter_celltypes(adata, min_cells=CELLTYPE_MIN_CELLS):
+    """Filter rare celltypes from an AnnData"""
+    celltype_counts = adata.obs["cell_type"].value_counts() >= min_cells
+    keep_cells = np.isin(adata.obs["cell_type"], celltype_counts.index[celltype_counts])
+    return adata[adata.obs.index[keep_cells]].copy()
+
+
+def filter_genes_cells(adata):
+    """Remove empty cells and genes."""
+    if "var_names_all" not in adata.uns:
+        # fill in original var names before filtering
+        adata.uns["var_names_all"] = adata.var.index.to_numpy()
+    sc.pp.filter_genes(adata, min_cells=1)
+    sc.pp.filter_cells(adata, min_counts=2)
+
+
+adata.X = adata.layers["counts"]
+sc.pp.filter_genes(adata, min_counts=10)
+adata_merged = generate_synthetic_dataset(adata, 
+    alpha=par['alpha'], 
+    n_obs=par['n_obs'], 
+    cell_lb=par['cell_lb'], 
+    cell_ub=par['cell_ub'], 
+    umi_lb=par['umi_lb'], 
+    umi_ub=par['umi_ub'] 
+)
+adata_merged.uns["spatial_data_summary"] = f"Dirichlet alpha={par['alpha']}"
+filter_genes_cells(adata_merged)
+adata_merged.X = None
+
+# Convert non-string objects to categoricals to avoid
+# TypeError: Can't implicitly convert non-string objects to strings
+# In this case, the error is raised when there are NA values in .obs columns with dtype object (boolean).
+# The resulting anndata object cannot be written to a file.
+# This conversion is handled in later versions of anndata (0.10)
+for col in adata_merged.obs:
+    if adata_merged.obs[col].dtype == 'object':
+        adata_merged.obs[col] = adata_merged.obs[col].astype('category')
+
+print("Writing output to file")
+adata_merged.write_h5ad(par["simulated_data"])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_SIMULATED_DATA" ] && [ ! -e "$VIASH_PAR_SIMULATED_DATA" ]; then
+  ViashError "Output file '$VIASH_PAR_SIMULATED_DATA' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatial_decomposition/methods/cell2location/.config.vsh.yaml b/target/native/spatial_decomposition/methods/cell2location/.config.vsh.yaml
new file mode 100644
index 0000000000..05c3de9bbd
--- /dev/null
+++ b/target/native/spatial_decomposition/methods/cell2location/.config.vsh.yaml
@@ -0,0 +1,340 @@
+functionality:
+  name: "cell2location"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--detection_alpha"
+    description: "Hyperparameter controlling normalisation of within-experiment variation\
+      \ in RNA detection."
+    info: null
+    default:
+    - 20.0
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_cells_per_location"
+    description: "The expected average cell abundance. It is a tissue-dependent hyper-prior\
+      \ which can be estimated from  histology images"
+    info: null
+    default:
+    - 20
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean"
+    name: "--hard_coded_reference"
+    description: "Whether to use hard-coded reference or negative binomial regression\
+      \ model to account for batch effects. Hard-coded reference used by default."
+    info: null
+    default:
+    - true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean"
+    name: "--amortised"
+    description: "Whether to use amortised inference."
+    info: null
+    default:
+    - false
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--num_samples"
+    description: "Number of samples to use for summarising posterior distribution."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--sc_batch_size"
+    description: "Batch size used to train regression model for estimation of reference\
+      \ single-cell gene expression signature."
+    info: null
+    default:
+    - 2500
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--st_batch_size"
+    description: "Batch size used to train cell2location model for spatial mapping."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_sc"
+    description: "Maximum number of epochs to train regression model for estimation\
+      \ of reference single-cell gene expression signature."
+    info: null
+    default:
+    - 250
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_st"
+    description: "Maximum number of epochs to train cell2location model for spatial\
+      \ mapping."
+    info: null
+    default:
+    - 30000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Cell2Location"
+    summary: "Cell2location uses a Bayesian model to resolve cell types in spatial\
+      \ transcriptomic data and create comprehensive cellular maps of diverse tissues."
+    description: "Cell2location is a decomposition method based on Negative Binomial\
+      \ regression that is able to account for batch effects in estimating the single-cell\
+      \ gene expression signature used for the spatial decomposition step. \nNote\
+      \ that when batch information is unavailable for this task, we can use either\
+      \ a hard-coded reference, or a negative-binomial learned reference without batch\
+      \ labels. The parameter alpha refers to the detection efficiency prior.\n"
+    preferred_normalization: "counts"
+    variants:
+      cell2location_amortised_detection_alpha_20:
+        detection_alpha: 20
+        amortised: true
+      cell2location_detection_alpha_1:
+        detection_alpha: 1
+      cell2location_detection_alpha_20:
+        detection_alpha: 20
+      cell2location_detection_alpha_20_nb:
+        detection_alpha: 20
+        hard_coded_reference: false
+      cell2location_detection_alpha_200:
+        detection_alpha: 200
+    reference: "kleshchevnikov2022cell2location"
+    documentation_url: "https://cell2location.readthedocs.io/en/latest/"
+    repository_url: "https://github.com/BayraktarLab/cell2location"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scvi-tools==1.0.4"
+    - "cell2location"
+    - "jax==0.4.23"
+    - "jaxlib==0.4.23"
+    - "scipy<1.13"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/cell2location/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/methods/cell2location"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/methods/cell2location/cell2location"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatial_decomposition/methods/cell2location/cell2location b/target/native/spatial_decomposition/methods/cell2location/cell2location
new file mode 100755
index 0000000000..3080f44f71
--- /dev/null
+++ b/target/native/spatial_decomposition/methods/cell2location/cell2location
@@ -0,0 +1,851 @@
+#!/usr/bin/env bash
+
+# cell2location 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="cell2location"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "cell2location 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+  echo ""
+  echo "    --detection_alpha"
+  echo "        type: double"
+  echo "        default: 20.0"
+  echo "        Hyperparameter controlling normalisation of within-experiment variation"
+  echo "        in RNA detection."
+  echo ""
+  echo "    --n_cells_per_location"
+  echo "        type: integer"
+  echo "        default: 20"
+  echo "        The expected average cell abundance. It is a tissue-dependent"
+  echo "        hyper-prior which can be estimated from  histology images"
+  echo ""
+  echo "    --hard_coded_reference"
+  echo "        type: boolean"
+  echo "        default: true"
+  echo "        Whether to use hard-coded reference or negative binomial regression"
+  echo "        model to account for batch effects. Hard-coded reference used by"
+  echo "        default."
+  echo ""
+  echo "    --amortised"
+  echo "        type: boolean"
+  echo "        default: false"
+  echo "        Whether to use amortised inference."
+  echo ""
+  echo "    --num_samples"
+  echo "        type: integer"
+  echo "        default: 1000"
+  echo "        Number of samples to use for summarising posterior distribution."
+  echo ""
+  echo "    --sc_batch_size"
+  echo "        type: integer"
+  echo "        default: 2500"
+  echo "        Batch size used to train regression model for estimation of reference"
+  echo "        single-cell gene expression signature."
+  echo ""
+  echo "    --st_batch_size"
+  echo "        type: integer"
+  echo "        Batch size used to train cell2location model for spatial mapping."
+  echo ""
+  echo "    --max_epochs_sc"
+  echo "        type: integer"
+  echo "        default: 250"
+  echo "        Maximum number of epochs to train regression model for estimation of"
+  echo "        reference single-cell gene expression signature."
+  echo ""
+  echo "    --max_epochs_st"
+  echo "        type: integer"
+  echo "        default: 30000"
+  echo "        Maximum number of epochs to train cell2location model for spatial"
+  echo "        mapping."
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "cell2location 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --detection_alpha)
+            [ -n "$VIASH_PAR_DETECTION_ALPHA" ] && ViashError Bad arguments for option \'--detection_alpha\': \'$VIASH_PAR_DETECTION_ALPHA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DETECTION_ALPHA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --detection_alpha. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --detection_alpha=*)
+            [ -n "$VIASH_PAR_DETECTION_ALPHA" ] && ViashError Bad arguments for option \'--detection_alpha=*\': \'$VIASH_PAR_DETECTION_ALPHA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_DETECTION_ALPHA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_cells_per_location)
+            [ -n "$VIASH_PAR_N_CELLS_PER_LOCATION" ] && ViashError Bad arguments for option \'--n_cells_per_location\': \'$VIASH_PAR_N_CELLS_PER_LOCATION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_CELLS_PER_LOCATION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_cells_per_location. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_cells_per_location=*)
+            [ -n "$VIASH_PAR_N_CELLS_PER_LOCATION" ] && ViashError Bad arguments for option \'--n_cells_per_location=*\': \'$VIASH_PAR_N_CELLS_PER_LOCATION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_CELLS_PER_LOCATION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --hard_coded_reference)
+            [ -n "$VIASH_PAR_HARD_CODED_REFERENCE" ] && ViashError Bad arguments for option \'--hard_coded_reference\': \'$VIASH_PAR_HARD_CODED_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_HARD_CODED_REFERENCE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --hard_coded_reference. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --hard_coded_reference=*)
+            [ -n "$VIASH_PAR_HARD_CODED_REFERENCE" ] && ViashError Bad arguments for option \'--hard_coded_reference=*\': \'$VIASH_PAR_HARD_CODED_REFERENCE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_HARD_CODED_REFERENCE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --amortised)
+            [ -n "$VIASH_PAR_AMORTISED" ] && ViashError Bad arguments for option \'--amortised\': \'$VIASH_PAR_AMORTISED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_AMORTISED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --amortised. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --amortised=*)
+            [ -n "$VIASH_PAR_AMORTISED" ] && ViashError Bad arguments for option \'--amortised=*\': \'$VIASH_PAR_AMORTISED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_AMORTISED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --num_samples)
+            [ -n "$VIASH_PAR_NUM_SAMPLES" ] && ViashError Bad arguments for option \'--num_samples\': \'$VIASH_PAR_NUM_SAMPLES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_SAMPLES="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --num_samples. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --num_samples=*)
+            [ -n "$VIASH_PAR_NUM_SAMPLES" ] && ViashError Bad arguments for option \'--num_samples=*\': \'$VIASH_PAR_NUM_SAMPLES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_SAMPLES=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --sc_batch_size)
+            [ -n "$VIASH_PAR_SC_BATCH_SIZE" ] && ViashError Bad arguments for option \'--sc_batch_size\': \'$VIASH_PAR_SC_BATCH_SIZE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SC_BATCH_SIZE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --sc_batch_size. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --sc_batch_size=*)
+            [ -n "$VIASH_PAR_SC_BATCH_SIZE" ] && ViashError Bad arguments for option \'--sc_batch_size=*\': \'$VIASH_PAR_SC_BATCH_SIZE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SC_BATCH_SIZE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --st_batch_size)
+            [ -n "$VIASH_PAR_ST_BATCH_SIZE" ] && ViashError Bad arguments for option \'--st_batch_size\': \'$VIASH_PAR_ST_BATCH_SIZE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_ST_BATCH_SIZE="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --st_batch_size. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --st_batch_size=*)
+            [ -n "$VIASH_PAR_ST_BATCH_SIZE" ] && ViashError Bad arguments for option \'--st_batch_size=*\': \'$VIASH_PAR_ST_BATCH_SIZE\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_ST_BATCH_SIZE=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_epochs_sc)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SC" ] && ViashError Bad arguments for option \'--max_epochs_sc\': \'$VIASH_PAR_MAX_EPOCHS_SC\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SC="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_epochs_sc. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_epochs_sc=*)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SC" ] && ViashError Bad arguments for option \'--max_epochs_sc=*\': \'$VIASH_PAR_MAX_EPOCHS_SC\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SC=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_epochs_st)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_ST" ] && ViashError Bad arguments for option \'--max_epochs_st\': \'$VIASH_PAR_MAX_EPOCHS_ST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_ST="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_epochs_st. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_epochs_st=*)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_ST" ] && ViashError Bad arguments for option \'--max_epochs_st=*\': \'$VIASH_PAR_MAX_EPOCHS_ST\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_ST=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_DETECTION_ALPHA+x} ]; then
+  VIASH_PAR_DETECTION_ALPHA="20.0"
+fi
+if [ -z ${VIASH_PAR_N_CELLS_PER_LOCATION+x} ]; then
+  VIASH_PAR_N_CELLS_PER_LOCATION="20"
+fi
+if [ -z ${VIASH_PAR_HARD_CODED_REFERENCE+x} ]; then
+  VIASH_PAR_HARD_CODED_REFERENCE="true"
+fi
+if [ -z ${VIASH_PAR_AMORTISED+x} ]; then
+  VIASH_PAR_AMORTISED="false"
+fi
+if [ -z ${VIASH_PAR_NUM_SAMPLES+x} ]; then
+  VIASH_PAR_NUM_SAMPLES="1000"
+fi
+if [ -z ${VIASH_PAR_SC_BATCH_SIZE+x} ]; then
+  VIASH_PAR_SC_BATCH_SIZE="2500"
+fi
+if [ -z ${VIASH_PAR_MAX_EPOCHS_SC+x} ]; then
+  VIASH_PAR_MAX_EPOCHS_SC="250"
+fi
+if [ -z ${VIASH_PAR_MAX_EPOCHS_ST+x} ]; then
+  VIASH_PAR_MAX_EPOCHS_ST="30000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_DETECTION_ALPHA" ]]; then
+  if ! [[ "$VIASH_PAR_DETECTION_ALPHA" =~ ^[-+]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)([eE][-+]?[0-9]+)?$ ]]; then
+    ViashError '--detection_alpha' has to be a double. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_CELLS_PER_LOCATION" ]]; then
+  if ! [[ "$VIASH_PAR_N_CELLS_PER_LOCATION" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_cells_per_location' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_HARD_CODED_REFERENCE" ]]; then
+  if ! [[ "$VIASH_PAR_HARD_CODED_REFERENCE" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--hard_coded_reference' has to be a boolean. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_AMORTISED" ]]; then
+  if ! [[ "$VIASH_PAR_AMORTISED" =~ ^(true|True|TRUE|false|False|FALSE|yes|Yes|YES|no|No|NO)$ ]]; then
+    ViashError '--amortised' has to be a boolean. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_NUM_SAMPLES" ]]; then
+  if ! [[ "$VIASH_PAR_NUM_SAMPLES" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--num_samples' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_SC_BATCH_SIZE" ]]; then
+  if ! [[ "$VIASH_PAR_SC_BATCH_SIZE" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--sc_batch_size' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_ST_BATCH_SIZE" ]]; then
+  if ! [[ "$VIASH_PAR_ST_BATCH_SIZE" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--st_batch_size' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_MAX_EPOCHS_SC" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_EPOCHS_SC" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_epochs_sc' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_MAX_EPOCHS_ST" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_EPOCHS_ST" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_epochs_st' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-cell2location-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+from cell2location.cluster_averages.cluster_averages import compute_cluster_averages
+from cell2location.models import Cell2location
+from cell2location.models import RegressionModel
+from torch.nn import ELU
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'detection_alpha': $( if [ ! -z ${VIASH_PAR_DETECTION_ALPHA+x} ]; then echo "float(r'${VIASH_PAR_DETECTION_ALPHA//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_cells_per_location': $( if [ ! -z ${VIASH_PAR_N_CELLS_PER_LOCATION+x} ]; then echo "int(r'${VIASH_PAR_N_CELLS_PER_LOCATION//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'hard_coded_reference': $( if [ ! -z ${VIASH_PAR_HARD_CODED_REFERENCE+x} ]; then echo "r'${VIASH_PAR_HARD_CODED_REFERENCE//\'/\'\"\'\"r\'}'.lower() == 'true'"; else echo None; fi ),
+  'amortised': $( if [ ! -z ${VIASH_PAR_AMORTISED+x} ]; then echo "r'${VIASH_PAR_AMORTISED//\'/\'\"\'\"r\'}'.lower() == 'true'"; else echo None; fi ),
+  'num_samples': $( if [ ! -z ${VIASH_PAR_NUM_SAMPLES+x} ]; then echo "int(r'${VIASH_PAR_NUM_SAMPLES//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'sc_batch_size': $( if [ ! -z ${VIASH_PAR_SC_BATCH_SIZE+x} ]; then echo "int(r'${VIASH_PAR_SC_BATCH_SIZE//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'st_batch_size': $( if [ ! -z ${VIASH_PAR_ST_BATCH_SIZE+x} ]; then echo "int(r'${VIASH_PAR_ST_BATCH_SIZE//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'max_epochs_sc': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_SC+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_SC//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'max_epochs_st': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_ST+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_ST//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+input_single_cell.X = input_single_cell.layers["counts"]
+input_spatial.X = input_spatial.layers["counts"]
+
+if not par["hard_coded_reference"]:
+  if "batch" in input_single_cell.obs.columns:
+      input_single_cell.obs["batch_key"] = input_single_cell.obs["batch"].copy()
+  else:
+    input_single_cell.obs["batch_key"] = "all"
+  # REFERENCE SIGNATURE ESTIMATION FROM scRNA
+  # prepare anndata for the regression model
+  RegressionModel.setup_anndata(
+    adata=input_single_cell,
+    # 10X reaction / sample / batch
+    batch_key="batch_key",
+    # cell type, covariate used for constructing signatures
+    labels_key="cell_type",
+  )
+  sc_model = RegressionModel(input_single_cell)
+  sc_model.train(max_epochs=par["max_epochs_sc"], batch_size=par["sc_batch_size"])
+  # In this section, we export the estimated cell abundance
+  # (summary of the posterior distribution).
+  input_single_cell = sc_model.export_posterior(
+    input_single_cell,
+    sample_kwargs={"num_samples": par["num_samples"], "batch_size": par["sc_batch_size"]},
+  )
+  # export estimated expression in each cluster
+  try:
+    means_per_cluster = input_single_cell.varm["means_per_cluster_mu_fg"]
+  except KeyError:
+    # sometimes varm fails for unknown reason
+    means_per_cluster = input_single_cell.var
+  means_per_cluster = means_per_cluster[
+    [
+      f"means_per_cluster_mu_fg_{i}"
+      for i in input_single_cell.uns["mod"]["factor_names"]
+    ]
+  ].copy()
+  means_per_cluster.columns = input_single_cell.uns["mod"]["factor_names"]
+else:
+  means_per_cluster = compute_cluster_averages(
+    input_single_cell,
+    labels="cell_type",
+    layer=None,
+    use_raw=False,
+  )
+
+# SPATIAL MAPPING
+# find shared genes and subset both anndata and reference signatures
+intersect = np.intersect1d(input_spatial.var_names, means_per_cluster.index)
+input_spatial = input_spatial[:, intersect].copy()
+means_per_cluster = means_per_cluster.loc[intersect, :].copy()
+
+# prepare anndata for cell2location model
+input_spatial.obs["sample"] = "all"
+Cell2location.setup_anndata(adata=input_spatial, batch_key="sample")
+cell2location_kwargs = dict(
+    cell_state_df=means_per_cluster,
+    # the expected average cell abundance: tissue-dependent hyper-prior which can be estimated from paired histology:
+    # here = average in the simulated dataset
+    N_cells_per_location=par["n_cells_per_location"],
+    # hyperparameter controlling normalisation of within-experiment variation in RNA detection:
+    detection_alpha=par["detection_alpha"],
+)
+if par["amortised"]:
+    cell2location_kwargs["amortised"] = True
+    cell2location_kwargs["encoder_mode"] = "multiple"
+    cell2location_kwargs["encoder_kwargs"] = {
+        "dropout_rate": 0.1,
+        "n_hidden": {
+            "single": 256,
+            "n_s_cells_per_location": 10,
+            "b_s_groups_per_location": 10,
+            "z_sr_groups_factors": 64,
+            "w_sf": 256,
+            "detection_y_s": 20,
+        },
+        "use_batch_norm": False,
+        "use_layer_norm": True,
+        "n_layers": 1,
+        "activation_fn": ELU,
+    }
+# create and train the model
+st_model = Cell2location(input_spatial, **cell2location_kwargs)
+st_model.train(
+    max_epochs=par["max_epochs_st"],
+    # train using full data (batch_size=None)
+    batch_size=par["st_batch_size"],
+    # use all data points in training because we need to estimate cell abundance at all locations
+    train_size=1,
+)
+# In this section, we export the estimated cell abundance (summary of the posterior distribution).
+input_spatial = st_model.export_posterior(
+    input_spatial,
+    sample_kwargs={
+        "num_samples": par["num_samples"],
+        "batch_size": par["st_batch_size"],
+    },
+)
+
+input_spatial.obsm["proportions_pred"] = input_spatial.obsm["q05_cell_abundance_w_sf"].values
+input_spatial.obsm["proportions_pred"] /= input_spatial.obsm["proportions_pred"].sum(axis=1)[:, None]
+
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  },
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatial_decomposition/methods/destvi/.config.vsh.yaml b/target/native/spatial_decomposition/methods/destvi/.config.vsh.yaml
new file mode 100644
index 0000000000..2b4aebb95a
--- /dev/null
+++ b/target/native/spatial_decomposition/methods/destvi/.config.vsh.yaml
@@ -0,0 +1,246 @@
+functionality:
+  name: "destvi"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_sc"
+    description: "Number of epochs to train the Conditional version of single-cell\
+      \ Variational Inference (CondSCVI) model using MAP inference."
+    info: null
+    default:
+    - 500
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_sp"
+    description: "Number of epochs to train the DestVI model using MAP inference."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "DestVI"
+    summary: "DestVI is a probabilistic method for multi-resolution analysis for spatial\
+      \ transcriptomics that explicitly models continuous variation within cell types"
+    description: "Deconvolution of Spatial Transcriptomics profiles using Variational\
+      \ Inference (DestVI) is a spatial decomposition method that leverages a conditional\
+      \ generative model of spatial transcriptomics down to the sub-cell-type variation\
+      \ level, which is then used to decompose the cell-type proportions determining\
+      \ the spatial organization of a tissue.\n"
+    preferred_normalization: "counts"
+    reference: "lopez2022destvi"
+    documentation_url: "https://docs.scvi-tools.org/en/stable/user_guide/models/destvi.html"
+    repository_url: "https://github.com/scverse/scvi-tools"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_pytorch_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scvi-tools>=1.1.0"
+    upgrade: true
+  - type: "docker"
+    run:
+    - "pip install -U \"jax[cuda12_pip]\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "midmem"
+    - "midcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/destvi/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/methods/destvi"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/methods/destvi/destvi"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatial_decomposition/methods/destvi/destvi b/target/native/spatial_decomposition/methods/destvi/destvi
new file mode 100755
index 0000000000..0e0c786635
--- /dev/null
+++ b/target/native/spatial_decomposition/methods/destvi/destvi
@@ -0,0 +1,584 @@
+#!/usr/bin/env bash
+
+# destvi 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="destvi"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "destvi 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+  echo ""
+  echo "    --max_epochs_sc"
+  echo "        type: integer"
+  echo "        default: 500"
+  echo "        Number of epochs to train the Conditional version of single-cell"
+  echo "        Variational Inference (CondSCVI) model using MAP inference."
+  echo ""
+  echo "    --max_epochs_sp"
+  echo "        type: integer"
+  echo "        default: 2000"
+  echo "        Number of epochs to train the DestVI model using MAP inference."
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "destvi 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_epochs_sc)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SC" ] && ViashError Bad arguments for option \'--max_epochs_sc\': \'$VIASH_PAR_MAX_EPOCHS_SC\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SC="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_epochs_sc. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_epochs_sc=*)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SC" ] && ViashError Bad arguments for option \'--max_epochs_sc=*\': \'$VIASH_PAR_MAX_EPOCHS_SC\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SC=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_epochs_sp)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SP" ] && ViashError Bad arguments for option \'--max_epochs_sp\': \'$VIASH_PAR_MAX_EPOCHS_SP\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SP="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_epochs_sp. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_epochs_sp=*)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SP" ] && ViashError Bad arguments for option \'--max_epochs_sp=*\': \'$VIASH_PAR_MAX_EPOCHS_SP\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SP=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_MAX_EPOCHS_SC+x} ]; then
+  VIASH_PAR_MAX_EPOCHS_SC="500"
+fi
+if [ -z ${VIASH_PAR_MAX_EPOCHS_SP+x} ]; then
+  VIASH_PAR_MAX_EPOCHS_SP="2000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_MAX_EPOCHS_SC" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_EPOCHS_SC" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_epochs_sc' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_MAX_EPOCHS_SP" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_EPOCHS_SP" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_epochs_sp' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-destvi-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+from scvi.model import CondSCVI
+from scvi.model import DestVI
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'max_epochs_sc': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_SC+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_SC//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'max_epochs_sp': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_SP+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_SP//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+input_single_cell.X = input_single_cell.layers["counts"]
+input_spatial.X = input_spatial.layers["counts"]
+
+CondSCVI.setup_anndata(input_single_cell, labels_key="cell_type")
+sc_model = CondSCVI(input_single_cell, weight_obs=False)
+sc_model.train(
+  max_epochs=par['max_epochs_sc'],
+  early_stopping=True,
+  train_size=0.9,
+  validation_size=0.1,
+  early_stopping_monitor="elbo_validation",
+)
+
+DestVI.setup_anndata(input_spatial)
+st_model = DestVI.from_rna_model(input_spatial, sc_model)
+st_model.train(
+  max_epochs=par['max_epochs_sp'],
+  batch_size=min(int(input_spatial.n_obs / 20 + 3), 128),
+  plan_kwargs={"min_kl_weight": 3.0, "max_kl_weight": 3},
+)
+input_spatial.obsm["proportions_pred"] = st_model.get_proportions().to_numpy()
+
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatial_decomposition/methods/nmfreg/.config.vsh.yaml b/target/native/spatial_decomposition/methods/nmfreg/.config.vsh.yaml
new file mode 100644
index 0000000000..7aea12bc26
--- /dev/null
+++ b/target/native/spatial_decomposition/methods/nmfreg/.config.vsh.yaml
@@ -0,0 +1,231 @@
+functionality:
+  name: "nmfreg"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_components"
+    description: "Number of components to use for non-negative matrix factorization."
+    info: null
+    default:
+    - 30
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "NMFreg"
+    summary: "NMFreg reconstructs gene expression as a weighted combination of cell\
+      \ type signatures defined by scRNA-seq."
+    description: "Non-Negative Matrix Factorization regression (NMFreg) is a decomposition\
+      \ method that reconstructs expression of each spatial location as a weighted\
+      \ combination of cell-type signatures defined by scRNA-seq. It was originally\
+      \ developed for Slide-seq data. This is a re-implementation from https://github.com/tudaga/NMFreg_tutorial.\n"
+    preferred_normalization: "counts"
+    reference: "rodriques2019slide"
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html"
+    repository_url: "https://github.com/tudaga/NMFreg_tutorial/tree/master?tab=readme-ov-file"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    - "scipy"
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/nmfreg/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/methods/nmfreg"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/methods/nmfreg/nmfreg"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatial_decomposition/methods/nmfreg/nmfreg b/target/native/spatial_decomposition/methods/nmfreg/nmfreg
new file mode 100755
index 0000000000..c455f8e6f5
--- /dev/null
+++ b/target/native/spatial_decomposition/methods/nmfreg/nmfreg
@@ -0,0 +1,587 @@
+#!/usr/bin/env bash
+
+# nmfreg 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="nmfreg"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "nmfreg 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+  echo ""
+  echo "    --n_components"
+  echo "        type: integer"
+  echo "        default: 30"
+  echo "        Number of components to use for non-negative matrix factorization."
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "nmfreg 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_components)
+            [ -n "$VIASH_PAR_N_COMPONENTS" ] && ViashError Bad arguments for option \'--n_components\': \'$VIASH_PAR_N_COMPONENTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_COMPONENTS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_components. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_components=*)
+            [ -n "$VIASH_PAR_N_COMPONENTS" ] && ViashError Bad arguments for option \'--n_components=*\': \'$VIASH_PAR_N_COMPONENTS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_COMPONENTS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_COMPONENTS+x} ]; then
+  VIASH_PAR_N_COMPONENTS="30"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_COMPONENTS" ]]; then
+  if ! [[ "$VIASH_PAR_N_COMPONENTS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_components' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-nmfreg-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+from scipy.optimize import nnls
+from scipy.sparse import issparse
+from sklearn.decomposition import NMF
+from sklearn.preprocessing import StandardScaler
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_components': $( if [ ! -z ${VIASH_PAR_N_COMPONENTS+x} ]; then echo "int(r'${VIASH_PAR_N_COMPONENTS//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+n_types = input_single_cell.obs["cell_type"].cat.categories.shape[0]
+
+# Learn from reference
+X = input_single_cell.layers['counts']
+X_norm = X / X.sum(1)
+X_scaled = StandardScaler(with_mean=False).fit_transform(X_norm)
+model = NMF(
+  n_components=par['n_components'],
+  init="random",
+  random_state=42
+)
+Ha = model.fit_transform(X_scaled)
+Wa = model.components_
+
+cluster_df = input_single_cell.obs[["cell_type"]].copy()
+cluster_df.loc[:, "factor"] = np.argmax(Ha, axis=1)
+cluster_df.loc[:, "code"] = cluster_df.cell_type.values.codes
+factor_to_cluster_map = np.array(
+  [
+    np.histogram(
+      cluster_df.loc[cluster_df.factor == k, "code"],
+      bins=n_types,
+      range=(0, n_types),
+    )[0]
+    for k in range(par['n_components'])
+  ]
+).T
+
+factor_to_best_celltype = np.argmax(factor_to_cluster_map, axis=0)
+
+factor_to_best_celltype_matrix = np.zeros((par['n_components'], n_types))
+for i, j in enumerate(factor_to_best_celltype):
+  factor_to_best_celltype_matrix[i, j] = 1
+
+Ha_norm = StandardScaler(with_mean=False).fit_transform(Ha)
+sc_deconv = np.dot(Ha_norm**2, factor_to_best_celltype_matrix)
+sc_deconv = sc_deconv / sc_deconv.sum(1)[:, np.newaxis]
+
+# Start run on actual spatial data
+X_sp = input_spatial.layers['counts']
+X_sp_norm = X_sp / X_sp.sum(1)
+X_sp_scaled = StandardScaler(with_mean=False).fit_transform(X_sp_norm)
+
+bead_prop_soln = np.array([nnls(Wa.T, X_sp_scaled[b, : ].toarray().reshape(-1))[0] for b in range(X_sp_scaled.shape[0])])
+bead_prop_soln = StandardScaler(with_mean=False).fit_transform(bead_prop_soln)
+bead_prop = np.dot(bead_prop_soln, factor_to_best_celltype_matrix)
+
+prop = bead_prop / bead_prop.sum(1)[:, np.newaxis]
+input_spatial.obsm["proportions_pred"] = prop
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatial_decomposition/methods/nnls/.config.vsh.yaml b/target/native/spatial_decomposition/methods/nnls/.config.vsh.yaml
new file mode 100644
index 0000000000..c86e859a8e
--- /dev/null
+++ b/target/native/spatial_decomposition/methods/nnls/.config.vsh.yaml
@@ -0,0 +1,217 @@
+functionality:
+  name: "nnls"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "NNLS"
+    summary: "NNLS is a decomposition method based on Non-Negative Least Square Regression."
+    description: "NonNegative Least Squares (NNLS), is a convex optimization problem\
+      \ with convex constraints. It was used by the AutoGeneS method to infer cellular\
+      \ proporrtions by solvong a multi-objective optimization problem.\n"
+    preferred_normalization: "counts"
+    reference: "aliee2021autogenes"
+    documentation_url: "https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.nnls.html"
+    repository_url: "https://github.com/scipy/scipy"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    - "scipy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/nnls/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/methods/nnls"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/methods/nnls/nnls"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatial_decomposition/methods/nnls/nnls b/target/native/spatial_decomposition/methods/nnls/nnls
new file mode 100755
index 0000000000..04360e3c02
--- /dev/null
+++ b/target/native/spatial_decomposition/methods/nnls/nnls
@@ -0,0 +1,532 @@
+#!/usr/bin/env bash
+
+# nnls 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="nnls"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "nnls 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "nnls 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-nnls-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+from scipy.optimize import nnls
+from scipy.sparse import issparse
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+# Compute means over each 'cell_type'
+labels = input_single_cell.obs['cell_type'].cat.categories
+n_var = input_single_cell.shape[1]
+means = np.empty((labels.shape[0], n_var))
+for i, lab in enumerate(labels):
+  adata_lab = input_single_cell[input_single_cell.obs['cell_type'] == lab]
+  x_lab = adata_lab.layers['counts']
+  means[i, :] = x_lab.mean(axis=0).flatten()
+adata_means = ad.AnnData(means)
+adata_means.obs_names = labels
+adata_means.var_names = input_single_cell.var_names
+
+X = adata_means.X.T
+y = input_spatial.layers['counts'].T
+res = np.zeros((y.shape[1], X.shape[1]))  # (voxels, cells)
+for i in range(y.shape[1]):
+  x, _ = nnls(X, y[:, i].toarray().reshape(-1))
+  res[i] = x
+
+# Normalize coefficients to sum to 1
+res[res < 0] = 0
+res = res / res.sum(axis=1, keepdims=1)
+
+input_spatial.obsm["proportions_pred"] = res
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatial_decomposition/methods/rctd/.config.vsh.yaml b/target/native/spatial_decomposition/methods/rctd/.config.vsh.yaml
new file mode 100644
index 0000000000..b096f43822
--- /dev/null
+++ b/target/native/spatial_decomposition/methods/rctd/.config.vsh.yaml
@@ -0,0 +1,247 @@
+functionality:
+  name: "rctd"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--fc_cutoff"
+    description: "Minimum log-fold-change (across cell types) for genes to be included\
+      \ in the platform effect normalization step."
+    info: null
+    default:
+    - 0.5
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--fc_cutoff_reg"
+    description: "Minimum log-fold-change (across cell types) for genes to be included\
+      \ in the RCTD step."
+    info: null
+    default:
+    - 0.75
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "RCTD"
+    summary: "RCTD learns cell type profiles from scRNA-seq to decompose cell type\
+      \ mixtures while correcting for differences across sequencing technologies."
+    description: "RCTD (Robust Cell Type Decomposition) is a decomposition method\
+      \ that uses signatures learnt from single-cell data to decompose spatial expression\
+      \ of tissues. It is able to use a platform effect normalization step, which\
+      \ normalizes the scRNA-seq cell type profiles to match the platform effects\
+      \ of the spatial transcriptomics dataset.\n"
+    preferred_normalization: "counts"
+    reference: "cable2021robust"
+    documentation_url: "https://raw.githack.com/dmcable/spacexr/master/vignettes/spatial-transcriptomics.html"
+    repository_url: "https://github.com/dmcable/spacexr"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "Matrix"
+    - "pak"
+    bioc_force_install: false
+  - type: "r"
+    script:
+    - "pak::pkg_install(\"dmcable/spacexr\")"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/rctd/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/methods/rctd"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/methods/rctd/rctd"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatial_decomposition/methods/rctd/rctd b/target/native/spatial_decomposition/methods/rctd/rctd
new file mode 100755
index 0000000000..05f7b7d504
--- /dev/null
+++ b/target/native/spatial_decomposition/methods/rctd/rctd
@@ -0,0 +1,624 @@
+#!/usr/bin/env bash
+
+# rctd 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="rctd"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "rctd 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+  echo ""
+  echo "    --fc_cutoff"
+  echo "        type: double"
+  echo "        default: 0.5"
+  echo "        Minimum log-fold-change (across cell types) for genes to be included in"
+  echo "        the platform effect normalization step."
+  echo ""
+  echo "    --fc_cutoff_reg"
+  echo "        type: double"
+  echo "        default: 0.75"
+  echo "        Minimum log-fold-change (across cell types) for genes to be included in"
+  echo "        the RCTD step."
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "rctd 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --fc_cutoff)
+            [ -n "$VIASH_PAR_FC_CUTOFF" ] && ViashError Bad arguments for option \'--fc_cutoff\': \'$VIASH_PAR_FC_CUTOFF\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_FC_CUTOFF="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --fc_cutoff. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --fc_cutoff=*)
+            [ -n "$VIASH_PAR_FC_CUTOFF" ] && ViashError Bad arguments for option \'--fc_cutoff=*\': \'$VIASH_PAR_FC_CUTOFF\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_FC_CUTOFF=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --fc_cutoff_reg)
+            [ -n "$VIASH_PAR_FC_CUTOFF_REG" ] && ViashError Bad arguments for option \'--fc_cutoff_reg\': \'$VIASH_PAR_FC_CUTOFF_REG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_FC_CUTOFF_REG="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --fc_cutoff_reg. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --fc_cutoff_reg=*)
+            [ -n "$VIASH_PAR_FC_CUTOFF_REG" ] && ViashError Bad arguments for option \'--fc_cutoff_reg=*\': \'$VIASH_PAR_FC_CUTOFF_REG\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_FC_CUTOFF_REG=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_FC_CUTOFF+x} ]; then
+  VIASH_PAR_FC_CUTOFF="0.5"
+fi
+if [ -z ${VIASH_PAR_FC_CUTOFF_REG+x} ]; then
+  VIASH_PAR_FC_CUTOFF_REG="0.75"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_FC_CUTOFF" ]]; then
+  if ! [[ "$VIASH_PAR_FC_CUTOFF" =~ ^[-+]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)([eE][-+]?[0-9]+)?$ ]]; then
+    ViashError '--fc_cutoff' has to be a double. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_FC_CUTOFF_REG" ]]; then
+  if ! [[ "$VIASH_PAR_FC_CUTOFF_REG" =~ ^[-+]?(\.[0-9]+|[0-9]+(\.[0-9]*)?)([eE][-+]?[0-9]+)?$ ]]; then
+    ViashError '--fc_cutoff_reg' has to be a double. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-rctd-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+library(anndata)
+library(spacexr)
+library(Matrix)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_single_cell" = $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_SINGLE_CELL" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_spatial_masked" = $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "fc_cutoff" = $( if [ ! -z ${VIASH_PAR_FC_CUTOFF+x} ]; then echo -n "as.numeric('"; echo -n "$VIASH_PAR_FC_CUTOFF" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "fc_cutoff_reg" = $( if [ ! -z ${VIASH_PAR_FC_CUTOFF_REG+x} ]; then echo -n "as.numeric('"; echo -n "$VIASH_PAR_FC_CUTOFF_REG" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading input files\\n")
+input_single_cell <- anndata::read_h5ad(par\$input_single_cell)
+input_spatial <- anndata::read_h5ad(par\$input_spatial)
+
+# set spatial coordinates for the single cell data
+coordinates <- matrix(1, dim(input_single_cell)[1], 2)
+rownames(coordinates) <- rownames(input_single_cell)
+input_single_cell\$obsm <- list(coordinates = coordinates)
+
+# remove rare cell types to prevent RCTD error
+# celltype_counts <- table(input_single_cell\$obs\$cell_type)
+# input_single_cell <- input_single_cell[input_single_cell\$obs\$cell_type %in% names(as.table(celltype_counts[celltype_counts > 25]))]
+
+# get single cell reference counts
+sc_counts <- t(input_single_cell\$layers['counts'])
+# get single cell reference labels
+sc_cell_types <- factor(input_single_cell\$obs\$cell_type)
+names(sc_cell_types) <- rownames(input_single_cell)
+# construct reference object (specific for RCTD)
+reference <- Reference(sc_counts, sc_cell_types)
+
+# get spatial data counts
+sp_counts <- t(input_spatial\$layers['counts'])
+# get spatial data coordinates
+sp_coords <- as.data.frame(input_spatial\$obsm['coordinates'])
+colnames(sp_coords) <- c("x", "y")
+rownames(sp_coords) <- rownames(input_spatial)
+# create spatial object to use in RCTD
+puck <- SpatialRNA(sp_coords, sp_counts)
+
+# create RCTD object from reference and spatialRNA objects
+if (!is.null(meta\$cpus)) {
+max_cores <- meta\$cpus
+} else {
+max_cores <- 1
+}
+rctd <- create.RCTD(
+  puck,
+  reference,
+  max_cores = max_cores,
+  fc_cutoff = par\$fc_cutoff,
+  fc_cutoff_reg = par\$fc_cutoff_reg,
+  test_mode = FALSE,
+  UMI_min_sigma = 100,
+  CELL_MIN_INSTANCE = 1
+)
+
+# run analysis and get results
+rctd <- run.RCTD(rctd)
+results <- rctd@results
+cell_type_names <- rctd@cell_type_info\$info[[2]]
+
+# extract proportions and normalize them (to sum to one)
+norm_weights <- sweep(results\$weights, 1, rowSums(results\$weights), "/")
+norm_weights <- as.matrix(norm_weights)
+coordinates <- as.matrix(sp_coords)
+
+cat("Write output AnnData to file\\n")
+output <- anndata::AnnData(
+  shape = input_spatial\$shape, 
+  obs = input_spatial\$obs,
+  var = input_spatial\$var,
+  uns = list(
+    cell_type_names = input_spatial\$uns['cell_type_names'],
+    dataset_id = input_spatial\$uns[["dataset_id"]],
+    method_id = meta[["functionality_name"]]
+  ),
+  obsm = list(
+    coordinates = coordinates,
+    proportions_pred = norm_weights
+  ),
+  layers = list(
+    counts = input_spatial\$layers['counts']
+  )
+)
+output\$write_h5ad(par[["output"]], compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatial_decomposition/methods/seurat/.config.vsh.yaml b/target/native/spatial_decomposition/methods/seurat/.config.vsh.yaml
new file mode 100644
index 0000000000..8436b0f16f
--- /dev/null
+++ b/target/native/spatial_decomposition/methods/seurat/.config.vsh.yaml
@@ -0,0 +1,241 @@
+functionality:
+  name: "seurat"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pcs"
+    description: "Number of principal components."
+    info: null
+    default:
+    - 30
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--sctransform_n_cells"
+    description: "Number of cells sampled to build NB regression."
+    info: null
+    default:
+    - 5000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Seurat"
+    summary: "Seurat method that is based on Canonical Correlation Analysis (CCA)."
+    description: "This method applies the 'anchor'-based integration workflow introduced\
+      \ in Seurat v3, that enables the probabilistic transfer of annotations from\
+      \ a reference to a query set. First, mutual nearest neighbors (anchors) are\
+      \ identified from the reference scRNA-seq and query spatial datasets. Then,\
+      \ annotations are transfered from the single cell reference data to the sptial\
+      \ data along with prediction scores for each spot.\n"
+    preferred_normalization: "counts"
+    reference: "stuart2019comprehensive"
+    documentation_url: "https://satijalab.org/seurat/articles/spatial_vignette"
+    repository_url: "https://github.com/satijalab/seurat"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "Matrix"
+    - "Seurat"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/seurat/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/methods/seurat"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/methods/seurat/seurat"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatial_decomposition/methods/seurat/seurat b/target/native/spatial_decomposition/methods/seurat/seurat
new file mode 100755
index 0000000000..4d293472d0
--- /dev/null
+++ b/target/native/spatial_decomposition/methods/seurat/seurat
@@ -0,0 +1,628 @@
+#!/usr/bin/env bash
+
+# seurat 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="seurat"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "seurat 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+  echo ""
+  echo "    --n_pcs"
+  echo "        type: integer"
+  echo "        default: 30"
+  echo "        Number of principal components."
+  echo ""
+  echo "    --sctransform_n_cells"
+  echo "        type: integer"
+  echo "        default: 5000"
+  echo "        Number of cells sampled to build NB regression."
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "seurat 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_pcs)
+            [ -n "$VIASH_PAR_N_PCS" ] && ViashError Bad arguments for option \'--n_pcs\': \'$VIASH_PAR_N_PCS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_pcs. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_pcs=*)
+            [ -n "$VIASH_PAR_N_PCS" ] && ViashError Bad arguments for option \'--n_pcs=*\': \'$VIASH_PAR_N_PCS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_PCS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --sctransform_n_cells)
+            [ -n "$VIASH_PAR_SCTRANSFORM_N_CELLS" ] && ViashError Bad arguments for option \'--sctransform_n_cells\': \'$VIASH_PAR_SCTRANSFORM_N_CELLS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SCTRANSFORM_N_CELLS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --sctransform_n_cells. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --sctransform_n_cells=*)
+            [ -n "$VIASH_PAR_SCTRANSFORM_N_CELLS" ] && ViashError Bad arguments for option \'--sctransform_n_cells=*\': \'$VIASH_PAR_SCTRANSFORM_N_CELLS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_SCTRANSFORM_N_CELLS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_PCS+x} ]; then
+  VIASH_PAR_N_PCS="30"
+fi
+if [ -z ${VIASH_PAR_SCTRANSFORM_N_CELLS+x} ]; then
+  VIASH_PAR_SCTRANSFORM_N_CELLS="5000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_PCS" ]]; then
+  if ! [[ "$VIASH_PAR_N_PCS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_pcs' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_SCTRANSFORM_N_CELLS" ]]; then
+  if ! [[ "$VIASH_PAR_SCTRANSFORM_N_CELLS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--sctransform_n_cells' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-seurat-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+library(anndata)
+library(Seurat)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_single_cell" = $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_SINGLE_CELL" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_spatial_masked" = $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "n_pcs" = $( if [ ! -z ${VIASH_PAR_N_PCS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_PCS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "sctransform_n_cells" = $( if [ ! -z ${VIASH_PAR_SCTRANSFORM_N_CELLS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_SCTRANSFORM_N_CELLS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading input files\\n")
+input_single_cell <- anndata::read_h5ad(par\$input_single_cell)
+input_spatial <- anndata::read_h5ad(par\$input_spatial)
+
+cat(">> Converting AnnData to Seurat\\n")
+anndataToSeurat <- function(adata, assay) {
+  obj <- SeuratObject::CreateSeuratObject(counts = as(Matrix::t(adata\$layers[["counts"]]), "CsparseMatrix"), assay = assay)
+  obj <- SeuratObject::AddMetaData(object = obj, metadata = adata\$obs)
+  obj
+}
+
+seurat_sc <- anndataToSeurat(input_single_cell, "RNA")
+seurat_sp <- anndataToSeurat(input_spatial, "spatial")
+
+cat(">> Generate predictions\\n")
+
+# Normalize and do dimred for spatial data
+seurat_sp <- SCTransform(
+  seurat_sp,
+  assay = "spatial",
+  ncells = min(par\$sctransform_n_cells, nrow(seurat_sp)),
+  verbose = TRUE,
+  conserve.memory = TRUE
+)
+
+seurat_sp <- RunPCA(seurat_sp, assay = "SCT", verbose = FALSE, n_pcs = par\$n_pcs)
+
+# Normalize and do dimred for single cell data
+seurat_sc <- SCTransform(
+  seurat_sc,
+  assay = "RNA",
+  ncells = min(par\$sctransform_n_cells, nrow(seurat_sc)),
+  verbose = TRUE,
+  conserve.memory = TRUE
+)
+
+seurat_sc <- RunPCA(seurat_sc, verbose = FALSE, n_pcs = par\$n_pcs)
+
+# find anchors (MNN's to compute adjustmen vectors)
+anchors <- FindTransferAnchors(
+  reference = seurat_sc,
+  query = seurat_sp,
+  normalization.method = "SCT"
+)
+
+# transfer labels from single cell data to spatial
+predictions_assay <- TransferData(
+  anchorset = anchors,
+  refdata = as.factor(as.character(seurat_sc@meta.data\$cell_type)),
+  prediction.assay = TRUE,
+  weight.reduction = seurat_sp[["pca"]],
+  dims = 1:par\$n_pcs
+)
+
+# format data and return results
+predictions <- LayerData(predictions_assay, layer = "data")
+predictions <- predictions[!(rownames(predictions) == "max"), ]
+predictions <- t(predictions)
+
+sp_coords <- as.data.frame(input_spatial\$obsm['coordinates'])
+colnames(sp_coords) <- c("x", "y")
+rownames(sp_coords) <- rownames(input_spatial)
+sp_coords <- as.matrix(sp_coords)
+
+cat("Write output AnnData to file\\n")
+output <- anndata::AnnData(
+  shape = input_spatial\$shape, 
+  obs = input_spatial\$obs,
+  var = input_spatial\$var,
+  uns = list(
+    cell_type_names = input_spatial\$uns['cell_type_names'],
+    dataset_id = input_spatial\$uns[["dataset_id"]],
+    method_id = meta[["functionality_name"]]
+  ),
+  obsm = list(
+    coordinates = sp_coords,
+    proportions_pred = predictions
+  ),
+  layers = list(
+    counts = input_spatial\$layers['counts']
+  )
+)
+output\$write_h5ad(par[["output"]], compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatial_decomposition/methods/stereoscope/.config.vsh.yaml b/target/native/spatial_decomposition/methods/stereoscope/.config.vsh.yaml
new file mode 100644
index 0000000000..ef76337174
--- /dev/null
+++ b/target/native/spatial_decomposition/methods/stereoscope/.config.vsh.yaml
@@ -0,0 +1,243 @@
+functionality:
+  name: "stereoscope"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_sc"
+    description: "Number of of epochs to train RNAStereoscope model."
+    info: null
+    default:
+    - 100
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_sp"
+    description: "Number of of epochs to train SpatialStereoscope model."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Stereoscope"
+    summary: "Stereoscope is a decomposition method based on Negative Binomial regression."
+    description: "Stereoscope is a decomposition method based on Negative Binomial\
+      \ regression. It is similar in scope and implementation to cell2location but\
+      \ less flexible to incorporate additional covariates such as batch effects and\
+      \ other type of experimental design annotations.\n"
+    preferred_normalization: "counts"
+    reference: "andersson2020single"
+    documentation_url: "https://docs.scvi-tools.org/en/stable/user_guide/models/stereoscope.html"
+    repository_url: "https://github.com/scverse/scvi-tools"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_pytorch_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scvi-tools>=1.1.0"
+    upgrade: true
+  - type: "docker"
+    run:
+    - "pip install -U \"jax[cuda12_pip]\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "midmem"
+    - "midcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/stereoscope/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/methods/stereoscope"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/methods/stereoscope/stereoscope"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatial_decomposition/methods/stereoscope/stereoscope b/target/native/spatial_decomposition/methods/stereoscope/stereoscope
new file mode 100755
index 0000000000..93500a395c
--- /dev/null
+++ b/target/native/spatial_decomposition/methods/stereoscope/stereoscope
@@ -0,0 +1,582 @@
+#!/usr/bin/env bash
+
+# stereoscope 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="stereoscope"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "stereoscope 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+  echo ""
+  echo "    --max_epochs_sc"
+  echo "        type: integer"
+  echo "        default: 100"
+  echo "        Number of of epochs to train RNAStereoscope model."
+  echo ""
+  echo "    --max_epochs_sp"
+  echo "        type: integer"
+  echo "        default: 1000"
+  echo "        Number of of epochs to train SpatialStereoscope model."
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "stereoscope 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_epochs_sc)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SC" ] && ViashError Bad arguments for option \'--max_epochs_sc\': \'$VIASH_PAR_MAX_EPOCHS_SC\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SC="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_epochs_sc. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_epochs_sc=*)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SC" ] && ViashError Bad arguments for option \'--max_epochs_sc=*\': \'$VIASH_PAR_MAX_EPOCHS_SC\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SC=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_epochs_sp)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SP" ] && ViashError Bad arguments for option \'--max_epochs_sp\': \'$VIASH_PAR_MAX_EPOCHS_SP\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SP="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_epochs_sp. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_epochs_sp=*)
+            [ -n "$VIASH_PAR_MAX_EPOCHS_SP" ] && ViashError Bad arguments for option \'--max_epochs_sp=*\': \'$VIASH_PAR_MAX_EPOCHS_SP\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_EPOCHS_SP=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_MAX_EPOCHS_SC+x} ]; then
+  VIASH_PAR_MAX_EPOCHS_SC="100"
+fi
+if [ -z ${VIASH_PAR_MAX_EPOCHS_SP+x} ]; then
+  VIASH_PAR_MAX_EPOCHS_SP="1000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_MAX_EPOCHS_SC" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_EPOCHS_SC" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_epochs_sc' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_MAX_EPOCHS_SP" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_EPOCHS_SP" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_epochs_sp' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-stereoscope-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+from scvi.external import RNAStereoscope
+from scvi.external import SpatialStereoscope
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'max_epochs_sc': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_SC+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_SC//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'max_epochs_sp': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_SP+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_SP//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+input_single_cell.X = input_single_cell.layers["counts"]
+input_spatial.X = input_spatial.layers["counts"]
+
+print('Generate predictions', flush=True)
+
+RNAStereoscope.setup_anndata(input_single_cell, labels_key="cell_type")
+sc_model = RNAStereoscope(input_single_cell)
+sc_model.train(
+  max_epochs=par["max_epochs_sc"],
+  # early_stopping=True,
+  # early_stopping_monitor="elbo_validation"
+)
+
+SpatialStereoscope.setup_anndata(input_spatial)
+st_model = SpatialStereoscope.from_rna_model(input_spatial, sc_model)
+st_model.train(
+  max_epochs=par["max_epochs_sp"],
+  # early_stopping=True,
+  # early_stopping_monitor="elbo_validation"
+)
+input_spatial.obsm["proportions_pred"] = st_model.get_proportions().to_numpy()
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  },
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatial_decomposition/methods/tangram/.config.vsh.yaml b/target/native/spatial_decomposition/methods/tangram/.config.vsh.yaml
new file mode 100644
index 0000000000..4128709579
--- /dev/null
+++ b/target/native/spatial_decomposition/methods/tangram/.config.vsh.yaml
@@ -0,0 +1,240 @@
+functionality:
+  name: "tangram"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--num_epochs"
+    description: "Number of epochs to use while mapping single cells to spatial locations."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_markers"
+    description: "Number of marker genes to use."
+    info: null
+    default:
+    - 100
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Tangram"
+    summary: "Tanagram maps single-cell gene expression data onto spatial gene expression\
+      \ data by fitting gene expression on shared genes"
+    description: "Tangram is a method to map gene expression signatures from scRNA-seq\
+      \ data to spatial data. It performs the cell type mapping by learning a similarity\
+      \ matrix between single-cell and spatial locations based on gene expression\
+      \ profiles.\n"
+    preferred_normalization: "counts"
+    reference: "biancalani2021deep"
+    documentation_url: "https://tangram-sc.readthedocs.io/en/latest/index.html"
+    repository_url: "https://github.com/broadinstitute/Tangram"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "tangram-sc"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/tangram/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/methods/tangram"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/methods/tangram/tangram"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatial_decomposition/methods/tangram/tangram b/target/native/spatial_decomposition/methods/tangram/tangram
new file mode 100755
index 0000000000..f5e8021b65
--- /dev/null
+++ b/target/native/spatial_decomposition/methods/tangram/tangram
@@ -0,0 +1,605 @@
+#!/usr/bin/env bash
+
+# tangram 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="tangram"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "tangram 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+  echo ""
+  echo "    --num_epochs"
+  echo "        type: integer"
+  echo "        default: 1000"
+  echo "        Number of epochs to use while mapping single cells to spatial locations."
+  echo ""
+  echo "    --n_markers"
+  echo "        type: integer"
+  echo "        default: 100"
+  echo "        Number of marker genes to use."
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "tangram 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --num_epochs)
+            [ -n "$VIASH_PAR_NUM_EPOCHS" ] && ViashError Bad arguments for option \'--num_epochs\': \'$VIASH_PAR_NUM_EPOCHS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_EPOCHS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --num_epochs. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --num_epochs=*)
+            [ -n "$VIASH_PAR_NUM_EPOCHS" ] && ViashError Bad arguments for option \'--num_epochs=*\': \'$VIASH_PAR_NUM_EPOCHS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_NUM_EPOCHS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_markers)
+            [ -n "$VIASH_PAR_N_MARKERS" ] && ViashError Bad arguments for option \'--n_markers\': \'$VIASH_PAR_N_MARKERS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_MARKERS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_markers. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_markers=*)
+            [ -n "$VIASH_PAR_N_MARKERS" ] && ViashError Bad arguments for option \'--n_markers=*\': \'$VIASH_PAR_N_MARKERS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_MARKERS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_NUM_EPOCHS+x} ]; then
+  VIASH_PAR_NUM_EPOCHS="1000"
+fi
+if [ -z ${VIASH_PAR_N_MARKERS+x} ]; then
+  VIASH_PAR_N_MARKERS="100"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_NUM_EPOCHS" ]]; then
+  if ! [[ "$VIASH_PAR_NUM_EPOCHS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--num_epochs' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_PAR_N_MARKERS" ]]; then
+  if ! [[ "$VIASH_PAR_N_MARKERS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_markers' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-tangram-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import pandas as pd
+import scanpy as sc
+import tangram as tg
+import torch
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'num_epochs': $( if [ ! -z ${VIASH_PAR_NUM_EPOCHS+x} ]; then echo "int(r'${VIASH_PAR_NUM_EPOCHS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'n_markers': $( if [ ! -z ${VIASH_PAR_N_MARKERS+x} ]; then echo "int(r'${VIASH_PAR_N_MARKERS//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+print('Generate predictions', flush=True)
+# analysis based on github.com/broadinstitute/Tangram/blob/master/tutorial_tangram_with_squidpy.ipynb
+# using tangram from PyPi, not github version
+
+input_single_cell.X = input_single_cell.layers["counts"]
+input_spatial.X = input_spatial.layers["counts"]
+
+# pre-process single cell data
+sc.pp.normalize_total(input_single_cell, 1e4)
+sc.pp.log1p(input_single_cell)
+# identify marker genes
+sc.tl.rank_genes_groups(input_single_cell, groupby="cell_type", use_raw=False)
+
+# extract marker genes to data frame
+markers_df = pd.DataFrame(input_single_cell.uns["rank_genes_groups"]["names"]).iloc[0:par['n_markers'], :]
+
+# get union of all marker genes
+markers = list(set(markers_df.melt().value.values))
+
+# match genes between single cell and spatial data
+tg.pp_adatas(input_single_cell, input_spatial, genes=markers)
+
+# get device
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+
+# map single cells to spatial locations
+ad_map = tg.map_cells_to_space(
+  input_single_cell,
+  input_spatial,
+  device=device,
+  num_epochs=par['num_epochs'],
+)
+
+# transfer labels from mapped cells to spatial location
+tg.project_cell_annotations(adata_map=ad_map, adata_sp=input_spatial, annotation="cell_type")
+
+# normalize scores
+pred_props = input_spatial.obsm["tangram_ct_pred"].to_numpy()
+input_spatial.obsm["proportions_pred"] = pred_props / pred_props.sum(axis=1)[:, None]
+
+# remove un-normalized predictions
+del input_spatial.obsm["tangram_ct_pred"]
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  },
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatial_decomposition/methods/vanillanmf/.config.vsh.yaml b/target/native/spatial_decomposition/methods/vanillanmf/.config.vsh.yaml
new file mode 100644
index 0000000000..2c8bca06df
--- /dev/null
+++ b/target/native/spatial_decomposition/methods/vanillanmf/.config.vsh.yaml
@@ -0,0 +1,233 @@
+functionality:
+  name: "vanillanmf"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_iter"
+    description: "Maximum number of iterations before timing out."
+    info: null
+    default:
+    - 4000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "NMF"
+    summary: "NMF reconstructs gene expression as a weighted combination of cell type\
+      \ signatures defined by scRNA-seq."
+    description: "NMF is a decomposition method based on Non-negative Matrix Factorization\
+      \ (NMF) that reconstructs expression of each spatial location as a weighted\
+      \ combination of cell-type signatures defined by scRNA-seq. It is a simpler\
+      \ baseline than NMFreg as it only performs the NMF step based on mean expression\
+      \ signatures of cell types, returning the weights loading of the NMF as (normalized)\
+      \ cell type proportions, without the regression step.\n"
+    preferred_normalization: "counts"
+    reference: "cichocki2009fast"
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html"
+    repository_url: "https://github.com/scikit-learn/scikit-learn/blob/92c9b1866/sklearn/decomposition/"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    - "scipy"
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/vanillanmf/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/methods/vanillanmf"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/methods/vanillanmf/vanillanmf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatial_decomposition/methods/vanillanmf/vanillanmf b/target/native/spatial_decomposition/methods/vanillanmf/vanillanmf
new file mode 100755
index 0000000000..7be12e51bb
--- /dev/null
+++ b/target/native/spatial_decomposition/methods/vanillanmf/vanillanmf
@@ -0,0 +1,573 @@
+#!/usr/bin/env bash
+
+# vanillanmf 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="vanillanmf"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "vanillanmf 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_single_cell"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+  echo ""
+  echo "    --input_spatial_masked"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+  echo ""
+  echo "    --max_iter"
+  echo "        type: integer"
+  echo "        default: 4000"
+  echo "        Maximum number of iterations before timing out."
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "vanillanmf 2.0.0"
+            exit
+            ;;
+        --input_single_cell)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_single_cell. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_single_cell=*)
+            [ -n "$VIASH_PAR_INPUT_SINGLE_CELL" ] && ViashError Bad arguments for option \'--input_single_cell=*\': \'$VIASH_PAR_INPUT_SINGLE_CELL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SINGLE_CELL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_spatial_masked)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_spatial_masked. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_spatial_masked=*)
+            [ -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && ViashError Bad arguments for option \'--input_spatial_masked=*\': \'$VIASH_PAR_INPUT_SPATIAL_MASKED\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SPATIAL_MASKED=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_iter)
+            [ -n "$VIASH_PAR_MAX_ITER" ] && ViashError Bad arguments for option \'--max_iter\': \'$VIASH_PAR_MAX_ITER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_ITER="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_iter. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_iter=*)
+            [ -n "$VIASH_PAR_MAX_ITER" ] && ViashError Bad arguments for option \'--max_iter=*\': \'$VIASH_PAR_MAX_ITER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_ITER=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then
+  ViashError '--input_single_cell' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then
+  ViashError '--input_spatial_masked' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_MAX_ITER+x} ]; then
+  VIASH_PAR_MAX_ITER="4000"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_SINGLE_CELL" ] && [ ! -e "$VIASH_PAR_INPUT_SINGLE_CELL" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SINGLE_CELL' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SPATIAL_MASKED" ] && [ ! -e "$VIASH_PAR_INPUT_SPATIAL_MASKED" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SPATIAL_MASKED' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_MAX_ITER" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_ITER" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_iter' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-vanillanmf-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+from scipy.sparse import issparse
+from sklearn.decomposition import NMF
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'max_iter': $( if [ ! -z ${VIASH_PAR_MAX_ITER+x} ]; then echo "int(r'${VIASH_PAR_MAX_ITER//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+print('Generate predictions', flush=True)
+
+n_types = input_single_cell.obs["cell_type"].cat.categories.shape[0]
+vanila_nmf_model = NMF(
+  n_components=n_types,
+  beta_loss="kullback-leibler",
+  solver="mu",
+  max_iter=par['max_iter'],
+  alpha_W=0.1,
+  alpha_H=0.1,
+  init="custom",
+  random_state=42,
+)
+
+# Make profiles from single-cell expression dataset
+# Compute means over each 'cell_type'
+labels = input_single_cell.obs['cell_type'].cat.categories
+n_var = input_single_cell.shape[1]
+means = np.empty((labels.shape[0], n_var))
+for i, lab in enumerate(labels):
+  adata_lab = input_single_cell[input_single_cell.obs['cell_type'] == lab]
+  x_lab = adata_lab.layers['counts']
+  means[i, :] = x_lab.mean(axis=0).flatten()
+adata_means = ad.AnnData(means)
+adata_means.obs_names = labels
+adata_means.var_names = input_single_cell.var_names
+
+X = input_spatial.layers['counts'].toarray()
+
+Wa = vanila_nmf_model.fit_transform(
+  X.astype(adata_means.X.dtype),
+  H=adata_means.X,
+  W=np.ones((input_spatial.shape[0], n_types), dtype=adata_means.X.dtype),
+)
+
+prop = Wa / Wa.sum(1)[:, np.newaxis]
+input_spatial.obsm["proportions_pred"] = prop
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatial_decomposition/metrics/r2/.config.vsh.yaml b/target/native/spatial_decomposition/metrics/r2/.config.vsh.yaml
new file mode 100644
index 0000000000..35e94e6f9d
--- /dev/null
+++ b/target/native/spatial_decomposition/metrics/r2/.config.vsh.yaml
@@ -0,0 +1,252 @@
+functionality:
+  name: "r2"
+  namespace: "spatial_decomposition/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_method"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, with true cell-type proportions for each spot / capture location."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_true"
+          description: "True cell type proportions for each spot"
+          required: true
+        uns:
+        - name: "cell_type_names"
+          type: "string"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - name: "dataset_id"
+          type: "string"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file."
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "r2"
+      label: "R2"
+      summary: "R2 represents the proportion of variance in the true proportions which\
+        \ is explained by the predicted proportions."
+      description: "R2, or the “coefficient of determination”, reports the fraction\
+        \ of the true proportion values' variance that can be explained by the predicted\
+        \ proportion values. The best score, and upper bound, is 1.0. There is no\
+        \ fixed lower bound for the metric. The uniform/non-weighted average across\
+        \ all cell types/states is used to summarise performance. By default, cases\
+        \ resulting in a score of NaN (perfect predictions) or -Inf (imperfect predictions)\
+        \ are replaced with 1.0 (perfect predictions) or 0.0 (imperfect predictions)\
+        \ respectively.\n"
+      reference: "miles2005rsquared"
+      documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html"
+      repository_url: "https://github.com/scikit-learn/scikit-learn/tree/5c4aa5d0d90ba66247d675d4c3fc2fdfba3c39ff"
+      min: "-inf"
+      max: 1
+      maximize: true
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A spatial decomposition metric."
+      description: "A metric for evaluating accuracy of cell type proportion estimate\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/metrics/r2/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/metrics/r2"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatial_decomposition/metrics/r2/r2"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatial_decomposition/metrics/r2/r2 b/target/native/spatial_decomposition/metrics/r2/r2
new file mode 100755
index 0000000000..860314892e
--- /dev/null
+++ b/target/native/spatial_decomposition/metrics/r2/r2
@@ -0,0 +1,507 @@
+#!/usr/bin/env bash
+
+# r2 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="r2"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "r2 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_method"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/score.h5ad"
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "r2 2.0.0"
+            exit
+            ;;
+        --input_method)
+            [ -n "$VIASH_PAR_INPUT_METHOD" ] && ViashError Bad arguments for option \'--input_method\': \'$VIASH_PAR_INPUT_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_METHOD="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_method. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_method=*)
+            [ -n "$VIASH_PAR_INPUT_METHOD" ] && ViashError Bad arguments for option \'--input_method=*\': \'$VIASH_PAR_INPUT_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_METHOD=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_METHOD+x} ]; then
+  ViashError '--input_method' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_METHOD" ] && [ ! -e "$VIASH_PAR_INPUT_METHOD" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_METHOD' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-r2-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import sklearn.metrics
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_method': $( if [ ! -z ${VIASH_PAR_INPUT_METHOD+x} ]; then echo "r'${VIASH_PAR_INPUT_METHOD//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_method = ad.read_h5ad(par['input_method'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Compute metrics', flush=True)
+prop_true = input_solution.obsm["proportions_true"]
+prop_pred = input_method.obsm["proportions_pred"]
+r2_score = sklearn.metrics.r2_score(
+  prop_true, prop_pred, sample_weight=None, multioutput="uniform_average"
+)
+
+uns_metric_ids = [ 'r2' ]
+uns_metric_values = [ r2_score ]
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  uns={
+    'dataset_id': input_method.uns['dataset_id'],
+    'method_id': input_method.uns['method_id'],
+    'metric_ids': uns_metric_ids,
+    'metric_values': uns_metric_values
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatially_variable_genes/control_methods/random_ranking/.config.vsh.yaml b/target/native/spatially_variable_genes/control_methods/random_ranking/.config.vsh.yaml
new file mode 100644
index 0000000000..6751ee7b87
--- /dev/null
+++ b/target/native/spatially_variable_genes/control_methods/random_ranking/.config.vsh.yaml
@@ -0,0 +1,247 @@
+functionality:
+  name: "random_ranking"
+  namespace: "spatially_variable_genes/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Anndata with true spatial variability."
+      description: "Anndata with true spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature (e.g., ESEMBL gene id suffixed\
+            \ with alpha value)."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: true
+        - type: "string"
+          name: "orig_feature_name"
+          description: "Original human-readable name for the feature, usually a gene\
+            \ symbol."
+          required: true
+        - type: "double"
+          name: "true_spatial_var_score"
+          description: "True spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: true
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Random Ranking"
+    summary: "Negative control method that randomly rank genes."
+    description: "A negative control method with random ranking of genes.\n"
+    preferred_normalization: "counts"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "Control methods have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "pandas"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/control_methods/random_ranking/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/control_methods/random_ranking"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/control_methods/random_ranking/random_ranking"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatially_variable_genes/control_methods/random_ranking/random_ranking b/target/native/spatially_variable_genes/control_methods/random_ranking/random_ranking
new file mode 100755
index 0000000000..f089741fa0
--- /dev/null
+++ b/target/native/spatially_variable_genes/control_methods/random_ranking/random_ranking
@@ -0,0 +1,497 @@
+#!/usr/bin/env bash
+
+# random_ranking 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="random_ranking"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "random_ranking 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "random_ranking 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-random_ranking-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import numpy as np
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Generate predictions', flush=True)
+input_data = ad.read_h5ad(par['input_data'])
+
+df = input_data.var[["feature_id"]]
+
+np.random.seed(0)
+df['pred_spatial_var_score'] = np.random.rand(len(df['feature_id']))
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': input_data.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatially_variable_genes/control_methods/true_ranking/.config.vsh.yaml b/target/native/spatially_variable_genes/control_methods/true_ranking/.config.vsh.yaml
new file mode 100644
index 0000000000..9be6771d81
--- /dev/null
+++ b/target/native/spatially_variable_genes/control_methods/true_ranking/.config.vsh.yaml
@@ -0,0 +1,247 @@
+functionality:
+  name: "true_ranking"
+  namespace: "spatially_variable_genes/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Anndata with true spatial variability."
+      description: "Anndata with true spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature (e.g., ESEMBL gene id suffixed\
+            \ with alpha value)."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: true
+        - type: "string"
+          name: "orig_feature_name"
+          description: "Original human-readable name for the feature, usually a gene\
+            \ symbol."
+          required: true
+        - type: "double"
+          name: "true_spatial_var_score"
+          description: "True spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: true
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "True Ranking"
+    summary: "Positive control method that correctly rank genes."
+    description: "A positive control method with correct ranking of genes.\n"
+    preferred_normalization: "counts"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "Control methods have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "pandas"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/control_methods/true_ranking/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/control_methods/true_ranking"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/control_methods/true_ranking/true_ranking"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatially_variable_genes/control_methods/true_ranking/true_ranking b/target/native/spatially_variable_genes/control_methods/true_ranking/true_ranking
new file mode 100755
index 0000000000..41f989ec78
--- /dev/null
+++ b/target/native/spatially_variable_genes/control_methods/true_ranking/true_ranking
@@ -0,0 +1,494 @@
+#!/usr/bin/env bash
+
+# true_ranking 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="true_ranking"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "true_ranking 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "true_ranking 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-true_ranking-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Generate predictions', flush=True)
+input_solution = ad.read_h5ad(par['input_solution'])
+
+df = input_solution.var[["feature_id", "true_spatial_var_score"]]
+df.rename(columns={'true_spatial_var_score': 'pred_spatial_var_score'}, inplace=True)
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': input_solution.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatially_variable_genes/methods/boostgp/.config.vsh.yaml b/target/native/spatially_variable_genes/methods/boostgp/.config.vsh.yaml
new file mode 100644
index 0000000000..2c7b383797
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/boostgp/.config.vsh.yaml
@@ -0,0 +1,209 @@
+functionality:
+  name: "boostgp"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_iter"
+    description: "Number of iterations."
+    info:
+      test_default: 7
+    default:
+    - 10
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "BOOST-GP"
+    summary: "Bayesian modeling of spatial molecular profiling data via Gaussian process"
+    description: "BOOST-GP a novel Bayesian hierarchical model to analyze spatial\
+      \ transcriptomics data, \nwith several unique characteristics. It models the\
+      \ zero-inflated and over-dispersed \ncounts by deploying a zero-inflated negative\
+      \ binomial model that greatly increases \nmodel stability and robustness. Besides,\
+      \ the Bayesian inference framework allows us \nto borrow strength in parameter\
+      \ estimation in a de novo fashion. As a result, \nthe proposed model shows competitive\
+      \ performances in accuracy and robustness \nover existing methods in both simulation\
+      \ studies and two real data applications.\n"
+    preferred_normalization: "counts"
+    reference: "li2021bayesian"
+    documentation_url: "https://github.com/Minzhe/BOOST-GP"
+    repository_url: "https://github.com/Minzhe/BOOST-GP"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_r:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone https://github.com/Minzhe/BOOST-GP.git /opt/BOOST-GP\n"
+  - type: "r"
+    cran:
+    - "RcppDist"
+    - "ggplot2"
+    - "anndata"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "veryhightime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/boostgp/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/boostgp"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/boostgp/boostgp"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatially_variable_genes/methods/boostgp/boostgp b/target/native/spatially_variable_genes/methods/boostgp/boostgp
new file mode 100755
index 0000000000..494731d4d3
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/boostgp/boostgp
@@ -0,0 +1,530 @@
+#!/usr/bin/env bash
+
+# boostgp 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="boostgp"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "boostgp 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+  echo ""
+  echo "    --n_iter"
+  echo "        type: integer"
+  echo "        default: 10"
+  echo "        Number of iterations."
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "boostgp 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_iter)
+            [ -n "$VIASH_PAR_N_ITER" ] && ViashError Bad arguments for option \'--n_iter\': \'$VIASH_PAR_N_ITER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_ITER="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_iter. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_iter=*)
+            [ -n "$VIASH_PAR_N_ITER" ] && ViashError Bad arguments for option \'--n_iter=*\': \'$VIASH_PAR_N_ITER\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_ITER=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_N_ITER+x} ]; then
+  VIASH_PAR_N_ITER="10"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_ITER" ]]; then
+  if ! [[ "$VIASH_PAR_N_ITER" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_iter' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-boostgp-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+library(RcppDist)
+library(anndata)
+
+dest <- getwd()
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_data" = $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_DATA" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "n_iter" = $( if [ ! -z ${VIASH_PAR_N_ITER+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_ITER" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+# VIASH END
+
+cat("Load data\\n")
+adata <- anndata::read_h5ad(par\$input_data)
+
+setwd("/opt/BOOST-GP")
+source("./R/boost.gp.R")
+
+counts <- as.matrix(adata\$layers[["counts"]])
+colnames(counts) <- adata\$var_names
+rownames(counts) <- adata\$obs_names
+mode(counts) <- "integer"
+
+loc <- as.data.frame(adata\$obsm[["spatial"]])
+rownames(loc) <- adata\$obs_names
+colnames(loc) <- c("x", "y")
+
+cat("Run BOOST-GP\\n")
+df <- as.data.frame(boost.gp(Y = counts, loc = loc, iter = par\$n_iter, burn = 5))
+
+df\$feature_id <- rownames(df)
+df <- subset(df, select = c("feature_id", "PPI"))
+colnames(df) <- c("feature_id", "pred_spatial_var_score")
+
+# save output
+cat("Write output AnnData to file\\n")
+output <- anndata::AnnData(
+    shape = adata\$shape,
+    var = df,
+    uns = list(
+        "dataset_id" = adata\$uns[["dataset_id"]],
+        "method_id" = meta[["functionality_name"]]
+    )
+)
+
+zzz <- output\$write_h5ad(paste0(dest, "/", par\$output), compression = "gzip")
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatially_variable_genes/methods/gpcounts/.config.vsh.yaml b/target/native/spatially_variable_genes/methods/gpcounts/.config.vsh.yaml
new file mode 100644
index 0000000000..7a23998277
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/gpcounts/.config.vsh.yaml
@@ -0,0 +1,218 @@
+functionality:
+  name: "gpcounts"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_features"
+    description: "Number of features to include."
+    info:
+      test_default: 120
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "GPcounts"
+    summary: "GPcounts is non-parametric modelling of temporal and spatial counts\
+      \ data from RNA-seq experiments."
+    description: "The GPcounts package implements GP regression methods for modelling\
+      \ counts data using a \nnegative binomial likelihood function. Computational\
+      \ efficiency is achieved through the use of \nvariational Bayesian inference.\
+      \ The GP function models changes in the mean of the negative binomial \nlikelihood\
+      \ through a logarithmic link function and the dispersion parameter is fitted\
+      \ by maximum \nlikelihood. We validate the method on simulated time course data,\
+      \ showing better performance to identify \nchanges in over-dispersed counts\
+      \ data than methods based on Gaussian or Poisson likelihoods. \n"
+    preferred_normalization: "counts"
+    reference: "bintayyash2021non"
+    documentation_url: "https://github.com/ManchesterBioinference/GPcounts/blob/master/demo_notebooks/GPcounts_spatial.ipynb"
+    repository_url: "https://github.com/ManchesterBioinference/GPcounts"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_tensorflow_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    interactive: false
+  - type: "python"
+    user: false
+    packages:
+    - "tensorflow-probability"
+    - "tensorflow[and-cuda]"
+    - "gpflow"
+    - "scipy==1.9.1"
+    upgrade: true
+  - type: "docker"
+    run:
+    - "git clone https://github.com/markvdw/RobustGP.git /opt/RobustGP && \\\ngit\
+      \ clone https://github.com/lzj1769/GPcounts.git /opt/GPcounts\n"
+  - type: "python"
+    user: false
+    packages:
+    - "/opt/RobustGP"
+    - "/opt/GPcounts"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "veryhightime"
+    - "midmem"
+    - "midcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/gpcounts/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/gpcounts"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/gpcounts/gpcounts"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatially_variable_genes/methods/gpcounts/gpcounts b/target/native/spatially_variable_genes/methods/gpcounts/gpcounts
new file mode 100755
index 0000000000..1ab3e11188
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/gpcounts/gpcounts
@@ -0,0 +1,558 @@
+#!/usr/bin/env bash
+
+# gpcounts 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="gpcounts"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "gpcounts 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+  echo ""
+  echo "    --n_features"
+  echo "        type: integer"
+  echo "        Number of features to include."
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "gpcounts 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --n_features)
+            [ -n "$VIASH_PAR_N_FEATURES" ] && ViashError Bad arguments for option \'--n_features\': \'$VIASH_PAR_N_FEATURES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_FEATURES="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --n_features. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --n_features=*)
+            [ -n "$VIASH_PAR_N_FEATURES" ] && ViashError Bad arguments for option \'--n_features=*\': \'$VIASH_PAR_N_FEATURES\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_N_FEATURES=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_N_FEATURES" ]]; then
+  if ! [[ "$VIASH_PAR_N_FEATURES" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--n_features' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-gpcounts-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import statsmodels.api as sm
+import statsmodels.formula.api as smf
+import pandas as pd
+import anndata as ad
+import scipy
+from GPcounts.RNA_seq_GP import rna_seq_gp
+import warnings
+warnings.filterwarnings('ignore')
+
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'n_features': $( if [ ! -z ${VIASH_PAR_N_FEATURES+x} ]; then echo "int(r'${VIASH_PAR_N_FEATURES//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run GPcounts')
+
+# Subset if required
+if par['n_features']:
+    adata = adata[:, :par['n_features']]
+
+counts = adata.layers["counts"]
+if scipy.sparse.issparse(counts): 
+    counts = counts.todense()
+
+# spatialx = [str(i) for i in adata.obsm['spatial'][:, 0]]
+# spatialy = [str(i) for i in adata.obsm['spatial'][:, 1]]
+
+# index_names = [i+'x'+j for i, j in zip(spatialx, spatialy)]
+# Y = pd.DataFrame(data=counts, index=index_names, columns=adata.var.index)
+
+# spatial_locations = pd.DataFrame(index=Y.index)
+# spatial_locations['x'] = Y.index.str.split('x').str.get(0).map(float)
+# spatial_locations['y'] = Y.index.str.split('x').str.get(1).map(float)
+
+# spatial_locations['total_counts'] = Y.sum(1)
+
+Y = pd.DataFrame(data=counts, 
+                index=adata.obs_names, 
+                columns=adata.var_names)
+
+spatial_locations = pd.DataFrame(data=adata.obsm['spatial'], 
+                                index=adata.obs_names, 
+                                columns=['x', 'y'])
+spatial_locations['total_counts'] = Y.sum(1)
+
+Y = Y.loc[spatial_locations.index]
+X = spatial_locations[['x', 'y']]
+
+scales = []
+for i in range(0, len(Y.columns)):
+    model = smf.glm(formula="Y.iloc[:,i]~0+spatial_locations['total_counts']", data=Y,
+                    family=sm.families.NegativeBinomial(sm.families.links.identity())).fit()
+    res = model.params[0]*spatial_locations['total_counts']
+    scales.append(res)
+scalesdf = pd.DataFrame(scales)
+scalesdf = scalesdf.T
+
+Y = Y.T
+X = X[['x', 'y']]
+
+sparse = True
+nb_scaled = True  # set the nb_scaled argument to True to pass the scale factors
+gene_name = Y.index
+likelihood = 'Negative_binomial'
+gp_counts = rna_seq_gp(
+    X, Y.loc[gene_name], sparse=sparse, M=250, scale=scalesdf, safe_mode=False)
+
+log_likelihood_ratio = gp_counts.One_sample_test(likelihood)
+
+df = gp_counts.calculate_FDR(log_likelihood_ratio)
+
+# save results
+df = df.loc[adata.var_names][['log_likelihood_ratio']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatially_variable_genes/methods/moran_i/.config.vsh.yaml b/target/native/spatially_variable_genes/methods/moran_i/.config.vsh.yaml
new file mode 100644
index 0000000000..d51536e555
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/moran_i/.config.vsh.yaml
@@ -0,0 +1,201 @@
+functionality:
+  name: "moran_i"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--coord_type_moran_i"
+    description: "Type of coordinate system. Valid options are \"grid\" for grid coordinates\
+      \ or \"generic\" for generic coordinates."
+    info: null
+    default:
+    - "generic"
+    required: false
+    choices:
+    - "grid"
+    - "generic"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Moran's I"
+    summary: "Moran's I is a measurement of spatial autocorrelation."
+    description: "The MoranI global spatial auto-correlation statistics evaluates\
+      \ whether features (i.e. genes) \nshows a pattern that is clustered, dispersed\
+      \ or random in the tissue are under consideration.\n"
+    preferred_normalization: "counts"
+    reference: "palla2022squidpy"
+    documentation_url: "https://squidpy.readthedocs.io/en/stable/api/squidpy.gr.spatial_autocorr.html"
+    repository_url: "https://github.com/scverse/squidpy"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "pandas"
+    - "squidpy==1.4.1"
+    - "matplotlib==3.8.3"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/moran_i/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/moran_i"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/moran_i/moran_i"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatially_variable_genes/methods/moran_i/moran_i b/target/native/spatially_variable_genes/methods/moran_i/moran_i
new file mode 100755
index 0000000000..b73b13cb8a
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/moran_i/moran_i
@@ -0,0 +1,524 @@
+#!/usr/bin/env bash
+
+# moran_i 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="moran_i"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "moran_i 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+  echo ""
+  echo "    --coord_type_moran_i"
+  echo "        type: string"
+  echo "        default: generic"
+  echo "        choices: [ grid, generic ]"
+  echo "        Type of coordinate system. Valid options are \"grid\" for grid coordinates"
+  echo "        or \"generic\" for generic coordinates."
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "moran_i 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --coord_type_moran_i)
+            [ -n "$VIASH_PAR_COORD_TYPE_MORAN_I" ] && ViashError Bad arguments for option \'--coord_type_moran_i\': \'$VIASH_PAR_COORD_TYPE_MORAN_I\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_COORD_TYPE_MORAN_I="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --coord_type_moran_i. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --coord_type_moran_i=*)
+            [ -n "$VIASH_PAR_COORD_TYPE_MORAN_I" ] && ViashError Bad arguments for option \'--coord_type_moran_i=*\': \'$VIASH_PAR_COORD_TYPE_MORAN_I\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_COORD_TYPE_MORAN_I=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_COORD_TYPE_MORAN_I+x} ]; then
+  VIASH_PAR_COORD_TYPE_MORAN_I="generic"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# check whether value is belongs to a set of choices
+if [ ! -z "$VIASH_PAR_COORD_TYPE_MORAN_I" ]; then
+  VIASH_PAR_COORD_TYPE_MORAN_I_CHOICES=("grid:generic")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_COORD_TYPE_MORAN_I_CHOICES[*]}:" =~ ":$VIASH_PAR_COORD_TYPE_MORAN_I:" ]]; then
+    ViashError '--coord_type_moran_i' specified value of \'$VIASH_PAR_COORD_TYPE_MORAN_I\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-moran_i-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import warnings
+warnings.filterwarnings('ignore')
+
+import anndata as ad
+import squidpy as sq
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'coord_type_moran_i': $( if [ ! -z ${VIASH_PAR_COORD_TYPE_MORAN_I+x} ]; then echo "r'${VIASH_PAR_COORD_TYPE_MORAN_I//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run moranI', flush=True)
+sq.gr.spatial_neighbors(adata,
+                        coord_type=par['coord_type_moran_i'],
+                        delaunay=True)
+
+sq.gr.spatial_autocorr(adata,
+                       mode="moran",
+                       layer='normalized',
+                       n_perms=100,
+                       genes=adata.var_names)
+
+# save results
+df = adata.uns["moranI"]
+df = df.loc[adata.var_names][['I']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatially_variable_genes/methods/nnsvg/.config.vsh.yaml b/target/native/spatially_variable_genes/methods/nnsvg/.config.vsh.yaml
new file mode 100644
index 0000000000..831c394e5e
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/nnsvg/.config.vsh.yaml
@@ -0,0 +1,190 @@
+functionality:
+  name: "nnsvg"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "nnSVG"
+    summary: "nnSVG is based on nearest-neighbor Gaussian process (NNGP) models to\
+      \ estimate parameters in GPs"
+    description: "nnSVG identifies genes that vary in expression continuously across\
+      \ the entire tissue or within a priori defined \nspatial domains. It uses gene-specific\
+      \ estimates of length scale parameters within the Gaussian process models, \n\
+      and scales linearly with the number of spatial locations.\n"
+    preferred_normalization: "counts"
+    reference: "weber2023nnsvg"
+    documentation_url: "https://bioconductor.org/packages/release/bioc/vignettes/nnSVG/inst/doc/nnSVG.html"
+    repository_url: "https://github.com/lmweber/nnSVG"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_r:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "anndata"
+    - "dplyr"
+    bioc:
+    - "SpatialExperiment"
+    - "scran"
+    - "nnSVG"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/nnsvg/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/nnsvg"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/nnsvg/nnsvg"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatially_variable_genes/methods/nnsvg/nnsvg b/target/native/spatially_variable_genes/methods/nnsvg/nnsvg
new file mode 100755
index 0000000000..3054e4a2d3
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/nnsvg/nnsvg
@@ -0,0 +1,522 @@
+#!/usr/bin/env bash
+
+# nnsvg 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="nnsvg"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "nnsvg 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "nnsvg 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-nnsvg-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+suppressMessages(library(SpatialExperiment))
+suppressMessages(library(scran))
+suppressMessages(library(nnSVG))
+suppressMessages(library(anndata))
+suppressMessages(library(dplyr))
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_data" = $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_DATA" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+# VIASH END
+
+# load data
+cat('Load data')
+adata <- read_h5ad(par\$input_data)
+counts <- t(as.matrix(adata\$layers[['counts']]))
+    
+colnames(counts) <- adata\$obs_names
+rownames(counts) <- adata\$var_names
+    
+loc <- as.data.frame(adata\$obsm[['spatial']])
+
+row_data = adata\$var
+row_data\$gene_id = rownames(row_data)
+row_data\$feature_type = "Gene Expression"
+
+colnames(loc) <- c("x", "y")
+rownames(loc) <- colnames(counts)
+
+spe <- SpatialExperiment(
+    assays = list(counts = counts),
+    rowData = row_data,
+    colData = loc, 
+    spatialCoordsNames = c("x", "y"))
+
+# calculate logcounts (log-transformed normalized counts) using scran package
+# using library size factors
+spe <- computeLibraryFactors(spe)
+spe <- logNormCounts(spe)
+
+# run nnSVG
+if (!is.null(meta\$cpus)) {
+    n_cpus <- meta\$cpus
+} else {
+    n_cpus <- 1
+}
+
+cat('Run nnSVG')
+spe <- nnSVG(spe, n_threads=n_cpus)
+
+df <- as.data.frame(rowData(spe)) %>%
+    subset(select = c('feature_id', 'LR_stat'))
+
+colnames(df) <- c('feature_id', 'pred_spatial_var_score')
+rownames(df) <- NULL
+
+# save output
+cat("Write output AnnData to file\\n")
+output = anndata::AnnData(
+    shape = adata\$shape, 
+    var=df,
+    uns=list('dataset_id' = adata\$uns[['dataset_id']],
+             'method_id' =  meta[['functionality_name']]))
+
+anndata::write_h5ad(anndata = output, filename = par\$output)
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatially_variable_genes/methods/scgco/.config.vsh.yaml b/target/native/spatially_variable_genes/methods/scgco/.config.vsh.yaml
new file mode 100644
index 0000000000..bd0913984e
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/scgco/.config.vsh.yaml
@@ -0,0 +1,224 @@
+functionality:
+  name: "scgco"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "scGCO"
+    summary: "Identification of spatially variable genes with graph cuts."
+    description: "Single-cell gene expression data with positional information is\
+      \ critical to dissect \nmechanisms and architectures of multicellular organisms,\
+      \ but the potential is limited \nby the scalability of current data analysis\
+      \ strategies. Here, we present scGCO, \na method based on fast optimization\
+      \ of hidden Markov Random Fields with graph cuts \nto identify spatially variable\
+      \ genes. Comparing to existing methods, scGCO delivers \na superior performance\
+      \ with lower false positive rate and improved specificity, \nwhile demonstrates\
+      \ a more robust performance in the presence of noises. \nCritically, scGCO scales\
+      \ near linearly with inputs and demonstrates orders of \nmagnitude better running\
+      \ time and memory requirement than existing methods, \nand could represent a\
+      \ valuable solution when spatial transcriptomics data grows \ninto millions\
+      \ of data points and beyond..\n"
+    preferred_normalization: "counts"
+    reference: "zhang2022identification"
+    documentation_url: "https://github.com/WangPeng-Lab/scGCO/blob/master/code/Tutorial/scGCO_tutorial.ipynb"
+    repository_url: "https://github.com/WangPeng-Lab/scGCO"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.9.16"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    - "procps"
+    - "libhdf5-dev"
+    - "cmake"
+    - "gdal-bin"
+    - "libgdal-dev"
+    interactive: false
+  - type: "docker"
+    run:
+    - "pip install Cython==0.29.33 numpy==1.23.5 scipy==1.9.1\n"
+  - type: "docker"
+    run:
+    - "git clone https://github.com/lzj1769/scGCO_simple.git /opt/scGCO/scGCO_simple\n"
+  - type: "python"
+    user: false
+    packages:
+    - "h5py==3.8.0"
+    - "pandas==1.5.3"
+    - "parmap==1.6.0"
+    - "scanpy==1.9.3"
+    - "tqdm==4.65.0"
+    - "anndata==0.8.0"
+    - "matplotlib==3.7.1"
+    - "scikit-learn==1.2.2"
+    - "hdbscan"
+    - "seaborn==0.12.2"
+    - "pysal==2.0.0"
+    - "pygco==0.0.16"
+    - "shapely==2.0.1"
+    - "networkx==2.5"
+    - "scikit-image"
+    - "pyyaml"
+    - "requests"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/scgco/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/scgco"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/scgco/scgco"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatially_variable_genes/methods/scgco/scgco b/target/native/spatially_variable_genes/methods/scgco/scgco
new file mode 100755
index 0000000000..246feb1df9
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/scgco/scgco
@@ -0,0 +1,508 @@
+#!/usr/bin/env bash
+
+# scgco 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="scgco"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "scgco 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "scgco 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-scgco-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import warnings
+warnings.filterwarnings('ignore')
+
+import pandas as pd
+import anndata as ad
+import numpy as np
+import scipy
+import sys
+sys.path.append("/opt/scGCO")
+
+from scGCO_simple import *
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+counts = adata.layers["counts"]
+if scipy.sparse.issparse(counts): 
+    counts = counts.todense()
+
+data = pd.DataFrame(
+    counts,
+    columns=adata.var_names,
+    index=adata.obs_names
+)
+
+print('Run scGCO', flush=True)
+data_norm = normalize_count_cellranger(data)
+
+exp = data.iloc[:, 0]
+locs = adata.obsm['spatial'].copy()
+
+print('Create graph with weight', flush=True)
+cellGraph = create_graph_with_weight(locs, exp)
+gmmDict = gmm_model(data_norm)
+
+print('Identify spatial genes', flush=True)
+df = identify_spatial_genes(locs, data_norm, cellGraph, gmmDict)
+
+# save results
+df = df.loc[adata.var_names][['fdr']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+# Transform the values via -log10 to make sure a bigger score represents a 
+# higher spatial variation
+df['pred_spatial_var_score'] = -np.log10(df['pred_spatial_var_score'].tolist())
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatially_variable_genes/methods/sepal/.config.vsh.yaml b/target/native/spatially_variable_genes/methods/sepal/.config.vsh.yaml
new file mode 100644
index 0000000000..6fe635a1c5
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/sepal/.config.vsh.yaml
@@ -0,0 +1,216 @@
+functionality:
+  name: "sepal"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_neighs_sepal"
+    description: "Maximum number of neighbors of a node in the spatial graph. Valid\
+      \ options are 4 (square-grid) and 6 (hexagonal-grid)."
+    info: null
+    default:
+    - 6
+    required: false
+    choices:
+    - 4
+    - 6
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--coord_type_sepal"
+    description: "Type of coordinate system. Valid options are \"grid\" for grid coordinates\
+      \ or \"generic\" for generic coordinates."
+    info: null
+    default:
+    - "grid"
+    required: false
+    choices:
+    - "grid"
+    - "generic"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Sepal"
+    summary: "Sepal simulates diffusion of individual transcripts to extract genes\
+      \ with spatial patterns."
+    description: "This method assesses the degree of randomness exhibited by each\
+      \ transcript profile and rank them accordingly.\n"
+    preferred_normalization: "counts"
+    reference: "andersson2021sepal"
+    documentation_url: "https://squidpy.readthedocs.io/en/stable/api/squidpy.gr.sepal.html"
+    repository_url: "https://github.com/scverse/squidpy"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "pandas"
+    - "squidpy==1.4.1"
+    - "matplotlib==3.8.3"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/sepal/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/sepal"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/sepal/sepal"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatially_variable_genes/methods/sepal/sepal b/target/native/spatially_variable_genes/methods/sepal/sepal
new file mode 100755
index 0000000000..d17eca7fe8
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/sepal/sepal
@@ -0,0 +1,560 @@
+#!/usr/bin/env bash
+
+# sepal 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="sepal"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "sepal 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+  echo ""
+  echo "    --max_neighs_sepal"
+  echo "        type: integer"
+  echo "        default: 6"
+  echo "        choices: [ 4, 6 ]"
+  echo "        Maximum number of neighbors of a node in the spatial graph. Valid"
+  echo "        options are 4 (square-grid) and 6 (hexagonal-grid)."
+  echo ""
+  echo "    --coord_type_sepal"
+  echo "        type: string"
+  echo "        default: grid"
+  echo "        choices: [ grid, generic ]"
+  echo "        Type of coordinate system. Valid options are \"grid\" for grid coordinates"
+  echo "        or \"generic\" for generic coordinates."
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "sepal 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --max_neighs_sepal)
+            [ -n "$VIASH_PAR_MAX_NEIGHS_SEPAL" ] && ViashError Bad arguments for option \'--max_neighs_sepal\': \'$VIASH_PAR_MAX_NEIGHS_SEPAL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_NEIGHS_SEPAL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --max_neighs_sepal. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --max_neighs_sepal=*)
+            [ -n "$VIASH_PAR_MAX_NEIGHS_SEPAL" ] && ViashError Bad arguments for option \'--max_neighs_sepal=*\': \'$VIASH_PAR_MAX_NEIGHS_SEPAL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_MAX_NEIGHS_SEPAL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --coord_type_sepal)
+            [ -n "$VIASH_PAR_COORD_TYPE_SEPAL" ] && ViashError Bad arguments for option \'--coord_type_sepal\': \'$VIASH_PAR_COORD_TYPE_SEPAL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_COORD_TYPE_SEPAL="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --coord_type_sepal. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --coord_type_sepal=*)
+            [ -n "$VIASH_PAR_COORD_TYPE_SEPAL" ] && ViashError Bad arguments for option \'--coord_type_sepal=*\': \'$VIASH_PAR_COORD_TYPE_SEPAL\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_COORD_TYPE_SEPAL=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# filling in defaults
+if [ -z ${VIASH_PAR_MAX_NEIGHS_SEPAL+x} ]; then
+  VIASH_PAR_MAX_NEIGHS_SEPAL="6"
+fi
+if [ -z ${VIASH_PAR_COORD_TYPE_SEPAL+x} ]; then
+  VIASH_PAR_COORD_TYPE_SEPAL="grid"
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_PAR_MAX_NEIGHS_SEPAL" ]]; then
+  if ! [[ "$VIASH_PAR_MAX_NEIGHS_SEPAL" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError '--max_neighs_sepal' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# check whether value is belongs to a set of choices
+if [ ! -z "$VIASH_PAR_MAX_NEIGHS_SEPAL" ]; then
+  VIASH_PAR_MAX_NEIGHS_SEPAL_CHOICES=("4:6")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_MAX_NEIGHS_SEPAL_CHOICES[*]}:" =~ ":$VIASH_PAR_MAX_NEIGHS_SEPAL:" ]]; then
+    ViashError '--max_neighs_sepal' specified value of \'$VIASH_PAR_MAX_NEIGHS_SEPAL\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+if [ ! -z "$VIASH_PAR_COORD_TYPE_SEPAL" ]; then
+  VIASH_PAR_COORD_TYPE_SEPAL_CHOICES=("grid:generic")
+  IFS=':'
+  set -f
+  if ! [[ ":${VIASH_PAR_COORD_TYPE_SEPAL_CHOICES[*]}:" =~ ":$VIASH_PAR_COORD_TYPE_SEPAL:" ]]; then
+    ViashError '--coord_type_sepal' specified value of \'$VIASH_PAR_COORD_TYPE_SEPAL\' is not in the list of allowed values. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+  set +f
+  unset IFS
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-sepal-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import squidpy as sq
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'max_neighs_sepal': $( if [ ! -z ${VIASH_PAR_MAX_NEIGHS_SEPAL+x} ]; then echo "int(r'${VIASH_PAR_MAX_NEIGHS_SEPAL//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'coord_type_sepal': $( if [ ! -z ${VIASH_PAR_COORD_TYPE_SEPAL+x} ]; then echo "r'${VIASH_PAR_COORD_TYPE_SEPAL//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Generate predictions', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+sq.gr.spatial_neighbors(adata,
+                        coord_type=par['coord_type_sepal'],
+                        delaunay=False)
+
+sq.gr.sepal(adata, 
+            layer='normalized',
+            max_neighs=par['max_neighs_sepal'], 
+            genes=adata.var_names,
+            n_jobs=1)
+
+# save results
+df = adata.uns["sepal_score"]
+df = df.loc[adata.var_names][['sepal_score']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatially_variable_genes/methods/somde/.config.vsh.yaml b/target/native/spatially_variable_genes/methods/somde/.config.vsh.yaml
new file mode 100644
index 0000000000..4360d5364e
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/somde/.config.vsh.yaml
@@ -0,0 +1,192 @@
+functionality:
+  name: "somde"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SOMDE"
+    summary: "SOMDE is a scalable method for identifying spatially variable genes\
+      \ with self-organizing map."
+    description: "SOMDE uses self-organizing map to cluster neighboring cells into\
+      \ nodes, and then uses a Gaussian process \nto fit the node-level spatial gene\
+      \ expression to identify SVgenes. Experiments show that SOMDE is about \n5 to\
+      \ 50 times faster than existing methods with comparable results. \nThe adjustable\
+      \ resolution of SOMDE makes it the only method that can give results in about\
+      \ \n5 min in large datasets of more than 20 000 sequencing sites.\n"
+    preferred_normalization: "counts"
+    reference: "hao2021somde"
+    documentation_url: "https://github.com/WhirlFirst/somde/blob/master/slide_seq0819_11_SOM.ipynb"
+    repository_url: "https://github.com/XuegongLab/somde"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "somde"
+    - "scanpy==1.9.8"
+    - "pandas==2.2.1"
+    - "numpy==1.26.4"
+    - "scipy==1.11.4"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/somde/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/somde"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/somde/somde"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatially_variable_genes/methods/somde/somde b/target/native/spatially_variable_genes/methods/somde/somde
new file mode 100755
index 0000000000..375e6dc81c
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/somde/somde
@@ -0,0 +1,498 @@
+#!/usr/bin/env bash
+
+# somde 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="somde"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "somde 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "somde 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-somde-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import pandas as pd
+import numpy as np
+import scanpy as sc
+from somde import SomNode
+import scipy
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run SOMDE', flush=True)
+counts = adata.layers["counts"]
+if scipy.sparse.issparse(counts): 
+    counts = counts.todense()
+    
+data = pd.DataFrame(
+    counts, 
+    columns=adata.var_names, 
+    index=adata.obs_names
+)
+
+X = pd.DataFrame(adata.obsm["spatial"], 
+                     index=adata.obs_names, 
+                     columns=["x", "y"]).values.astype(np.float32)
+    
+som = SomNode(X, k=10)
+ndf, ninfo = som.mtx(data.transpose())
+nres = som.norm() 
+
+df, SVnum = som.run()
+
+# save results
+df.set_index("g", inplace=True)
+df = df.loc[adata.var_names][['FSV']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatially_variable_genes/methods/spagcn/.config.vsh.yaml b/target/native/spatially_variable_genes/methods/spagcn/.config.vsh.yaml
new file mode 100644
index 0000000000..4a20c81f60
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/spagcn/.config.vsh.yaml
@@ -0,0 +1,206 @@
+functionality:
+  name: "spagcn"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SpaGCN"
+    summary: "Integrating gene expression, spatial location and histology to identify\
+      \ spatial domains and spatially variable genes by graph convolutional network."
+    description: "To elucidate spatial gene expression variation, we present SpaGCN,\
+      \ a graph convolutional \nnetwork approach that integrates gene expression,\
+      \ spatial location and histology in SRT data analysis. \nThrough graph convolution,\
+      \ SpaGCN aggregates gene expression of each spot from its neighboring spots,\
+      \ \nwhich enables the identification of spatial domains with coherent expression\
+      \ and histology. \nThe subsequent domain guided differential expression (DE)\
+      \ analysis then detects genes with \nenriched expression patterns in the identified\
+      \ domains. Analyzing seven SRT datasets using \nSpaGCN, we show it can detect\
+      \ genes with much more enriched spatial expression patterns than competing methods.\
+      \ Furthermore, genes detected by SpaGCN are transferrable and can be utilized\
+      \ to study spatial variation of gene expression in other datasets. SpaGCN is\
+      \ computationally \nfast, platform independent, making it a desirable tool for\
+      \ diverse SRT studies.\n"
+    preferred_normalization: "counts"
+    reference: "hu2021spagcn"
+    documentation_url: "https://github.com/jianhuupenn/SpaGCN/blob/master/tutorial/tutorial.ipynb"
+    repository_url: "https://github.com/jianhuupenn/SpaGCN"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    - "procps"
+    - "libhdf5-dev"
+    - "cmake"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone https://github.com/jianhuupenn/SpaGCN.git /opt/SpaGCN\n"
+  - type: "python"
+    user: false
+    packages:
+    - "numpy<2.0"
+    - "/opt/SpaGCN/SpaGCN_package"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spagcn/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/spagcn"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/spagcn/spagcn"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatially_variable_genes/methods/spagcn/spagcn b/target/native/spatially_variable_genes/methods/spagcn/spagcn
new file mode 100755
index 0000000000..f6261b2749
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/spagcn/spagcn
@@ -0,0 +1,577 @@
+#!/usr/bin/env bash
+
+# spagcn 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="spagcn"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "spagcn 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "spagcn 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-spagcn-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import SpaGCN as spg
+import pandas as pd
+import numpy as np
+import scanpy as sc
+import random
+import torch
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+# run normalization
+adata.X = adata.layers['counts'].copy()
+sc.pp.normalize_total(adata=adata)
+sc.pp.log1p(adata)
+
+print('Run SpaGCN', flush=True)
+random_seed = 100
+
+# Set seed
+random.seed(random_seed)
+torch.manual_seed(random_seed)
+np.random.seed(random_seed)
+
+p = 0.5
+min_in_group_fraction = 0
+min_in_out_group_ratio = 0
+min_fold_change = 0
+
+
+adj = spg.calculate_adj_matrix(
+    x=adata.obsm["spatial"][:, 0],
+    y=adata.obsm["spatial"][:, 1],
+    histology=False
+)
+l = spg.search_l(p, adj, start=0.01, end=1000, tol=0.01, max_run=100)
+
+clf = spg.SpaGCN()
+clf.set_l(l)
+
+# Run
+clf.train(
+    adata,
+    adj,
+    init_spa=True,
+    init="louvain",
+    res=0.5,
+    tol=5e-3,
+    lr=0.05,
+    max_epochs=200,
+)
+
+y_pred, prob = clf.predict()
+adata.obs["pred"] = y_pred
+de_genes_all = list()
+n_clusters = len(adata.obs["pred"].unique())
+
+# identify DE genes
+for target in range(n_clusters):
+    print(f"target: {target}")
+    start, end = np.quantile(adj[adj != 0], q=0.001), np.quantile(
+        adj[adj != 0], q=0.1
+    )
+    r = spg.search_radius(
+        target_cluster=target,
+        cell_id=adata.obs.index.tolist(),
+        x=adata.obsm["spatial"][:, 0],
+        y=adata.obsm["spatial"][:, 1],
+        pred=adata.obs["pred"].tolist(),
+        start=start,
+        end=end,
+        num_min=10,
+        num_max=14,
+        max_run=100,
+    )
+
+    try:
+        nbr_domians = spg.find_neighbor_clusters(
+            target_cluster=target,
+            cell_id=adata.obs.index.tolist(),
+            x=adata.obsm["spatial"][:, 0],
+            y=adata.obsm["spatial"][:, 1],
+            pred=adata.obs["pred"].tolist(),
+            radius=r,
+            ratio=0,
+        )
+
+        de_genes_info = spg.rank_genes_groups(
+            input_adata=adata,
+            target_cluster=target,
+            nbr_list=nbr_domians,
+            label_col="pred",
+            adj_nbr=True,
+            log=True,
+        )
+        de_genes_all.append(de_genes_info)
+    except (RuntimeError, TypeError, NameError):
+        pass
+
+if len(de_genes_all) == 0:
+    df = adata.var
+    df['pvals_adj'] = np.random.random(adata.n_vars)
+else:
+    df_res = pd.concat(de_genes_all)
+    df_res = df_res.groupby(["genes"]).min()
+    df_res = df_res.loc[adata.var_names]
+    df = pd.concat([df_res, adata.var], axis=1)
+
+# save results
+df = df.loc[adata.var_names][['pvals_adj']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+# reverse it to make sure a bigger score represents a higher spatial variation
+df['pred_spatial_var_score'] = -np.log10(df['pred_spatial_var_score'])
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatially_variable_genes/methods/spagft/.config.vsh.yaml b/target/native/spatially_variable_genes/methods/spagft/.config.vsh.yaml
new file mode 100644
index 0000000000..1f648a1018
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/spagft/.config.vsh.yaml
@@ -0,0 +1,216 @@
+functionality:
+  name: "spagft"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SpaGFT"
+    summary: "SpaGFT is a graph Fourier transform for tissue module identification\
+      \ from spatially resolved transcriptomics"
+    description: "The tissue module (TM) was defined as an architectural area containing\
+      \ recurrent cellular \ncommunities executing specific biological functions at\
+      \ different tissue sites. \nHowever, the computational identification of TMs\
+      \ poses challenges owing to their various \nlength scales, convoluted biological\
+      \ processes, not well-defined molecular features, and \nirregular spatial patterns.\
+      \ Here, we present a hypothesis-free graph Fourier transform model, \nSpaGFT,\
+      \ to characterize TMs. For the first time, SpaGFT transforms complex gene expression\
+      \ \npatterns into simple, but informative signals, leading to the accurate identification\
+      \ of \nspatially variable genes (SVGs) at a fast computational speed. Based\
+      \ on clustering the \ntransformed signals of the SVGs, SpaGFT provides a novel\
+      \ computational framework for TM \ncharacterization. Three case studies were\
+      \ used to illustrate TM identities, the biological \nprocesses of convoluted\
+      \ TMs in the lymph node, and conserved TMs across multiple samples constituting\
+      \ \nthe complex organ. The superior accuracy, scalability, and interpretability\
+      \ of SpaGFT indicate \nthat it is a novel and powerful tool for the investigation\
+      \ of TMs to gain new insights into a variety \nof biological questions.\n"
+    preferred_normalization: "counts"
+    reference: "chang2022spatial"
+    documentation_url: "https://spagft.readthedocs.io/en/latest/"
+    repository_url: "https://github.com/jxLiu-bio/SpaGFT"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.10"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    - "procps"
+    - "libhdf5-dev"
+    - "cmake"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone https://github.com/jxLiu-bio/SpaGFT.git /opt/SpaGFT\n"
+  - type: "python"
+    user: false
+    packages:
+    - "h5py"
+    - "numba==0.55.1"
+    - "louvain==0.7.1"
+    - "chardet==5.1.0"
+    - "charset-normalizer==3.1.0"
+    - "anndata"
+    - "/opt/SpaGFT"
+    - "mizani==0.9.3"
+    - "pyyaml"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spagft/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/spagft"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/spagft/spagft"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatially_variable_genes/methods/spagft/spagft b/target/native/spatially_variable_genes/methods/spagft/spagft
new file mode 100755
index 0000000000..fe301cdbe9
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/spagft/spagft
@@ -0,0 +1,489 @@
+#!/usr/bin/env bash
+
+# spagft 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="spagft"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "spagft 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "spagft 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-spagft-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import SpaGFT as spg
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run SpaGFT', flush=True)
+
+adata.X = adata.layers['normalized'].copy()
+
+adata.obs.loc[:, ['array_row', 'array_col']] = adata.obsm['spatial']
+
+(ratio_low, ratio_high) = spg.gft.determine_frequency_ratio(adata, ratio_neighbors=1)
+
+df = spg.detect_svg(adata,
+                    spatial_info=['array_row', 'array_col'],
+                    ratio_low_freq=ratio_low,
+                    ratio_high_freq=ratio_high,
+                    ratio_neighbors=1,
+                    filter_peaks=True,
+                    S=6)
+
+
+# save results
+df = df.loc[adata.var_names][['gft_score']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatially_variable_genes/methods/spanve/.config.vsh.yaml b/target/native/spatially_variable_genes/methods/spanve/.config.vsh.yaml
new file mode 100644
index 0000000000..3fc6d6b7ef
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/spanve/.config.vsh.yaml
@@ -0,0 +1,203 @@
+functionality:
+  name: "spanve"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Spanve"
+    summary: "Spanve is a non-parametric statistical approach based on modeling space\
+      \ dependence as a distance of two distributions for detecting SV genes."
+    description: "The depiction of in situ gene expression through spatial transcriptomics\
+      \ facilitates the inference of cell \nfunction mechanisms. To build spatial\
+      \ maps of transcriptomes, the first and crucial step is to \nidentify spatially\
+      \ variable (SV) genes. However, current methods fall short in dealing with \n\
+      large-scale spatial transcriptomics data and may result in a high false positive\
+      \ rate due to the \nmodeling of gene expression into parametric distributions.\
+      \ \nThis paper introduces Spanve (https://github.com/zjupgx/Spanve), a non-parametric\
+      \ statistical approach \nbased on modeling space dependence as a distance of\
+      \ two distributions for detecting SV genes. \nThe high computing efficiency\
+      \ and accuracy of Spanve is demonstrated through comprehensive benchmarking.\
+      \ \nAdditionally, Spanve can detect clustering-friendly SV genes and spatially\
+      \ variable co-expression, \nfacilitating the identification of spatial tissue\
+      \ domains by an imputation.   \n"
+    preferred_normalization: "counts"
+    reference: "cai2023spanve"
+    documentation_url: "https://github.com/zjupgx/Spanve/blob/main/tutorial.ipynb"
+    repository_url: "https://github.com/zjupgx/Spanve"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone https://github.com/gx-Cai/Spanve.git /opt/Spanve\n"
+  - type: "python"
+    user: false
+    packages:
+    - "/opt/Spanve"
+    - "numpy==1.26.4"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spanve/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/spanve"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/spanve/spanve"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatially_variable_genes/methods/spanve/spanve b/target/native/spatially_variable_genes/methods/spanve/spanve
new file mode 100755
index 0000000000..3c1aa1b131
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/spanve/spanve
@@ -0,0 +1,478 @@
+#!/usr/bin/env bash
+
+# spanve 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="spanve"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "spanve 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "spanve 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-spanve-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+from Spanve import Spanve
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run Spanve', flush=True)
+adata.X = adata.layers['counts']
+spanve = Spanve(adata)
+spanve.fit(verbose=False)
+
+# save results
+df = spanve.result_df
+df = df.loc[adata.var_names][['ent']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatially_variable_genes/methods/spark/.config.vsh.yaml b/target/native/spatially_variable_genes/methods/spark/.config.vsh.yaml
new file mode 100644
index 0000000000..8eec290b55
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/spark/.config.vsh.yaml
@@ -0,0 +1,184 @@
+functionality:
+  name: "spark"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SPARK"
+    summary: "Spatial PAttern Recognition via Kernels"
+    description: "SPARK builds upon a generalized linear spatial model (GLSM) with\
+      \ a variety of spatial kernels to accommodate count data.\nWith a newly developed\
+      \ penalized quasi-likelihood (PQL) algorithm, SPARK is scalable to analyzing\
+      \ tens of \nthousands of genes across tens of thousands spatial locations.\n"
+    preferred_normalization: "counts"
+    reference: "sun2020statistical"
+    documentation_url: "https://xzhoulab.github.io/SPARK/02_SPARK_Example/"
+    repository_url: "https://github.com/xzhoulab/SPARK"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_r:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    github:
+    - "xzhoulab/SPARK"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "veryhightime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spark/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/spark"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/spark/spark"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatially_variable_genes/methods/spark/spark b/target/native/spatially_variable_genes/methods/spark/spark
new file mode 100755
index 0000000000..d68c68dfe1
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/spark/spark
@@ -0,0 +1,526 @@
+#!/usr/bin/env bash
+
+# spark 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="spark"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "spark 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "spark 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-spark-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+suppressMessages(library(SPARK))
+suppressMessages(library(anndata))
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_data" = $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_DATA" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+# VIASH END
+
+# load data
+cat("Load data\\n")
+adata <- anndata::read_h5ad(par\$input_data)
+counts <- t(as.matrix(adata\$layers[["counts"]]))
+colnames(counts) <- adata\$obs_names
+rownames(counts) <- adata\$var_names
+info <- as.data.frame(adata\$obsm[["spatial"]])
+rownames(info) <- colnames(counts)
+colnames(info) <- c("x", "y")
+
+# run SPARK
+cat("Run SPARK\\n")
+if (!is.null(meta\$cpus)) {
+    n_cpus <- meta\$cpus
+} else {
+    n_cpus <- 1
+}
+
+spark <- CreateSPARKObject(
+    counts = counts, percentage = 0,
+    min_total_counts = 0, location = info[, 1:2]
+)
+
+spark@lib_size <- apply(spark@counts, 2, sum)
+spark <- spark.vc(spark,
+    covariates = NULL,
+    lib_size = spark@lib_size,
+    num_core = n_cpus,
+    verbose = FALSE
+)
+
+## Calculating pval
+spark <- spark.test(spark,
+    check_positive = T,
+    verbose = F
+)
+
+df <- as.data.frame(spark@res_mtest)
+
+df\$feature_id <- rownames(df)
+
+df <- subset(df, select = c("feature_id", "adjusted_pvalue"))
+colnames(df) <- c("feature_id", "pred_spatial_var_score")
+
+# because SPARK only generates p-values, we here transform the values
+# via -log10 to make sure a bigger score represents a higher spatial variation
+df\$pred_spatial_var_score <- -log10(df\$pred_spatial_var_score)
+
+# save output
+cat("Write output AnnData to file\\n")
+output <- anndata::AnnData(
+    shape = adata\$shape,
+    var = df,
+    uns = list(
+        "dataset_id" = adata\$uns[["dataset_id"]],
+        "method_id" = meta[["functionality_name"]]
+    )
+)
+
+anndata::write_h5ad(anndata = output, filename = par\$output)
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatially_variable_genes/methods/spark_x/.config.vsh.yaml b/target/native/spatially_variable_genes/methods/spark_x/.config.vsh.yaml
new file mode 100644
index 0000000000..8729283bfa
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/spark_x/.config.vsh.yaml
@@ -0,0 +1,191 @@
+functionality:
+  name: "spark_x"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SPARK-X"
+    summary: "SPARK-X is a non-parametric method for rapid and effective detection\
+      \ of spatially expressed genes in large spatial transcriptomic studies."
+    description: "Spatial transcriptomic studies are becoming increasingly common\
+      \ and large, posing important \nstatistical and computational challenges for\
+      \ many analytic tasks. Here, we present SPARK-X, \na non-parametric method for\
+      \ rapid and effective detection of spatially expressed genes in large \nspatial\
+      \ transcriptomic studies. SPARK-X not only produces effective type I error control\
+      \ and \nhigh power but also brings orders of magnitude computational savings.\
+      \ We apply SPARK-X to \nanalyze three large datasets, one of which is only analyzable\
+      \ by SPARK-X. In these data, \nSPARK-X identifies many spatially expressed genes\
+      \ including those that are spatially \nexpressed within the same cell type,\
+      \ revealing new biological insights.\n"
+    preferred_normalization: "counts"
+    reference: "zhu2021spark"
+    documentation_url: "https://xzhoulab.github.io/SPARK/02_SPARK_Example/"
+    repository_url: "https://github.com/xzhoulab/SPARK"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_r:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    github:
+    - "xzhoulab/SPARK"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spark_x/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/spark_x"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/spark_x/spark_x"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatially_variable_genes/methods/spark_x/spark_x b/target/native/spatially_variable_genes/methods/spark_x/spark_x
new file mode 100755
index 0000000000..583d64eb3b
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/spark_x/spark_x
@@ -0,0 +1,508 @@
+#!/usr/bin/env bash
+
+# spark_x 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="spark_x"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "spark_x 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "spark_x 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-spark_x-XXXXXX").R
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+suppressMessages(library(SPARK))
+suppressMessages(library(anndata))
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_data" = $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_DATA" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\]#\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\]#\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+# VIASH END
+
+# load data
+cat("Load data\\n")
+adata <- anndata::read_h5ad(par\$input_data)
+counts <- t(as.matrix(adata\$layers[["counts"]]))
+colnames(counts) <- adata\$obs_names
+rownames(counts) <- adata\$var_names
+info <- as.data.frame(adata\$obsm[["spatial"]])
+rownames(info) <- colnames(counts)
+colnames(info) <- c("x", "y")
+
+# run SPARK-X
+cat("Load SPARK-X\\n")
+if (!is.null(meta\$cpus)) {
+    n_cpus <- meta\$cpus
+} else {
+    n_cpus <- 1
+}
+
+sparkX <- sparkx(counts, info[, 1:2], numCores = n_cpus, option = "mixture")
+
+df <- as.data.frame(sparkX\$res_mtest)
+df\$feature_id <- rownames(df)
+df <- subset(df, select = c("feature_id", "adjustedPval"))
+colnames(df) <- c("feature_id", "pred_spatial_var_score")
+rownames(df) <- NULL
+
+# because SPARK-X only generates p-values, we here transform the values
+# via -log10 to make sure a bigger score represents a higher spatial variation
+df\$pred_spatial_var_score <- -log10(df\$pred_spatial_var_score)
+
+# save output
+cat("Write output AnnData to file\\n")
+output <- anndata::AnnData(
+    shape = adata\$shape,
+    var = df,
+    uns = list(
+        "dataset_id" = adata\$uns[["dataset_id"]],
+        "method_id" = meta[["functionality_name"]]
+    )
+)
+
+anndata::write_h5ad(anndata = output, filename = par\$output)
+VIASHMAIN
+Rscript "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatially_variable_genes/methods/spatialde/.config.vsh.yaml b/target/native/spatially_variable_genes/methods/spatialde/.config.vsh.yaml
new file mode 100644
index 0000000000..33c8df65ed
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/spatialde/.config.vsh.yaml
@@ -0,0 +1,197 @@
+functionality:
+  name: "spatialde"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SpatialDE"
+    summary: "SpatialDE is a method for identify spatially variable genes based on\
+      \ Gaussian Process model "
+    description: "SpatialDE decomposes expression variability into spatial and nonspatial\
+      \ components using two random effect terms: a spatial variance term that parametrizes\
+      \ gene expression covariance by pairwise distances of samples, and a noise term\
+      \ that models nonspatial variability.\n"
+    preferred_normalization: "counts"
+    reference: "svensson2018spatialde"
+    documentation_url: "https://github.com/Teichlab/SpatialDE"
+    repository_url: "https://github.com/Teichlab/SpatialDE"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone https://github.com/Teichlab/SpatialDE.git /opt/SpatialDE\n"
+  - type: "python"
+    user: false
+    packages:
+    - "/opt/SpatialDE/Python-module"
+    - "scanpy==1.9.8"
+    - "pandas==2.2.1"
+    - "numpy==1.26.4"
+    - "scipy==1.11.4"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spatialde/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/spatialde"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/spatialde/spatialde"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatially_variable_genes/methods/spatialde/spatialde b/target/native/spatially_variable_genes/methods/spatialde/spatialde
new file mode 100755
index 0000000000..693080502c
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/spatialde/spatialde
@@ -0,0 +1,498 @@
+#!/usr/bin/env bash
+
+# spatialde 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="spatialde"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "spatialde 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "spatialde 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-spatialde-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import warnings
+warnings.filterwarnings('ignore')
+
+import scanpy as sc
+import anndata as ad
+import NaiveDE
+import SpatialDE
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+# run spatialDE
+print('Run spatialDE')
+sc.pp.calculate_qc_metrics(adata, 
+                           layer='counts', 
+                           inplace=True, 
+                           percent_top=[10])
+    
+counts = sc.get.obs_df(adata, 
+                       keys=list(adata.var_names), 
+                       use_raw=False, 
+                       layer='counts')
+
+total_counts = sc.get.obs_df(adata, keys=["total_counts"])
+norm_expr = NaiveDE.stabilize(counts.T).T
+resid_expr = NaiveDE.regress_out(total_counts, 
+                                 norm_expr.T, 
+                                 "np.log(total_counts)").T
+    
+df = SpatialDE.run(adata.obsm["spatial"], resid_expr)
+
+# save results
+df.set_index("g", inplace=True)
+df = df.loc[adata.var_names][['FSV']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatially_variable_genes/methods/spatialde2/.config.vsh.yaml b/target/native/spatially_variable_genes/methods/spatialde2/.config.vsh.yaml
new file mode 100644
index 0000000000..799794c224
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/spatialde2/.config.vsh.yaml
@@ -0,0 +1,213 @@
+functionality:
+  name: "spatialde2"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SpatialDE2"
+    summary: "SpatialDE2: Fast and localized variance component analysis of spatial\
+      \ transcriptomics"
+    description: "Spatial transcriptomics is now a mature technology, allowing to\
+      \ assay gene expression changes \nin the histological context of complex tissues.\
+      \ A canonical analysis workflow starts with the \nidentification of tissue zones\
+      \ that share similar expression profiles, followed by the detection \nof highly\
+      \ variable or spatially variable genes. Rapid increases in the scale and complexity\
+      \ of \nspatial transcriptomic datasets demand that these analysis steps are\
+      \ conducted in a consistent \nand integrated manner, a requirement that is not\
+      \ met by current methods. To address this, we \nhere present SpatialDE2, which\
+      \ unifies the mapping of tissue zones and spatial variable gene \ndetection\
+      \ as integrated software framework, while at the same time advancing current\
+      \ algorithms \nfor both of these steps. Formulated in a Bayesian framework,\
+      \ the model accounts for the Poisson \ncount noise, while simultaneously offering\
+      \ superior computational speed compared to previous methods. \nWe validate SpatialDE2\
+      \ using simulated data and illustrate its utility in the context of two real-world\
+      \ \napplications to the spatial transcriptomics profiles of the mouse brain\
+      \ and human endometrium.\n"
+    preferred_normalization: "counts"
+    reference: "kats2021spatialde2"
+    documentation_url: "https://pmbio.github.io/SpatialDE/"
+    repository_url: "https://github.com/PMBio/SpatialDE"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.7.12"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    - "procps"
+    - "libhdf5-dev"
+    - "cmake"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone https://github.com/PMBio/SpatialDE.git /opt/SpatialDE2\n"
+  - type: "python"
+    user: false
+    packages:
+    - "scanpy"
+    - "anndata"
+    - "patsy"
+    - "/opt/SpatialDE2"
+    - "pyyaml"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spatialde2/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/spatialde2"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/methods/spatialde2/spatialde2"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatially_variable_genes/methods/spatialde2/spatialde2 b/target/native/spatially_variable_genes/methods/spatialde2/spatialde2
new file mode 100755
index 0000000000..55c00c9cf2
--- /dev/null
+++ b/target/native/spatially_variable_genes/methods/spatialde2/spatialde2
@@ -0,0 +1,496 @@
+#!/usr/bin/env bash
+
+# spatialde2 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="spatialde2"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "spatialde2 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_data"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "spatialde2 2.0.0"
+            exit
+            ;;
+        --input_data)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_data. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_data=*)
+            [ -n "$VIASH_PAR_INPUT_DATA" ] && ViashError Bad arguments for option \'--input_data=*\': \'$VIASH_PAR_INPUT_DATA\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_DATA=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_DATA+x} ]; then
+  ViashError '--input_data' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_DATA" ] && [ ! -e "$VIASH_PAR_INPUT_DATA" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_DATA' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-spatialde2-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import scanpy as sc
+import anndata as ad
+import SpatialDE as sd
+import NaiveDE
+import warnings
+warnings.filterwarnings("ignore")
+
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+# run SpatialDE2
+print('Run spatialDE2', flush=True)
+adata.X = adata.layers['counts'].copy()
+sc.pp.calculate_qc_metrics(adata, inplace=True, percent_top=[10])
+
+counts = sc.get.obs_df(adata,
+                       keys=list(adata.var_names),
+                       use_raw=False,
+                       layer='counts')
+
+total_counts = sc.get.obs_df(adata, keys=["total_counts"])
+norm_expr = NaiveDE.stabilize(counts.T).T
+adata.X = NaiveDE.regress_out(
+    total_counts, norm_expr.T, "np.log(total_counts)").T
+
+# run SpatialDE2
+df = sd.fit(adata, normalized=True, control=None)
+df.set_index("gene", inplace=True)
+
+# save results
+df = df.loc[adata.var_names][['FSV']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/native/spatially_variable_genes/metrics/correlation/.config.vsh.yaml b/target/native/spatially_variable_genes/metrics/correlation/.config.vsh.yaml
new file mode 100644
index 0000000000..136b9cf811
--- /dev/null
+++ b/target/native/spatially_variable_genes/metrics/correlation/.config.vsh.yaml
@@ -0,0 +1,241 @@
+functionality:
+  name: "correlation"
+  namespace: "spatially_variable_genes/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_method"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Anndata with true spatial variability."
+      description: "Anndata with true spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature (e.g., ESEMBL gene id suffixed\
+            \ with alpha value)."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: true
+        - type: "string"
+          name: "orig_feature_name"
+          description: "Original human-readable name for the feature, usually a gene\
+            \ symbol."
+          required: true
+        - type: "double"
+          name: "true_spatial_var_score"
+          description: "True spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: true
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file."
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "correlation"
+      label: "correlation"
+      summary: "Correlation represents the agreement of true and predicted spatial\
+        \ variability."
+      description: "Kendall rank correlation coefficient measures the ordinal association\
+        \ between two measured quantities. The best score and upper bound is 1 (observations\
+        \ have an identical rank), while the lower bound is -1 (observations have\
+        \ a completely different rank).\n"
+      reference: "kendall1938new"
+      documentation_url: "https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient"
+      repository_url: "https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html"
+      min: -1
+      max: 1
+      maximize: true
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A spatially variable genes identification metric."
+      description: "A metric for evaluating accuracy spatially variable genes identification\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "pandas"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/metrics/correlation/config.vsh.yaml"
+  platform: "native"
+  output: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/metrics/correlation"
+  executable: "/home/runner/work/openproblems/openproblems/target/native/spatially_variable_genes/metrics/correlation/correlation"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/native/spatially_variable_genes/metrics/correlation/correlation b/target/native/spatially_variable_genes/metrics/correlation/correlation
new file mode 100755
index 0000000000..8a90a02545
--- /dev/null
+++ b/target/native/spatially_variable_genes/metrics/correlation/correlation
@@ -0,0 +1,505 @@
+#!/usr/bin/env bash
+
+# correlation 2.0.0
+# 
+# This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+# work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+# Intuitive.
+# 
+# The component may contain files which fall under a different license. The
+# authors of this component should specify the license in the header of such
+# files, or include a separate license file detailing the licenses of all included
+# files.
+
+set -e
+
+if [ -z "$VIASH_TEMP" ]; then
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$VIASH_TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TMP}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMPDIR}
+  VIASH_TEMP=${VIASH_TEMP:-$TEMP}
+  VIASH_TEMP=${VIASH_TEMP:-/tmp}
+fi
+
+# define helper functions
+# ViashQuote: put quotes around non flag values
+# $1     : unquoted string
+# return : possibly quoted string
+# examples:
+#   ViashQuote --foo      # returns --foo
+#   ViashQuote bar        # returns 'bar'
+#   Viashquote --foo=bar  # returns --foo='bar'
+function ViashQuote {
+  if [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+=.+$ ]]; then
+    echo "$1" | sed "s#=\(.*\)#='\1'#"
+  elif [[ "$1" =~ ^-+[a-zA-Z0-9_\-]+$ ]]; then
+    echo "$1"
+  else
+    echo "'$1'"
+  fi
+}
+# ViashRemoveFlags: Remove leading flag
+# $1     : string with a possible leading flag
+# return : string without possible leading flag
+# examples:
+#   ViashRemoveFlags --foo=bar  # returns bar
+function ViashRemoveFlags {
+  echo "$1" | sed 's/^--*[a-zA-Z0-9_\-]*=//'
+}
+# ViashSourceDir: return the path of a bash file, following symlinks
+# usage   : ViashSourceDir ${BASH_SOURCE[0]}
+# $1      : Should always be set to ${BASH_SOURCE[0]}
+# returns : The absolute path of the bash file
+function ViashSourceDir {
+  SOURCE="$1"
+  while [ -h "$SOURCE" ]; do
+    DIR="$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )"
+    SOURCE="$(readlink "$SOURCE")"
+    [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
+  done
+  cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd
+}
+# ViashFindTargetDir: return the path of the '.build.yaml' file, following symlinks
+# usage   : ViashFindTargetDir 'ScriptPath'
+# $1      : The location from where to start the upward search
+# returns : The absolute path of the '.build.yaml' file
+function ViashFindTargetDir {
+  SOURCE="$1"
+  while [[ "$SOURCE" != "" && ! -e "$SOURCE/.build.yaml" ]]; do
+    SOURCE=${SOURCE%/*}
+  done
+  echo $SOURCE
+}
+# see https://en.wikipedia.org/wiki/Syslog#Severity_level
+VIASH_LOGCODE_EMERGENCY=0
+VIASH_LOGCODE_ALERT=1
+VIASH_LOGCODE_CRITICAL=2
+VIASH_LOGCODE_ERROR=3
+VIASH_LOGCODE_WARNING=4
+VIASH_LOGCODE_NOTICE=5
+VIASH_LOGCODE_INFO=6
+VIASH_LOGCODE_DEBUG=7
+VIASH_VERBOSITY=$VIASH_LOGCODE_NOTICE
+
+# ViashLog: Log events depending on the verbosity level
+# usage: ViashLog 1 alert Oh no something went wrong!
+# $1: required verbosity level
+# $2: display tag
+# $3+: messages to display
+# stdout: Your input, prepended by '[$2] '.
+function ViashLog {
+  local required_level="$1"
+  local display_tag="$2"
+  shift 2
+  if [ $VIASH_VERBOSITY -ge $required_level ]; then
+    >&2 echo "[$display_tag]" "$@"
+  fi
+}
+
+# ViashEmergency: log events when the system is unstable
+# usage: ViashEmergency Oh no something went wrong.
+# stdout: Your input, prepended by '[emergency] '.
+function ViashEmergency {
+  ViashLog $VIASH_LOGCODE_EMERGENCY emergency "$@"
+}
+
+# ViashAlert: log events when actions must be taken immediately (e.g. corrupted system database)
+# usage: ViashAlert Oh no something went wrong.
+# stdout: Your input, prepended by '[alert] '.
+function ViashAlert {
+  ViashLog $VIASH_LOGCODE_ALERT alert "$@"
+}
+
+# ViashCritical: log events when a critical condition occurs
+# usage: ViashCritical Oh no something went wrong.
+# stdout: Your input, prepended by '[critical] '.
+function ViashCritical {
+  ViashLog $VIASH_LOGCODE_CRITICAL critical "$@"
+}
+
+# ViashError: log events when an error condition occurs
+# usage: ViashError Oh no something went wrong.
+# stdout: Your input, prepended by '[error] '.
+function ViashError {
+  ViashLog $VIASH_LOGCODE_ERROR error "$@"
+}
+
+# ViashWarning: log potentially abnormal events
+# usage: ViashWarning Something may have gone wrong.
+# stdout: Your input, prepended by '[warning] '.
+function ViashWarning {
+  ViashLog $VIASH_LOGCODE_WARNING warning "$@"
+}
+
+# ViashNotice: log significant but normal events
+# usage: ViashNotice This just happened.
+# stdout: Your input, prepended by '[notice] '.
+function ViashNotice {
+  ViashLog $VIASH_LOGCODE_NOTICE notice "$@"
+}
+
+# ViashInfo: log normal events
+# usage: ViashInfo This just happened.
+# stdout: Your input, prepended by '[info] '.
+function ViashInfo {
+  ViashLog $VIASH_LOGCODE_INFO info "$@"
+}
+
+# ViashDebug: log all events, for debugging purposes
+# usage: ViashDebug This just happened.
+# stdout: Your input, prepended by '[debug] '.
+function ViashDebug {
+  ViashLog $VIASH_LOGCODE_DEBUG debug "$@"
+}
+
+# find source folder of this component
+VIASH_META_RESOURCES_DIR=`ViashSourceDir ${BASH_SOURCE[0]}`
+
+# find the root of the built components & dependencies
+VIASH_TARGET_DIR=`ViashFindTargetDir $VIASH_META_RESOURCES_DIR`
+
+# define meta fields
+VIASH_META_FUNCTIONALITY_NAME="correlation"
+VIASH_META_EXECUTABLE="$VIASH_META_RESOURCES_DIR/$VIASH_META_FUNCTIONALITY_NAME"
+VIASH_META_CONFIG="$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+VIASH_META_TEMP_DIR="$VIASH_TEMP"
+
+
+# ViashHelp: Display helpful explanation about this executable
+function ViashHelp {
+  echo "correlation 2.0.0"
+  echo ""
+  echo "Arguments:"
+  echo "    --input_method"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+  echo ""
+  echo "    --input_solution"
+  echo "        type: file, required parameter, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+  echo ""
+  echo "    --output"
+  echo "        type: file, required parameter, output, file must exist"
+  echo "        example:"
+  echo "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/score.h5ad"
+}
+
+# initialise array
+VIASH_POSITIONAL_ARGS=''
+VIASH_MODE='run'
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        -h|--help)
+            ViashHelp
+            exit
+            ;;
+        ---v|---verbose)
+            let "VIASH_VERBOSITY=VIASH_VERBOSITY+1"
+            shift 1
+            ;;
+        ---verbosity)
+            VIASH_VERBOSITY="$2"
+            shift 2
+            ;;
+        ---verbosity=*)
+            VIASH_VERBOSITY="$(ViashRemoveFlags "$1")"
+            shift 1
+            ;;
+        --version)
+            echo "correlation 2.0.0"
+            exit
+            ;;
+        --input_method)
+            [ -n "$VIASH_PAR_INPUT_METHOD" ] && ViashError Bad arguments for option \'--input_method\': \'$VIASH_PAR_INPUT_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_METHOD="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_method. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_method=*)
+            [ -n "$VIASH_PAR_INPUT_METHOD" ] && ViashError Bad arguments for option \'--input_method=*\': \'$VIASH_PAR_INPUT_METHOD\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_METHOD=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --input_solution)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --input_solution. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --input_solution=*)
+            [ -n "$VIASH_PAR_INPUT_SOLUTION" ] && ViashError Bad arguments for option \'--input_solution=*\': \'$VIASH_PAR_INPUT_SOLUTION\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_INPUT_SOLUTION=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        --output)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to --output. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        --output=*)
+            [ -n "$VIASH_PAR_OUTPUT" ] && ViashError Bad arguments for option \'--output=*\': \'$VIASH_PAR_OUTPUT\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_PAR_OUTPUT=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---cpus)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---cpus. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---cpus=*)
+            [ -n "$VIASH_META_CPUS" ] && ViashError Bad arguments for option \'---cpus=*\': \'$VIASH_META_CPUS\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_CPUS=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        ---memory)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY="$2"
+            [ $# -lt 2 ] && ViashError Not enough arguments passed to ---memory. Use "--help" to get more information on the parameters. && exit 1
+            shift 2
+            ;;
+        ---memory=*)
+            [ -n "$VIASH_META_MEMORY" ] && ViashError Bad arguments for option \'---memory=*\': \'$VIASH_META_MEMORY\' \& \'$2\' - you should provide exactly one argument for this option. && exit 1
+            VIASH_META_MEMORY=$(ViashRemoveFlags "$1")
+            shift 1
+            ;;
+        *)  # positional arg or unknown option
+            # since the positional args will be eval'd, can we always quote, instead of using ViashQuote
+            VIASH_POSITIONAL_ARGS="$VIASH_POSITIONAL_ARGS '$1'"
+            [[ $1 == -* ]] && ViashWarning $1 looks like a parameter but is not a defined parameter and will instead be treated as a positional argument. Use "--help" to get more information on the parameters.
+            shift # past argument
+            ;;
+    esac
+done
+
+# parse positional parameters
+eval set -- $VIASH_POSITIONAL_ARGS
+
+
+# setting computational defaults
+
+# helper function for parsing memory strings
+function ViashMemoryAsBytes {
+  local memory=`echo "$1" | tr '[:upper:]' '[:lower:]' | tr -d '[:space:]'`
+  local memory_regex='^([0-9]+)([kmgtp]b?|b)$'
+  if [[ $memory =~ $memory_regex ]]; then
+    local number=${memory/[^0-9]*/}
+    local symbol=${memory/*[0-9]/}
+    
+    case $symbol in
+      b)      memory_b=$number ;;
+      kb|k)   memory_b=$(( $number * 1024 )) ;;
+      mb|m)   memory_b=$(( $number * 1024 * 1024 )) ;;
+      gb|g)   memory_b=$(( $number * 1024 * 1024 * 1024 )) ;;
+      tb|t)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 )) ;;
+      pb|p)   memory_b=$(( $number * 1024 * 1024 * 1024 * 1024 * 1024 )) ;;
+    esac
+    echo "$memory_b"
+  fi
+}
+# compute memory in different units
+if [ ! -z ${VIASH_META_MEMORY+x} ]; then
+  VIASH_META_MEMORY_B=`ViashMemoryAsBytes $VIASH_META_MEMORY`
+  # do not define other variables if memory_b is an empty string
+  if [ ! -z "$VIASH_META_MEMORY_B" ]; then
+    VIASH_META_MEMORY_KB=$(( ($VIASH_META_MEMORY_B+1023) / 1024 ))
+    VIASH_META_MEMORY_MB=$(( ($VIASH_META_MEMORY_KB+1023) / 1024 ))
+    VIASH_META_MEMORY_GB=$(( ($VIASH_META_MEMORY_MB+1023) / 1024 ))
+    VIASH_META_MEMORY_TB=$(( ($VIASH_META_MEMORY_GB+1023) / 1024 ))
+    VIASH_META_MEMORY_PB=$(( ($VIASH_META_MEMORY_TB+1023) / 1024 ))
+  else
+    # unset memory if string is empty
+    unset $VIASH_META_MEMORY_B
+  fi
+fi
+# unset nproc if string is empty
+if [ -z "$VIASH_META_CPUS" ]; then
+  unset $VIASH_META_CPUS
+fi
+
+
+# check whether required parameters exist
+if [ -z ${VIASH_PAR_INPUT_METHOD+x} ]; then
+  ViashError '--input_method' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then
+  ViashError '--input_solution' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_PAR_OUTPUT+x} ]; then
+  ViashError '--output' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then
+  ViashError 'functionality_name' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_RESOURCES_DIR+x} ]; then
+  ViashError 'resources_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_EXECUTABLE+x} ]; then
+  ViashError 'executable' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_CONFIG+x} ]; then
+  ViashError 'config' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+if [ -z ${VIASH_META_TEMP_DIR+x} ]; then
+  ViashError 'temp_dir' is a required argument. Use "--help" to get more information on the parameters.
+  exit 1
+fi
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_INPUT_METHOD" ] && [ ! -e "$VIASH_PAR_INPUT_METHOD" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_METHOD' does not exist."
+  exit 1
+fi
+if [ ! -z "$VIASH_PAR_INPUT_SOLUTION" ] && [ ! -e "$VIASH_PAR_INPUT_SOLUTION" ]; then
+  ViashError "Input file '$VIASH_PAR_INPUT_SOLUTION' does not exist."
+  exit 1
+fi
+
+# check whether parameters values are of the right type
+if [[ -n "$VIASH_META_CPUS" ]]; then
+  if ! [[ "$VIASH_META_CPUS" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'cpus' has to be an integer. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_B" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_B" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_b' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_KB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_KB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_kb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_MB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_MB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_mb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_GB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_GB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_gb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_TB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_TB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_tb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+if [[ -n "$VIASH_META_MEMORY_PB" ]]; then
+  if ! [[ "$VIASH_META_MEMORY_PB" =~ ^[-+]?[0-9]+$ ]]; then
+    ViashError 'memory_pb' has to be a long. Use "--help" to get more information on the parameters.
+    exit 1
+  fi
+fi
+
+# create parent directories of output files, if so desired
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -d "$(dirname "$VIASH_PAR_OUTPUT")" ]; then
+  mkdir -p "$(dirname "$VIASH_PAR_OUTPUT")"
+fi
+
+
+# set dependency paths
+
+
+ViashDebug "Running command: bash"
+cat << VIASHEOF | bash
+set -e
+tempscript=\$(mktemp "$VIASH_META_TEMP_DIR/viash-run-correlation-XXXXXX").py
+function clean_up {
+  rm "\$tempscript"
+}
+function interrupt {
+  echo -e "\nCTRL-C Pressed..."
+  exit 1
+}
+trap clean_up EXIT
+trap interrupt INT SIGINT
+cat > "\$tempscript" << 'VIASHMAIN'
+import anndata as ad
+import pandas as pd
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_method': $( if [ ! -z ${VIASH_PAR_INPUT_METHOD+x} ]; then echo "r'${VIASH_PAR_INPUT_METHOD//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\'/\'\"\'\"r\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\'/\'\"\'\"r\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\'/\'\"\'\"r\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\'/\'\"\'\"r\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_method = ad.read_h5ad(par['input_method'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Compute metrics', flush=True)
+df = pd.merge(input_method.var, input_solution.var, how='left', on='feature_id')
+groupby = df.groupby('orig_feature_name', observed=True)
+corr = groupby.apply(lambda x: x['pred_spatial_var_score'].corr(x['true_spatial_var_score'], method='kendall'))
+
+uns_metric_ids = [ 'correlation' ]
+uns_metric_values = [ corr.mean() ]
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  uns={
+    'dataset_id': input_method.uns['dataset_id'],
+    'method_id': input_method.uns['method_id'],
+    'metric_ids': uns_metric_ids,
+    'metric_values': uns_metric_values
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "\$tempscript" &
+wait "\$!"
+
+VIASHEOF
+
+
+# check whether required files exist
+if [ ! -z "$VIASH_PAR_OUTPUT" ] && [ ! -e "$VIASH_PAR_OUTPUT" ]; then
+  ViashError "Output file '$VIASH_PAR_OUTPUT' does not exist."
+  exit 1
+fi
+
+
+exit 0
diff --git a/target/nextflow/batch_integration/control_methods/no_integration/batch_embed/.config.vsh.yaml b/target/nextflow/batch_integration/control_methods/no_integration/batch_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..cec9afb4fc
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/no_integration/batch_embed/.config.vsh.yaml
@@ -0,0 +1,206 @@
+functionality:
+  name: "batch_embed"
+  namespace: "batch_integration/control_methods/no_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "No integration by Batch"
+    summary: "Cells are embedded by computing PCA independently on each batch"
+    description: "Cells are embedded by computing PCA independently on each batch"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_embed/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "embedding"
+    type_info:
+      label: "Control method (embedding)"
+      summary: "A batch integration embedding control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/batch_embed/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/batch_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/batch_embed/batch_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/control_methods/no_integration/batch_embed/main.nf b/target/nextflow/batch_integration/control_methods/no_integration/batch_embed/main.nf
new file mode 100644
index 0000000000..f2a116db6a
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/no_integration/batch_embed/main.nf
@@ -0,0 +1,3620 @@
+// batch_embed 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "batch_embed",
+    "namespace" : "batch_integration/control_methods/no_integration",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/batch_embed/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "No integration by Batch",
+      "summary" : "Cells are embedded by computing PCA independently on each batch",
+      "description" : "Cells are embedded by computing PCA independently on each batch",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/batch_integration_embed/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "control_method",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Control method (embedding)",
+        "summary" : "A batch integration embedding control method.",
+        "description" : "A batch integration control method which outputs a batch-corrected embedding.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/batch_embed/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/batch_embed",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import scanpy as sc
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+adata.var["highly_variable"] = adata.var["hvg"]
+
+print("Process dataset", flush=True)
+adata.obsm["X_emb"] = np.zeros((adata.shape[0], 50), dtype=float)
+for batch in adata.obs["batch"].unique():
+    batch_idx = adata.obs["batch"] == batch
+    n_comps = min(50, np.sum(batch_idx))
+    solver = "full" if n_comps == np.sum(batch_idx) else "arpack"
+    adata.obsm["X_emb"][batch_idx, :n_comps] = sc.tl.pca(
+        adata[batch_idx].copy(),
+        n_comps=n_comps,
+        use_highly_variable=True,
+        svd_solver=solver,
+        copy=True,
+    ).obsm["X_pca"]
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/control_methods/no_integration/batch_embed",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/control_methods/no_integration/batch_embed/nextflow.config b/target/nextflow/batch_integration/control_methods/no_integration/batch_embed/nextflow.config
new file mode 100644
index 0000000000..d1a6babb32
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/no_integration/batch_embed/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/control_methods/no_integration/batch_embed'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/control_methods/no_integration/batch_embed/read_anndata_partial.py b/target/nextflow/batch_integration/control_methods/no_integration/batch_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/no_integration/batch_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/control_methods/no_integration/global_embed/.config.vsh.yaml b/target/nextflow/batch_integration/control_methods/no_integration/global_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..fee0c1ea26
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/no_integration/global_embed/.config.vsh.yaml
@@ -0,0 +1,206 @@
+functionality:
+  name: "global_embed"
+  namespace: "batch_integration/control_methods/no_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "No integration"
+    summary: "Cells are embedded by PCA on the unintegrated data"
+    description: "Cells are embedded by PCA on the unintegrated data"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "embedding"
+    type_info:
+      label: "Control method (embedding)"
+      summary: "A batch integration embedding control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_embed/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/global_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/global_embed/global_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/control_methods/no_integration/global_embed/main.nf b/target/nextflow/batch_integration/control_methods/no_integration/global_embed/main.nf
new file mode 100644
index 0000000000..ec43ea2798
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/no_integration/global_embed/main.nf
@@ -0,0 +1,3606 @@
+// global_embed 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "global_embed",
+    "namespace" : "batch_integration/control_methods/no_integration",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_embed/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "No integration",
+      "summary" : "Cells are embedded by PCA on the unintegrated data",
+      "description" : "Cells are embedded by PCA on the unintegrated data",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/_common/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "control_method",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Control method (embedding)",
+        "summary" : "A batch integration embedding control method.",
+        "description" : "A batch integration control method which outputs a batch-corrected embedding.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_embed/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/global_embed",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+print("process dataset", flush=True)
+adata.obsm["X_emb"] = adata.obsm["X_pca"]
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_embed",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/control_methods/no_integration/global_embed/nextflow.config b/target/nextflow/batch_integration/control_methods/no_integration/global_embed/nextflow.config
new file mode 100644
index 0000000000..cf89f846dc
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/no_integration/global_embed/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/control_methods/no_integration/global_embed'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/control_methods/no_integration/global_embed/read_anndata_partial.py b/target/nextflow/batch_integration/control_methods/no_integration/global_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/no_integration/global_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/control_methods/no_integration/global_feature/.config.vsh.yaml b/target/nextflow/batch_integration/control_methods/no_integration/global_feature/.config.vsh.yaml
new file mode 100644
index 0000000000..1bdf36cf2d
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/no_integration/global_feature/.config.vsh.yaml
@@ -0,0 +1,206 @@
+functionality:
+  name: "global_feature"
+  namespace: "batch_integration/control_methods/no_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "No integration"
+    summary: "Original feature space is not modified"
+    description: "Original feature space is not modified"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "feature"
+    type_info:
+      label: "Control method (feature)"
+      summary: "A batch integration feature control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ feature space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_feature/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/global_feature"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/global_feature/global_feature"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/control_methods/no_integration/global_feature/main.nf b/target/nextflow/batch_integration/control_methods/no_integration/global_feature/main.nf
new file mode 100644
index 0000000000..14db7ab412
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/no_integration/global_feature/main.nf
@@ -0,0 +1,3608 @@
+// global_feature 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "global_feature",
+    "namespace" : "batch_integration/control_methods/no_integration",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "feature",
+          "label" : "Integrated Feature",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "corrected_counts",
+                "description" : "Corrected counts after integration",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_feature/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "No integration",
+      "summary" : "Original feature space is not modified",
+      "description" : "Original feature space is not modified",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/_common/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "control_method",
+      "subtype" : "feature",
+      "type_info" : {
+        "label" : "Control method (feature)",
+        "summary" : "A batch integration feature control method.",
+        "description" : "A batch integration control method which outputs a batch-corrected feature space.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_feature/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/global_feature",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+# no processing, subset matrix to highly variable genes
+adata_hvg = adata[:, adata.var["hvg"]].copy()
+adata.layers['corrected_counts'] = adata_hvg.X.copy()
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_feature",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/control_methods/no_integration/global_feature/nextflow.config b/target/nextflow/batch_integration/control_methods/no_integration/global_feature/nextflow.config
new file mode 100644
index 0000000000..90d3ea5dc5
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/no_integration/global_feature/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/control_methods/no_integration/global_feature'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/control_methods/no_integration/global_feature/read_anndata_partial.py b/target/nextflow/batch_integration/control_methods/no_integration/global_feature/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/no_integration/global_feature/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/control_methods/no_integration/global_graph/.config.vsh.yaml b/target/nextflow/batch_integration/control_methods/no_integration/global_graph/.config.vsh.yaml
new file mode 100644
index 0000000000..15a86f3702
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/no_integration/global_graph/.config.vsh.yaml
@@ -0,0 +1,217 @@
+functionality:
+  name: "global_graph"
+  namespace: "batch_integration/control_methods/no_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "graph"
+      label: "Integrated Graph"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        obsp:
+        - type: "double"
+          name: "connectivities"
+          description: "Neighbors connectivities matrix."
+          required: true
+        - type: "double"
+          name: "distances"
+          description: "Neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "object"
+          name: "neighbors"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "No integration"
+    summary: "kNN graph is built on the PCA of the unintegrated data"
+    description: "Cells are embedded by PCA on the unintegrated data. A kNN graph\
+      \ is built on this PCA."
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "graph"
+    type_info:
+      label: "Control method (graph)"
+      summary: "A batch integration graph control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ cell graphs.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_graph/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/global_graph"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/global_graph/global_graph"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/control_methods/no_integration/global_graph/main.nf b/target/nextflow/batch_integration/control_methods/no_integration/global_graph/main.nf
new file mode 100644
index 0000000000..0da1090669
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/no_integration/global_graph/main.nf
@@ -0,0 +1,3628 @@
+// global_graph 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "global_graph",
+    "namespace" : "batch_integration/control_methods/no_integration",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "graph",
+          "label" : "Integrated Graph",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "connectivities",
+                "description" : "Neighbors connectivities matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "distances",
+                "description" : "Neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "neighbors",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_graph/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "../../utils.py",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_graph/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "No integration",
+      "summary" : "kNN graph is built on the PCA of the unintegrated data",
+      "description" : "Cells are embedded by PCA on the unintegrated data. A kNN graph is built on this PCA.",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/_common/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "control_method",
+      "subtype" : "graph",
+      "type_info" : {
+        "label" : "Control method (graph)",
+        "summary" : "A batch integration graph control method.",
+        "description" : "A batch integration control method which outputs a batch-corrected cell graphs.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_graph/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/global_graph",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import scanpy as sc
+import sys
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _set_uns
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsp='obsp',
+    uns='uns'
+)
+
+print("process dataset", flush=True)
+neighbors_map = adata.uns['knn']
+adata.obsp['connectivities'] = adata.obsp[neighbors_map['connectivities_key']]
+adata.obsp['distances'] = adata.obsp[neighbors_map['distances_key']]
+_set_uns(adata, neighbors_key='knn')
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/control_methods/no_integration/global_graph",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/control_methods/no_integration/global_graph/nextflow.config b/target/nextflow/batch_integration/control_methods/no_integration/global_graph/nextflow.config
new file mode 100644
index 0000000000..535ed803bd
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/no_integration/global_graph/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/control_methods/no_integration/global_graph'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/control_methods/no_integration/global_graph/read_anndata_partial.py b/target/nextflow/batch_integration/control_methods/no_integration/global_graph/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/no_integration/global_graph/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/control_methods/no_integration/global_graph/utils.py b/target/nextflow/batch_integration/control_methods/no_integration/global_graph/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/no_integration/global_graph/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed/.config.vsh.yaml b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..923e01516c
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed/.config.vsh.yaml
@@ -0,0 +1,208 @@
+functionality:
+  name: "celltype_embed"
+  namespace: "batch_integration/control_methods/perfect_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Perfect embedding by cell type"
+    summary: "Cells are embedded as a one-hot encoding of celltype labels"
+    description: "Cells are embedded as a one-hot encoding of celltype labels"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "embedding"
+    type_info:
+      label: "Control method (embedding)"
+      summary: "A batch integration embedding control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/perfect_integration/celltype_embed/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed/celltype_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed/main.nf b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed/main.nf
new file mode 100644
index 0000000000..9bf6a147a6
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed/main.nf
@@ -0,0 +1,3610 @@
+// celltype_embed 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "celltype_embed",
+    "namespace" : "batch_integration/control_methods/perfect_integration",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/perfect_integration/celltype_embed/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "../../utils.py",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/perfect_integration/celltype_embed/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Perfect embedding by cell type",
+      "summary" : "Cells are embedded as a one-hot encoding of celltype labels",
+      "description" : "Cells are embedded as a one-hot encoding of celltype labels",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/_common/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "control_method",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Control method (embedding)",
+        "summary" : "A batch integration embedding control method.",
+        "description" : "A batch integration control method which outputs a batch-corrected embedding.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/perfect_integration/celltype_embed/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import sys
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+sys.path.append(meta["resources_dir"])
+from utils import _perfect_embedding
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    uns='uns'
+)
+
+print('Process data...', flush=True)
+adata.obsm["X_emb"] = _perfect_embedding(partition=adata.obs["label"])
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/control_methods/perfect_integration/celltype_embed",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed/nextflow.config b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed/nextflow.config
new file mode 100644
index 0000000000..9168c365b9
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/control_methods/perfect_integration/celltype_embed'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed/read_anndata_partial.py b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed/utils.py b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/.config.vsh.yaml b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..7336a1a02f
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/.config.vsh.yaml
@@ -0,0 +1,220 @@
+functionality:
+  name: "celltype_jitter_embed"
+  namespace: "batch_integration/control_methods/perfect_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--jitter"
+    info: null
+    default:
+    - 0.01
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Perfect embedding by celltype with jitter"
+    summary: "Cells are embedded as a one-hot encoding of celltype labels, with a\
+      \ small amount of random noise added to the embedding"
+    description: "Cells are embedded as a one-hot encoding of celltype labels, with\
+      \ a small amount of random noise added to the embedding"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_embed/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "embedding"
+    type_info:
+      label: "Control method (embedding)"
+      summary: "A batch integration embedding control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/celltype_jitter_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/main.nf b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/main.nf
new file mode 100644
index 0000000000..78f73058fd
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/main.nf
@@ -0,0 +1,3626 @@
+// celltype_jitter_embed 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "celltype_jitter_embed",
+    "namespace" : "batch_integration/control_methods/perfect_integration",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "double",
+        "name" : "--jitter",
+        "default" : [
+          0.01
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "../../utils.py",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Perfect embedding by celltype with jitter",
+      "summary" : "Cells are embedded as a one-hot encoding of celltype labels, with a small amount of random noise added to the embedding",
+      "description" : "Cells are embedded as a one-hot encoding of celltype labels, with a small amount of random noise added to the embedding",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/batch_integration_embed/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "control_method",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Control method (embedding)",
+        "summary" : "A batch integration embedding control method.",
+        "description" : "A batch integration control method which outputs a batch-corrected embedding.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import sys
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'jitter': $( if [ ! -z ${VIASH_PAR_JITTER+x} ]; then echo "float(r'${VIASH_PAR_JITTER//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+sys.path.append(meta["resources_dir"])
+from utils import _perfect_embedding
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    uns='uns'
+)
+
+print('Process data...', flush=True)
+adata.obsm["X_emb"] = _perfect_embedding(
+    partition=adata.obs["label"],
+    jitter=par["jitter"]
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/control_methods/perfect_integration/celltype_jitter_embed",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/nextflow.config b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/nextflow.config
new file mode 100644
index 0000000000..399f0dc685
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/control_methods/perfect_integration/celltype_jitter_embed'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/read_anndata_partial.py b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/utils.py b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/batch_embed/.config.vsh.yaml b/target/nextflow/batch_integration/control_methods/random_integration/batch_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..95a9f92313
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/batch_embed/.config.vsh.yaml
@@ -0,0 +1,208 @@
+functionality:
+  name: "batch_embed"
+  namespace: "batch_integration/control_methods/random_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Random integration by batch"
+    summary: "Embedding coordinates are randomly permuted within each batch"
+    description: "Embedding coordinates are randomly permuted within each batch"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "embedding"
+    type_info:
+      label: "Control method (embedding)"
+      summary: "A batch integration embedding control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_embed/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/batch_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/batch_embed/batch_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/batch_embed/main.nf b/target/nextflow/batch_integration/control_methods/random_integration/batch_embed/main.nf
new file mode 100644
index 0000000000..3239d8d731
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/batch_embed/main.nf
@@ -0,0 +1,3615 @@
+// batch_embed 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "batch_embed",
+    "namespace" : "batch_integration/control_methods/random_integration",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_embed/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "../../utils.py",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_embed/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Random integration by batch",
+      "summary" : "Embedding coordinates are randomly permuted within each batch",
+      "description" : "Embedding coordinates are randomly permuted within each batch",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/_common/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "control_method",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Control method (embedding)",
+        "summary" : "A batch integration embedding control method.",
+        "description" : "A batch integration control method which outputs a batch-corrected embedding.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_embed/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/batch_embed",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_features
+from read_anndata_partial import read_anndata
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+print("process dataset", flush=True)
+adata.obsm["X_emb"] = _randomize_features(
+    adata.obsm["X_pca"],
+    partition=adata.obs["batch"],
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_embed",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/batch_embed/nextflow.config b/target/nextflow/batch_integration/control_methods/random_integration/batch_embed/nextflow.config
new file mode 100644
index 0000000000..9e2fe3266b
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/batch_embed/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/control_methods/random_integration/batch_embed'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/batch_embed/read_anndata_partial.py b/target/nextflow/batch_integration/control_methods/random_integration/batch_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/batch_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/batch_embed/utils.py b/target/nextflow/batch_integration/control_methods/random_integration/batch_embed/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/batch_embed/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/batch_feature/.config.vsh.yaml b/target/nextflow/batch_integration/control_methods/random_integration/batch_feature/.config.vsh.yaml
new file mode 100644
index 0000000000..a2b43e3b5e
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/batch_feature/.config.vsh.yaml
@@ -0,0 +1,208 @@
+functionality:
+  name: "batch_feature"
+  namespace: "batch_integration/control_methods/random_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Random integration by batch"
+    summary: "Feature values are randomly permuted within each batch"
+    description: "Feature values are randomly permuted within each batch"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "acf5c95a7306b819c4a13972783433d0a48f769b"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "feature"
+    type_info:
+      label: "Control method (feature)"
+      summary: "A batch integration feature control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ feature space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_feature/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/batch_feature"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/batch_feature/batch_feature"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/batch_feature/main.nf b/target/nextflow/batch_integration/control_methods/random_integration/batch_feature/main.nf
new file mode 100644
index 0000000000..7e3a3deeaa
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/batch_feature/main.nf
@@ -0,0 +1,3617 @@
+// batch_feature 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "batch_feature",
+    "namespace" : "batch_integration/control_methods/random_integration",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "feature",
+          "label" : "Integrated Feature",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "corrected_counts",
+                "description" : "Corrected counts after integration",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_feature/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "../../utils.py",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_feature/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Random integration by batch",
+      "summary" : "Feature values are randomly permuted within each batch",
+      "description" : "Feature values are randomly permuted within each batch",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/_common/methods/baseline.py",
+        "commit" : "acf5c95a7306b819c4a13972783433d0a48f769b"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "control_method",
+      "subtype" : "feature",
+      "type_info" : {
+        "label" : "Control method (feature)",
+        "summary" : "A batch integration feature control method.",
+        "description" : "A batch integration control method which outputs a batch-corrected feature space.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_feature/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/batch_feature",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import sys
+
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_features
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+adata.layers['corrected_counts'] = _randomize_features(
+    adata.X,
+    partition=adata.obs["batch"],
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_feature",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/batch_feature/nextflow.config b/target/nextflow/batch_integration/control_methods/random_integration/batch_feature/nextflow.config
new file mode 100644
index 0000000000..975c698a6c
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/batch_feature/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/control_methods/random_integration/batch_feature'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/batch_feature/read_anndata_partial.py b/target/nextflow/batch_integration/control_methods/random_integration/batch_feature/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/batch_feature/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/batch_feature/utils.py b/target/nextflow/batch_integration/control_methods/random_integration/batch_feature/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/batch_feature/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/batch_graph/.config.vsh.yaml b/target/nextflow/batch_integration/control_methods/random_integration/batch_graph/.config.vsh.yaml
new file mode 100644
index 0000000000..e1c3c2df00
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/batch_graph/.config.vsh.yaml
@@ -0,0 +1,216 @@
+functionality:
+  name: "batch_graph"
+  namespace: "batch_integration/control_methods/random_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "graph"
+      label: "Integrated Graph"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        obsp:
+        - type: "double"
+          name: "connectivities"
+          description: "Neighbors connectivities matrix."
+          required: true
+        - type: "double"
+          name: "distances"
+          description: "Neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "object"
+          name: "neighbors"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Random integration"
+    summary: "Graph connectivity values are randomly permuted within each batch"
+    description: "Graph connectivity values are randomly permuted within each batch"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "graph"
+    type_info:
+      label: "Control method (graph)"
+      summary: "A batch integration graph control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ cell graphs.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_graph/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/batch_graph"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/batch_graph/batch_graph"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/batch_graph/main.nf b/target/nextflow/batch_integration/control_methods/random_integration/batch_graph/main.nf
new file mode 100644
index 0000000000..522288b430
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/batch_graph/main.nf
@@ -0,0 +1,3629 @@
+// batch_graph 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "batch_graph",
+    "namespace" : "batch_integration/control_methods/random_integration",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "graph",
+          "label" : "Integrated Graph",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "connectivities",
+                "description" : "Neighbors connectivities matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "distances",
+                "description" : "Neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "neighbors",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_graph/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "../../utils.py",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_graph/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Random integration",
+      "summary" : "Graph connectivity values are randomly permuted within each batch",
+      "description" : "Graph connectivity values are randomly permuted within each batch",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/_common/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "control_method",
+      "subtype" : "graph",
+      "type_info" : {
+        "label" : "Control method (graph)",
+        "summary" : "A batch integration graph control method.",
+        "description" : "A batch integration control method which outputs a batch-corrected cell graphs.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_graph/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/batch_graph",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import sys
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_graph
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsp='obsp',
+    uns='uns'
+)
+
+print('Randomize graph...', flush=True)
+adata = _randomize_graph(
+    adata,
+    neighbors_key="knn",
+    partition=adata.obs["batch"],
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/control_methods/random_integration/batch_graph",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/batch_graph/nextflow.config b/target/nextflow/batch_integration/control_methods/random_integration/batch_graph/nextflow.config
new file mode 100644
index 0000000000..e4ca0a4245
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/batch_graph/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/control_methods/random_integration/batch_graph'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/batch_graph/read_anndata_partial.py b/target/nextflow/batch_integration/control_methods/random_integration/batch_graph/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/batch_graph/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/batch_graph/utils.py b/target/nextflow/batch_integration/control_methods/random_integration/batch_graph/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/batch_graph/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed/.config.vsh.yaml b/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..bbf56f578f
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed/.config.vsh.yaml
@@ -0,0 +1,208 @@
+functionality:
+  name: "celltype_embed"
+  namespace: "batch_integration/control_methods/random_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Random embedding by cell type"
+    summary: "Embedding coordinates are randomized within celltype labels"
+    description: "Embedding coordinates are randomized within celltype labels"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "embedding"
+    type_info:
+      label: "Control method (embedding)"
+      summary: "A batch integration embedding control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_embed/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed/celltype_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed/main.nf b/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed/main.nf
new file mode 100644
index 0000000000..e80c133fd2
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed/main.nf
@@ -0,0 +1,3614 @@
+// celltype_embed 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "celltype_embed",
+    "namespace" : "batch_integration/control_methods/random_integration",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_embed/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "../../utils.py",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_embed/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Random embedding by cell type",
+      "summary" : "Embedding coordinates are randomized within celltype labels",
+      "description" : "Embedding coordinates are randomized within celltype labels",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/_common/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "control_method",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Control method (embedding)",
+        "summary" : "A batch integration embedding control method.",
+        "description" : "A batch integration control method which outputs a batch-corrected embedding.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_embed/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import sys
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_features
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+print('Process data...', flush=True)
+adata.obsm["X_emb"] = _randomize_features(
+    adata.obsm["X_pca"],
+    partition=adata.obs["label"]
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_embed",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed/nextflow.config b/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed/nextflow.config
new file mode 100644
index 0000000000..6bc440d098
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/control_methods/random_integration/celltype_embed'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed/read_anndata_partial.py b/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed/utils.py b/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature/.config.vsh.yaml b/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature/.config.vsh.yaml
new file mode 100644
index 0000000000..89c94750ca
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature/.config.vsh.yaml
@@ -0,0 +1,208 @@
+functionality:
+  name: "celltype_feature"
+  namespace: "batch_integration/control_methods/random_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Random feature by cell type"
+    summary: "Features are randomized within celltype labels"
+    description: "Features are randomized within celltype labels"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "feature"
+    type_info:
+      label: "Control method (feature)"
+      summary: "A batch integration feature control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ feature space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_feature/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature/celltype_feature"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature/main.nf b/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature/main.nf
new file mode 100644
index 0000000000..425637c1c6
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature/main.nf
@@ -0,0 +1,3617 @@
+// celltype_feature 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "celltype_feature",
+    "namespace" : "batch_integration/control_methods/random_integration",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "feature",
+          "label" : "Integrated Feature",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "corrected_counts",
+                "description" : "Corrected counts after integration",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_feature/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "../../utils.py",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_feature/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Random feature by cell type",
+      "summary" : "Features are randomized within celltype labels",
+      "description" : "Features are randomized within celltype labels",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/_common/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "control_method",
+      "subtype" : "feature",
+      "type_info" : {
+        "label" : "Control method (feature)",
+        "summary" : "A batch integration feature control method.",
+        "description" : "A batch integration control method which outputs a batch-corrected feature space.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_feature/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_features
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+print("Process data...", flush=True)
+adata.layers['corrected_counts'] = _randomize_features(
+    adata.X,
+    partition=adata.obs["label"]
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_feature",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature/nextflow.config b/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature/nextflow.config
new file mode 100644
index 0000000000..e858fa11e2
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/control_methods/random_integration/celltype_feature'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature/read_anndata_partial.py b/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature/utils.py b/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph/.config.vsh.yaml b/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph/.config.vsh.yaml
new file mode 100644
index 0000000000..ccf17a0b45
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph/.config.vsh.yaml
@@ -0,0 +1,216 @@
+functionality:
+  name: "celltype_graph"
+  namespace: "batch_integration/control_methods/random_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "graph"
+      label: "Integrated Graph"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        obsp:
+        - type: "double"
+          name: "connectivities"
+          description: "Neighbors connectivities matrix."
+          required: true
+        - type: "double"
+          name: "distances"
+          description: "Neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "object"
+          name: "neighbors"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Random graph by cell type"
+    summary: "Graph connectivities are randomized within celltype labels"
+    description: "Graph connectivities are randomized within celltype labels"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "graph"
+    type_info:
+      label: "Control method (graph)"
+      summary: "A batch integration graph control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ cell graphs.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_graph/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph/celltype_graph"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph/main.nf b/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph/main.nf
new file mode 100644
index 0000000000..f3929ba2ef
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph/main.nf
@@ -0,0 +1,3628 @@
+// celltype_graph 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "celltype_graph",
+    "namespace" : "batch_integration/control_methods/random_integration",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "graph",
+          "label" : "Integrated Graph",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "connectivities",
+                "description" : "Neighbors connectivities matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "distances",
+                "description" : "Neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "neighbors",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_graph/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "../../utils.py",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_graph/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Random graph by cell type",
+      "summary" : "Graph connectivities are randomized within celltype labels",
+      "description" : "Graph connectivities are randomized within celltype labels",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/_common/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "control_method",
+      "subtype" : "graph",
+      "type_info" : {
+        "label" : "Control method (graph)",
+        "summary" : "A batch integration graph control method.",
+        "description" : "A batch integration control method which outputs a batch-corrected cell graphs.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_graph/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_graph
+from read_anndata_partial import read_anndata
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsp='obsp',
+    uns='uns'
+)
+
+print("Process data...", flush=True)
+adata = _randomize_graph(
+    adata,
+    neighbors_key="knn",
+    partition=adata.obs["label"],
+)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/control_methods/random_integration/celltype_graph",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph/nextflow.config b/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph/nextflow.config
new file mode 100644
index 0000000000..8900757037
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/control_methods/random_integration/celltype_graph'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph/read_anndata_partial.py b/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph/utils.py b/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/global_embed/.config.vsh.yaml b/target/nextflow/batch_integration/control_methods/random_integration/global_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..85693aaafd
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/global_embed/.config.vsh.yaml
@@ -0,0 +1,208 @@
+functionality:
+  name: "global_embed"
+  namespace: "batch_integration/control_methods/random_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Random integration"
+    summary: "Embedding coordinates are randomly permuted"
+    description: "Embedding coordinates are randomly permuted"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "embedding"
+    type_info:
+      label: "Control method (embedding)"
+      summary: "A batch integration embedding control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_embed/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/global_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/global_embed/global_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/global_embed/main.nf b/target/nextflow/batch_integration/control_methods/random_integration/global_embed/main.nf
new file mode 100644
index 0000000000..fb740dad7d
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/global_embed/main.nf
@@ -0,0 +1,3612 @@
+// global_embed 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "global_embed",
+    "namespace" : "batch_integration/control_methods/random_integration",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_embed/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "../../utils.py",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_embed/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Random integration",
+      "summary" : "Embedding coordinates are randomly permuted",
+      "description" : "Embedding coordinates are randomly permuted",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/_common/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "control_method",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Control method (embedding)",
+        "summary" : "A batch integration embedding control method.",
+        "description" : "A batch integration control method which outputs a batch-corrected embedding.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_embed/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/global_embed",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_features
+from read_anndata_partial import read_anndata
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+print("process dataset", flush=True)
+adata.obsm["X_emb"] = _randomize_features(adata.obsm["X_pca"])
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_embed",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/global_embed/nextflow.config b/target/nextflow/batch_integration/control_methods/random_integration/global_embed/nextflow.config
new file mode 100644
index 0000000000..dbfc233217
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/global_embed/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/control_methods/random_integration/global_embed'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/global_embed/read_anndata_partial.py b/target/nextflow/batch_integration/control_methods/random_integration/global_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/global_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/global_embed/utils.py b/target/nextflow/batch_integration/control_methods/random_integration/global_embed/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/global_embed/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/global_feature/.config.vsh.yaml b/target/nextflow/batch_integration/control_methods/random_integration/global_feature/.config.vsh.yaml
new file mode 100644
index 0000000000..fc9394c77f
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/global_feature/.config.vsh.yaml
@@ -0,0 +1,208 @@
+functionality:
+  name: "global_feature"
+  namespace: "batch_integration/control_methods/random_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Random integration"
+    summary: "Feature values are randomly permuted"
+    description: "Feature values are randomly permuted"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "acf5c95a7306b819c4a13972783433d0a48f769b"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "feature"
+    type_info:
+      label: "Control method (feature)"
+      summary: "A batch integration feature control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ feature space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_feature/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/global_feature"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/global_feature/global_feature"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/global_feature/main.nf b/target/nextflow/batch_integration/control_methods/random_integration/global_feature/main.nf
new file mode 100644
index 0000000000..c462b54213
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/global_feature/main.nf
@@ -0,0 +1,3613 @@
+// global_feature 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "global_feature",
+    "namespace" : "batch_integration/control_methods/random_integration",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "feature",
+          "label" : "Integrated Feature",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "corrected_counts",
+                "description" : "Corrected counts after integration",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_feature/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "../../utils.py",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_feature/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Random integration",
+      "summary" : "Feature values are randomly permuted",
+      "description" : "Feature values are randomly permuted",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/_common/methods/baseline.py",
+        "commit" : "acf5c95a7306b819c4a13972783433d0a48f769b"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "control_method",
+      "subtype" : "feature",
+      "type_info" : {
+        "label" : "Control method (feature)",
+        "summary" : "A batch integration feature control method.",
+        "description" : "A batch integration control method which outputs a batch-corrected feature space.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_feature/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/global_feature",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import sys
+
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_features
+from read_anndata_partial import read_anndata
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+adata.layers['corrected_counts'] = _randomize_features(adata.X)
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_feature",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/global_feature/nextflow.config b/target/nextflow/batch_integration/control_methods/random_integration/global_feature/nextflow.config
new file mode 100644
index 0000000000..bf7b306c5a
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/global_feature/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/control_methods/random_integration/global_feature'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/global_feature/read_anndata_partial.py b/target/nextflow/batch_integration/control_methods/random_integration/global_feature/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/global_feature/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/global_feature/utils.py b/target/nextflow/batch_integration/control_methods/random_integration/global_feature/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/global_feature/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/global_graph/.config.vsh.yaml b/target/nextflow/batch_integration/control_methods/random_integration/global_graph/.config.vsh.yaml
new file mode 100644
index 0000000000..9a8eec599d
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/global_graph/.config.vsh.yaml
@@ -0,0 +1,216 @@
+functionality:
+  name: "global_graph"
+  namespace: "batch_integration/control_methods/random_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "graph"
+      label: "Integrated Graph"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        obsp:
+        - type: "double"
+          name: "connectivities"
+          description: "Neighbors connectivities matrix."
+          required: true
+        - type: "double"
+          name: "distances"
+          description: "Neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "object"
+          name: "neighbors"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  - type: "file"
+    path: "../../utils.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Random integration"
+    summary: "Graph connectivity values are randomly permuted"
+    description: "Graph connectivity values are randomly permuted"
+    v1:
+      path: "openproblems/tasks/_batch_integration/_common/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "control_method"
+    subtype: "graph"
+    type_info:
+      label: "Control method (graph)"
+      summary: "A batch integration graph control method."
+      description: "A batch integration control method which outputs a batch-corrected\
+        \ cell graphs.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_graph/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/global_graph"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/global_graph/global_graph"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/global_graph/main.nf b/target/nextflow/batch_integration/control_methods/random_integration/global_graph/main.nf
new file mode 100644
index 0000000000..cf0b0274a3
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/global_graph/main.nf
@@ -0,0 +1,3625 @@
+// global_graph 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "global_graph",
+    "namespace" : "batch_integration/control_methods/random_integration",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "graph",
+          "label" : "Integrated Graph",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "connectivities",
+                "description" : "Neighbors connectivities matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "distances",
+                "description" : "Neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "neighbors",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_graph/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "../../utils.py",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_graph/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Random integration",
+      "summary" : "Graph connectivity values are randomly permuted",
+      "description" : "Graph connectivity values are randomly permuted",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/_common/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "control_method",
+      "subtype" : "graph",
+      "type_info" : {
+        "label" : "Control method (graph)",
+        "summary" : "A batch integration graph control method.",
+        "description" : "A batch integration control method which outputs a batch-corrected cell graphs.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_graph/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/global_graph",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import sys
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from utils import _randomize_graph
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsp='obsp',
+    uns='uns'
+)
+
+print('Randomize graph...', flush=True)
+adata = _randomize_graph(adata, neighbors_key="knn")
+
+print("Store outputs", flush=True)
+adata.uns['method_id'] = meta['functionality_name']
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/control_methods/random_integration/global_graph",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/global_graph/nextflow.config b/target/nextflow/batch_integration/control_methods/random_integration/global_graph/nextflow.config
new file mode 100644
index 0000000000..b0b615d642
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/global_graph/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/control_methods/random_integration/global_graph'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/global_graph/read_anndata_partial.py b/target/nextflow/batch_integration/control_methods/random_integration/global_graph/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/global_graph/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/control_methods/random_integration/global_graph/utils.py b/target/nextflow/batch_integration/control_methods/random_integration/global_graph/utils.py
new file mode 100644
index 0000000000..954e24af26
--- /dev/null
+++ b/target/nextflow/batch_integration/control_methods/random_integration/global_graph/utils.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+
+def _set_uns(adata, neighbors_key):
+    adata.uns["neighbors"] = adata.uns[neighbors_key]
+    adata.uns["neighbors"]["connectivities_key"] = "connectivities"
+    adata.uns["neighbors"]["distances_key"] = "distances"
+
+
+def _randomize_features(X, partition=None):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L13
+    """
+    X_out = X.copy()
+    if partition is None:
+        partition = np.full(X.shape[0], 0)
+    else:
+        partition = np.asarray(partition)
+    for partition_name in np.unique(partition):
+        partition_idx = np.argwhere(partition == partition_name).flatten()
+        X_out[partition_idx] = X[np.random.permutation(partition_idx)]
+    return X_out
+
+
+def _randomize_graph(adata, partition=None, neighbors_key="neighbors"):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L25
+    """
+    knn_map = adata.uns[neighbors_key]
+    distances, connectivities = (
+        adata.obsp[knn_map["distances_key"]],
+        adata.obsp[knn_map["connectivities_key"]],
+    )
+    new_idx = _randomize_features(np.arange(distances.shape[0]), partition=partition)
+    adata.obsp["distances"] = distances[new_idx][:, new_idx]
+    adata.obsp["connectivities"] = connectivities[new_idx][:, new_idx]
+    _set_uns(adata, neighbors_key)
+    return adata
+
+
+def _perfect_embedding(partition, jitter=0.01):
+    """
+    Taken and adapted from opsca-v1:
+    https://github.com/openproblems-bio/openproblems/blob/acf5c95a7306b819c4a13972783433d0a48f769b/openproblems/tasks/_batch_integration/_common/methods/baseline.py#L37
+    """
+    from sklearn.preprocessing import LabelEncoder
+    from sklearn.preprocessing import OneHotEncoder
+
+    embedding = OneHotEncoder().fit_transform(
+        LabelEncoder().fit_transform(partition)[:, None]
+    )
+    if jitter is not None:
+        embedding = embedding + np.random.uniform(-1 * jitter, jitter, embedding.shape)
+    return np.asarray(embedding)
diff --git a/target/nextflow/batch_integration/methods/bbknn/.config.vsh.yaml b/target/nextflow/batch_integration/methods/bbknn/.config.vsh.yaml
new file mode 100644
index 0000000000..fd89b02369
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/bbknn/.config.vsh.yaml
@@ -0,0 +1,267 @@
+functionality:
+  name: "bbknn"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "graph"
+      label: "Integrated Graph"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        obsp:
+        - type: "double"
+          name: "connectivities"
+          description: "Neighbors connectivities matrix."
+          required: true
+        - type: "double"
+          name: "distances"
+          description: "Neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "object"
+          name: "neighbors"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--annoy_n_trees"
+    description: "Number of trees to use in the annoy forrest."
+    info: null
+    default:
+    - 10
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--neighbors_within_batch"
+    description: "Number of neighbors to report within each batch."
+    info: null
+    default:
+    - 3
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to use."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "BBKNN"
+    summary: "BBKNN creates k nearest neighbours graph by identifying neighbours within\
+      \ batches, then combining and processing them with UMAP for visualization."
+    description: "\"BBKNN or batch balanced k nearest neighbours graph is built for\
+      \ each cell by\nidentifying its k nearest neighbours within each defined batch\
+      \ separately,\ncreating independent neighbour sets for each cell in each batch.\
+      \ These sets\nare then combined and processed with the UMAP algorithm for visualisation.\"\
+      \n"
+    reference: "polanski2020bbknn"
+    repository_url: "https://github.com/Teichlab/bbknn"
+    documentation_url: "https://github.com/Teichlab/bbknn#readme"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/bbknn.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      bbknn_full_unscaled: null
+      bbknn_full_scaled:
+        preferred_normalization: "log_cp10k_scaled"
+    type: "method"
+    subtype: "graph"
+    type_info:
+      label: "Method (graph)"
+      summary: "A batch integration graph method."
+      description: "A batch integration method which outputs a batch-corrected cell\
+        \ graphs.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "bbknn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/bbknn/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/bbknn"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/bbknn/bbknn"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/methods/bbknn/main.nf b/target/nextflow/batch_integration/methods/bbknn/main.nf
new file mode 100644
index 0000000000..6a86b884f6
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/bbknn/main.nf
@@ -0,0 +1,3711 @@
+// bbknn 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "bbknn",
+    "namespace" : "batch_integration/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "graph",
+          "label" : "Integrated Graph",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "connectivities",
+                "description" : "Neighbors connectivities matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "distances",
+                "description" : "Neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "neighbors",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--annoy_n_trees",
+        "description" : "Number of trees to use in the annoy forrest.",
+        "default" : [
+          10
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--neighbors_within_batch",
+        "description" : "Number of neighbors to report within each batch.",
+        "default" : [
+          3
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hvg",
+        "description" : "Number of highly variable genes to use.",
+        "default" : [
+          2000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/bbknn/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "BBKNN",
+      "summary" : "BBKNN creates k nearest neighbours graph by identifying neighbours within batches, then combining and processing them with UMAP for visualization.",
+      "description" : "\\"BBKNN or batch balanced k nearest neighbours graph is built for each cell by\nidentifying its k nearest neighbours within each defined batch separately,\ncreating independent neighbour sets for each cell in each batch. These sets\nare then combined and processed with the UMAP algorithm for visualisation.\\"\n",
+      "reference" : "polanski2020bbknn",
+      "repository_url" : "https://github.com/Teichlab/bbknn",
+      "documentation_url" : "https://github.com/Teichlab/bbknn#readme",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/batch_integration_graph/methods/bbknn.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "variants" : {
+        "bbknn_full_scaled" : {
+          "preferred_normalization" : "log_cp10k_scaled"
+        }
+      },
+      "type" : "method",
+      "subtype" : "graph",
+      "type_info" : {
+        "label" : "Method (graph)",
+        "summary" : "A batch integration graph method.",
+        "description" : "A batch integration method which outputs a batch-corrected cell graphs.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "bbknn"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/bbknn/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/bbknn",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+import scanpy as sc
+import bbknn
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'annoy_n_trees': $( if [ ! -z ${VIASH_PAR_ANNOY_N_TREES+x} ]; then echo "int(r'${VIASH_PAR_ANNOY_N_TREES//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'neighbors_within_batch': $( if [ ! -z ${VIASH_PAR_NEIGHBORS_WITHIN_BATCH+x} ]; then echo "int(r'${VIASH_PAR_NEIGHBORS_WITHIN_BATCH//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+if par['n_hvg']:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var['hvg_score'].to_numpy().argsort()[::-1][:par['n_hvg']]
+    adata = adata[:, idx].copy()
+    sc.pp.pca(adata)
+
+print('Run BBKNN', flush=True)
+kwargs = dict(batch_key='batch', copy=True)
+kwargs['annoy_n_trees'] = par['annoy_n_trees']
+kwargs['neighbors_within_batch'] = par['neighbors_within_batch']
+
+ad_bbknn = bbknn.bbknn(adata, **kwargs)
+
+print("Store output", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    obsp={
+        'connectivities': ad_bbknn.obsp['connectivities'],
+        'distances': ad_bbknn.obsp['distances'],
+    },
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+        'neighbors': ad_bbknn.uns['neighbors']
+    }
+)
+
+print("Store outputs", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/methods/bbknn",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/methods/bbknn/nextflow.config b/target/nextflow/batch_integration/methods/bbknn/nextflow.config
new file mode 100644
index 0000000000..84797d5f91
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/bbknn/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/methods/bbknn'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/methods/bbknn/read_anndata_partial.py b/target/nextflow/batch_integration/methods/bbknn/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/bbknn/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/methods/combat/.config.vsh.yaml b/target/nextflow/batch_integration/methods/combat/.config.vsh.yaml
new file mode 100644
index 0000000000..d02dbcb5b3
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/combat/.config.vsh.yaml
@@ -0,0 +1,231 @@
+functionality:
+  name: "combat"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to use."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Combat"
+    summary: "Adjusting batch effects in microarray expression data using empirical\
+      \ Bayes methods"
+    description: "\"An Empirical Bayes (EB) approach to correct for batch effects.\
+      \ It\nestimates batch-specific parameters by pooling information across genes\
+      \ in\neach batch and shrinks the estimates towards the overall mean of the batch\n\
+      effect estimates across all genes. These parameters are then used to adjust\n\
+      the data for batch effects, leading to more accurate and reproducible\nresults.\"\
+      \n"
+    reference: "hansen2012removing"
+    repository_url: "https://scanpy.readthedocs.io/en/stable/api/scanpy.pp.combat.html"
+    documentation_url: "https://scanpy.readthedocs.io/en/stable/api/scanpy.pp.combat.html"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/combat.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      combat_full_unscaled: null
+      combat_full_scaled:
+        preferred_normalization: "log_cp10k_scaled"
+    type: "method"
+    subtype: "feature"
+    type_info:
+      label: "Method (feature)"
+      summary: "A batch integration feature method."
+      description: "A batch integration method which outputs a batch-corrected feature-space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/combat/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/combat"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/combat/combat"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/methods/combat/main.nf b/target/nextflow/batch_integration/methods/combat/main.nf
new file mode 100644
index 0000000000..acf3dfa6c2
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/combat/main.nf
@@ -0,0 +1,3655 @@
+// combat 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "combat",
+    "namespace" : "batch_integration/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "feature",
+          "label" : "Integrated Feature",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "corrected_counts",
+                "description" : "Corrected counts after integration",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hvg",
+        "description" : "Number of highly variable genes to use.",
+        "default" : [
+          2000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/combat/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Combat",
+      "summary" : "Adjusting batch effects in microarray expression data using empirical Bayes methods",
+      "description" : "\\"An Empirical Bayes (EB) approach to correct for batch effects. It\nestimates batch-specific parameters by pooling information across genes in\neach batch and shrinks the estimates towards the overall mean of the batch\neffect estimates across all genes. These parameters are then used to adjust\nthe data for batch effects, leading to more accurate and reproducible\nresults.\\"\n",
+      "reference" : "hansen2012removing",
+      "repository_url" : "https://scanpy.readthedocs.io/en/stable/api/scanpy.pp.combat.html",
+      "documentation_url" : "https://scanpy.readthedocs.io/en/stable/api/scanpy.pp.combat.html",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/batch_integration_graph/methods/combat.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "variants" : {
+        "combat_full_scaled" : {
+          "preferred_normalization" : "log_cp10k_scaled"
+        }
+      },
+      "type" : "method",
+      "subtype" : "feature",
+      "type_info" : {
+        "label" : "Method (feature)",
+        "summary" : "A batch integration feature method.",
+        "description" : "A batch integration method which outputs a batch-corrected feature-space.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/combat/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/combat",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import scanpy as sc
+from scipy.sparse import csr_matrix
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+if par['n_hvg']:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var['hvg_score'].to_numpy().argsort()[::-1][:par['n_hvg']]
+    adata = adata[:, idx].copy()
+
+
+print('Run Combat', flush=True)
+adata.X = sc.pp.combat(adata, key='batch', inplace=False)
+
+
+print("Store output", flush=True)
+output = sc.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+    },
+    layers={
+        'corrected_counts': csr_matrix(adata.X),
+    }
+)
+
+print("Store outputs", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/methods/combat",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/methods/combat/nextflow.config b/target/nextflow/batch_integration/methods/combat/nextflow.config
new file mode 100644
index 0000000000..68f0f6c884
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/combat/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/methods/combat'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/methods/combat/read_anndata_partial.py b/target/nextflow/batch_integration/methods/combat/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/combat/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/methods/fastmnn_embedding/.config.vsh.yaml b/target/nextflow/batch_integration/methods/fastmnn_embedding/.config.vsh.yaml
new file mode 100644
index 0000000000..417ddd0e6c
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/fastmnn_embedding/.config.vsh.yaml
@@ -0,0 +1,220 @@
+functionality:
+  name: "fastmnn_embedding"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "../fastmnn_feature/script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "fastMnn (embedding)"
+    summary: "A simpler version of the original mnnCorrect algorithm."
+    description: "The fastMNN() approach is much simpler than the original mnnCorrect()\
+      \ algorithm, and proceeds in several steps.\n\n1. Perform a multi-sample PCA\
+      \ on the (cosine-)normalized expression values to reduce dimensionality.\n2.\
+      \ Identify MNN pairs in the low-dimensional space between a reference batch\
+      \ and a target batch.\n3. Remove variation along the average batch vector in\
+      \ both reference and target batches.\n4. Correct the cells in the target batch\
+      \ towards the reference, using locally weighted correction vectors.\n5. Merge\
+      \ the corrected target batch with the reference, and repeat with the next target\
+      \ batch.\n"
+    reference: "haghverdi2018batch"
+    repository_url: "https://code.bioconductor.org/browse/batchelor/"
+    documentation_url: "https://bioconductor.org/packages/batchelor/"
+    preferred_normalization: "log_cp10k"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/fastmnn.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "method"
+    subtype: "embedding"
+    type_info:
+      label: "Method (embedding)"
+      summary: "A batch integration embedding method."
+      description: "A batch integration method which outputs a batch-corrected embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    bioc:
+    - "batchelor"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowcpu"
+    - "highmem"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/fastmnn_embedding/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/fastmnn_embedding"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/fastmnn_embedding/fastmnn_embedding"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/methods/fastmnn_embedding/main.nf b/target/nextflow/batch_integration/methods/fastmnn_embedding/main.nf
new file mode 100644
index 0000000000..e685ba1f72
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/fastmnn_embedding/main.nf
@@ -0,0 +1,3645 @@
+// fastmnn_embedding 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "fastmnn_embedding",
+    "namespace" : "batch_integration/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "../fastmnn_feature/script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/fastmnn_embedding/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "fastMnn (embedding)",
+      "summary" : "A simpler version of the original mnnCorrect algorithm.",
+      "description" : "The fastMNN() approach is much simpler than the original mnnCorrect() algorithm, and proceeds in several steps.\n\n1. Perform a multi-sample PCA on the (cosine-)normalized expression values to reduce dimensionality.\n2. Identify MNN pairs in the low-dimensional space between a reference batch and a target batch.\n3. Remove variation along the average batch vector in both reference and target batches.\n4. Correct the cells in the target batch towards the reference, using locally weighted correction vectors.\n5. Merge the corrected target batch with the reference, and repeat with the next target batch.\n",
+      "reference" : "haghverdi2018batch",
+      "repository_url" : "https://code.bioconductor.org/browse/batchelor/",
+      "documentation_url" : "https://bioconductor.org/packages/batchelor/",
+      "preferred_normalization" : "log_cp10k",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/batch_integration_graph/methods/fastmnn.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "type" : "method",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Method (embedding)",
+        "summary" : "A batch integration embedding method.",
+        "description" : "A batch integration method which outputs a batch-corrected embedding.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "bioc" : [
+            "batchelor"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowcpu",
+          "highmem"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/fastmnn_embedding/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/fastmnn_embedding",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+cat("Loading dependencies\\\\n")
+suppressPackageStartupMessages({
+  requireNamespace("anndata", quietly = TRUE)
+  library(Matrix, warn.conflicts = FALSE)
+  requireNamespace("batchelor", quietly = TRUE)
+  library(SingleCellExperiment, warn.conflicts = FALSE)
+})
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Read input\\\\n")
+adata <- anndata::read_h5ad(par\\$input)
+
+# TODO: pass output of 'multiBatchNorm' to fastMNN
+
+cat("Run mnn\\\\n")
+out <- suppressWarnings(batchelor::fastMNN(
+  t(adata\\$layers[["normalized"]]),
+  batch = adata\\$obs[["batch"]]
+))
+
+cat("Reformat output\\\\n")
+# reusing the same script for fastmnn_embed and fastmnn_feature
+return_type <- gsub("fastmnn_", "", meta[["functionality_name"]])
+
+output <- anndata::AnnData(
+  shape = adata\\$shape,
+  uns = list(
+    dataset_id = adata\\$uns[["dataset_id"]],
+    normalization_id = adata\\$uns[["normalization_id"]],
+    method_id = meta\\$functionality_name
+  )
+)
+
+if (return_type == "feature") {
+  layer <- as(SummarizedExperiment::assay(out, "reconstructed"), "sparseMatrix")
+  output\\$layers[["corrected_counts"]] <- t(layer)
+} else if (return_type == "embedding") {
+  obsm <- SingleCellExperiment::reducedDim(out, "corrected")
+  output\\$obsm[["X_emb"]] <- obsm
+}
+
+cat("Write output to file\\\\n")
+zzz <- output\\$write_h5ad(par\\$output, compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/methods/fastmnn_embedding",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowcpu",
+    "highmem"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/methods/fastmnn_embedding/nextflow.config b/target/nextflow/batch_integration/methods/fastmnn_embedding/nextflow.config
new file mode 100644
index 0000000000..fd24418ff5
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/fastmnn_embedding/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/methods/fastmnn_embedding'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/methods/fastmnn_feature/.config.vsh.yaml b/target/nextflow/batch_integration/methods/fastmnn_feature/.config.vsh.yaml
new file mode 100644
index 0000000000..adf2cd2e49
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/fastmnn_feature/.config.vsh.yaml
@@ -0,0 +1,220 @@
+functionality:
+  name: "fastmnn_feature"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "fastMnn (feature)"
+    summary: "A simpler version of the original mnnCorrect algorithm."
+    description: "The fastMNN() approach is much simpler than the original mnnCorrect()\
+      \ algorithm, and proceeds in several steps.\n\n1. Perform a multi-sample PCA\
+      \ on the (cosine-)normalized expression values to reduce dimensionality.\n2.\
+      \ Identify MNN pairs in the low-dimensional space between a reference batch\
+      \ and a target batch.\n3. Remove variation along the average batch vector in\
+      \ both reference and target batches.\n4. Correct the cells in the target batch\
+      \ towards the reference, using locally weighted correction vectors.\n5. Merge\
+      \ the corrected target batch with the reference, and repeat with the next target\
+      \ batch.\n"
+    reference: "haghverdi2018batch"
+    repository_url: "https://code.bioconductor.org/browse/batchelor/"
+    documentation_url: "https://bioconductor.org/packages/batchelor/"
+    preferred_normalization: "log_cp10k"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/fastmnn.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "method"
+    subtype: "feature"
+    type_info:
+      label: "Method (feature)"
+      summary: "A batch integration feature method."
+      description: "A batch integration method which outputs a batch-corrected feature-space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    bioc:
+    - "batchelor"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowcpu"
+    - "highmem"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/fastmnn_feature/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/fastmnn_feature"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/fastmnn_feature/fastmnn_feature"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/methods/fastmnn_feature/main.nf b/target/nextflow/batch_integration/methods/fastmnn_feature/main.nf
new file mode 100644
index 0000000000..4931ea075c
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/fastmnn_feature/main.nf
@@ -0,0 +1,3645 @@
+// fastmnn_feature 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "fastmnn_feature",
+    "namespace" : "batch_integration/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "feature",
+          "label" : "Integrated Feature",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "corrected_counts",
+                "description" : "Corrected counts after integration",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/fastmnn_feature/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "fastMnn (feature)",
+      "summary" : "A simpler version of the original mnnCorrect algorithm.",
+      "description" : "The fastMNN() approach is much simpler than the original mnnCorrect() algorithm, and proceeds in several steps.\n\n1. Perform a multi-sample PCA on the (cosine-)normalized expression values to reduce dimensionality.\n2. Identify MNN pairs in the low-dimensional space between a reference batch and a target batch.\n3. Remove variation along the average batch vector in both reference and target batches.\n4. Correct the cells in the target batch towards the reference, using locally weighted correction vectors.\n5. Merge the corrected target batch with the reference, and repeat with the next target batch.\n",
+      "reference" : "haghverdi2018batch",
+      "repository_url" : "https://code.bioconductor.org/browse/batchelor/",
+      "documentation_url" : "https://bioconductor.org/packages/batchelor/",
+      "preferred_normalization" : "log_cp10k",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/batch_integration_graph/methods/fastmnn.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "type" : "method",
+      "subtype" : "feature",
+      "type_info" : {
+        "label" : "Method (feature)",
+        "summary" : "A batch integration feature method.",
+        "description" : "A batch integration method which outputs a batch-corrected feature-space.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "bioc" : [
+            "batchelor"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowcpu",
+          "highmem"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/fastmnn_feature/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/fastmnn_feature",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+cat("Loading dependencies\\\\n")
+suppressPackageStartupMessages({
+  requireNamespace("anndata", quietly = TRUE)
+  library(Matrix, warn.conflicts = FALSE)
+  requireNamespace("batchelor", quietly = TRUE)
+  library(SingleCellExperiment, warn.conflicts = FALSE)
+})
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Read input\\\\n")
+adata <- anndata::read_h5ad(par\\$input)
+
+# TODO: pass output of 'multiBatchNorm' to fastMNN
+
+cat("Run mnn\\\\n")
+out <- suppressWarnings(batchelor::fastMNN(
+  t(adata\\$layers[["normalized"]]),
+  batch = adata\\$obs[["batch"]]
+))
+
+cat("Reformat output\\\\n")
+# reusing the same script for fastmnn_embed and fastmnn_feature
+return_type <- gsub("fastmnn_", "", meta[["functionality_name"]])
+
+output <- anndata::AnnData(
+  shape = adata\\$shape,
+  uns = list(
+    dataset_id = adata\\$uns[["dataset_id"]],
+    normalization_id = adata\\$uns[["normalization_id"]],
+    method_id = meta\\$functionality_name
+  )
+)
+
+if (return_type == "feature") {
+  layer <- as(SummarizedExperiment::assay(out, "reconstructed"), "sparseMatrix")
+  output\\$layers[["corrected_counts"]] <- t(layer)
+} else if (return_type == "embedding") {
+  obsm <- SingleCellExperiment::reducedDim(out, "corrected")
+  output\\$obsm[["X_emb"]] <- obsm
+}
+
+cat("Write output to file\\\\n")
+zzz <- output\\$write_h5ad(par\\$output, compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/methods/fastmnn_feature",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowcpu",
+    "highmem"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/methods/fastmnn_feature/nextflow.config b/target/nextflow/batch_integration/methods/fastmnn_feature/nextflow.config
new file mode 100644
index 0000000000..7afe553fd6
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/fastmnn_feature/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/methods/fastmnn_feature'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/methods/liger/.config.vsh.yaml b/target/nextflow/batch_integration/methods/liger/.config.vsh.yaml
new file mode 100644
index 0000000000..22114a5d87
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/liger/.config.vsh.yaml
@@ -0,0 +1,218 @@
+functionality:
+  name: "liger"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "LIGER"
+    summary: "Linked Inference of Genomic Experimental Relationships"
+    description: "LIGER or linked inference of genomic experimental relationships\
+      \ uses iNMF \nderiving and implementing a novel coordinate descent algorithm\
+      \ to efficiently \ndo the factorization. Joint clustering is performed and factor\
+      \ loadings are \nnormalised.\n"
+    reference: "welch2019single"
+    repository_url: "https://github.com/welch-lab/liger"
+    documentation_url: "https://github.com/welch-lab/liger"
+    preferred_normalization: "log_cp10k"
+    type: "method"
+    subtype: "embedding"
+    type_info:
+      label: "Method (embedding)"
+      summary: "A batch integration embedding method."
+      description: "A batch integration method which outputs a batch-corrected embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "cmake"
+    interactive: false
+  - type: "r"
+    cran:
+    - "rliger"
+    github:
+    - "welch-lab/RcppPlanc"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowcpu"
+    - "highmem"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/liger/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/liger"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/liger/liger"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/methods/liger/main.nf b/target/nextflow/batch_integration/methods/liger/main.nf
new file mode 100644
index 0000000000..4fe5aca93a
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/liger/main.nf
@@ -0,0 +1,3708 @@
+// liger 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "liger",
+    "namespace" : "batch_integration/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/liger/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "LIGER",
+      "summary" : "Linked Inference of Genomic Experimental Relationships",
+      "description" : "LIGER or linked inference of genomic experimental relationships uses iNMF \nderiving and implementing a novel coordinate descent algorithm to efficiently \ndo the factorization. Joint clustering is performed and factor loadings are \nnormalised.\n",
+      "reference" : "welch2019single",
+      "repository_url" : "https://github.com/welch-lab/liger",
+      "documentation_url" : "https://github.com/welch-lab/liger",
+      "preferred_normalization" : "log_cp10k",
+      "type" : "method",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Method (embedding)",
+        "summary" : "A batch integration embedding method.",
+        "description" : "A batch integration method which outputs a batch-corrected embedding.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "apt",
+          "packages" : [
+            "cmake"
+          ],
+          "interactive" : false
+        },
+        {
+          "type" : "r",
+          "cran" : [
+            "rliger"
+          ],
+          "github" : [
+            "welch-lab/RcppPlanc"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "lowcpu",
+          "highmem",
+          "midtime"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/liger/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/liger",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+cat(">> Load dependencies\\\\n")
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("rliger", quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Read input\\\\n")
+adata <- anndata::read_h5ad(par\\$input)
+
+anndataToLiger <- function(adata) {
+  # fetch batch names
+  batch <- adata\\$obs\\$batch
+  batch_names <- as.character(unique(batch))
+
+  # restructure data
+  raw_data <- lapply(batch_names, function(batch_name) {
+    Matrix::t(adata\\$layers[["counts"]][batch == batch_name, , drop = FALSE])
+  })
+  names(raw_data) <- batch_names
+
+  rliger::createLiger(rawData = raw_data, removeMissing = FALSE)
+}
+
+addNormalizedDataToLiger <- function(adata, lobj) {
+  norm_data <- lapply(names(rliger::rawData(lobj)), function(name) {
+    norm <- adata\\$layers[["normalized"]]
+
+    # subset
+    col_names <- colnames(rliger::rawData(lobj)[[name]])
+    row_names <- rownames(rliger::rawData(lobj)[[name]])
+    prefix <- paste0(name, "_")
+    col_names <- sub(prefix, "", col_names)
+
+    norm <- norm[
+      col_names,
+      row_names,
+      drop = FALSE
+    ]
+
+    # add prefix
+    rownames(norm) <- paste0(prefix, rownames(norm))
+
+    # transpose
+    norm <- Matrix::t(norm)
+
+    # turn into dgcMatrix
+    as(as(norm, "denseMatrix"), "CsparseMatrix")
+  })
+  names(norm_data) <- names(rliger::rawData(lobj))
+
+  for (name in names(rliger::rawData(lobj))) {
+    lobj@datasets[[name]]@normData <- norm_data[[name]]
+  }
+
+  lobj
+}
+
+cat(">> Create Liger Data object\\\\n")
+lobj <- anndataToLiger(adata)
+
+cat(">> Normalize data\\\\n")
+lobj <- addNormalizedDataToLiger(adata, lobj)
+
+# could also use the rliger normalization instead
+# lobj <- rliger::normalize(lobj)
+
+cat(">> Select genes\\\\n")
+# lobj <- rliger::selectGenes(lobj)
+# overwrite gene selection to include all genes
+lobj@varFeatures <- adata\\$var_names
+
+cat(">> Perform scaling\\\\n")
+lobj <- rliger::scaleNotCenter(lobj, removeMissing = FALSE)
+
+cat(">> Joint Matrix Factorization\\\\n")
+lobj <- rliger::runIntegration(lobj, k = 20)
+
+cat(">> Quantile normalization\\\\n")
+lobj <- rliger::quantileNorm(lobj)
+
+cat(">> Store output\\\\n")
+# remove dataset names from rownames
+for (name in names(rliger::rawData(lobj))) {
+  rownames(lobj@H.norm) <- sub(paste0(name, "_"), "", rownames(lobj@H.norm))
+}
+
+output <- anndata::AnnData(
+  uns = list(
+    dataset_id = adata\\$uns[["dataset_id"]],
+    normalization_id = adata\\$uns[["normalization_id"]],
+    method_id = meta\\$functionality_name
+  ),
+  obsm = list(
+    X_emb = lobj@H.norm[rownames(adata), , drop = FALSE]
+  ),
+  shape = adata\\$shape
+)
+
+cat(">> Write AnnData to file\\\\n")
+zzz <- output\\$write_h5ad(par\\$output, compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/methods/liger",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "lowcpu",
+    "highmem",
+    "midtime"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/methods/liger/nextflow.config b/target/nextflow/batch_integration/methods/liger/nextflow.config
new file mode 100644
index 0000000000..c96fcd60ac
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/liger/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/methods/liger'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/methods/mnn_correct/.config.vsh.yaml b/target/nextflow/batch_integration/methods/mnn_correct/.config.vsh.yaml
new file mode 100644
index 0000000000..ec778a9666
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/mnn_correct/.config.vsh.yaml
@@ -0,0 +1,214 @@
+functionality:
+  name: "mnn_correct"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "mnnCorrect"
+    summary: "Correct for batch effects in single-cell expression data using the mutual\
+      \ nearest neighbors method."
+    description: "We present a strategy for batch correction based on the detection\
+      \ of mutual nearest neighbors (MNNs) in the high-dimensional expression space.\n\
+      Our approach does not rely on predefined or equal population compositions across\
+      \ batches; instead, it requires only that a subset of the population be shared\
+      \ between batches.\n"
+    reference: "haghverdi2018batch"
+    repository_url: "https://code.bioconductor.org/browse/batchelor/"
+    documentation_url: "https://bioconductor.org/packages/batchelor/"
+    preferred_normalization: "log_cp10k"
+    type: "method"
+    subtype: "feature"
+    type_info:
+      label: "Method (feature)"
+      summary: "A batch integration feature method."
+      description: "A batch integration method which outputs a batch-corrected feature-space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    bioc:
+    - "batchelor"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowcpu"
+    - "highmem"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/mnn_correct/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/mnn_correct"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/mnn_correct/mnn_correct"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/methods/mnn_correct/main.nf b/target/nextflow/batch_integration/methods/mnn_correct/main.nf
new file mode 100644
index 0000000000..e45d63e404
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/mnn_correct/main.nf
@@ -0,0 +1,3637 @@
+// mnn_correct 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "mnn_correct",
+    "namespace" : "batch_integration/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "feature",
+          "label" : "Integrated Feature",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "corrected_counts",
+                "description" : "Corrected counts after integration",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/mnn_correct/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "mnnCorrect",
+      "summary" : "Correct for batch effects in single-cell expression data using the mutual nearest neighbors method.",
+      "description" : "We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space.\nOur approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches.\n",
+      "reference" : "haghverdi2018batch",
+      "repository_url" : "https://code.bioconductor.org/browse/batchelor/",
+      "documentation_url" : "https://bioconductor.org/packages/batchelor/",
+      "preferred_normalization" : "log_cp10k",
+      "type" : "method",
+      "subtype" : "feature",
+      "type_info" : {
+        "label" : "Method (feature)",
+        "summary" : "A batch integration feature method.",
+        "description" : "A batch integration method which outputs a batch-corrected feature-space.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "bioc" : [
+            "batchelor"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowcpu",
+          "highmem"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/mnn_correct/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/mnn_correct",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+cat("Loading dependencies\\\\n")
+suppressPackageStartupMessages({
+  requireNamespace("anndata", quietly = TRUE)
+  library(Matrix, warn.conflicts = FALSE)
+  requireNamespace("batchelor", quietly = TRUE)
+  library(SingleCellExperiment, warn.conflicts = FALSE)
+})
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Read input\\\\n")
+adata <- anndata::read_h5ad(par\\$input)
+
+cat("Run mnn\\\\n")
+out <- suppressWarnings(batchelor::mnnCorrect(
+  t(adata\\$layers[["normalized"]]),
+  batch = adata\\$obs[["batch"]]
+))
+
+cat("Reformat output\\\\n")
+layer <- SummarizedExperiment::assay(out, "corrected")
+as(t(layer), "sparseMatrix")
+
+
+
+cat("Store outputs\\\\n")
+output <- anndata::AnnData(
+  uns = list(
+    dataset_id = adata\\$uns[["dataset_id"]],
+    normalization_id = adata\\$uns[["normalization_id"]],
+    method_id = meta\\$functionality_name
+  ),
+  layers = list(
+    corrected_counts = as(t(layer), "sparseMatrix")
+  ),
+  shape = adata\\$shape
+)
+
+cat("Write output to file\\\\n")
+zzz <- output\\$write_h5ad(par\\$output, compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/methods/mnn_correct",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowcpu",
+    "highmem"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/methods/mnn_correct/nextflow.config b/target/nextflow/batch_integration/methods/mnn_correct/nextflow.config
new file mode 100644
index 0000000000..f72fc464c6
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/mnn_correct/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/methods/mnn_correct'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/methods/mnnpy/.config.vsh.yaml b/target/nextflow/batch_integration/methods/mnnpy/.config.vsh.yaml
new file mode 100644
index 0000000000..415a628100
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/mnnpy/.config.vsh.yaml
@@ -0,0 +1,247 @@
+functionality:
+  name: "mnnpy"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to use."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "mnnpy"
+    summary: "Batch effect correction by matching mutual nearest neighbors, Python\
+      \ implementation."
+    description: "An implementation of MNN correct in python featuring low memory\
+      \ usage, full multicore support and compatibility with the scanpy framework.\n\
+      \nBatch effect correction by matching mutual nearest neighbors (Haghverdi et\
+      \ al, 2018) has been implemented as a function 'mnnCorrect' in the R package\
+      \ scran. Sadly it's extremely slow for big datasets and doesn't make full use\
+      \ of the parallel architecture of modern CPUs.\n\nThis project is a python implementation\
+      \ of the MNN correct algorithm which takes advantage of python's extendability\
+      \ and hackability. It seamlessly integrates with the scanpy framework and has\
+      \ multicore support in its bones.\n"
+    reference: "hie2019efficient"
+    repository_url: "https://github.com/chriscainx/mnnpy"
+    documentation_url: "https://github.com/chriscainx/mnnpy#readme"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/mnn.py"
+      commit: "29803b95c88b4ec5921df2eec7111fd5d1a95daf"
+    preferred_normalization: "log_cp10k"
+    variants:
+      mnn_full_unscaled: null
+      mnn_full_scaled:
+        preferred_normalization: "log_cp10k_scaled"
+    type: "method"
+    subtype: "feature"
+    type_info:
+      label: "Method (feature)"
+      summary: "A batch integration feature method."
+      description: "A batch integration method which outputs a batch-corrected feature-space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.8"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "procps"
+    interactive: false
+  - type: "python"
+    user: false
+    pypi:
+    - "anndata~=0.8.0"
+    - "scanpy"
+    - "pyyaml"
+    - "requests"
+    - "jsonschema"
+    github:
+    - "chriscainx/mnnpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowcpu"
+    - "lowmem"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/mnnpy/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/mnnpy"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/mnnpy/mnnpy"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/methods/mnnpy/main.nf b/target/nextflow/batch_integration/methods/mnnpy/main.nf
new file mode 100644
index 0000000000..e421242781
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/mnnpy/main.nf
@@ -0,0 +1,3673 @@
+// mnnpy 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "mnnpy",
+    "namespace" : "batch_integration/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "feature",
+          "label" : "Integrated Feature",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "corrected_counts",
+                "description" : "Corrected counts after integration",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hvg",
+        "description" : "Number of highly variable genes to use.",
+        "default" : [
+          2000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/mnnpy/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "mnnpy",
+      "summary" : "Batch effect correction by matching mutual nearest neighbors, Python implementation.",
+      "description" : "An implementation of MNN correct in python featuring low memory usage, full multicore support and compatibility with the scanpy framework.\n\nBatch effect correction by matching mutual nearest neighbors (Haghverdi et al, 2018) has been implemented as a function 'mnnCorrect' in the R package scran. Sadly it's extremely slow for big datasets and doesn't make full use of the parallel architecture of modern CPUs.\n\nThis project is a python implementation of the MNN correct algorithm which takes advantage of python's extendability and hackability. It seamlessly integrates with the scanpy framework and has multicore support in its bones.\n",
+      "reference" : "hie2019efficient",
+      "repository_url" : "https://github.com/chriscainx/mnnpy",
+      "documentation_url" : "https://github.com/chriscainx/mnnpy#readme",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/batch_integration_graph/methods/mnn.py",
+        "commit" : "29803b95c88b4ec5921df2eec7111fd5d1a95daf"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "variants" : {
+        "mnn_full_scaled" : {
+          "preferred_normalization" : "log_cp10k_scaled"
+        }
+      },
+      "type" : "method",
+      "subtype" : "feature",
+      "type_info" : {
+        "label" : "Method (feature)",
+        "summary" : "A batch integration feature method.",
+        "description" : "A batch integration method which outputs a batch-corrected feature-space.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "python:3.8",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "apt",
+          "packages" : [
+            "procps"
+          ],
+          "interactive" : false
+        },
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "anndata~=0.8.0",
+            "scanpy",
+            "pyyaml",
+            "requests",
+            "jsonschema"
+          ],
+          "github" : [
+            "chriscainx/mnnpy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowcpu",
+          "lowmem"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/mnnpy/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/mnnpy",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import mnnpy
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Read input', flush=True)
+adata = ad.read_h5ad(par['input'])
+adata.X = adata.layers['normalized']
+del adata.layers['normalized']
+del adata.layers['counts']
+
+if par['n_hvg']:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var['hvg_score'].to_numpy().argsort()[::-1][:par['n_hvg']]
+    adata = adata[:, idx].copy()
+
+print('Run mnn', flush=True)
+split = []
+batch_categories = adata.obs['batch'].cat.categories
+for i in batch_categories:
+    split.append(adata[adata.obs['batch'] == i].copy())
+corrected, _, _ = mnnpy.mnn_correct(
+        *split,
+        batch_key='batch',
+        batch_categories=batch_categories,
+        index_unique=None
+    )
+
+print("Store outputs", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+    },
+    layers={
+        'corrected_counts': corrected.X,
+    }
+)
+
+
+print("Store outputs", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/methods/mnnpy",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowcpu",
+    "lowmem"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/methods/mnnpy/nextflow.config b/target/nextflow/batch_integration/methods/mnnpy/nextflow.config
new file mode 100644
index 0000000000..7029ec2c25
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/mnnpy/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/methods/mnnpy'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/methods/pyliger/.config.vsh.yaml b/target/nextflow/batch_integration/methods/pyliger/.config.vsh.yaml
new file mode 100644
index 0000000000..562b221618
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/pyliger/.config.vsh.yaml
@@ -0,0 +1,223 @@
+functionality:
+  name: "pyliger"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "pyliger"
+    summary: "Python implementation of LIGER (Linked Inference of Genomic Experimental\
+      \ Relationships"
+    description: "LIGER (installed as rliger) is a package for integrating and analyzing\
+      \ multiple \nsingle-cell datasets, developed by the Macosko lab and maintained/extended\
+      \ by the \nWelch lab. It relies on integrative non-negative matrix factorization\
+      \ to identify \nshared and dataset-specific factors.\n"
+    reference: "welch2019single"
+    repository_url: "https://github.com/welch-lab/pyliger"
+    documentation_url: "https://github.com/welch-lab/pyliger"
+    preferred_normalization: "log_cp10k"
+    variants:
+      liger_unscaled: null
+      liger_scaled:
+        preferred_normalization: "log_cp10k_scaled"
+    type: "method"
+    subtype: "embedding"
+    type_info:
+      label: "Method (embedding)"
+      summary: "A batch integration embedding method."
+      description: "A batch integration method which outputs a batch-corrected embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "umap-learn[plot]"
+    - "pyliger"
+    - "dask-expr"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowcpu"
+    - "highmem"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/pyliger/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/pyliger"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/pyliger/pyliger"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/methods/pyliger/main.nf b/target/nextflow/batch_integration/methods/pyliger/main.nf
new file mode 100644
index 0000000000..bc8e25813d
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/pyliger/main.nf
@@ -0,0 +1,3682 @@
+// pyliger 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "pyliger",
+    "namespace" : "batch_integration/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/pyliger/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "pyliger",
+      "summary" : "Python implementation of LIGER (Linked Inference of Genomic Experimental Relationships",
+      "description" : "LIGER (installed as rliger) is a package for integrating and analyzing multiple \nsingle-cell datasets, developed by the Macosko lab and maintained/extended by the \nWelch lab. It relies on integrative non-negative matrix factorization to identify \nshared and dataset-specific factors.\n",
+      "reference" : "welch2019single",
+      "repository_url" : "https://github.com/welch-lab/pyliger",
+      "documentation_url" : "https://github.com/welch-lab/pyliger",
+      "preferred_normalization" : "log_cp10k",
+      "variants" : {
+        "liger_scaled" : {
+          "preferred_normalization" : "log_cp10k_scaled"
+        }
+      },
+      "type" : "method",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Method (embedding)",
+        "summary" : "A batch integration embedding method.",
+        "description" : "A batch integration method which outputs a batch-corrected embedding.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "umap-learn[plot]",
+            "pyliger",
+            "dask-expr"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "lowcpu",
+          "highmem",
+          "midtime"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/pyliger/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/pyliger",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+import numpy as np
+import pyliger
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('>> Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/counts',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+adata.layers['norm_data'] = read_anndata(par['input'], X='layers/normalized').X
+
+print('>> Prepare data', flush=True)
+adata_per_batch = []
+for batch in adata.obs['batch'].unique():
+  adb = adata[adata.obs['batch'] == batch].copy()
+  
+  # save row sum and sum of squares for further use
+  norm_sum = np.ravel(np.sum(adb.layers["norm_data"], axis=0))
+  norm_sum_sq = np.ravel(np.sum(adb.layers["norm_data"].power(2), axis=0))
+  adb.var["norm_sum"] = norm_sum
+  adb.var["norm_sum_sq"] = norm_sum_sq
+  adb.var["norm_mean"] = norm_sum / adb.shape[0]
+
+  # set more metadata
+  adb.obs.index.name = 'cell_barcode'
+  adb.var.index.name = 'gene_id'
+  adb.uns['sample_name'] = batch
+
+  # append to list
+  adata_per_batch.append(adb)
+
+print('Create liger object', flush=True)
+lobj = pyliger.create_liger(
+  adata_per_batch,
+  remove_missing=False
+)
+
+# do not select genes
+lobj.var_genes = adata.var_names
+
+print('>> Scaling', flush=True)
+pyliger.scale_not_center(lobj, remove_missing=False)
+
+print('>> Optimize ALS', flush=True)
+pyliger.optimize_ALS(lobj, k=20)
+
+print('>> Quantile normalization', flush=True)
+pyliger.quantile_norm(lobj)
+
+print('>> Concatenate outputs', flush=True)
+ad_out = ad.concat(lobj.adata_list)
+
+print('Store output', flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+        obsm={
+        'X_emb': ad_out[adata.obs_names, :].obsm['H_norm']
+    },
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/methods/pyliger",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "lowcpu",
+    "highmem",
+    "midtime"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/methods/pyliger/nextflow.config b/target/nextflow/batch_integration/methods/pyliger/nextflow.config
new file mode 100644
index 0000000000..033d858989
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/pyliger/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/methods/pyliger'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/methods/pyliger/read_anndata_partial.py b/target/nextflow/batch_integration/methods/pyliger/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/pyliger/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/methods/scalex_embed/.config.vsh.yaml b/target/nextflow/batch_integration/methods/scalex_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..ba27ad02e2
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scalex_embed/.config.vsh.yaml
@@ -0,0 +1,237 @@
+functionality:
+  name: "scalex_embed"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to use."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "SCALEX (embedding)"
+    summary: "Online single-cell data integration through projecting heterogeneous\
+      \ datasets into a common cell-embedding space"
+    description: "SCALEX is a method for integrating heterogeneous single-cell data\
+      \ online using a VAE framework. Its generalised encoder disentangles batch-related\
+      \ components from batch-invariant biological components, which are then projected\
+      \ into a common cell-embedding space.\n"
+    reference: "xiong2021online"
+    repository_url: "https://github.com/jsxlei/SCALEX"
+    documentation_url: "https://scalex.readthedocs.io"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/scalex.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      scalex_feature_unscaled: null
+      scanorama_feature_scaled:
+        preferred_normalization: "log_cp10k_scaled"
+    type: "method"
+    subtype: "embedding"
+    type_info:
+      label: "Method (embedding)"
+      summary: "A batch integration embedding method."
+      description: "A batch integration method which outputs a batch-corrected embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scalex"
+    - "numpy<1.24"
+    - "torch<2.1"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowmem"
+    - "lowcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scalex_embed/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scalex_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scalex_embed/scalex_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/methods/scalex_embed/main.nf b/target/nextflow/batch_integration/methods/scalex_embed/main.nf
new file mode 100644
index 0000000000..22c029389c
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scalex_embed/main.nf
@@ -0,0 +1,3682 @@
+// scalex_embed 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "scalex_embed",
+    "namespace" : "batch_integration/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hvg",
+        "description" : "Number of highly variable genes to use.",
+        "default" : [
+          2000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scalex_embed/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "SCALEX (embedding)",
+      "summary" : "Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space",
+      "description" : "SCALEX is a method for integrating heterogeneous single-cell data online using a VAE framework. Its generalised encoder disentangles batch-related components from batch-invariant biological components, which are then projected into a common cell-embedding space.\n",
+      "reference" : "xiong2021online",
+      "repository_url" : "https://github.com/jsxlei/SCALEX",
+      "documentation_url" : "https://scalex.readthedocs.io",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/batch_integration_graph/methods/scalex.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "variants" : {
+        "scanorama_feature_scaled" : {
+          "preferred_normalization" : "log_cp10k_scaled"
+        }
+      },
+      "type" : "method",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Method (embedding)",
+        "summary" : "A batch integration embedding method.",
+        "description" : "A batch integration method which outputs a batch-corrected embedding.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scalex",
+            "numpy<1.24",
+            "torch<2.1"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "lowmem",
+          "lowcpu",
+          "midtime"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scalex_embed/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scalex_embed",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+import scalex
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+
+if par['n_hvg']:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var['hvg_score'].to_numpy().argsort()[::-1][:par['n_hvg']]
+    adata = adata[:, idx].copy()
+
+print('Run SCALEX', flush=True)
+adata = scalex.SCALEX(
+    adata,
+    batch_key="batch",
+    ignore_umap=True,
+    impute=adata.obs["batch"].cat.categories[0],
+    processed=True,
+    max_iteration=40,
+    min_features=None,
+    min_cells=None,
+    n_top_features=0,
+    outdir=None,
+    gpu=0,
+)
+adata.obsm["X_emb"] = adata.obsm["latent"]
+
+print("Store outputs", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    layers={
+        'corrected_counts': adata.layers["impute"],
+    },
+    obsm={
+        'X_emb': adata.obsm['latent'],
+    },
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/methods/scalex_embed",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "lowmem",
+    "lowcpu",
+    "midtime"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/methods/scalex_embed/nextflow.config b/target/nextflow/batch_integration/methods/scalex_embed/nextflow.config
new file mode 100644
index 0000000000..a22f9440c8
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scalex_embed/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/methods/scalex_embed'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/methods/scalex_embed/read_anndata_partial.py b/target/nextflow/batch_integration/methods/scalex_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scalex_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/methods/scalex_feature/.config.vsh.yaml b/target/nextflow/batch_integration/methods/scalex_feature/.config.vsh.yaml
new file mode 100644
index 0000000000..9bcf3de057
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scalex_feature/.config.vsh.yaml
@@ -0,0 +1,237 @@
+functionality:
+  name: "scalex_feature"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to use."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "../scalex_embed/script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "SCALEX (feature)"
+    summary: "Online single-cell data integration through projecting heterogeneous\
+      \ datasets into a common cell-embedding space"
+    description: "SCALEX is a method for integrating heterogeneous single-cell data\
+      \ online using a VAE framework. Its generalised encoder disentangles batch-related\
+      \ components from batch-invariant biological components, which are then projected\
+      \ into a common cell-embedding space.\n"
+    reference: "xiong2021online"
+    repository_url: "https://github.com/jsxlei/SCALEX"
+    documentation_url: "https://scalex.readthedocs.io"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/scalex.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      scalex_feature_unscaled: null
+      scanorama_feature_scaled:
+        preferred_normalization: "log_cp10k_scaled"
+    type: "method"
+    subtype: "feature"
+    type_info:
+      label: "Method (feature)"
+      summary: "A batch integration feature method."
+      description: "A batch integration method which outputs a batch-corrected feature-space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scalex"
+    - "numpy<1.24"
+    - "torch<2.1"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowmem"
+    - "lowcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scalex_feature/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scalex_feature"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scalex_feature/scalex_feature"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/methods/scalex_feature/main.nf b/target/nextflow/batch_integration/methods/scalex_feature/main.nf
new file mode 100644
index 0000000000..e9828173d3
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scalex_feature/main.nf
@@ -0,0 +1,3682 @@
+// scalex_feature 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "scalex_feature",
+    "namespace" : "batch_integration/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "feature",
+          "label" : "Integrated Feature",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "corrected_counts",
+                "description" : "Corrected counts after integration",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hvg",
+        "description" : "Number of highly variable genes to use.",
+        "default" : [
+          2000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "../scalex_embed/script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scalex_feature/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "SCALEX (feature)",
+      "summary" : "Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space",
+      "description" : "SCALEX is a method for integrating heterogeneous single-cell data online using a VAE framework. Its generalised encoder disentangles batch-related components from batch-invariant biological components, which are then projected into a common cell-embedding space.\n",
+      "reference" : "xiong2021online",
+      "repository_url" : "https://github.com/jsxlei/SCALEX",
+      "documentation_url" : "https://scalex.readthedocs.io",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/batch_integration_graph/methods/scalex.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "variants" : {
+        "scanorama_feature_scaled" : {
+          "preferred_normalization" : "log_cp10k_scaled"
+        }
+      },
+      "type" : "method",
+      "subtype" : "feature",
+      "type_info" : {
+        "label" : "Method (feature)",
+        "summary" : "A batch integration feature method.",
+        "description" : "A batch integration method which outputs a batch-corrected feature-space.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scalex",
+            "numpy<1.24",
+            "torch<2.1"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "lowmem",
+          "lowcpu",
+          "midtime"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scalex_feature/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scalex_feature",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+import scalex
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+
+if par['n_hvg']:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var['hvg_score'].to_numpy().argsort()[::-1][:par['n_hvg']]
+    adata = adata[:, idx].copy()
+
+print('Run SCALEX', flush=True)
+adata = scalex.SCALEX(
+    adata,
+    batch_key="batch",
+    ignore_umap=True,
+    impute=adata.obs["batch"].cat.categories[0],
+    processed=True,
+    max_iteration=40,
+    min_features=None,
+    min_cells=None,
+    n_top_features=0,
+    outdir=None,
+    gpu=0,
+)
+adata.obsm["X_emb"] = adata.obsm["latent"]
+
+print("Store outputs", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    layers={
+        'corrected_counts': adata.layers["impute"],
+    },
+    obsm={
+        'X_emb': adata.obsm['latent'],
+    },
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/methods/scalex_feature",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "lowmem",
+    "lowcpu",
+    "midtime"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/methods/scalex_feature/nextflow.config b/target/nextflow/batch_integration/methods/scalex_feature/nextflow.config
new file mode 100644
index 0000000000..965b522c4e
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scalex_feature/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/methods/scalex_feature'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/methods/scalex_feature/read_anndata_partial.py b/target/nextflow/batch_integration/methods/scalex_feature/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scalex_feature/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/methods/scanorama_embed/.config.vsh.yaml b/target/nextflow/batch_integration/methods/scanorama_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..8cbcee5f8d
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scanorama_embed/.config.vsh.yaml
@@ -0,0 +1,234 @@
+functionality:
+  name: "scanorama_embed"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to use."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Scanorama (embedding)"
+    summary: "Efficient integration of heterogeneous single-cell transcriptomes using\
+      \ Scanorama"
+    description: "\"Scanorama is an extension of the MNN method. Other then MNN, it\
+      \ finds mutual nearest neighbours over all batches and embeds observations into\
+      \ a joint hyperplane.\"\n"
+    reference: "hie2019efficient"
+    repository_url: "https://github.com/brianhie/scanorama"
+    documentation_url: "https://github.com/brianhie/scanorama#readme"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/scanorama.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      scanorama_embed_full_unscaled: null
+      scanorama_embed_full_scaled:
+        preferred_normalization: "log_cp10k_scaled"
+    type: "method"
+    subtype: "embedding"
+    type_info:
+      label: "Method (embedding)"
+      summary: "A batch integration embedding method."
+      description: "A batch integration method which outputs a batch-corrected embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scanorama"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanorama_embed/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scanorama_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scanorama_embed/scanorama_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/methods/scanorama_embed/main.nf b/target/nextflow/batch_integration/methods/scanorama_embed/main.nf
new file mode 100644
index 0000000000..15d67453e8
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scanorama_embed/main.nf
@@ -0,0 +1,3697 @@
+// scanorama_embed 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "scanorama_embed",
+    "namespace" : "batch_integration/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hvg",
+        "description" : "Number of highly variable genes to use.",
+        "default" : [
+          2000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanorama_embed/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Scanorama (embedding)",
+      "summary" : "Efficient integration of heterogeneous single-cell transcriptomes using Scanorama",
+      "description" : "\\"Scanorama is an extension of the MNN method. Other then MNN, it finds mutual nearest neighbours over all batches and embeds observations into a joint hyperplane.\\"\n",
+      "reference" : "hie2019efficient",
+      "repository_url" : "https://github.com/brianhie/scanorama",
+      "documentation_url" : "https://github.com/brianhie/scanorama#readme",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/batch_integration_graph/methods/scanorama.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "variants" : {
+        "scanorama_embed_full_scaled" : {
+          "preferred_normalization" : "log_cp10k_scaled"
+        }
+      },
+      "type" : "method",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Method (embedding)",
+        "summary" : "A batch integration embedding method.",
+        "description" : "A batch integration method which outputs a batch-corrected embedding.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scanorama"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanorama_embed/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scanorama_embed",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+import scanorama
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+# based on scib
+# -> https://github.com/theislab/scib/blob/59ae6eee5e611d9d3db067685ec96c28804e9127/scib/utils.py#L51C1-L72C62
+def merge_adata(*adata_list, **kwargs):
+    """Merge adatas from list while remove duplicated \\`\\`obs\\`\\` and \\`\\`var\\`\\` columns
+
+    :param adata_list: \\`\\`anndata\\`\\` objects to be concatenated
+    :param kwargs: arguments to be passed to \\`\\`anndata.AnnData.concatenate\\`\\`
+    """
+
+    if len(adata_list) == 1:
+        return adata_list[0]
+
+    # Make sure that adatas do not contain duplicate columns
+    for _adata in adata_list:
+        for attr in ("obs", "var"):
+            df = getattr(_adata, attr)
+            dup_mask = df.columns.duplicated()
+            if dup_mask.any():
+                print(
+                    f"Deleting duplicated keys \\`{list(df.columns[dup_mask].unique())}\\` from \\`adata.{attr}\\`."
+                )
+                setattr(_adata, attr, df.loc[:, ~dup_mask])
+
+    return ad.AnnData.concatenate(*adata_list, **kwargs)
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+if par['n_hvg']:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var['hvg_score'].to_numpy().argsort()[::-1][:par['n_hvg']]
+    adata = adata[:, idx].copy()
+
+print('Run scanorama', flush=True)
+split = []
+batch_categories = adata.obs['batch'].cat.categories
+for i in batch_categories:
+    split.append(adata[adata.obs['batch'] == i].copy())
+corrected = scanorama.correct_scanpy(split, return_dimred=True)
+corrected = merge_adata(*corrected, batch_key='batch', batch_categories=batch_categories, index_unique=None)
+
+print("Store output", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+    },
+    layers={
+        'corrected_counts': corrected.X,
+    },
+    obsm={
+        'X_emb': corrected.obsm["X_scanorama"],
+    }
+)
+
+print("Write output to file", flush=True)
+output.write(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/methods/scanorama_embed",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/methods/scanorama_embed/nextflow.config b/target/nextflow/batch_integration/methods/scanorama_embed/nextflow.config
new file mode 100644
index 0000000000..e59f068885
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scanorama_embed/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/methods/scanorama_embed'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/methods/scanorama_embed/read_anndata_partial.py b/target/nextflow/batch_integration/methods/scanorama_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scanorama_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/methods/scanorama_feature/.config.vsh.yaml b/target/nextflow/batch_integration/methods/scanorama_feature/.config.vsh.yaml
new file mode 100644
index 0000000000..13089d4014
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scanorama_feature/.config.vsh.yaml
@@ -0,0 +1,234 @@
+functionality:
+  name: "scanorama_feature"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to use."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "../scanorama_embed/script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Scanorama (feature)"
+    summary: "Efficient integration of heterogeneous single-cell transcriptomes using\
+      \ Scanorama"
+    description: "\"Scanorama is an extension of the MNN method. Other then MNN, it\
+      \ finds mutual nearest neighbours over all batches and embeds observations into\
+      \ a joint hyperplane.\"\n"
+    reference: "hie2019efficient"
+    repository_url: "https://github.com/brianhie/scanorama"
+    documentation_url: "https://github.com/brianhie/scanorama#readme"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/scanorama.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      scanorama_feature_full_unscaled: null
+      scanorama_feature_full_scaled:
+        preferred_normalization: "log_cp10k_scaled"
+    type: "method"
+    subtype: "feature"
+    type_info:
+      label: "Method (feature)"
+      summary: "A batch integration feature method."
+      description: "A batch integration method which outputs a batch-corrected feature-space.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scanorama"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanorama_feature/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scanorama_feature"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scanorama_feature/scanorama_feature"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/methods/scanorama_feature/main.nf b/target/nextflow/batch_integration/methods/scanorama_feature/main.nf
new file mode 100644
index 0000000000..85107195cb
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scanorama_feature/main.nf
@@ -0,0 +1,3697 @@
+// scanorama_feature 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "scanorama_feature",
+    "namespace" : "batch_integration/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "feature",
+          "label" : "Integrated Feature",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "corrected_counts",
+                "description" : "Corrected counts after integration",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hvg",
+        "description" : "Number of highly variable genes to use.",
+        "default" : [
+          2000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "../scanorama_embed/script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanorama_feature/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Scanorama (feature)",
+      "summary" : "Efficient integration of heterogeneous single-cell transcriptomes using Scanorama",
+      "description" : "\\"Scanorama is an extension of the MNN method. Other then MNN, it finds mutual nearest neighbours over all batches and embeds observations into a joint hyperplane.\\"\n",
+      "reference" : "hie2019efficient",
+      "repository_url" : "https://github.com/brianhie/scanorama",
+      "documentation_url" : "https://github.com/brianhie/scanorama#readme",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/batch_integration_graph/methods/scanorama.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "variants" : {
+        "scanorama_feature_full_scaled" : {
+          "preferred_normalization" : "log_cp10k_scaled"
+        }
+      },
+      "type" : "method",
+      "subtype" : "feature",
+      "type_info" : {
+        "label" : "Method (feature)",
+        "summary" : "A batch integration feature method.",
+        "description" : "A batch integration method which outputs a batch-corrected feature-space.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scanorama"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanorama_feature/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scanorama_feature",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+import scanorama
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+# based on scib
+# -> https://github.com/theislab/scib/blob/59ae6eee5e611d9d3db067685ec96c28804e9127/scib/utils.py#L51C1-L72C62
+def merge_adata(*adata_list, **kwargs):
+    """Merge adatas from list while remove duplicated \\`\\`obs\\`\\` and \\`\\`var\\`\\` columns
+
+    :param adata_list: \\`\\`anndata\\`\\` objects to be concatenated
+    :param kwargs: arguments to be passed to \\`\\`anndata.AnnData.concatenate\\`\\`
+    """
+
+    if len(adata_list) == 1:
+        return adata_list[0]
+
+    # Make sure that adatas do not contain duplicate columns
+    for _adata in adata_list:
+        for attr in ("obs", "var"):
+            df = getattr(_adata, attr)
+            dup_mask = df.columns.duplicated()
+            if dup_mask.any():
+                print(
+                    f"Deleting duplicated keys \\`{list(df.columns[dup_mask].unique())}\\` from \\`adata.{attr}\\`."
+                )
+                setattr(_adata, attr, df.loc[:, ~dup_mask])
+
+    return ad.AnnData.concatenate(*adata_list, **kwargs)
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+if par['n_hvg']:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var['hvg_score'].to_numpy().argsort()[::-1][:par['n_hvg']]
+    adata = adata[:, idx].copy()
+
+print('Run scanorama', flush=True)
+split = []
+batch_categories = adata.obs['batch'].cat.categories
+for i in batch_categories:
+    split.append(adata[adata.obs['batch'] == i].copy())
+corrected = scanorama.correct_scanpy(split, return_dimred=True)
+corrected = merge_adata(*corrected, batch_key='batch', batch_categories=batch_categories, index_unique=None)
+
+print("Store output", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': meta['functionality_name'],
+    },
+    layers={
+        'corrected_counts': corrected.X,
+    },
+    obsm={
+        'X_emb': corrected.obsm["X_scanorama"],
+    }
+)
+
+print("Write output to file", flush=True)
+output.write(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/methods/scanorama_feature",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/methods/scanorama_feature/nextflow.config b/target/nextflow/batch_integration/methods/scanorama_feature/nextflow.config
new file mode 100644
index 0000000000..6032e168c8
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scanorama_feature/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/methods/scanorama_feature'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/methods/scanorama_feature/read_anndata_partial.py b/target/nextflow/batch_integration/methods/scanorama_feature/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scanorama_feature/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/methods/scanvi/.config.vsh.yaml b/target/nextflow/batch_integration/methods/scanvi/.config.vsh.yaml
new file mode 100644
index 0000000000..6b84482c76
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scanvi/.config.vsh.yaml
@@ -0,0 +1,292 @@
+functionality:
+  name: "scanvi"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to use."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_latent"
+    description: "Number of latent dimensions."
+    info: null
+    default:
+    - 30
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hidden"
+    description: "Number of hidden units."
+    info: null
+    default:
+    - 128
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_layers"
+    description: "Number of layers."
+    info: null
+    default:
+    - 2
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_scvi"
+    description: "Maximum number of training epochs for scVI."
+    info: null
+    example:
+    - 400
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_scanvi"
+    description: "Maximum number of training epochs for scANVI."
+    info: null
+    example:
+    - 10
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "scANVI"
+    summary: "scANVI is a deep learning method that considers cell type labels."
+    description: "scANVI (single-cell ANnotation using Variational Inference; Python\
+      \ class SCANVI) is a semi-supervised model for single-cell transcriptomics data.\
+      \ In a sense, it can be seen as a scVI extension that can leverage the cell\
+      \ type knowledge for a subset of the cells present in the data sets to infer\
+      \ the states of the rest of the cells.\n"
+    reference: "lopez2018deep"
+    repository_url: "https://github.com/scverse/scvi-tools"
+    documentation_url: "https://docs.scvi-tools.org/en/stable/user_guide/models/scanvi.html"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/scanvi.py"
+      commit: "29803b95c88b4ec5921df2eec7111fd5d1a95daf"
+    preferred_normalization: "counts"
+    variants:
+      scanvi_full_unscaled: null
+    type: "method"
+    subtype: "embedding"
+    type_info:
+      label: "Method (embedding)"
+      summary: "A batch integration embedding method."
+      description: "A batch integration method which outputs a batch-corrected embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scvi-tools>=1.1.0"
+    upgrade: true
+  - type: "docker"
+    run:
+    - "pip install -U \"jax[cuda12_pip]\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanvi/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scanvi"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scanvi/scanvi"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/methods/scanvi/main.nf b/target/nextflow/batch_integration/methods/scanvi/main.nf
new file mode 100644
index 0000000000..b4c8bb5548
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scanvi/main.nf
@@ -0,0 +1,3755 @@
+// scanvi 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "scanvi",
+    "namespace" : "batch_integration/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hvg",
+        "description" : "Number of highly variable genes to use.",
+        "default" : [
+          2000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_latent",
+        "description" : "Number of latent dimensions.",
+        "default" : [
+          30
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hidden",
+        "description" : "Number of hidden units.",
+        "default" : [
+          128
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_layers",
+        "description" : "Number of layers.",
+        "default" : [
+          2
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--max_epochs_scvi",
+        "description" : "Maximum number of training epochs for scVI.",
+        "example" : [
+          400
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--max_epochs_scanvi",
+        "description" : "Maximum number of training epochs for scANVI.",
+        "example" : [
+          10
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanvi/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "scANVI",
+      "summary" : "scANVI is a deep learning method that considers cell type labels.",
+      "description" : "scANVI (single-cell ANnotation using Variational Inference; Python class SCANVI) is a semi-supervised model for single-cell transcriptomics data. In a sense, it can be seen as a scVI extension that can leverage the cell type knowledge for a subset of the cells present in the data sets to infer the states of the rest of the cells.\n",
+      "reference" : "lopez2018deep",
+      "repository_url" : "https://github.com/scverse/scvi-tools",
+      "documentation_url" : "https://docs.scvi-tools.org/en/stable/user_guide/models/scanvi.html",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/batch_integration_graph/methods/scanvi.py",
+        "commit" : "29803b95c88b4ec5921df2eec7111fd5d1a95daf"
+      },
+      "preferred_normalization" : "counts",
+      "type" : "method",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Method (embedding)",
+        "summary" : "A batch integration embedding method.",
+        "description" : "A batch integration method which outputs a batch-corrected embedding.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scvi-tools>=1.1.0"
+          ],
+          "upgrade" : true
+        },
+        {
+          "type" : "docker",
+          "run" : [
+            "pip install -U \\"jax[cuda12_pip]\\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+          ]
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu",
+          "gpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanvi/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scanvi",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+from scvi.model import SCVI, SCANVI
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_latent': $( if [ ! -z ${VIASH_PAR_N_LATENT+x} ]; then echo "int(r'${VIASH_PAR_N_LATENT//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_hidden': $( if [ ! -z ${VIASH_PAR_N_HIDDEN+x} ]; then echo "int(r'${VIASH_PAR_N_HIDDEN//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_layers': $( if [ ! -z ${VIASH_PAR_N_LAYERS+x} ]; then echo "int(r'${VIASH_PAR_N_LAYERS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'max_epochs_scvi': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_SCVI+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_SCVI//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'max_epochs_scanvi': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_SCANVI+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_SCANVI//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/counts',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    adata = adata[:, idx].copy()
+
+print("Processing data", flush=True)
+SCVI.setup_anndata(adata, batch_key="batch")
+
+print("Run scVI", flush=True)
+model_kwargs = {
+    key: par[key]
+    for key in ["n_latent", "n_hidden", "n_layers"]
+    if par[key] is not None
+}
+
+vae = SCVI(adata, **model_kwargs)
+
+vae.train(max_epochs=par["max_epochs_scvi"], train_size=1.0)
+
+print('Run SCANVI', flush=True)
+scanvae = SCANVI.from_scvi_model(
+    scvi_model=vae,
+    labels_key="label",
+    unlabeled_category="UnknownUnknown", # pick anything definitely not in a dataset
+)
+scanvae.train(max_epochs=par["max_epochs_scanvi"], train_size=1.0)
+
+print("Store outputs", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    obsm={
+        "X_emb": scanvae.get_latent_representation(),
+    },
+    uns={
+        "dataset_id": adata.uns["dataset_id"],
+        "normalization_id": adata.uns["normalization_id"],
+        "method_id": meta["functionality_name"],
+    },
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/methods/scanvi",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu",
+    "gpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/methods/scanvi/nextflow.config b/target/nextflow/batch_integration/methods/scanvi/nextflow.config
new file mode 100644
index 0000000000..85e5ca9ce7
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scanvi/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/methods/scanvi'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/methods/scanvi/read_anndata_partial.py b/target/nextflow/batch_integration/methods/scanvi/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scanvi/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/methods/scvi/.config.vsh.yaml b/target/nextflow/batch_integration/methods/scvi/.config.vsh.yaml
new file mode 100644
index 0000000000..8bd437aec8
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scvi/.config.vsh.yaml
@@ -0,0 +1,281 @@
+functionality:
+  name: "scvi"
+  namespace: "batch_integration/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to use."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_latent"
+    description: "Number of latent dimensions."
+    info: null
+    default:
+    - 30
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hidden"
+    description: "Number of hidden units."
+    info: null
+    default:
+    - 128
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_layers"
+    description: "Number of layers."
+    info: null
+    default:
+    - 2
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs"
+    description: "Maximum number of epochs."
+    info: null
+    example:
+    - 400
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "scVI"
+    summary: "scVI combines a variational autoencoder with a hierarchical Bayesian\
+      \ model."
+    description: "scVI combines a variational autoencoder with a hierarchical Bayesian\
+      \ model. It uses the negative binomial distribution to describe gene expression\
+      \ of each cell, conditioned on unobserved factors and the batch variable. ScVI\
+      \ is run as implemented in Luecken et al.\n"
+    reference: "lopez2018deep"
+    repository_url: "https://github.com/scverse/scvi-tools"
+    documentation_url: "https://docs.scvi-tools.org/en/stable/user_guide/models/scvi.html"
+    v1:
+      path: "openproblems/tasks/_batch_integration/batch_integration_graph/methods/scvi.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "counts"
+    variants:
+      scvi_full_unscaled: null
+    type: "method"
+    subtype: "embedding"
+    type_info:
+      label: "Method (embedding)"
+      summary: "A batch integration embedding method."
+      description: "A batch integration method which outputs a batch-corrected embedding.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scvi-tools>=1.1.0"
+    upgrade: true
+  - type: "docker"
+    run:
+    - "pip install -U \"jax[cuda12_pip]\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scvi/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scvi"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scvi/scvi"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/methods/scvi/main.nf b/target/nextflow/batch_integration/methods/scvi/main.nf
new file mode 100644
index 0000000000..719d8f2025
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scvi/main.nf
@@ -0,0 +1,3732 @@
+// scvi 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "scvi",
+    "namespace" : "batch_integration/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hvg",
+        "description" : "Number of highly variable genes to use.",
+        "default" : [
+          2000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_latent",
+        "description" : "Number of latent dimensions.",
+        "default" : [
+          30
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hidden",
+        "description" : "Number of hidden units.",
+        "default" : [
+          128
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_layers",
+        "description" : "Number of layers.",
+        "default" : [
+          2
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--max_epochs",
+        "description" : "Maximum number of epochs.",
+        "example" : [
+          400
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scvi/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "scVI",
+      "summary" : "scVI combines a variational autoencoder with a hierarchical Bayesian model.",
+      "description" : "scVI combines a variational autoencoder with a hierarchical Bayesian model. It uses the negative binomial distribution to describe gene expression of each cell, conditioned on unobserved factors and the batch variable. ScVI is run as implemented in Luecken et al.\n",
+      "reference" : "lopez2018deep",
+      "repository_url" : "https://github.com/scverse/scvi-tools",
+      "documentation_url" : "https://docs.scvi-tools.org/en/stable/user_guide/models/scvi.html",
+      "v1" : {
+        "path" : "openproblems/tasks/_batch_integration/batch_integration_graph/methods/scvi.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "counts",
+      "type" : "method",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Method (embedding)",
+        "summary" : "A batch integration embedding method.",
+        "description" : "A batch integration method which outputs a batch-corrected embedding.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scvi-tools>=1.1.0"
+          ],
+          "upgrade" : true
+        },
+        {
+          "type" : "docker",
+          "run" : [
+            "pip install -U \\"jax[cuda12_pip]\\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+          ]
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu",
+          "gpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scvi/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scvi",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+from scvi.model import SCVI
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_latent': $( if [ ! -z ${VIASH_PAR_N_LATENT+x} ]; then echo "int(r'${VIASH_PAR_N_LATENT//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_hidden': $( if [ ! -z ${VIASH_PAR_N_HIDDEN+x} ]; then echo "int(r'${VIASH_PAR_N_HIDDEN//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_layers': $( if [ ! -z ${VIASH_PAR_N_LAYERS+x} ]; then echo "int(r'${VIASH_PAR_N_LAYERS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'max_epochs': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/counts',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = adata.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    adata = adata[:, idx].copy()
+
+print("Processing data", flush=True)
+SCVI.setup_anndata(adata, batch_key="batch")
+
+print("Run scVI", flush=True)
+model_kwargs = {
+    key: par[key]
+    for key in ["n_latent", "n_hidden", "n_layers"]
+    if par[key] is not None
+}
+
+vae = SCVI(adata, **model_kwargs)
+
+vae.train(max_epochs=par["max_epochs"], train_size=1.0)
+
+print("Store outputs", flush=True)
+output = ad.AnnData(
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    obsm={
+        "X_emb": vae.get_latent_representation(),
+    },
+    uns={
+        "dataset_id": adata.uns["dataset_id"],
+        "normalization_id": adata.uns["normalization_id"],
+        "method_id": meta["functionality_name"],
+    },
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/methods/scvi",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu",
+    "gpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/methods/scvi/nextflow.config b/target/nextflow/batch_integration/methods/scvi/nextflow.config
new file mode 100644
index 0000000000..56353467b4
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scvi/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/methods/scvi'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/methods/scvi/read_anndata_partial.py b/target/nextflow/batch_integration/methods/scvi/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/methods/scvi/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/metrics/asw_batch/.config.vsh.yaml b/target/nextflow/batch_integration/metrics/asw_batch/.config.vsh.yaml
new file mode 100644
index 0000000000..698f9a57b8
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/asw_batch/.config.vsh.yaml
@@ -0,0 +1,292 @@
+functionality:
+  name: "asw_batch"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/"
+    dest: "resources_test/batch_integration/"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "asw_batch"
+      label: "ASW batch"
+      summary: "Average silhouette of batches per cell identity label (cell type)"
+      description: "We consider the absolute silhouette width, s(i), on\nbatch labels\
+        \ per cell i. Here, 0 indicates that batches are well mixed, and any\ndeviation\
+        \ from 0 indicates a batch effect:\n\U0001d460batch(\U0001d456)=|\U0001d460\
+        (\U0001d456)|.\n\nTo ensure higher scores indicate better batch mixing, these\
+        \ scores are scaled by\nsubtracting them from 1. As we expect batches to integrate\
+        \ within cell identity\nclusters, we compute the batchASWj score for each\
+        \ cell label j separately,\nusing the equation:\nbatchASW\U0001d457=1|\U0001d436\
+        \U0001d457|∑\U0001d456∈\U0001d436\U0001d4571−\U0001d460batch(\U0001d456),\n\
+        \nwhere Cj is the set of cells with the cell label j and |Cj| denotes the\
+        \ number of cells\nin that set.\n\nTo obtain the final batchASW score, the\
+        \ label-specific batchASWj scores are averaged:\nbatchASW=1|\U0001d440|∑\U0001d457\
+        ∈\U0001d440batchASW\U0001d457.\n\nHere, M is the set of unique cell labels.\n"
+      reference: "luecken2022benchmarking"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_embed/metrics/sil_batch.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    subtype: "embedding"
+    type_info:
+      label: "Metric (embedding)"
+      summary: "A batch integration embedding metric."
+      description: "A metric for evaluating batch corrected embeddings.\n"
+    test_setup:
+      pancreas:
+        input_integrated: "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/pancreas/solution.h5ad"
+      cellxgene_census:
+        input_integrated: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/asw_batch/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/asw_batch"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/asw_batch/asw_batch"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/metrics/asw_batch/main.nf b/target/nextflow/batch_integration/metrics/asw_batch/main.nf
new file mode 100644
index 0000000000..f8358b3bb3
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/asw_batch/main.nf
@@ -0,0 +1,3731 @@
+// asw_batch 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "asw_batch",
+    "namespace" : "batch_integration/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_integrated",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "Solution dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/asw_batch/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/",
+        "dest" : "resources_test/batch_integration/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "asw_batch",
+          "label" : "ASW batch",
+          "summary" : "Average silhouette of batches per cell identity label (cell type)",
+          "description" : "We consider the absolute silhouette width, s(i), on\nbatch labels per cell i. Here, 0 indicates that batches are well mixed, and any\ndeviation from 0 indicates a batch effect:\n𝑠batch(𝑖)=|𝑠(𝑖)|.\n\nTo ensure higher scores indicate better batch mixing, these scores are scaled by\nsubtracting them from 1. As we expect batches to integrate within cell identity\nclusters, we compute the batchASWj score for each cell label j separately,\nusing the equation:\nbatchASW𝑗=1|𝐶𝑗|∑𝑖∈𝐶𝑗1−𝑠batch(𝑖),\n\nwhere Cj is the set of cells with the cell label j and |Cj| denotes the number of cells\nin that set.\n\nTo obtain the final batchASW score, the label-specific batchASWj scores are averaged:\nbatchASW=1|𝑀|∑𝑗∈𝑀batchASW𝑗.\n\nHere, M is the set of unique cell labels.\n",
+          "reference" : "luecken2022benchmarking",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/_batch_integration/batch_integration_embed/metrics/sil_batch.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          }
+        }
+      ],
+      "type" : "metric",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Metric (embedding)",
+        "summary" : "A batch integration embedding metric.",
+        "description" : "A metric for evaluating batch corrected embeddings.\n"
+      },
+      "test_setup" : {
+        "pancreas" : {
+          "input_integrated" : "resources_test/batch_integration/pancreas/integrated_embedding.h5ad",
+          "input_solution" : "resources_test/batch_integration/pancreas/solution.h5ad"
+        },
+        "cellxgene_census" : {
+          "input_integrated" : "resources_test/batch_integration/cxg_mouse_pancreas_atlas/integrated_embedding.h5ad",
+          "input_solution" : "resources_test/batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad"
+        }
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scib==1.1.5"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/asw_batch/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/asw_batch",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+from scib.metrics import silhouette_batch
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsm='obsm', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('compute score', flush=True)
+score = silhouette_batch(
+    adata,
+    batch_key='batch',
+    label_key='label',
+    embed='X_emb',
+)
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': adata.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/metrics/asw_batch",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/metrics/asw_batch/nextflow.config b/target/nextflow/batch_integration/metrics/asw_batch/nextflow.config
new file mode 100644
index 0000000000..a75d6739c1
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/asw_batch/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/metrics/asw_batch'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/metrics/asw_batch/read_anndata_partial.py b/target/nextflow/batch_integration/metrics/asw_batch/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/asw_batch/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/metrics/asw_label/.config.vsh.yaml b/target/nextflow/batch_integration/metrics/asw_label/.config.vsh.yaml
new file mode 100644
index 0000000000..1228a18262
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/asw_label/.config.vsh.yaml
@@ -0,0 +1,284 @@
+functionality:
+  name: "asw_label"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/"
+    dest: "resources_test/batch_integration/"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "asw_label"
+      label: "ASW Label"
+      summary: "Average silhouette of cell identity labels (cell types)"
+      description: "For the bio-conservation score, the ASW was computed on cell identity\
+        \ labels and\nscaled to a value between 0 and 1 using the equation:\ncelltypeASW=(ASW_C+1)/2,\n\
+        \nwhere C denotes the set of all cell identity labels.\nFor information about\
+        \ the batch silhouette score, check sil_batch.\n"
+      reference: "luecken2022benchmarking"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_embed/metrics/silhouette.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    subtype: "embedding"
+    type_info:
+      label: "Metric (embedding)"
+      summary: "A batch integration embedding metric."
+      description: "A metric for evaluating batch corrected embeddings.\n"
+    test_setup:
+      pancreas:
+        input_integrated: "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/pancreas/solution.h5ad"
+      cellxgene_census:
+        input_integrated: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/asw_label/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/asw_label"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/asw_label/asw_label"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/metrics/asw_label/main.nf b/target/nextflow/batch_integration/metrics/asw_label/main.nf
new file mode 100644
index 0000000000..2595106fd3
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/asw_label/main.nf
@@ -0,0 +1,3730 @@
+// asw_label 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "asw_label",
+    "namespace" : "batch_integration/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_integrated",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "Solution dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/asw_label/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/",
+        "dest" : "resources_test/batch_integration/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "asw_label",
+          "label" : "ASW Label",
+          "summary" : "Average silhouette of cell identity labels (cell types)",
+          "description" : "For the bio-conservation score, the ASW was computed on cell identity labels and\nscaled to a value between 0 and 1 using the equation:\ncelltypeASW=(ASW_C+1)/2,\n\nwhere C denotes the set of all cell identity labels.\nFor information about the batch silhouette score, check sil_batch.\n",
+          "reference" : "luecken2022benchmarking",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/_batch_integration/batch_integration_embed/metrics/silhouette.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          }
+        }
+      ],
+      "type" : "metric",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Metric (embedding)",
+        "summary" : "A batch integration embedding metric.",
+        "description" : "A metric for evaluating batch corrected embeddings.\n"
+      },
+      "test_setup" : {
+        "pancreas" : {
+          "input_integrated" : "resources_test/batch_integration/pancreas/integrated_embedding.h5ad",
+          "input_solution" : "resources_test/batch_integration/pancreas/solution.h5ad"
+        },
+        "cellxgene_census" : {
+          "input_integrated" : "resources_test/batch_integration/cxg_mouse_pancreas_atlas/integrated_embedding.h5ad",
+          "input_solution" : "resources_test/batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad"
+        }
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scib==1.1.5"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/asw_label/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/asw_label",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+from scib.metrics import silhouette
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsm='obsm', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('compute score', flush=True)
+score = silhouette(
+    adata,
+    label_key='label',
+    embed='X_emb'
+)
+
+print("Create output AnnData object", flush=True)
+output = ad.AnnData(
+    uns={
+        "dataset_id": adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        "method_id": adata.uns['method_id'],
+        "metric_ids": [meta['functionality_name']],
+        "metric_values": [score]
+    }
+)
+
+print("Write data to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/metrics/asw_label",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/metrics/asw_label/nextflow.config b/target/nextflow/batch_integration/metrics/asw_label/nextflow.config
new file mode 100644
index 0000000000..90124aac25
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/asw_label/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/metrics/asw_label'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/metrics/asw_label/read_anndata_partial.py b/target/nextflow/batch_integration/metrics/asw_label/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/asw_label/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/metrics/cell_cycle_conservation/.config.vsh.yaml b/target/nextflow/batch_integration/metrics/cell_cycle_conservation/.config.vsh.yaml
new file mode 100644
index 0000000000..05bdaf9214
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/cell_cycle_conservation/.config.vsh.yaml
@@ -0,0 +1,296 @@
+functionality:
+  name: "cell_cycle_conservation"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/"
+    dest: "resources_test/batch_integration/"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "cell_cycle_conservation"
+      label: "Cell Cycle Conservation"
+      summary: "Cell cycle conservation score based on principle component regression\
+        \ on cell cycle gene scores"
+      description: "The cell-cycle conservation score evaluates how well the cell-cycle\
+        \ effect can be\ncaptured before and after integration. We computed cell-cycle\
+        \ scores using Scanpy’s\nscore_cell_cycle function with a reference gene set\
+        \ from Tirosh et al for the\nrespective cell-cycle phases. We used the same\
+        \ set of cell-cycle genes for mouse and\nhuman data (using capitalization\
+        \ to convert between the gene symbols). We then computed\nthe variance contribution\
+        \ of the resulting S and G2/M phase scores using principal\ncomponent regression\
+        \ (Principal component regression), which was performed for each\nbatch separately.\
+        \ The differences in variance before, Varbefore, and after, Varafter,\nintegration\
+        \ were aggregated into a final score between 0 and 1, using the equation:\n\
+        CCconservation=1−|Varafter−Varbefore|/Varbefore.\n\nIn this equation, values\
+        \ close to 0 indicate lower conservation and 1 indicates complete\nconservation\
+        \ of the variance explained by cell cycle. In other words, the variance\n\
+        remains unchanged within each batch for complete conservation, while any deviation\
+        \ from\nthe preintegration variance contribution reduces the score.\n"
+      reference: "luecken2022benchmarking"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_embed/metrics/cc_score.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    subtype: "embedding"
+    type_info:
+      label: "Metric (embedding)"
+      summary: "A batch integration embedding metric."
+      description: "A metric for evaluating batch corrected embeddings.\n"
+    test_setup:
+      pancreas:
+        input_integrated: "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/pancreas/solution.h5ad"
+      cellxgene_census:
+        input_integrated: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/cell_cycle_conservation/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/cell_cycle_conservation"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/cell_cycle_conservation/cell_cycle_conservation"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/metrics/cell_cycle_conservation/main.nf b/target/nextflow/batch_integration/metrics/cell_cycle_conservation/main.nf
new file mode 100644
index 0000000000..dbe53af07f
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/cell_cycle_conservation/main.nf
@@ -0,0 +1,3755 @@
+// cell_cycle_conservation 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "cell_cycle_conservation",
+    "namespace" : "batch_integration/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_integrated",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "Solution dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/cell_cycle_conservation/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/",
+        "dest" : "resources_test/batch_integration/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "cell_cycle_conservation",
+          "label" : "Cell Cycle Conservation",
+          "summary" : "Cell cycle conservation score based on principle component regression on cell cycle gene scores",
+          "description" : "The cell-cycle conservation score evaluates how well the cell-cycle effect can be\ncaptured before and after integration. We computed cell-cycle scores using Scanpy’s\nscore_cell_cycle function with a reference gene set from Tirosh et al for the\nrespective cell-cycle phases. We used the same set of cell-cycle genes for mouse and\nhuman data (using capitalization to convert between the gene symbols). We then computed\nthe variance contribution of the resulting S and G2/M phase scores using principal\ncomponent regression (Principal component regression), which was performed for each\nbatch separately. The differences in variance before, Varbefore, and after, Varafter,\nintegration were aggregated into a final score between 0 and 1, using the equation:\nCCconservation=1−|Varafter−Varbefore|/Varbefore.\n\nIn this equation, values close to 0 indicate lower conservation and 1 indicates complete\nconservation of the variance explained by cell cycle. In other words, the variance\nremains unchanged within each batch for complete conservation, while any deviation from\nthe preintegration variance contribution reduces the score.\n",
+          "reference" : "luecken2022benchmarking",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/_batch_integration/batch_integration_embed/metrics/cc_score.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          }
+        }
+      ],
+      "type" : "metric",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Metric (embedding)",
+        "summary" : "A batch integration embedding metric.",
+        "description" : "A metric for evaluating batch corrected embeddings.\n"
+      },
+      "test_setup" : {
+        "pancreas" : {
+          "input_integrated" : "resources_test/batch_integration/pancreas/integrated_embedding.h5ad",
+          "input_solution" : "resources_test/batch_integration/pancreas/solution.h5ad"
+        },
+        "cellxgene_census" : {
+          "input_integrated" : "resources_test/batch_integration/cxg_mouse_pancreas_atlas/integrated_embedding.h5ad",
+          "input_solution" : "resources_test/batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad"
+        }
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scib==1.1.5"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/cell_cycle_conservation/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/cell_cycle_conservation",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+from scib.metrics import cell_cycle
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata_solution = read_anndata(
+    par['input_solution'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+adata_integrated = read_anndata(
+    par['input_integrated'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+print('Use gene symbols for features', flush=True)
+adata_solution.var_names = adata_solution.var['feature_name']
+
+translator = {
+    "homo_sapiens": "human",
+    "mus_musculus": "mouse",
+}
+
+print('Compute score', flush=True)
+if adata_solution.uns['dataset_organism'] not in translator:
+    score = np.nan
+else:
+    organism = translator[adata_solution.uns['dataset_organism']]
+    score = cell_cycle(
+        adata_solution,
+        adata_integrated,
+        batch_key='batch',
+        embed='X_emb',
+        organism=organism,
+    )
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata_solution.uns['dataset_id'],
+        'normalization_id': adata_solution.uns['normalization_id'],
+        'method_id': adata_integrated.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/metrics/cell_cycle_conservation",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/metrics/cell_cycle_conservation/nextflow.config b/target/nextflow/batch_integration/metrics/cell_cycle_conservation/nextflow.config
new file mode 100644
index 0000000000..7c27eb7f6c
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/cell_cycle_conservation/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/metrics/cell_cycle_conservation'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/metrics/cell_cycle_conservation/read_anndata_partial.py b/target/nextflow/batch_integration/metrics/cell_cycle_conservation/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/cell_cycle_conservation/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/metrics/clustering_overlap/.config.vsh.yaml b/target/nextflow/batch_integration/metrics/clustering_overlap/.config.vsh.yaml
new file mode 100644
index 0000000000..79fa1dbc3a
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/clustering_overlap/.config.vsh.yaml
@@ -0,0 +1,313 @@
+functionality:
+  name: "clustering_overlap"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "graph"
+      label: "Integrated Graph"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        obsp:
+        - type: "double"
+          name: "connectivities"
+          description: "Neighbors connectivities matrix."
+          required: true
+        - type: "double"
+          name: "distances"
+          description: "Neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "object"
+          name: "neighbors"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "ari"
+      label: "ARI"
+      summary: "Adjusted Rand Index compares clustering overlap, correcting for random\
+        \ labels and considering correct overlaps and disagreements."
+      description: "The Adjusted Rand Index (ARI) compares the overlap of two clusterings;\n\
+        it considers both correct clustering overlaps while also counting correct\n\
+        disagreements between two clusterings.\nWe compared the cell-type labels with\
+        \ the NMI-optimized\nLouvain clustering computed on the integrated dataset.\n\
+        The adjustment of the Rand index corrects for randomly correct labels.\nAn\
+        \ ARI of 0 or 1 corresponds to random labeling or a perfect match,\nrespectively.\n"
+      reference:
+      - "hubert1985comparing"
+      - "luecken2022benchmarking"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_graph/metrics/ari.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    - name: "nmi"
+      label: "NMI"
+      summary: "NMI compares overlap by scaling using mean entropy terms and optimizing\
+        \ Louvain clustering to obtain the best match between clusters and labels."
+      description: "Normalized Mutual Information (NMI) compares the overlap of two\
+        \ clusterings.\nWe used NMI to compare the cell-type labels with Louvain clusters\
+        \ computed on\nthe integrated dataset. The overlap was scaled using the mean\
+        \ of the entropy terms\nfor cell-type and cluster labels. Thus, NMI scores\
+        \ of 0 or 1 correspond to uncorrelated\nclustering or a perfect match, respectively.\
+        \ We performed optimized Louvain clustering\nfor this metric to obtain the\
+        \ best match between clusters and labels.\n"
+      reference:
+      - "amelio2015normalized"
+      - "luecken2022benchmarking"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_graph/metrics/nmi.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    subtype: "graph"
+    type_info:
+      label: "Metric (graph)"
+      summary: "A batch integration graph metric."
+      description: "A metric for evaluating batch corrected cell graphs.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/clustering_overlap/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/clustering_overlap"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/clustering_overlap/clustering_overlap"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/metrics/clustering_overlap/main.nf b/target/nextflow/batch_integration/metrics/clustering_overlap/main.nf
new file mode 100644
index 0000000000..edf83d81e9
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/clustering_overlap/main.nf
@@ -0,0 +1,3767 @@
+// clustering_overlap 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "clustering_overlap",
+    "namespace" : "batch_integration/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_integrated",
+        "info" : {
+          "prediction_type" : "graph",
+          "label" : "Integrated Graph",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "connectivities",
+                "description" : "Neighbors connectivities matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "distances",
+                "description" : "Neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "neighbors",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "Solution dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/clustering_overlap/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_metric_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "ari",
+          "label" : "ARI",
+          "summary" : "Adjusted Rand Index compares clustering overlap, correcting for random labels and considering correct overlaps and disagreements.",
+          "description" : "The Adjusted Rand Index (ARI) compares the overlap of two clusterings;\nit considers both correct clustering overlaps while also counting correct\ndisagreements between two clusterings.\nWe compared the cell-type labels with the NMI-optimized\nLouvain clustering computed on the integrated dataset.\nThe adjustment of the Rand index corrects for randomly correct labels.\nAn ARI of 0 or 1 corresponds to random labeling or a perfect match,\nrespectively.\n",
+          "reference" : [
+            "hubert1985comparing",
+            "luecken2022benchmarking"
+          ],
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/_batch_integration/batch_integration_graph/metrics/ari.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          }
+        },
+        {
+          "name" : "nmi",
+          "label" : "NMI",
+          "summary" : "NMI compares overlap by scaling using mean entropy terms and optimizing Louvain clustering to obtain the best match between clusters and labels.",
+          "description" : "Normalized Mutual Information (NMI) compares the overlap of two clusterings.\nWe used NMI to compare the cell-type labels with Louvain clusters computed on\nthe integrated dataset. The overlap was scaled using the mean of the entropy terms\nfor cell-type and cluster labels. Thus, NMI scores of 0 or 1 correspond to uncorrelated\nclustering or a perfect match, respectively. We performed optimized Louvain clustering\nfor this metric to obtain the best match between clusters and labels.\n",
+          "reference" : [
+            "amelio2015normalized",
+            "luecken2022benchmarking"
+          ],
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/_batch_integration/batch_integration_graph/metrics/nmi.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          }
+        }
+      ],
+      "type" : "metric",
+      "subtype" : "graph",
+      "type_info" : {
+        "label" : "Metric (graph)",
+        "summary" : "A batch integration graph metric.",
+        "description" : "A metric for evaluating batch corrected cell graphs.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scib==1.1.5"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/clustering_overlap/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/clustering_overlap",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+import scanpy as sc
+from scib.metrics.clustering import cluster_optimal_resolution
+from scib.metrics import ari, nmi
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsp='obsp', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('Run optimal Leiden clustering', flush=True)
+cluster_optimal_resolution(
+    adata=adata,
+    label_key='label',
+    cluster_key='cluster',
+    cluster_function=sc.tl.leiden,
+)
+
+print('Compute ARI score', flush=True)
+ari_score = ari(adata, cluster_key='cluster', label_key='label')
+
+print('Compute NMI score', flush=True)
+nmi_score = nmi(adata, cluster_key='cluster', label_key='label')
+
+print("Create output AnnData object", flush=True)
+output = ad.AnnData(
+    uns={
+        "dataset_id": adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        "method_id": adata.uns['method_id'],
+        "metric_ids": [ "ari", "nmi" ],
+        "metric_values": [ ari_score, nmi_score ]
+    }
+)
+
+print("Write data to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/metrics/clustering_overlap",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/metrics/clustering_overlap/nextflow.config b/target/nextflow/batch_integration/metrics/clustering_overlap/nextflow.config
new file mode 100644
index 0000000000..ac19467b8f
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/clustering_overlap/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/metrics/clustering_overlap'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/metrics/clustering_overlap/read_anndata_partial.py b/target/nextflow/batch_integration/metrics/clustering_overlap/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/clustering_overlap/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/metrics/graph_connectivity/.config.vsh.yaml b/target/nextflow/batch_integration/metrics/graph_connectivity/.config.vsh.yaml
new file mode 100644
index 0000000000..61b5edf2c5
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/graph_connectivity/.config.vsh.yaml
@@ -0,0 +1,297 @@
+functionality:
+  name: "graph_connectivity"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "graph"
+      label: "Integrated Graph"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        obsp:
+        - type: "double"
+          name: "connectivities"
+          description: "Neighbors connectivities matrix."
+          required: true
+        - type: "double"
+          name: "distances"
+          description: "Neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "object"
+          name: "neighbors"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "graph_connectivity"
+      label: "Graph Connectivity"
+      summary: "Connectivity of the subgraph per cell type label"
+      description: "The graph connectivity metric assesses whether the kNN graph representation,\n\
+        G, of the integrated data directly connects all cells with the same cell\n\
+        identity label. For each cell identity label c, we created the subset kNN\n\
+        graph G(Nc;Ec) to contain only cells from a given label. Using these subset\n\
+        kNN graphs, we computed the graph connectivity score using the equation:\n\
+        \ngc =1/|C| Σc∈C |LCC(G(Nc;Ec))|/|Nc|.\n\nHere, C represents the set of cell\
+        \ identity labels, |LCC()| is the number\nof nodes in the largest connected\
+        \ component of the graph, and |Nc| is the\nnumber of nodes with cell identity\
+        \ c. The resultant score has a range\nof (0;1], where 1 indicates that all\
+        \ cells with the same cell identity\nare connected in the integrated kNN graph,\
+        \ and the lowest possible score\nindicates a graph where no cell is connected.\
+        \ As this score is computed\non the kNN graph, it can be used to evaluate\
+        \ all integration outputs.\n"
+      reference: "luecken2022benchmarking"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "https://github.com/openproblems-bio/openproblems/blob/main/openproblems/tasks/_batch_integration/batch_integration_graph/metrics/graph_connectivity.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    subtype: "graph"
+    type_info:
+      label: "Metric (graph)"
+      summary: "A batch integration graph metric."
+      description: "A metric for evaluating batch corrected cell graphs.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/graph_connectivity/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/graph_connectivity"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/graph_connectivity/graph_connectivity"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/metrics/graph_connectivity/main.nf b/target/nextflow/batch_integration/metrics/graph_connectivity/main.nf
new file mode 100644
index 0000000000..c0189d6e74
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/graph_connectivity/main.nf
@@ -0,0 +1,3737 @@
+// graph_connectivity 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "graph_connectivity",
+    "namespace" : "batch_integration/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_integrated",
+        "info" : {
+          "prediction_type" : "graph",
+          "label" : "Integrated Graph",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "connectivities",
+                "description" : "Neighbors connectivities matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "distances",
+                "description" : "Neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "neighbors",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "Solution dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/graph_connectivity/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_metric_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "graph_connectivity",
+          "label" : "Graph Connectivity",
+          "summary" : "Connectivity of the subgraph per cell type label",
+          "description" : "The graph connectivity metric assesses whether the kNN graph representation,\nG, of the integrated data directly connects all cells with the same cell\nidentity label. For each cell identity label c, we created the subset kNN\ngraph G(Nc;Ec) to contain only cells from a given label. Using these subset\nkNN graphs, we computed the graph connectivity score using the equation:\n\ngc =1/|C| Σc∈C |LCC(G(Nc;Ec))|/|Nc|.\n\nHere, C represents the set of cell identity labels, |LCC()| is the number\nof nodes in the largest connected component of the graph, and |Nc| is the\nnumber of nodes with cell identity c. The resultant score has a range\nof (0;1], where 1 indicates that all cells with the same cell identity\nare connected in the integrated kNN graph, and the lowest possible score\nindicates a graph where no cell is connected. As this score is computed\non the kNN graph, it can be used to evaluate all integration outputs.\n",
+          "reference" : "luecken2022benchmarking",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "https://github.com/openproblems-bio/openproblems/blob/main/openproblems/tasks/_batch_integration/batch_integration_graph/metrics/graph_connectivity.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          }
+        }
+      ],
+      "type" : "metric",
+      "subtype" : "graph",
+      "type_info" : {
+        "label" : "Metric (graph)",
+        "summary" : "A batch integration graph metric.",
+        "description" : "A metric for evaluating batch corrected cell graphs.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scib==1.1.5"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/graph_connectivity/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/graph_connectivity",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+import scib
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsp='obsp', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('compute score', flush=True)
+score = scib.metrics.graph_connectivity(
+    adata,
+    label_key='label'
+)
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': adata.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/metrics/graph_connectivity",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/metrics/graph_connectivity/nextflow.config b/target/nextflow/batch_integration/metrics/graph_connectivity/nextflow.config
new file mode 100644
index 0000000000..93e660e863
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/graph_connectivity/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/metrics/graph_connectivity'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/metrics/graph_connectivity/read_anndata_partial.py b/target/nextflow/batch_integration/metrics/graph_connectivity/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/graph_connectivity/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/metrics/hvg_overlap/.config.vsh.yaml b/target/nextflow/batch_integration/metrics/hvg_overlap/.config.vsh.yaml
new file mode 100644
index 0000000000..966a46fe26
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/hvg_overlap/.config.vsh.yaml
@@ -0,0 +1,287 @@
+functionality:
+  name: "hvg_overlap"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "hvg_overlap"
+      label: "HVG overlap"
+      summary: "Overlap of highly variable genes per batch before and after integration."
+      description: "The HVG conservation score is a proxy for the preservation of\n\
+        the biological signal. If the data integration method returned\na corrected\
+        \ data matrix, we computed the number of HVGs before\nand after correction\
+        \ for each batch via Scanpy’s\nhighly_variable_genes function (using the ‘\
+        cell ranger’ flavor).\nIf available, we computed 500 HVGs per batch. If fewer\
+        \ than 500\ngenes were present in the integrated object for a batch,\nthe\
+        \ number of HVGs was set to half the total genes in that batch.\nThe overlap\
+        \ coefficient is as follows:\noverlap(\U0001d44b,\U0001d44c)=|\U0001d44b∩\U0001d44c\
+        |/min(|\U0001d44b|,|\U0001d44c|),\n\nwhere X and Y denote the fraction of\
+        \ preserved informative genes.\nThe overall HVG score is the mean of the per-batch\
+        \ HVG overlap\ncoefficients.\n"
+      reference: "luecken2022benchmarking"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_feature/metrics/hvg_conservation.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    subtype: "feature"
+    type_info:
+      label: "Metric (feature)"
+      summary: "A batch integration feature metric."
+      description: "A metric for evaluating batch corrected feature spaces.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/hvg_overlap/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/hvg_overlap"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/hvg_overlap/hvg_overlap"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/metrics/hvg_overlap/main.nf b/target/nextflow/batch_integration/metrics/hvg_overlap/main.nf
new file mode 100644
index 0000000000..85ec633d4d
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/hvg_overlap/main.nf
@@ -0,0 +1,3737 @@
+// hvg_overlap 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "hvg_overlap",
+    "namespace" : "batch_integration/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_integrated",
+        "info" : {
+          "prediction_type" : "feature",
+          "label" : "Integrated Feature",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "corrected_counts",
+                "description" : "Corrected counts after integration",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "Solution dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/hvg_overlap/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_metric_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "hvg_overlap",
+          "label" : "HVG overlap",
+          "summary" : "Overlap of highly variable genes per batch before and after integration.",
+          "description" : "The HVG conservation score is a proxy for the preservation of\nthe biological signal. If the data integration method returned\na corrected data matrix, we computed the number of HVGs before\nand after correction for each batch via Scanpy’s\nhighly_variable_genes function (using the ‘cell ranger’ flavor).\nIf available, we computed 500 HVGs per batch. If fewer than 500\ngenes were present in the integrated object for a batch,\nthe number of HVGs was set to half the total genes in that batch.\nThe overlap coefficient is as follows:\noverlap(𝑋,𝑌)=|𝑋∩𝑌|/min(|𝑋|,|𝑌|),\n\nwhere X and Y denote the fraction of preserved informative genes.\nThe overall HVG score is the mean of the per-batch HVG overlap\ncoefficients.\n",
+          "reference" : "luecken2022benchmarking",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/_batch_integration/batch_integration_feature/metrics/hvg_conservation.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          }
+        }
+      ],
+      "type" : "metric",
+      "subtype" : "feature",
+      "type_info" : {
+        "label" : "Metric (feature)",
+        "summary" : "A batch integration feature metric.",
+        "description" : "A metric for evaluating batch corrected feature spaces.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scib==1.1.5"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/hvg_overlap/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/hvg_overlap",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+from scib.metrics import hvg_overlap
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata_solution = read_anndata(
+    par['input_solution'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+adata_integrated = read_anndata(
+    par['input_integrated'],
+    X='layers/corrected_counts',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+print('compute score', flush=True)
+score = hvg_overlap(
+    adata_solution,
+    adata_integrated,
+    batch_key="batch"
+)
+
+print("Create output AnnData object", flush=True)
+output = ad.AnnData(
+    uns={
+        "dataset_id": adata_solution.uns['dataset_id'],
+        'normalization_id': adata_solution.uns['normalization_id'],
+        "method_id": adata_integrated.uns['method_id'],
+        "metric_ids": [meta['functionality_name']],
+        "metric_values": [score]
+    }
+)
+
+print("Write data to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/metrics/hvg_overlap",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/metrics/hvg_overlap/nextflow.config b/target/nextflow/batch_integration/metrics/hvg_overlap/nextflow.config
new file mode 100644
index 0000000000..7ee4585506
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/hvg_overlap/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/metrics/hvg_overlap'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/metrics/hvg_overlap/read_anndata_partial.py b/target/nextflow/batch_integration/metrics/hvg_overlap/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/hvg_overlap/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/metrics/isolated_label_asw/.config.vsh.yaml b/target/nextflow/batch_integration/metrics/isolated_label_asw/.config.vsh.yaml
new file mode 100644
index 0000000000..2081ead820
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/isolated_label_asw/.config.vsh.yaml
@@ -0,0 +1,287 @@
+functionality:
+  name: "isolated_label_asw"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/"
+    dest: "resources_test/batch_integration/"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "isolated_label_asw"
+      label: "Isolated label ASW"
+      summary: "Evaluate how well isolated labels separate by average silhouette width"
+      description: "Isolated cell labels are defined as the labels present in the\
+        \ least number\nof batches in the integration task. The score evaluates how\
+        \ well these isolated labels\nseparate from other cell identities.\n\nThe\
+        \ isolated label ASW score is obtained by computing the\nASW of isolated versus\
+        \ non-isolated labels on the PCA embedding (ASW metric above) and\nscaling\
+        \ this score to be between 0 and 1. The final score for each metric version\n\
+        consists of the mean isolated score of all isolated labels.\n"
+      reference: "luecken2022benchmarking"
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_graph/metrics/iso_label_sil.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      min: 0
+      max: 1
+      maximize: true
+    type: "metric"
+    subtype: "embedding"
+    type_info:
+      label: "Metric (embedding)"
+      summary: "A batch integration embedding metric."
+      description: "A metric for evaluating batch corrected embeddings.\n"
+    test_setup:
+      pancreas:
+        input_integrated: "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/pancreas/solution.h5ad"
+      cellxgene_census:
+        input_integrated: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/isolated_label_asw/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/isolated_label_asw"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/isolated_label_asw/isolated_label_asw"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/metrics/isolated_label_asw/main.nf b/target/nextflow/batch_integration/metrics/isolated_label_asw/main.nf
new file mode 100644
index 0000000000..21ff5ec184
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/isolated_label_asw/main.nf
@@ -0,0 +1,3735 @@
+// isolated_label_asw 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "isolated_label_asw",
+    "namespace" : "batch_integration/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_integrated",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "Solution dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/isolated_label_asw/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/",
+        "dest" : "resources_test/batch_integration/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "isolated_label_asw",
+          "label" : "Isolated label ASW",
+          "summary" : "Evaluate how well isolated labels separate by average silhouette width",
+          "description" : "Isolated cell labels are defined as the labels present in the least number\nof batches in the integration task. The score evaluates how well these isolated labels\nseparate from other cell identities.\n\nThe isolated label ASW score is obtained by computing the\nASW of isolated versus non-isolated labels on the PCA embedding (ASW metric above) and\nscaling this score to be between 0 and 1. The final score for each metric version\nconsists of the mean isolated score of all isolated labels.\n",
+          "reference" : "luecken2022benchmarking",
+          "v1" : {
+            "path" : "openproblems/tasks/_batch_integration/batch_integration_graph/metrics/iso_label_sil.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          },
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true
+        }
+      ],
+      "type" : "metric",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Metric (embedding)",
+        "summary" : "A batch integration embedding metric.",
+        "description" : "A metric for evaluating batch corrected embeddings.\n"
+      },
+      "test_setup" : {
+        "pancreas" : {
+          "input_integrated" : "resources_test/batch_integration/pancreas/integrated_embedding.h5ad",
+          "input_solution" : "resources_test/batch_integration/pancreas/solution.h5ad"
+        },
+        "cellxgene_census" : {
+          "input_integrated" : "resources_test/batch_integration/cxg_mouse_pancreas_atlas/integrated_embedding.h5ad",
+          "input_solution" : "resources_test/batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad"
+        }
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scib==1.1.5"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/isolated_label_asw/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/isolated_label_asw",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+from scib.metrics import isolated_labels_asw
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsm='obsm', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('compute score', flush=True)
+
+score = isolated_labels_asw(
+    adata,
+    label_key='label',
+    batch_key='batch',
+    embed='X_emb',
+    iso_threshold=None,
+    verbose=True,
+)
+print(score, flush=True)
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': adata.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/metrics/isolated_label_asw",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/metrics/isolated_label_asw/nextflow.config b/target/nextflow/batch_integration/metrics/isolated_label_asw/nextflow.config
new file mode 100644
index 0000000000..a42ac93e2f
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/isolated_label_asw/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/metrics/isolated_label_asw'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/metrics/isolated_label_asw/read_anndata_partial.py b/target/nextflow/batch_integration/metrics/isolated_label_asw/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/isolated_label_asw/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/metrics/isolated_label_f1/.config.vsh.yaml b/target/nextflow/batch_integration/metrics/isolated_label_f1/.config.vsh.yaml
new file mode 100644
index 0000000000..8c60e15c0b
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/isolated_label_f1/.config.vsh.yaml
@@ -0,0 +1,303 @@
+functionality:
+  name: "isolated_label_f1"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "graph"
+      label: "Integrated Graph"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        obsp:
+        - type: "double"
+          name: "connectivities"
+          description: "Neighbors connectivities matrix."
+          required: true
+        - type: "double"
+          name: "distances"
+          description: "Neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "object"
+          name: "neighbors"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "isolated_label_f1"
+      label: "Isolated label F1 score"
+      summary: "Evaluate how well isolated labels coincide with clusters"
+      description: "We developed two isolated label scores to evaluate how well the\
+        \ data integration methods\ndealt with cell identity labels shared by few\
+        \ batches. Specifically, we identified\nisolated cell labels as the labels\
+        \ present in the least number of batches in the\nintegration task.\nThe score\
+        \ evaluates how well these isolated labels separate from other cell identities.\n\
+        We implemented the isolated label metric in two versions:\n(1) the best clustering\
+        \ of the isolated label (F1 score) and\n(2) the global ASW of the isolated\
+        \ label. For the cluster-based score,\nwe first optimize the cluster assignment\
+        \ of the isolated label using the F1 score˚\nacross louvain clustering resolutions\
+        \ ranging from 0.1 to 2 in resolution steps of 0.1.\nThe optimal F1 score\
+        \ for the isolated label is then used as the metric score.\nThe F1 score is\
+        \ a weighted mean of precision and recall given by the equation:\n\U0001d439\
+        1=2×(precision×recall)/(precision+recall).\n\nIt returns a value between 0\
+        \ and 1,\nwhere 1 shows that all of the isolated label cells and no others\
+        \ are captured in\nthe cluster. For the isolated label ASW score, we compute\
+        \ the ASW of isolated\nversus nonisolated labels on the PCA embedding (ASW\
+        \ metric above) and scale this\nscore to be between 0 and 1. The final score\
+        \ for each metric version consists of\nthe mean isolated score of all isolated\
+        \ labels.\n"
+      reference: "luecken2022benchmarking"
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_graph/metrics/iso_label_f1.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      min: 0
+      max: 1
+      maximize: true
+    type: "metric"
+    subtype: "graph"
+    type_info:
+      label: "Metric (graph)"
+      summary: "A batch integration graph metric."
+      description: "A metric for evaluating batch corrected cell graphs.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/isolated_label_f1/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/isolated_label_f1"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/isolated_label_f1/isolated_label_f1"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/metrics/isolated_label_f1/main.nf b/target/nextflow/batch_integration/metrics/isolated_label_f1/main.nf
new file mode 100644
index 0000000000..ffe5c74d3c
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/isolated_label_f1/main.nf
@@ -0,0 +1,3742 @@
+// isolated_label_f1 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "isolated_label_f1",
+    "namespace" : "batch_integration/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_integrated",
+        "info" : {
+          "prediction_type" : "graph",
+          "label" : "Integrated Graph",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "connectivities",
+                "description" : "Neighbors connectivities matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "distances",
+                "description" : "Neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "neighbors",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "Solution dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/isolated_label_f1/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_metric_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "isolated_label_f1",
+          "label" : "Isolated label F1 score",
+          "summary" : "Evaluate how well isolated labels coincide with clusters",
+          "description" : "We developed two isolated label scores to evaluate how well the data integration methods\ndealt with cell identity labels shared by few batches. Specifically, we identified\nisolated cell labels as the labels present in the least number of batches in the\nintegration task.\nThe score evaluates how well these isolated labels separate from other cell identities.\nWe implemented the isolated label metric in two versions:\n(1) the best clustering of the isolated label (F1 score) and\n(2) the global ASW of the isolated label. For the cluster-based score,\nwe first optimize the cluster assignment of the isolated label using the F1 score˚\nacross louvain clustering resolutions ranging from 0.1 to 2 in resolution steps of 0.1.\nThe optimal F1 score for the isolated label is then used as the metric score.\nThe F1 score is a weighted mean of precision and recall given by the equation:\n𝐹1=2×(precision×recall)/(precision+recall).\n\nIt returns a value between 0 and 1,\nwhere 1 shows that all of the isolated label cells and no others are captured in\nthe cluster. For the isolated label ASW score, we compute the ASW of isolated\nversus nonisolated labels on the PCA embedding (ASW metric above) and scale this\nscore to be between 0 and 1. The final score for each metric version consists of\nthe mean isolated score of all isolated labels.\n",
+          "reference" : "luecken2022benchmarking",
+          "v1" : {
+            "path" : "openproblems/tasks/_batch_integration/batch_integration_graph/metrics/iso_label_f1.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          },
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true
+        }
+      ],
+      "type" : "metric",
+      "subtype" : "graph",
+      "type_info" : {
+        "label" : "Metric (graph)",
+        "summary" : "A batch integration graph metric.",
+        "description" : "A metric for evaluating batch corrected cell graphs.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scib==1.1.5"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/isolated_label_f1/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/isolated_label_f1",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+from scib.metrics import isolated_labels_f1
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsp='obsp', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('compute score', flush=True)
+score = isolated_labels_f1(
+    adata,
+    label_key='label',
+    batch_key='batch',
+    embed=None,
+    iso_threshold=None,
+    verbose=True,
+)
+print(score, flush=True)
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': adata.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/metrics/isolated_label_f1",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/metrics/isolated_label_f1/nextflow.config b/target/nextflow/batch_integration/metrics/isolated_label_f1/nextflow.config
new file mode 100644
index 0000000000..186b403908
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/isolated_label_f1/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/metrics/isolated_label_f1'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/metrics/isolated_label_f1/read_anndata_partial.py b/target/nextflow/batch_integration/metrics/isolated_label_f1/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/isolated_label_f1/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/metrics/kbet/.config.vsh.yaml b/target/nextflow/batch_integration/metrics/kbet/.config.vsh.yaml
new file mode 100644
index 0000000000..56672ae68e
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/kbet/.config.vsh.yaml
@@ -0,0 +1,308 @@
+functionality:
+  name: "kbet"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/"
+    dest: "resources_test/batch_integration/"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "kbet"
+      label: "kBET"
+      summary: "kBET algorithm to determine how well batches are mixed within a cell\
+        \ type"
+      description: "The kBET algorithm (v.0.99.6, release 4c9dafa) determines whether\
+        \ the label composition\nof a k nearest neighborhood of a cell is similar\
+        \ to the expected (global) label\ncomposition (Buettner et al., Nat Meth 2019).\
+        \ The test is repeated for a random subset\nof cells, and the results are\
+        \ summarized as a rejection rate over all tested\nneighborhoods. Thus, kBET\
+        \ works on a kNN graph.\n\nWe compute kNN graphs where k = 50 for joint embeddings\
+        \ and corrected feature outputs\nvia Scanpy preprocessing steps. To test for\
+        \ technical effects and to account for\ncell-type frequency shifts across\
+        \ datasets, we applied kBET\nseparately on the batch variable for each cell\
+        \ identity label. Using the kBET defaults,\na k equal to the median of the\
+        \ number of cells per batch within each label is used for\nthis computation.\
+        \ Additionally, we set the minimum and maximum thresholds of k to 10 and\n\
+        100, respectively. As kNN graphs that have been subset by cell identity labels\
+        \ may no\nlonger be connected, we compute kBET per connected component. If\
+        \ >25% of cells were\nassigned to connected components too small for kBET\
+        \ computation (smaller than k × 3),\nwe assigned a kBET score of 1 to denote\
+        \ poor batch removal. Subsequently, kBET scores\nfor each label were averaged\
+        \ and subtracted from 1 to give a final kBET score.\n\nIn Open Problems we\
+        \ do not run kBET on graph outputs to avoid computation-intensive\ndiffusion\
+        \ processes being run.\n"
+      reference: "luecken2022benchmarking"
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_embed/metrics/kBET.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      min: 0
+      max: 1
+      maximize: true
+    type: "metric"
+    subtype: "embedding"
+    type_info:
+      label: "Metric (embedding)"
+      summary: "A batch integration embedding metric."
+      description: "A metric for evaluating batch corrected embeddings.\n"
+    test_setup:
+      pancreas:
+        input_integrated: "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/pancreas/solution.h5ad"
+      cellxgene_census:
+        input_integrated: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    github:
+    - "theislab/kBET"
+    bioc_force_install: false
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    - "rpy2>=3"
+    - "anndata2ri"
+    - "scipy<=1.13"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/kbet/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/kbet"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/kbet/kbet"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/metrics/kbet/main.nf b/target/nextflow/batch_integration/metrics/kbet/main.nf
new file mode 100644
index 0000000000..46368ff21e
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/kbet/main.nf
@@ -0,0 +1,3745 @@
+// kbet 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "kbet",
+    "namespace" : "batch_integration/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_integrated",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "Solution dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/kbet/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/",
+        "dest" : "resources_test/batch_integration/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "kbet",
+          "label" : "kBET",
+          "summary" : "kBET algorithm to determine how well batches are mixed within a cell type",
+          "description" : "The kBET algorithm (v.0.99.6, release 4c9dafa) determines whether the label composition\nof a k nearest neighborhood of a cell is similar to the expected (global) label\ncomposition (Buettner et al., Nat Meth 2019). The test is repeated for a random subset\nof cells, and the results are summarized as a rejection rate over all tested\nneighborhoods. Thus, kBET works on a kNN graph.\n\nWe compute kNN graphs where k = 50 for joint embeddings and corrected feature outputs\nvia Scanpy preprocessing steps. To test for technical effects and to account for\ncell-type frequency shifts across datasets, we applied kBET\nseparately on the batch variable for each cell identity label. Using the kBET defaults,\na k equal to the median of the number of cells per batch within each label is used for\nthis computation. Additionally, we set the minimum and maximum thresholds of k to 10 and\n100, respectively. As kNN graphs that have been subset by cell identity labels may no\nlonger be connected, we compute kBET per connected component. If >25% of cells were\nassigned to connected components too small for kBET computation (smaller than k × 3),\nwe assigned a kBET score of 1 to denote poor batch removal. Subsequently, kBET scores\nfor each label were averaged and subtracted from 1 to give a final kBET score.\n\nIn Open Problems we do not run kBET on graph outputs to avoid computation-intensive\ndiffusion processes being run.\n",
+          "reference" : "luecken2022benchmarking",
+          "v1" : {
+            "path" : "openproblems/tasks/_batch_integration/batch_integration_embed/metrics/kBET.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          },
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true
+        }
+      ],
+      "type" : "metric",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Metric (embedding)",
+        "summary" : "A batch integration embedding metric.",
+        "description" : "A metric for evaluating batch corrected embeddings.\n"
+      },
+      "test_setup" : {
+        "pancreas" : {
+          "input_integrated" : "resources_test/batch_integration/pancreas/integrated_embedding.h5ad",
+          "input_solution" : "resources_test/batch_integration/pancreas/solution.h5ad"
+        },
+        "cellxgene_census" : {
+          "input_integrated" : "resources_test/batch_integration/cxg_mouse_pancreas_atlas/integrated_embedding.h5ad",
+          "input_solution" : "resources_test/batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad"
+        }
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "github" : [
+            "theislab/kBET"
+          ],
+          "bioc_force_install" : false
+        },
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scib==1.1.5",
+            "rpy2>=3",
+            "anndata2ri",
+            "scipy<=1.13"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/kbet/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/kbet",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+from scib.metrics import kBET
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(par['input_integrated'], obs='obs', obsm='obsm', uns='uns')
+adata.obs = read_anndata(par['input_solution'], obs='obs').obs
+adata.uns |= read_anndata(par['input_solution'], uns='uns').uns
+
+print('compute score', flush=True)
+score = kBET(
+    adata,
+    batch_key="batch",
+    label_key="label",
+    type_="embed",
+    embed="X_emb",
+    scaled=True,
+    verbose=False,
+)
+print(score, flush=True)
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata.uns['dataset_id'],
+        'normalization_id': adata.uns['normalization_id'],
+        'method_id': adata.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/metrics/kbet",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/metrics/kbet/nextflow.config b/target/nextflow/batch_integration/metrics/kbet/nextflow.config
new file mode 100644
index 0000000000..1aabf2c9c1
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/kbet/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/metrics/kbet'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/metrics/kbet/read_anndata_partial.py b/target/nextflow/batch_integration/metrics/kbet/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/kbet/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/metrics/pcr/.config.vsh.yaml b/target/nextflow/batch_integration/metrics/pcr/.config.vsh.yaml
new file mode 100644
index 0000000000..3b6bd31bf1
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/pcr/.config.vsh.yaml
@@ -0,0 +1,291 @@
+functionality:
+  name: "pcr"
+  namespace: "batch_integration/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/batch_integration/"
+    dest: "resources_test/batch_integration/"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "pcr"
+      label: "PCR"
+      summary: "Compare explained variance by batch before and after integration"
+      description: "Principal component regression, derived from PCA, has previously\
+        \ been used to quantify\nbatch removal. Briefly, the R2 was calculated from\
+        \ a linear regression of the\ncovariate of interest (for example, the batch\
+        \ variable B) onto each principal component.\nThe variance contribution of\
+        \ the batch effect per principal component was then\ncalculated as the product\
+        \ of the variance explained by the ith principal component (PC)\nand the corresponding\
+        \ R2(PCi|B). The sum across all variance contributions by the batch\neffects\
+        \ in all principal components gives the total variance explained by the batch\n\
+        variable as follows:\nVar(\U0001d436|\U0001d435)=∑\U0001d456=1\U0001d43aVar(\U0001d436\
+        |PC\U0001d456)×\U0001d4452(PC\U0001d456|\U0001d435),\n\nwhere Var(C|PCi) is\
+        \ the variance of the data matrix C explained by the ith principal\ncomponent.\n"
+      reference: "luecken2022benchmarking"
+      v1:
+        path: "openproblems/tasks/_batch_integration/batch_integration_embed/metrics/pcr.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      min: 0
+      max: 1
+      maximize: true
+    type: "metric"
+    subtype: "embedding"
+    type_info:
+      label: "Metric (embedding)"
+      summary: "A batch integration embedding metric."
+      description: "A metric for evaluating batch corrected embeddings.\n"
+    test_setup:
+      pancreas:
+        input_integrated: "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/pancreas/solution.h5ad"
+      cellxgene_census:
+        input_integrated: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/integrated_embedding.h5ad"
+        input_solution: "resources_test/batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/pcr/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/pcr"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/pcr/pcr"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/metrics/pcr/main.nf b/target/nextflow/batch_integration/metrics/pcr/main.nf
new file mode 100644
index 0000000000..eae9e53c1f
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/pcr/main.nf
@@ -0,0 +1,3745 @@
+// pcr 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "pcr",
+    "namespace" : "batch_integration/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_integrated",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "Solution dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/pcr/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/",
+        "dest" : "resources_test/batch_integration/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "pcr",
+          "label" : "PCR",
+          "summary" : "Compare explained variance by batch before and after integration",
+          "description" : "Principal component regression, derived from PCA, has previously been used to quantify\nbatch removal. Briefly, the R2 was calculated from a linear regression of the\ncovariate of interest (for example, the batch variable B) onto each principal component.\nThe variance contribution of the batch effect per principal component was then\ncalculated as the product of the variance explained by the ith principal component (PC)\nand the corresponding R2(PCi|B). The sum across all variance contributions by the batch\neffects in all principal components gives the total variance explained by the batch\nvariable as follows:\nVar(𝐶|𝐵)=∑𝑖=1𝐺Var(𝐶|PC𝑖)×𝑅2(PC𝑖|𝐵),\n\nwhere Var(C|PCi) is the variance of the data matrix C explained by the ith principal\ncomponent.\n",
+          "reference" : "luecken2022benchmarking",
+          "v1" : {
+            "path" : "openproblems/tasks/_batch_integration/batch_integration_embed/metrics/pcr.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          },
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true
+        }
+      ],
+      "type" : "metric",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Metric (embedding)",
+        "summary" : "A batch integration embedding metric.",
+        "description" : "A metric for evaluating batch corrected embeddings.\n"
+      },
+      "test_setup" : {
+        "pancreas" : {
+          "input_integrated" : "resources_test/batch_integration/pancreas/integrated_embedding.h5ad",
+          "input_solution" : "resources_test/batch_integration/pancreas/solution.h5ad"
+        },
+        "cellxgene_census" : {
+          "input_integrated" : "resources_test/batch_integration/cxg_mouse_pancreas_atlas/integrated_embedding.h5ad",
+          "input_solution" : "resources_test/batch_integration/cxg_mouse_pancreas_atlas/solution.h5ad"
+        }
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scib==1.1.5"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/pcr/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/pcr",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+from scib.metrics import pcr_comparison
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata_solution = read_anndata(
+    par['input_solution'],
+    X='layers/normalized',
+    obs='obs',
+    var='var',
+    # obsm='obsm',
+    # varm='varm',
+    uns='uns'
+)
+adata_integrated = read_anndata(
+    par['input_integrated'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+print('compute score', flush=True)
+score = pcr_comparison(
+    adata_solution,
+    adata_integrated,
+    embed='X_emb',
+    covariate='batch',
+    verbose=False
+)
+
+print('Create output AnnData object', flush=True)
+output = ad.AnnData(
+    uns={
+        'dataset_id': adata_solution.uns['dataset_id'],
+        'normalization_id': adata_solution.uns['normalization_id'],
+        'method_id': adata_integrated.uns['method_id'],
+        'metric_ids': [ meta['functionality_name'] ],
+        'metric_values': [ score ]
+    }
+)
+
+
+print('Write data to file', flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/metrics/pcr",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/metrics/pcr/nextflow.config b/target/nextflow/batch_integration/metrics/pcr/nextflow.config
new file mode 100644
index 0000000000..338c2f9b15
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/pcr/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/metrics/pcr'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/metrics/pcr/read_anndata_partial.py b/target/nextflow/batch_integration/metrics/pcr/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/metrics/pcr/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/process_dataset/.config.vsh.yaml b/target/nextflow/batch_integration/process_dataset/.config.vsh.yaml
new file mode 100644
index 0000000000..ed906e5864
--- /dev/null
+++ b/target/nextflow/batch_integration/process_dataset/.config.vsh.yaml
@@ -0,0 +1,403 @@
+functionality:
+  name: "process_dataset"
+  namespace: "batch_integration"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Common Dataset"
+      summary: "A subset of the common dataset."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type information"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/common/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_dataset"
+    info:
+      label: "Dataset"
+      summary: "Unintegrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_solution"
+    info:
+      label: "Solution"
+      summary: "Solution dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "string"
+          name: "label"
+          description: "label information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_label"
+    description: "Which .obs slot to use as label."
+    info: null
+    default:
+    - "cell_type"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_batch"
+    description: "Which .obs slot to use as batch covariate."
+    info: null
+    default:
+    - "batch"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--hvgs"
+    description: "Number of highly variable genes"
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean"
+    name: "--subset_hvg"
+    description: "Whether to subset to highly variable genes"
+    info: null
+    default:
+    - false
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/subset_anndata.py"
+  description: "Preprocess adata object for data integration"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas/"
+    dest: "resources_test/common/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "process_dataset"
+    type_info:
+      label: "Data processor"
+      summary: "A label projection dataset processor."
+      description: "A component for processing a Common Dataset into a task-specific\
+        \ dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scib==1.1.5"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/process_dataset/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/process_dataset"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/process_dataset/process_dataset"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/process_dataset/main.nf b/target/nextflow/batch_integration/process_dataset/main.nf
new file mode 100644
index 0000000000..a4ff91a0d0
--- /dev/null
+++ b/target/nextflow/batch_integration/process_dataset/main.nf
@@ -0,0 +1,3925 @@
+// process_dataset 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_dataset",
+    "namespace" : "batch_integration",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Common Dataset",
+          "summary" : "A subset of the common dataset.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Cell type information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_dataset",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "Unintegrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "Solution dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "label information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--obs_label",
+        "description" : "Which .obs slot to use as label.",
+        "default" : [
+          "cell_type"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--obs_batch",
+        "description" : "Which .obs slot to use as batch covariate.",
+        "default" : [
+          "batch"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--hvgs",
+        "description" : "Number of highly variable genes",
+        "default" : [
+          2000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "boolean",
+        "name" : "--subset_hvg",
+        "description" : "Whether to subset to highly variable genes",
+        "default" : [
+          false
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/process_dataset/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/helper_functions/subset_anndata.py",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "description" : "Preprocess adata object for data integration",
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/common/pancreas/",
+        "dest" : "resources_test/common/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "process_dataset",
+      "type_info" : {
+        "label" : "Data processor",
+        "summary" : "A label projection dataset processor.",
+        "description" : "A component for processing a Common Dataset into a task-specific dataset.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scib==1.1.5"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "highmem",
+          "midcpu",
+          "midtime"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/process_dataset/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/process_dataset",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_dataset': $( if [ ! -z ${VIASH_PAR_OUTPUT_DATASET+x} ]; then echo "r'${VIASH_PAR_OUTPUT_DATASET//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_solution': $( if [ ! -z ${VIASH_PAR_OUTPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_OUTPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'obs_label': $( if [ ! -z ${VIASH_PAR_OBS_LABEL+x} ]; then echo "r'${VIASH_PAR_OBS_LABEL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'obs_batch': $( if [ ! -z ${VIASH_PAR_OBS_BATCH+x} ]; then echo "r'${VIASH_PAR_OBS_BATCH//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'hvgs': $( if [ ! -z ${VIASH_PAR_HVGS+x} ]; then echo "int(r'${VIASH_PAR_HVGS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'subset_hvg': $( if [ ! -z ${VIASH_PAR_SUBSET_HVG+x} ]; then echo "r'${VIASH_PAR_SUBSET_HVG//\\'/\\'\\"\\'\\"r\\'}'.lower() == 'true'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# import helper functions
+sys.path.append(meta['resources_dir'])
+from subset_anndata import read_config_slots_info, subset_anndata
+
+print('Read input', flush=True)
+input = ad.read_h5ad(par['input'])
+
+def compute_batched_hvg(adata, n_hvgs):
+    adata = adata.copy()
+    adata.X = adata.layers['normalized'].copy()
+    if n_hvgs > adata.n_vars or n_hvgs <= 0:
+        hvg_list = adata.var_names.tolist()
+    else:
+        import scib
+        hvg_list = scib.pp.hvg_batch(
+            adata,
+            batch_key='batch',
+            target_genes=n_hvgs,
+            adataOut=False
+        )
+    adata.var['hvg'] = adata.var_names.isin(hvg_list)
+    del adata.X
+    return adata
+
+print(f'Select {par["hvgs"]} highly variable genes', flush=True)
+adata_with_hvg = compute_batched_hvg(input, n_hvgs=par['hvgs'])
+
+if par['subset_hvg']:
+    print('Subsetting to HVG dimensions', flush=True)
+    adata_with_hvg = adata_with_hvg[:, adata_with_hvg.var['hvg']].copy()
+
+print(">> Figuring out which data needs to be copied to which output file", flush=True)
+# use par arguments to look for label and batch value in different slots
+slot_mapping = {
+    "obs": {
+        "label": par["obs_label"],
+        "batch": par["obs_batch"],
+    }
+}
+slot_info = read_config_slots_info(meta["config"], slot_mapping)
+
+print(">> Create output object", flush=True)
+output_dataset = subset_anndata(adata_with_hvg, slot_info["output_dataset"])
+output_solution = subset_anndata(adata_with_hvg, slot_info["output_solution"])
+
+print('Writing adatas to file', flush=True)
+output_dataset.write(par['output_dataset'], compression='gzip')
+output_solution.write(par['output_solution'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/process_dataset",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "highmem",
+    "midcpu",
+    "midtime"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/process_dataset/nextflow.config b/target/nextflow/batch_integration/process_dataset/nextflow.config
new file mode 100644
index 0000000000..ba5451a7a8
--- /dev/null
+++ b/target/nextflow/batch_integration/process_dataset/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'batch_integration/process_dataset'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Preprocess adata object for data integration'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/process_dataset/subset_anndata.py b/target/nextflow/batch_integration/process_dataset/subset_anndata.py
new file mode 100644
index 0000000000..80bd160872
--- /dev/null
+++ b/target/nextflow/batch_integration/process_dataset/subset_anndata.py
@@ -0,0 +1,83 @@
+"""Helper functions related to subsetting AnnData objects based on the file format 
+specifications in the .config.vsh.yaml and slot mapping overrides."""
+
+def read_config_slots_info(config_file, slot_mapping = {}):
+    """Read the .config.vsh.yaml to find out which output slots need to be copied to which output file.
+    
+    Arguments:
+    config_file -- Path to the .config.vsh.yaml file (required).
+    slot_mapping -- Which slots to retain. Must be a dictionary whose keys are the names 
+      of the AnnData structs, and values is another dictionary with destination value
+      names as keys and source value names as values.
+      Example of slot_mapping: 
+        ```
+        slot_mapping = {
+          "layers": {
+            "counts": par["layer_counts"],
+          },
+          "obs": {
+            "cell_type": par["obs_cell_type"],
+            "batch": par["obs_batch"],
+          }
+        }
+        ```
+    """
+    import yaml
+    import re
+
+    # read output spec from yaml
+    with open(config_file, "r") as object_name:
+        config = yaml.safe_load(object_name)
+
+    output_struct_slots = {}
+
+    # fetch info on which slots should be copied to which file
+    for arg in config["functionality"]["arguments"]:
+        # argument is an output file with a slot specification
+        if arg["direction"] == "output" and arg.get("info", {}).get("slots"):
+            object_name = re.sub("--", "", arg["name"])
+            
+            struct_slots = arg['info']['slots']
+            out = {}
+            for (struct, slots) in struct_slots.items():
+                out_struct = {}
+                for slot in slots:
+                    # if slot_mapping[struct][slot['name']] exists, use that as the source slot name
+                    # otherwise use slot['name']
+                    source_slot = slot_mapping.get(struct, {}).get(slot["name"], slot["name"])
+                    out_struct[slot["name"]] = source_slot
+                out[struct] = out_struct
+
+            output_struct_slots[object_name] = out
+
+    return output_struct_slots
+
+# create new anndata objects according to api spec
+def subset_anndata(adata, slot_info):
+    """Create new anndata object according to slot info specifications.
+    
+    Arguments:
+    adata -- An AnnData object to subset (required)
+    slot_info -- Which slots to retain, typically one of the items in the output of read_config_slots_info.
+      Must be a dictionary whose keys are the names of the AnnData structs, and values is another 
+      dictionary with destination value names as keys and source value names as values. 
+      """
+    import pandas as pd
+    import anndata as ad
+
+    structs = ["layers", "obs", "var", "uns", "obsp", "obsm", "varp", "varm"]
+    kwargs = {}
+
+    for struct in structs:
+        slot_mapping = slot_info.get(struct, {})
+        data = {dest : getattr(adata, struct)[src] for (dest, src) in slot_mapping.items()}
+        if len(data) > 0:
+            if struct in ['obs', 'var']:
+                data = pd.concat(data, axis=1)
+            kwargs[struct] = data
+        elif struct in ['obs', 'var']:
+            # if no columns need to be copied, we still need an 'obs' and a 'var' 
+            # to help determine the shape of the adata
+            kwargs[struct] = getattr(adata, struct).iloc[:,[]]
+
+    return ad.AnnData(**kwargs)
\ No newline at end of file
diff --git a/target/nextflow/batch_integration/transformers/embed_to_graph/.config.vsh.yaml b/target/nextflow/batch_integration/transformers/embed_to_graph/.config.vsh.yaml
new file mode 100644
index 0000000000..f253fc3f67
--- /dev/null
+++ b/target/nextflow/batch_integration/transformers/embed_to_graph/.config.vsh.yaml
@@ -0,0 +1,168 @@
+functionality:
+  name: "embed_to_graph"
+  namespace: "batch_integration/transformers"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "graph"
+      label: "Integrated Graph"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        obsp:
+        - type: "double"
+          name: "connectivities"
+          description: "Neighbors connectivities matrix."
+          required: true
+        - type: "double"
+          name: "distances"
+          description: "Neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "object"
+          name: "neighbors"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    label: "Embedding to Graph"
+    summary: "Transform an embedding to a graph output."
+    description: "Transform an embedding to a graph output by applying the k nearest\
+      \ neighbors algorithm.\n"
+    type: "transformer"
+    subtype: "graph"
+    type_info:
+      label: "Embedding to Graph"
+      summary: "Transform an embedding to a graph output."
+      description: "Transform an embedding to a graph output by applying the k nearest\
+        \ neighbors algorithm.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/transformers/embed_to_graph/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/transformers/embed_to_graph"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/transformers/embed_to_graph/embed_to_graph"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/transformers/embed_to_graph/main.nf b/target/nextflow/batch_integration/transformers/embed_to_graph/main.nf
new file mode 100644
index 0000000000..748ac1891e
--- /dev/null
+++ b/target/nextflow/batch_integration/transformers/embed_to_graph/main.nf
@@ -0,0 +1,3546 @@
+// embed_to_graph 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "embed_to_graph",
+    "namespace" : "batch_integration/transformers",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "graph",
+          "label" : "Integrated Graph",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "connectivities",
+                "description" : "Neighbors connectivities matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "distances",
+                "description" : "Neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "neighbors",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_graph.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/transformers/embed_to_graph/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Embedding to Graph",
+      "summary" : "Transform an embedding to a graph output.",
+      "description" : "Transform an embedding to a graph output by applying the k nearest neighbors algorithm.\n",
+      "type" : "transformer",
+      "subtype" : "graph",
+      "type_info" : {
+        "label" : "Embedding to Graph",
+        "summary" : "Transform an embedding to a graph output.",
+        "description" : "Transform an embedding to a graph output by applying the k nearest neighbors algorithm.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/transformers/embed_to_graph/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/transformers/embed_to_graph",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    obs='obs',
+    obsm='obsm',
+    uns='uns'
+)
+
+
+print('Run kNN...', flush=True)
+sc.pp.neighbors(adata, use_rep='X_emb')
+
+print("Store outputs", flush=True)
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/transformers/embed_to_graph",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/transformers/embed_to_graph/nextflow.config b/target/nextflow/batch_integration/transformers/embed_to_graph/nextflow.config
new file mode 100644
index 0000000000..33f6f8b65d
--- /dev/null
+++ b/target/nextflow/batch_integration/transformers/embed_to_graph/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/transformers/embed_to_graph'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/transformers/embed_to_graph/read_anndata_partial.py b/target/nextflow/batch_integration/transformers/embed_to_graph/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/transformers/embed_to_graph/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/transformers/feature_to_embed/.config.vsh.yaml b/target/nextflow/batch_integration/transformers/feature_to_embed/.config.vsh.yaml
new file mode 100644
index 0000000000..01cec1b253
--- /dev/null
+++ b/target/nextflow/batch_integration/transformers/feature_to_embed/.config.vsh.yaml
@@ -0,0 +1,160 @@
+functionality:
+  name: "feature_to_embed"
+  namespace: "batch_integration/transformers"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      prediction_type: "feature"
+      label: "Integrated Feature"
+      summary: "Integrated AnnData HDF5 file."
+      slots:
+        layers:
+        - type: "double"
+          name: "corrected_counts"
+          description: "Corrected counts after integration"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      prediction_type: "embedding"
+      label: "Integrated embedding"
+      summary: "An integrated AnnData HDF5 file."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "integration embedding prediction"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/helper_functions/read_anndata_partial.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/batch_integration/pancreas"
+    dest: "resources_test/batch_integration/pancreas"
+  info:
+    type: "transformer"
+    label: "Feature to Embedding"
+    summary: "Transform a feature output to an embedding."
+    description: "Transform a feature output to an embedding by computing a PCA on\
+      \ the corrected counts.\n"
+    subtype: "embedding"
+    type_info:
+      label: "Feature to Embedding"
+      summary: "Transform a feature output to an embedding."
+      description: "Transform a feature output to an embedding by computing a PCA\
+        \ on the corrected counts.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/transformers/feature_to_embed/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/transformers/feature_to_embed"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/transformers/feature_to_embed/feature_to_embed"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/transformers/feature_to_embed/main.nf b/target/nextflow/batch_integration/transformers/feature_to_embed/main.nf
new file mode 100644
index 0000000000..d3ec870eff
--- /dev/null
+++ b/target/nextflow/batch_integration/transformers/feature_to_embed/main.nf
@@ -0,0 +1,3541 @@
+// feature_to_embed 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "feature_to_embed",
+    "namespace" : "batch_integration/transformers",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "prediction_type" : "feature",
+          "label" : "Integrated Feature",
+          "summary" : "Integrated AnnData HDF5 file.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "corrected_counts",
+                "description" : "Corrected counts after integration",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_feature.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "prediction_type" : "embedding",
+          "label" : "Integrated embedding",
+          "summary" : "An integrated AnnData HDF5 file.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "integration embedding prediction",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/batch_integration/pancreas/integrated_embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/transformers/feature_to_embed/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/helper_functions/read_anndata_partial.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/batch_integration/pancreas",
+        "dest" : "resources_test/batch_integration/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "transformer",
+      "label" : "Feature to Embedding",
+      "summary" : "Transform a feature output to an embedding.",
+      "description" : "Transform a feature output to an embedding by computing a PCA on the corrected counts.\n",
+      "subtype" : "embedding",
+      "type_info" : {
+        "label" : "Feature to Embedding",
+        "summary" : "Transform a feature output to an embedding.",
+        "description" : "Transform a feature output to an embedding by computing a PCA on the corrected counts.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/transformers/feature_to_embed/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/transformers/feature_to_embed",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+from read_anndata_partial import read_anndata
+
+
+print('Read input', flush=True)
+adata = read_anndata(
+    par['input'],
+    X='layers/corrected_counts',
+    obs='obs',
+    var='var',
+    uns='uns'
+)
+
+
+print('Run PCA', flush=True)
+adata.obsm['X_emb'] = sc.pp.pca(
+    adata.X,
+    n_comps=50,
+    use_highly_variable=False,  # Do we want to set this to True?
+    svd_solver='arpack',
+    return_info=False
+)
+
+print('Store outputs', flush=True)
+adata.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/batch_integration/transformers/feature_to_embed",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/transformers/feature_to_embed/nextflow.config b/target/nextflow/batch_integration/transformers/feature_to_embed/nextflow.config
new file mode 100644
index 0000000000..bd54ba6422
--- /dev/null
+++ b/target/nextflow/batch_integration/transformers/feature_to_embed/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/transformers/feature_to_embed'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/transformers/feature_to_embed/read_anndata_partial.py b/target/nextflow/batch_integration/transformers/feature_to_embed/read_anndata_partial.py
new file mode 100755
index 0000000000..efbea0592d
--- /dev/null
+++ b/target/nextflow/batch_integration/transformers/feature_to_embed/read_anndata_partial.py
@@ -0,0 +1,77 @@
+import warnings
+from pathlib import Path
+import anndata as ad
+import h5py
+from scipy.sparse import csr_matrix
+from anndata.experimental import read_elem, sparse_dataset
+
+
+def read_anndata(
+    file: str,
+    backed: bool = False,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Read anndata file
+    :param file: path to anndata file in h5ad format
+    :param kwargs: AnnData parameter to group mapping
+    """
+    assert Path(file).exists(), f'File not found: {file}'
+    
+    f = h5py.File(file, 'r')
+    kwargs = {x: x for x in f} if not kwargs else kwargs
+    if len(f.keys()) == 0:
+        return ad.AnnData()
+    # check if keys are available
+    for name, slot in kwargs.items():
+        if slot not in f:
+            warnings.warn(
+                f'Cannot find "{slot}" for AnnData parameter `{name}` from "{file}"'
+            )
+    adata = read_partial(f, backed=backed, **kwargs)
+    if not backed:
+        f.close()
+    
+    return adata
+
+
+def read_partial(
+    group: h5py.Group,
+    backed: bool = False,
+    force_sparse_types: [str, list] = None,
+    **kwargs
+) -> ad.AnnData:
+    """
+    Partially read h5py groups
+    :params group: file group
+    :params force_sparse_types: encoding types to convert to sparse_dataset via csr_matrix
+    :params backed: read sparse matrix as sparse_dataset
+    :params **kwargs: dict of slot_name: slot, by default use all available slot for the h5py file
+    :return: AnnData object
+    """
+    if force_sparse_types is None:
+        force_sparse_types = []
+    elif isinstance(force_sparse_types, str):
+        force_sparse_types = [force_sparse_types]
+    slots = {}
+    if backed:
+        print('Read as backed sparse matrix...')
+    
+    for slot_name, slot in kwargs.items():
+        print(f'Read slot "{slot}", store as "{slot_name}"...')
+        if slot not in group:
+            warnings.warn(f'Slot "{slot}" not found, skip...')
+            slots[slot_name] = None
+        else:
+            elem = group[slot]
+            iospec = ad._io.specs.get_spec(elem)
+            if iospec.encoding_type in ("csr_matrix", "csc_matrix") and backed:
+                slots[slot_name] = sparse_dataset(elem)
+            elif iospec.encoding_type in force_sparse_types:
+                slots[slot_name] = csr_matrix(read_elem(elem))
+                if backed:
+                    slots[slot_name] = sparse_dataset(slots[slot_name])
+            else:
+                slots[slot_name] = read_elem(elem)
+    return ad.AnnData(**slots)
+
diff --git a/target/nextflow/batch_integration/workflows/process_datasets/.config.vsh.yaml b/target/nextflow/batch_integration/workflows/process_datasets/.config.vsh.yaml
new file mode 100644
index 0000000000..9d950eefbd
--- /dev/null
+++ b/target/nextflow/batch_integration/workflows/process_datasets/.config.vsh.yaml
@@ -0,0 +1,363 @@
+functionality:
+  name: "process_datasets"
+  namespace: "batch_integration/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input"
+      info:
+        label: "Common Dataset"
+        summary: "A subset of the common dataset."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "cell_type"
+            description: "Cell type information"
+            required: true
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          var:
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_pca"
+            description: "The resulting PCA embedding."
+            required: true
+          obsp:
+          - type: "double"
+            name: "knn_distances"
+            description: "K nearest neighbors distance matrix."
+            required: true
+          - type: "double"
+            name: "knn_connectivities"
+            description: "K nearest neighbors connectivities matrix."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+          - type: "object"
+            name: "knn"
+            description: "Supplementary K nearest neighbors data."
+            required: true
+      example:
+      - "resources_test/common/pancreas/dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_dataset"
+      info:
+        label: "Dataset"
+        summary: "Unintegrated AnnData HDF5 file."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          - type: "string"
+            name: "label"
+            description: "label information"
+            required: true
+          var:
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_pca"
+            description: "The resulting PCA embedding."
+            required: true
+          obsp:
+          - type: "double"
+            name: "knn_distances"
+            description: "K nearest neighbors distance matrix."
+            required: true
+          - type: "double"
+            name: "knn_connectivities"
+            description: "K nearest neighbors connectivities matrix."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "object"
+            name: "knn"
+            description: "Supplementary K nearest neighbors data."
+            required: true
+      example:
+      - "resources_test/batch_integration/pancreas/dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_solution"
+      info:
+        label: "Solution"
+        summary: "Solution dataset"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          - type: "string"
+            name: "label"
+            description: "label information"
+            required: true
+          var:
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_pca"
+            description: "The resulting PCA embedding."
+            required: true
+          obsp:
+          - type: "double"
+            name: "knn_distances"
+            description: "K nearest neighbors distance matrix."
+            required: true
+          - type: "double"
+            name: "knn_connectivities"
+            description: "K nearest neighbors connectivities matrix."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+          - type: "object"
+            name: "knn"
+            description: "Supplementary K nearest neighbors data."
+            required: true
+      example:
+      - "resources_test/batch_integration/pancreas/solution.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "src/wf_utils/helper.nf"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/check_dataset_schema"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+    configInfo:
+      functionalityName: "check_dataset_schema"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/check_dataset_schema/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+  - name: "batch_integration/process_dataset"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/process_dataset/config.vsh.yaml"
+    configInfo:
+      functionalityName: "process_dataset"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/process_dataset/config.vsh.yaml"
+      functionalityNamespace: "batch_integration"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/process_dataset/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/process_dataset"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/workflows/process_datasets/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/workflows/process_datasets"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/workflows/process_datasets/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/workflows/process_datasets/helper.nf b/target/nextflow/batch_integration/workflows/process_datasets/helper.nf
new file mode 100644
index 0000000000..7b3acd5b1c
--- /dev/null
+++ b/target/nextflow/batch_integration/workflows/process_datasets/helper.nf
@@ -0,0 +1,14 @@
+Map findArgumentSchema(Map config, String argument_id) {
+  def argument_groups =
+    (config.functionality.argument_groups ?: []) +
+    [
+      arguments: config.functionality.arguments ?: []
+    ]
+
+  def schema_value = argument_groups.findResult{ gr ->
+    gr.arguments.find { arg ->
+      arg.name == ("--" + argument_id)
+    }
+  }
+  return schema_value
+}
diff --git a/target/nextflow/batch_integration/workflows/process_datasets/main.nf b/target/nextflow/batch_integration/workflows/process_datasets/main.nf
new file mode 100644
index 0000000000..8b63b549d2
--- /dev/null
+++ b/target/nextflow/batch_integration/workflows/process_datasets/main.nf
@@ -0,0 +1,3506 @@
+// process_datasets 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_datasets",
+    "namespace" : "batch_integration/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input",
+            "info" : {
+              "label" : "Common Dataset",
+              "summary" : "A subset of the common dataset.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Cell type information",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_pca",
+                    "description" : "The resulting PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "obsp" : [
+                  {
+                    "type" : "double",
+                    "name" : "knn_distances",
+                    "description" : "K nearest neighbors distance matrix.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "knn_connectivities",
+                    "description" : "K nearest neighbors connectivities matrix.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  },
+                  {
+                    "type" : "object",
+                    "name" : "knn",
+                    "description" : "Supplementary K nearest neighbors data.",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_dataset",
+            "info" : {
+              "label" : "Dataset",
+              "summary" : "Unintegrated AnnData HDF5 file.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "label",
+                    "description" : "label information",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_pca",
+                    "description" : "The resulting PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "obsp" : [
+                  {
+                    "type" : "double",
+                    "name" : "knn_distances",
+                    "description" : "K nearest neighbors distance matrix.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "knn_connectivities",
+                    "description" : "K nearest neighbors connectivities matrix.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "object",
+                    "name" : "knn",
+                    "description" : "Supplementary K nearest neighbors data.",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/batch_integration/pancreas/dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_solution",
+            "info" : {
+              "label" : "Solution",
+              "summary" : "Solution dataset",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "label",
+                    "description" : "label information",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_pca",
+                    "description" : "The resulting PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "obsp" : [
+                  {
+                    "type" : "double",
+                    "name" : "knn_distances",
+                    "description" : "K nearest neighbors distance matrix.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "knn_connectivities",
+                    "description" : "K nearest neighbors connectivities matrix.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  },
+                  {
+                    "type" : "object",
+                    "name" : "knn",
+                    "description" : "Supplementary K nearest neighbors data.",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/batch_integration/pancreas/solution.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/workflows/process_datasets/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "src/wf_utils/helper.nf",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/check_dataset_schema",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "check_dataset_schema",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/check_dataset_schema/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+      },
+      {
+        "name" : "batch_integration/process_dataset",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/process_dataset/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "process_dataset",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/process_dataset/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/process_dataset/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/process_dataset"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/workflows/process_datasets/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/workflows/process_datasets",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { check_dataset_schema } from "${meta.resources_dir}/../../../../nextflow/common/check_dataset_schema/main.nf"
+include { process_dataset } from "${meta.resources_dir}/../../../../nextflow/batch_integration/process_dataset/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    | check_dataset_schema.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset": checks["exit_code"] == 0 ? state.input : null,
+        ]
+      }
+    )
+
+    // remove datasets which didn't pass the schema check
+    | filter { id, state ->
+      state.dataset != null
+    }
+
+    | process_dataset.run(
+      fromState: [ input: "dataset" ],
+      toState: [
+        output_dataset: "output_dataset",
+        output_solution: "output_solution"
+      ]
+    )
+
+    // only output the files for which an output file was specified
+    | setState(["output_dataset", "output_solution"])
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/workflows/process_datasets/nextflow.config b/target/nextflow/batch_integration/workflows/process_datasets/nextflow.config
new file mode 100644
index 0000000000..fe01a3bdbf
--- /dev/null
+++ b/target/nextflow/batch_integration/workflows/process_datasets/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/workflows/process_datasets'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/workflows/run_benchmark/.config.vsh.yaml b/target/nextflow/batch_integration/workflows/run_benchmark/.config.vsh.yaml
new file mode 100644
index 0000000000..761600a80b
--- /dev/null
+++ b/target/nextflow/batch_integration/workflows/run_benchmark/.config.vsh.yaml
@@ -0,0 +1,1050 @@
+functionality:
+  name: "run_benchmark"
+  namespace: "batch_integration/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input_dataset"
+      info:
+        label: "Dataset"
+        summary: "Unintegrated AnnData HDF5 file."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          - type: "string"
+            name: "label"
+            description: "label information"
+            required: true
+          var:
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_pca"
+            description: "The resulting PCA embedding."
+            required: true
+          obsp:
+          - type: "double"
+            name: "knn_distances"
+            description: "K nearest neighbors distance matrix."
+            required: true
+          - type: "double"
+            name: "knn_connectivities"
+            description: "K nearest neighbors connectivities matrix."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "object"
+            name: "knn"
+            description: "Supplementary K nearest neighbors data."
+            required: true
+      example:
+      - "resources_test/batch_integration/pancreas/dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_solution"
+      info:
+        label: "Solution"
+        summary: "Solution dataset"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          - type: "string"
+            name: "label"
+            description: "label information"
+            required: true
+          var:
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_pca"
+            description: "The resulting PCA embedding."
+            required: true
+          obsp:
+          - type: "double"
+            name: "knn_distances"
+            description: "K nearest neighbors distance matrix."
+            required: true
+          - type: "double"
+            name: "knn_connectivities"
+            description: "K nearest neighbors connectivities matrix."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+          - type: "object"
+            name: "knn"
+            description: "Supplementary K nearest neighbors data."
+            required: true
+      example:
+      - "resources_test/batch_integration/pancreas/solution.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_scores"
+      description: "A yaml file containing the scores of each of the methods"
+      info: null
+      default:
+      - "score_uns.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_method_configs"
+      info: null
+      default:
+      - "method_configs.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_metric_configs"
+      info: null
+      default:
+      - "metric_configs.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_dataset_info"
+      info: null
+      default:
+      - "dataset_uns.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_task_info"
+      info: null
+      default:
+      - "task_info.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Methods"
+    arguments:
+    - type: "string"
+      name: "--method_ids"
+      description: "A list of method ids to run. If not specified, all methods will\
+        \ be run."
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "../../api/task_info.yaml"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/check_dataset_schema"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+    configInfo:
+      functionalityName: "check_dataset_schema"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/check_dataset_schema/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+  - name: "common/extract_metadata"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+    configInfo:
+      functionalityName: "extract_metadata"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/extract_metadata/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  - name: "batch_integration/methods/bbknn"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/bbknn/config.vsh.yaml"
+    configInfo:
+      functionalityName: "bbknn"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/bbknn/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/methods/bbknn/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/bbknn"
+  - name: "batch_integration/methods/combat"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/combat/config.vsh.yaml"
+    configInfo:
+      functionalityName: "combat"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/combat/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/methods/combat/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/combat"
+  - name: "batch_integration/methods/fastmnn_embedding"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/fastmnn_embedding/config.vsh.yaml"
+    configInfo:
+      functionalityName: "fastmnn_embedding"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/fastmnn_embedding/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/methods/fastmnn_embedding/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/fastmnn_embedding"
+  - name: "batch_integration/methods/fastmnn_feature"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/fastmnn_feature/config.vsh.yaml"
+    configInfo:
+      functionalityName: "fastmnn_feature"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/fastmnn_feature/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/methods/fastmnn_feature/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/fastmnn_feature"
+  - name: "batch_integration/methods/liger"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/liger/config.vsh.yaml"
+    configInfo:
+      functionalityName: "liger"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/liger/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/methods/liger/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/liger"
+  - name: "batch_integration/methods/mnn_correct"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/mnn_correct/config.vsh.yaml"
+    configInfo:
+      functionalityName: "mnn_correct"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/mnn_correct/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/methods/mnn_correct/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/mnn_correct"
+  - name: "batch_integration/methods/mnnpy"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/mnnpy/config.vsh.yaml"
+    configInfo:
+      functionalityName: "mnnpy"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/mnnpy/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/methods/mnnpy/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/mnnpy"
+  - name: "batch_integration/methods/pyliger"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/pyliger/config.vsh.yaml"
+    configInfo:
+      functionalityName: "pyliger"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/pyliger/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/methods/pyliger/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/pyliger"
+  - name: "batch_integration/methods/scalex_embed"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scalex_embed/config.vsh.yaml"
+    configInfo:
+      functionalityName: "scalex_embed"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scalex_embed/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/methods/scalex_embed/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scalex_embed"
+  - name: "batch_integration/methods/scalex_feature"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scalex_feature/config.vsh.yaml"
+    configInfo:
+      functionalityName: "scalex_feature"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scalex_feature/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/methods/scalex_feature/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scalex_feature"
+  - name: "batch_integration/methods/scanorama_embed"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanorama_embed/config.vsh.yaml"
+    configInfo:
+      functionalityName: "scanorama_embed"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanorama_embed/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/methods/scanorama_embed/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scanorama_embed"
+  - name: "batch_integration/methods/scanorama_feature"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanorama_feature/config.vsh.yaml"
+    configInfo:
+      functionalityName: "scanorama_feature"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanorama_feature/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/methods/scanorama_feature/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scanorama_feature"
+  - name: "batch_integration/methods/scanvi"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanvi/config.vsh.yaml"
+    configInfo:
+      functionalityName: "scanvi"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanvi/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/methods/scanvi/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scanvi"
+  - name: "batch_integration/methods/scvi"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scvi/config.vsh.yaml"
+    configInfo:
+      functionalityName: "scvi"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scvi/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/methods/scvi/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scvi"
+  - name: "batch_integration/control_methods/no_integration/batch_embed"
+    alias: "no_integration_batch_embed"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/batch_embed/config.vsh.yaml"
+    configInfo:
+      functionalityName: "batch_embed"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/batch_embed/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/control_methods/no_integration"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/control_methods/no_integration/batch_embed/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/batch_embed"
+  - name: "batch_integration/control_methods/no_integration/global_embed"
+    alias: "no_integration_global_embed"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_embed/config.vsh.yaml"
+    configInfo:
+      functionalityName: "global_embed"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_embed/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/control_methods/no_integration"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/control_methods/no_integration/global_embed/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/global_embed"
+  - name: "batch_integration/control_methods/no_integration/global_feature"
+    alias: "no_integration_global_feature"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_feature/config.vsh.yaml"
+    configInfo:
+      functionalityName: "global_feature"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_feature/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/control_methods/no_integration"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/control_methods/no_integration/global_feature/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/global_feature"
+  - name: "batch_integration/control_methods/no_integration/global_graph"
+    alias: "no_integration_global_graph"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_graph/config.vsh.yaml"
+    configInfo:
+      functionalityName: "global_graph"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_graph/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/control_methods/no_integration"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/control_methods/no_integration/global_graph/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/global_graph"
+  - name: "batch_integration/control_methods/perfect_integration/celltype_embed"
+    alias: "perfect_integration_celltype_embed"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/perfect_integration/celltype_embed/config.vsh.yaml"
+    configInfo:
+      functionalityName: "celltype_embed"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/perfect_integration/celltype_embed/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/control_methods/perfect_integration"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed"
+  - name: "batch_integration/control_methods/perfect_integration/celltype_jitter_embed"
+    alias: "perfect_integration_celltype_jitter_embed"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/config.vsh.yaml"
+    configInfo:
+      functionalityName: "celltype_jitter_embed"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/control_methods/perfect_integration"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed"
+  - name: "batch_integration/control_methods/random_integration/batch_embed"
+    alias: "random_integration_batch_embed"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_embed/config.vsh.yaml"
+    configInfo:
+      functionalityName: "batch_embed"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_embed/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/control_methods/random_integration"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/control_methods/random_integration/batch_embed/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/batch_embed"
+  - name: "batch_integration/control_methods/random_integration/batch_feature"
+    alias: "random_integration_batch_feature"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_feature/config.vsh.yaml"
+    configInfo:
+      functionalityName: "batch_feature"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_feature/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/control_methods/random_integration"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/control_methods/random_integration/batch_feature/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/batch_feature"
+  - name: "batch_integration/control_methods/random_integration/batch_graph"
+    alias: "random_integration_batch_graph"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_graph/config.vsh.yaml"
+    configInfo:
+      functionalityName: "batch_graph"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_graph/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/control_methods/random_integration"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/control_methods/random_integration/batch_graph/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/batch_graph"
+  - name: "batch_integration/control_methods/random_integration/celltype_embed"
+    alias: "random_integration_celltype_embed"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_embed/config.vsh.yaml"
+    configInfo:
+      functionalityName: "celltype_embed"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_embed/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/control_methods/random_integration"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/control_methods/random_integration/celltype_embed/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed"
+  - name: "batch_integration/control_methods/random_integration/celltype_feature"
+    alias: "random_integration_celltype_feature"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_feature/config.vsh.yaml"
+    configInfo:
+      functionalityName: "celltype_feature"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_feature/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/control_methods/random_integration"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/control_methods/random_integration/celltype_feature/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature"
+  - name: "batch_integration/control_methods/random_integration/celltype_graph"
+    alias: "random_integration_celltype_graph"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_graph/config.vsh.yaml"
+    configInfo:
+      functionalityName: "celltype_graph"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_graph/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/control_methods/random_integration"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/control_methods/random_integration/celltype_graph/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph"
+  - name: "batch_integration/control_methods/random_integration/global_embed"
+    alias: "random_integration_global_embed"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_embed/config.vsh.yaml"
+    configInfo:
+      functionalityName: "global_embed"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_embed/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/control_methods/random_integration"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/control_methods/random_integration/global_embed/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/global_embed"
+  - name: "batch_integration/control_methods/random_integration/global_feature"
+    alias: "random_integration_global_feature"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_feature/config.vsh.yaml"
+    configInfo:
+      functionalityName: "global_feature"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_feature/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/control_methods/random_integration"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/control_methods/random_integration/global_feature/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/global_feature"
+  - name: "batch_integration/control_methods/random_integration/global_graph"
+    alias: "random_integration_global_graph"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_graph/config.vsh.yaml"
+    configInfo:
+      functionalityName: "global_graph"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_graph/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/control_methods/random_integration"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/control_methods/random_integration/global_graph/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/global_graph"
+  - name: "batch_integration/transformers/feature_to_embed"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/transformers/feature_to_embed/config.vsh.yaml"
+    configInfo:
+      functionalityName: "feature_to_embed"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/transformers/feature_to_embed/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/transformers"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/transformers/feature_to_embed/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/transformers/feature_to_embed"
+  - name: "batch_integration/transformers/embed_to_graph"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/transformers/embed_to_graph/config.vsh.yaml"
+    configInfo:
+      functionalityName: "embed_to_graph"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/transformers/embed_to_graph/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/transformers"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/transformers/embed_to_graph/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/transformers/embed_to_graph"
+  - name: "batch_integration/metrics/asw_batch"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/asw_batch/config.vsh.yaml"
+    configInfo:
+      functionalityName: "asw_batch"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/asw_batch/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/metrics/asw_batch/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/asw_batch"
+  - name: "batch_integration/metrics/asw_label"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/asw_label/config.vsh.yaml"
+    configInfo:
+      functionalityName: "asw_label"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/asw_label/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/metrics/asw_label/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/asw_label"
+  - name: "batch_integration/metrics/cell_cycle_conservation"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/cell_cycle_conservation/config.vsh.yaml"
+    configInfo:
+      functionalityName: "cell_cycle_conservation"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/cell_cycle_conservation/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/metrics/cell_cycle_conservation/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/cell_cycle_conservation"
+  - name: "batch_integration/metrics/clustering_overlap"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/clustering_overlap/config.vsh.yaml"
+    configInfo:
+      functionalityName: "clustering_overlap"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/clustering_overlap/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/metrics/clustering_overlap/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/clustering_overlap"
+  - name: "batch_integration/metrics/graph_connectivity"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/graph_connectivity/config.vsh.yaml"
+    configInfo:
+      functionalityName: "graph_connectivity"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/graph_connectivity/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/metrics/graph_connectivity/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/graph_connectivity"
+  - name: "batch_integration/metrics/hvg_overlap"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/hvg_overlap/config.vsh.yaml"
+    configInfo:
+      functionalityName: "hvg_overlap"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/hvg_overlap/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/metrics/hvg_overlap/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/hvg_overlap"
+  - name: "batch_integration/metrics/isolated_label_asw"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/isolated_label_asw/config.vsh.yaml"
+    configInfo:
+      functionalityName: "isolated_label_asw"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/isolated_label_asw/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/metrics/isolated_label_asw/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/isolated_label_asw"
+  - name: "batch_integration/metrics/isolated_label_f1"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/isolated_label_f1/config.vsh.yaml"
+    configInfo:
+      functionalityName: "isolated_label_f1"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/isolated_label_f1/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/metrics/isolated_label_f1/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/isolated_label_f1"
+  - name: "batch_integration/metrics/kbet"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/kbet/config.vsh.yaml"
+    configInfo:
+      functionalityName: "kbet"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/kbet/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/metrics/kbet/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/kbet"
+  - name: "batch_integration/metrics/pcr"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/pcr/config.vsh.yaml"
+    configInfo:
+      functionalityName: "pcr"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/pcr/config.vsh.yaml"
+      functionalityNamespace: "batch_integration/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/batch_integration/metrics/pcr/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/pcr"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/workflows/run_benchmark/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/workflows/run_benchmark"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/workflows/run_benchmark/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/batch_integration/workflows/run_benchmark/main.nf b/target/nextflow/batch_integration/workflows/run_benchmark/main.nf
new file mode 100644
index 0000000000..6a509d7651
--- /dev/null
+++ b/target/nextflow/batch_integration/workflows/run_benchmark/main.nf
@@ -0,0 +1,4582 @@
+// run_benchmark 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "run_benchmark",
+    "namespace" : "batch_integration/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input_dataset",
+            "info" : {
+              "label" : "Dataset",
+              "summary" : "Unintegrated AnnData HDF5 file.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "label",
+                    "description" : "label information",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_pca",
+                    "description" : "The resulting PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "obsp" : [
+                  {
+                    "type" : "double",
+                    "name" : "knn_distances",
+                    "description" : "K nearest neighbors distance matrix.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "knn_connectivities",
+                    "description" : "K nearest neighbors connectivities matrix.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "object",
+                    "name" : "knn",
+                    "description" : "Supplementary K nearest neighbors data.",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/batch_integration/pancreas/dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_solution",
+            "info" : {
+              "label" : "Solution",
+              "summary" : "Solution dataset",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "label",
+                    "description" : "label information",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_pca",
+                    "description" : "The resulting PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "obsp" : [
+                  {
+                    "type" : "double",
+                    "name" : "knn_distances",
+                    "description" : "K nearest neighbors distance matrix.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "knn_connectivities",
+                    "description" : "K nearest neighbors connectivities matrix.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  },
+                  {
+                    "type" : "object",
+                    "name" : "knn",
+                    "description" : "Supplementary K nearest neighbors data.",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/batch_integration/pancreas/solution.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_scores",
+            "description" : "A yaml file containing the scores of each of the methods",
+            "default" : [
+              "score_uns.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_method_configs",
+            "default" : [
+              "method_configs.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_metric_configs",
+            "default" : [
+              "metric_configs.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_dataset_info",
+            "default" : [
+              "dataset_uns.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_task_info",
+            "default" : [
+              "task_info.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Methods",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--method_ids",
+            "description" : "A list of method ids to run. If not specified, all methods will be run.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/workflows/run_benchmark/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "../../api/task_info.yaml",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/workflows/run_benchmark/"
+      }
+    ],
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/check_dataset_schema",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "check_dataset_schema",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/check_dataset_schema/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+      },
+      {
+        "name" : "common/extract_metadata",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "extract_metadata",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/extract_metadata/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+      },
+      {
+        "name" : "batch_integration/methods/bbknn",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/bbknn/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "bbknn",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/bbknn/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/methods/bbknn/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/bbknn"
+      },
+      {
+        "name" : "batch_integration/methods/combat",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/combat/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "combat",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/combat/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/methods/combat/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/combat"
+      },
+      {
+        "name" : "batch_integration/methods/fastmnn_embedding",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/fastmnn_embedding/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "fastmnn_embedding",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/fastmnn_embedding/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/methods/fastmnn_embedding/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/fastmnn_embedding"
+      },
+      {
+        "name" : "batch_integration/methods/fastmnn_feature",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/fastmnn_feature/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "fastmnn_feature",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/fastmnn_feature/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/methods/fastmnn_feature/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/fastmnn_feature"
+      },
+      {
+        "name" : "batch_integration/methods/liger",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/liger/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "liger",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/liger/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/methods/liger/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/liger"
+      },
+      {
+        "name" : "batch_integration/methods/mnn_correct",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/mnn_correct/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "mnn_correct",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/mnn_correct/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/methods/mnn_correct/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/mnn_correct"
+      },
+      {
+        "name" : "batch_integration/methods/mnnpy",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/mnnpy/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "mnnpy",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/mnnpy/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/methods/mnnpy/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/mnnpy"
+      },
+      {
+        "name" : "batch_integration/methods/pyliger",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/pyliger/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "pyliger",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/pyliger/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/methods/pyliger/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/pyliger"
+      },
+      {
+        "name" : "batch_integration/methods/scalex_embed",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scalex_embed/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "scalex_embed",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scalex_embed/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/methods/scalex_embed/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scalex_embed"
+      },
+      {
+        "name" : "batch_integration/methods/scalex_feature",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scalex_feature/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "scalex_feature",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scalex_feature/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/methods/scalex_feature/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scalex_feature"
+      },
+      {
+        "name" : "batch_integration/methods/scanorama_embed",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanorama_embed/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "scanorama_embed",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanorama_embed/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/methods/scanorama_embed/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scanorama_embed"
+      },
+      {
+        "name" : "batch_integration/methods/scanorama_feature",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanorama_feature/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "scanorama_feature",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanorama_feature/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/methods/scanorama_feature/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scanorama_feature"
+      },
+      {
+        "name" : "batch_integration/methods/scanvi",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanvi/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "scanvi",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scanvi/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/methods/scanvi/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scanvi"
+      },
+      {
+        "name" : "batch_integration/methods/scvi",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scvi/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "scvi",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/methods/scvi/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/methods/scvi/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/methods/scvi"
+      },
+      {
+        "name" : "batch_integration/control_methods/no_integration/batch_embed",
+        "alias" : "no_integration_batch_embed",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/batch_embed/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "batch_embed",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/batch_embed/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/control_methods/no_integration",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/control_methods/no_integration/batch_embed/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/batch_embed"
+      },
+      {
+        "name" : "batch_integration/control_methods/no_integration/global_embed",
+        "alias" : "no_integration_global_embed",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_embed/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "global_embed",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_embed/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/control_methods/no_integration",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/control_methods/no_integration/global_embed/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/global_embed"
+      },
+      {
+        "name" : "batch_integration/control_methods/no_integration/global_feature",
+        "alias" : "no_integration_global_feature",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_feature/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "global_feature",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_feature/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/control_methods/no_integration",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/control_methods/no_integration/global_feature/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/global_feature"
+      },
+      {
+        "name" : "batch_integration/control_methods/no_integration/global_graph",
+        "alias" : "no_integration_global_graph",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_graph/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "global_graph",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/no_integration/global_graph/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/control_methods/no_integration",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/control_methods/no_integration/global_graph/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/no_integration/global_graph"
+      },
+      {
+        "name" : "batch_integration/control_methods/perfect_integration/celltype_embed",
+        "alias" : "perfect_integration_celltype_embed",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/perfect_integration/celltype_embed/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "celltype_embed",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/perfect_integration/celltype_embed/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/control_methods/perfect_integration",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_embed"
+      },
+      {
+        "name" : "batch_integration/control_methods/perfect_integration/celltype_jitter_embed",
+        "alias" : "perfect_integration_celltype_jitter_embed",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "celltype_jitter_embed",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/control_methods/perfect_integration",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed"
+      },
+      {
+        "name" : "batch_integration/control_methods/random_integration/batch_embed",
+        "alias" : "random_integration_batch_embed",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_embed/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "batch_embed",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_embed/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/control_methods/random_integration",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/control_methods/random_integration/batch_embed/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/batch_embed"
+      },
+      {
+        "name" : "batch_integration/control_methods/random_integration/batch_feature",
+        "alias" : "random_integration_batch_feature",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_feature/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "batch_feature",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_feature/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/control_methods/random_integration",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/control_methods/random_integration/batch_feature/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/batch_feature"
+      },
+      {
+        "name" : "batch_integration/control_methods/random_integration/batch_graph",
+        "alias" : "random_integration_batch_graph",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_graph/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "batch_graph",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/batch_graph/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/control_methods/random_integration",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/control_methods/random_integration/batch_graph/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/batch_graph"
+      },
+      {
+        "name" : "batch_integration/control_methods/random_integration/celltype_embed",
+        "alias" : "random_integration_celltype_embed",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_embed/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "celltype_embed",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_embed/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/control_methods/random_integration",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/control_methods/random_integration/celltype_embed/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/celltype_embed"
+      },
+      {
+        "name" : "batch_integration/control_methods/random_integration/celltype_feature",
+        "alias" : "random_integration_celltype_feature",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_feature/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "celltype_feature",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_feature/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/control_methods/random_integration",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/control_methods/random_integration/celltype_feature/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/celltype_feature"
+      },
+      {
+        "name" : "batch_integration/control_methods/random_integration/celltype_graph",
+        "alias" : "random_integration_celltype_graph",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_graph/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "celltype_graph",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/celltype_graph/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/control_methods/random_integration",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/control_methods/random_integration/celltype_graph/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/celltype_graph"
+      },
+      {
+        "name" : "batch_integration/control_methods/random_integration/global_embed",
+        "alias" : "random_integration_global_embed",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_embed/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "global_embed",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_embed/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/control_methods/random_integration",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/control_methods/random_integration/global_embed/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/global_embed"
+      },
+      {
+        "name" : "batch_integration/control_methods/random_integration/global_feature",
+        "alias" : "random_integration_global_feature",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_feature/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "global_feature",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_feature/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/control_methods/random_integration",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/control_methods/random_integration/global_feature/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/global_feature"
+      },
+      {
+        "name" : "batch_integration/control_methods/random_integration/global_graph",
+        "alias" : "random_integration_global_graph",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_graph/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "global_graph",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/control_methods/random_integration/global_graph/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/control_methods/random_integration",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/control_methods/random_integration/global_graph/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/control_methods/random_integration/global_graph"
+      },
+      {
+        "name" : "batch_integration/transformers/feature_to_embed",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/transformers/feature_to_embed/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "feature_to_embed",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/transformers/feature_to_embed/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/transformers",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/transformers/feature_to_embed/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/transformers/feature_to_embed"
+      },
+      {
+        "name" : "batch_integration/transformers/embed_to_graph",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/transformers/embed_to_graph/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "embed_to_graph",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/transformers/embed_to_graph/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/transformers",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/transformers/embed_to_graph/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/transformers/embed_to_graph"
+      },
+      {
+        "name" : "batch_integration/metrics/asw_batch",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/asw_batch/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "asw_batch",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/asw_batch/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/metrics/asw_batch/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/asw_batch"
+      },
+      {
+        "name" : "batch_integration/metrics/asw_label",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/asw_label/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "asw_label",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/asw_label/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/metrics/asw_label/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/asw_label"
+      },
+      {
+        "name" : "batch_integration/metrics/cell_cycle_conservation",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/cell_cycle_conservation/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "cell_cycle_conservation",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/cell_cycle_conservation/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/metrics/cell_cycle_conservation/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/cell_cycle_conservation"
+      },
+      {
+        "name" : "batch_integration/metrics/clustering_overlap",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/clustering_overlap/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "clustering_overlap",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/clustering_overlap/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/metrics/clustering_overlap/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/clustering_overlap"
+      },
+      {
+        "name" : "batch_integration/metrics/graph_connectivity",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/graph_connectivity/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "graph_connectivity",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/graph_connectivity/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/metrics/graph_connectivity/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/graph_connectivity"
+      },
+      {
+        "name" : "batch_integration/metrics/hvg_overlap",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/hvg_overlap/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "hvg_overlap",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/hvg_overlap/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/metrics/hvg_overlap/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/hvg_overlap"
+      },
+      {
+        "name" : "batch_integration/metrics/isolated_label_asw",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/isolated_label_asw/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "isolated_label_asw",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/isolated_label_asw/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/metrics/isolated_label_asw/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/isolated_label_asw"
+      },
+      {
+        "name" : "batch_integration/metrics/isolated_label_f1",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/isolated_label_f1/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "isolated_label_f1",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/isolated_label_f1/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/metrics/isolated_label_f1/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/isolated_label_f1"
+      },
+      {
+        "name" : "batch_integration/metrics/kbet",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/kbet/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "kbet",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/kbet/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/metrics/kbet/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/kbet"
+      },
+      {
+        "name" : "batch_integration/metrics/pcr",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/pcr/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "pcr",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/metrics/pcr/config.vsh.yaml",
+          "functionalityNamespace" : "batch_integration/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/batch_integration/metrics/pcr/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/metrics/pcr"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/batch_integration/workflows/run_benchmark/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/batch_integration/workflows/run_benchmark",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { check_dataset_schema } from "${meta.resources_dir}/../../../../nextflow/common/check_dataset_schema/main.nf"
+include { extract_metadata } from "${meta.resources_dir}/../../../../nextflow/common/extract_metadata/main.nf"
+include { bbknn } from "${meta.resources_dir}/../../../../nextflow/batch_integration/methods/bbknn/main.nf"
+include { combat } from "${meta.resources_dir}/../../../../nextflow/batch_integration/methods/combat/main.nf"
+include { fastmnn_embedding } from "${meta.resources_dir}/../../../../nextflow/batch_integration/methods/fastmnn_embedding/main.nf"
+include { fastmnn_feature } from "${meta.resources_dir}/../../../../nextflow/batch_integration/methods/fastmnn_feature/main.nf"
+include { liger } from "${meta.resources_dir}/../../../../nextflow/batch_integration/methods/liger/main.nf"
+include { mnn_correct } from "${meta.resources_dir}/../../../../nextflow/batch_integration/methods/mnn_correct/main.nf"
+include { mnnpy } from "${meta.resources_dir}/../../../../nextflow/batch_integration/methods/mnnpy/main.nf"
+include { pyliger } from "${meta.resources_dir}/../../../../nextflow/batch_integration/methods/pyliger/main.nf"
+include { scalex_embed } from "${meta.resources_dir}/../../../../nextflow/batch_integration/methods/scalex_embed/main.nf"
+include { scalex_feature } from "${meta.resources_dir}/../../../../nextflow/batch_integration/methods/scalex_feature/main.nf"
+include { scanorama_embed } from "${meta.resources_dir}/../../../../nextflow/batch_integration/methods/scanorama_embed/main.nf"
+include { scanorama_feature } from "${meta.resources_dir}/../../../../nextflow/batch_integration/methods/scanorama_feature/main.nf"
+include { scanvi } from "${meta.resources_dir}/../../../../nextflow/batch_integration/methods/scanvi/main.nf"
+include { scvi } from "${meta.resources_dir}/../../../../nextflow/batch_integration/methods/scvi/main.nf"
+include { batch_embed as no_integration_batch_embed_viashalias } from "${meta.resources_dir}/../../../../nextflow/batch_integration/control_methods/no_integration/batch_embed/main.nf"
+no_integration_batch_embed = no_integration_batch_embed_viashalias.run(key: "no_integration_batch_embed")
+include { global_embed as no_integration_global_embed_viashalias } from "${meta.resources_dir}/../../../../nextflow/batch_integration/control_methods/no_integration/global_embed/main.nf"
+no_integration_global_embed = no_integration_global_embed_viashalias.run(key: "no_integration_global_embed")
+include { global_feature as no_integration_global_feature_viashalias } from "${meta.resources_dir}/../../../../nextflow/batch_integration/control_methods/no_integration/global_feature/main.nf"
+no_integration_global_feature = no_integration_global_feature_viashalias.run(key: "no_integration_global_feature")
+include { global_graph as no_integration_global_graph_viashalias } from "${meta.resources_dir}/../../../../nextflow/batch_integration/control_methods/no_integration/global_graph/main.nf"
+no_integration_global_graph = no_integration_global_graph_viashalias.run(key: "no_integration_global_graph")
+include { celltype_embed as perfect_integration_celltype_embed_viashalias } from "${meta.resources_dir}/../../../../nextflow/batch_integration/control_methods/perfect_integration/celltype_embed/main.nf"
+perfect_integration_celltype_embed = perfect_integration_celltype_embed_viashalias.run(key: "perfect_integration_celltype_embed")
+include { celltype_jitter_embed as perfect_integration_celltype_jitter_embed_viashalias } from "${meta.resources_dir}/../../../../nextflow/batch_integration/control_methods/perfect_integration/celltype_jitter_embed/main.nf"
+perfect_integration_celltype_jitter_embed = perfect_integration_celltype_jitter_embed_viashalias.run(key: "perfect_integration_celltype_jitter_embed")
+include { batch_embed as random_integration_batch_embed_viashalias } from "${meta.resources_dir}/../../../../nextflow/batch_integration/control_methods/random_integration/batch_embed/main.nf"
+random_integration_batch_embed = random_integration_batch_embed_viashalias.run(key: "random_integration_batch_embed")
+include { batch_feature as random_integration_batch_feature_viashalias } from "${meta.resources_dir}/../../../../nextflow/batch_integration/control_methods/random_integration/batch_feature/main.nf"
+random_integration_batch_feature = random_integration_batch_feature_viashalias.run(key: "random_integration_batch_feature")
+include { batch_graph as random_integration_batch_graph_viashalias } from "${meta.resources_dir}/../../../../nextflow/batch_integration/control_methods/random_integration/batch_graph/main.nf"
+random_integration_batch_graph = random_integration_batch_graph_viashalias.run(key: "random_integration_batch_graph")
+include { celltype_embed as random_integration_celltype_embed_viashalias } from "${meta.resources_dir}/../../../../nextflow/batch_integration/control_methods/random_integration/celltype_embed/main.nf"
+random_integration_celltype_embed = random_integration_celltype_embed_viashalias.run(key: "random_integration_celltype_embed")
+include { celltype_feature as random_integration_celltype_feature_viashalias } from "${meta.resources_dir}/../../../../nextflow/batch_integration/control_methods/random_integration/celltype_feature/main.nf"
+random_integration_celltype_feature = random_integration_celltype_feature_viashalias.run(key: "random_integration_celltype_feature")
+include { celltype_graph as random_integration_celltype_graph_viashalias } from "${meta.resources_dir}/../../../../nextflow/batch_integration/control_methods/random_integration/celltype_graph/main.nf"
+random_integration_celltype_graph = random_integration_celltype_graph_viashalias.run(key: "random_integration_celltype_graph")
+include { global_embed as random_integration_global_embed_viashalias } from "${meta.resources_dir}/../../../../nextflow/batch_integration/control_methods/random_integration/global_embed/main.nf"
+random_integration_global_embed = random_integration_global_embed_viashalias.run(key: "random_integration_global_embed")
+include { global_feature as random_integration_global_feature_viashalias } from "${meta.resources_dir}/../../../../nextflow/batch_integration/control_methods/random_integration/global_feature/main.nf"
+random_integration_global_feature = random_integration_global_feature_viashalias.run(key: "random_integration_global_feature")
+include { global_graph as random_integration_global_graph_viashalias } from "${meta.resources_dir}/../../../../nextflow/batch_integration/control_methods/random_integration/global_graph/main.nf"
+random_integration_global_graph = random_integration_global_graph_viashalias.run(key: "random_integration_global_graph")
+include { feature_to_embed } from "${meta.resources_dir}/../../../../nextflow/batch_integration/transformers/feature_to_embed/main.nf"
+include { embed_to_graph } from "${meta.resources_dir}/../../../../nextflow/batch_integration/transformers/embed_to_graph/main.nf"
+include { asw_batch } from "${meta.resources_dir}/../../../../nextflow/batch_integration/metrics/asw_batch/main.nf"
+include { asw_label } from "${meta.resources_dir}/../../../../nextflow/batch_integration/metrics/asw_label/main.nf"
+include { cell_cycle_conservation } from "${meta.resources_dir}/../../../../nextflow/batch_integration/metrics/cell_cycle_conservation/main.nf"
+include { clustering_overlap } from "${meta.resources_dir}/../../../../nextflow/batch_integration/metrics/clustering_overlap/main.nf"
+include { graph_connectivity } from "${meta.resources_dir}/../../../../nextflow/batch_integration/metrics/graph_connectivity/main.nf"
+include { hvg_overlap } from "${meta.resources_dir}/../../../../nextflow/batch_integration/metrics/hvg_overlap/main.nf"
+include { isolated_label_asw } from "${meta.resources_dir}/../../../../nextflow/batch_integration/metrics/isolated_label_asw/main.nf"
+include { isolated_label_f1 } from "${meta.resources_dir}/../../../../nextflow/batch_integration/metrics/isolated_label_f1/main.nf"
+include { kbet } from "${meta.resources_dir}/../../../../nextflow/batch_integration/metrics/kbet/main.nf"
+include { pcr } from "${meta.resources_dir}/../../../../nextflow/batch_integration/metrics/pcr/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // construct list of methods
+  methods = [
+    bbknn,
+    combat,
+    fastmnn_embedding,
+    fastmnn_feature,
+    liger,
+    mnn_correct,
+    mnnpy,
+    pyliger,
+    scalex_embed,
+    scalex_feature,
+    scanorama_embed,
+    scanorama_feature,
+    scanvi,
+    scvi,
+    no_integration_batch_embed,
+    no_integration_global_embed,
+    no_integration_global_feature,
+    no_integration_global_graph,
+    perfect_integration_celltype_embed,
+    perfect_integration_celltype_jitter_embed,
+    random_integration_batch_embed,
+    random_integration_batch_feature,
+    random_integration_batch_graph,
+    random_integration_celltype_embed,
+    random_integration_celltype_feature,
+    random_integration_celltype_graph,
+    random_integration_global_embed,
+    random_integration_global_feature,
+    random_integration_global_graph,
+  ]
+
+  // construct list of metrics
+  metrics = [
+    asw_batch,
+    asw_label,
+    cell_cycle_conservation,
+    clustering_overlap,
+    graph_connectivity,
+    hvg_overlap,
+    isolated_label_asw,
+    isolated_label_f1,
+    kbet,
+    pcr
+  ]
+
+  /****************************
+   * EXTRACT DATASET METADATA *
+   ****************************/
+  dataset_ch = input_ch
+    // store join id
+    | map{ id, state -> 
+      [id, state + ["_meta": [join_id: id]]]
+    }
+    
+    // extract the dataset metadata
+    | extract_metadata.run(
+      fromState: [input: "input_solution"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+  /***************************
+   * RUN METHODS AND METRICS *
+   ***************************/
+  // run all methods
+  method_out_ch1 = dataset_ch
+    | runEach(
+      components: methods,
+
+      // use the 'filter' argument to only run a method on the normalisation the component is asking for
+      filter: { id, state, comp ->
+        def norm = state.dataset_uns.normalization_id
+        def pref = comp.config.functionality.info.preferred_normalization
+        // if the preferred normalisation is none at all,
+        // we can pass whichever dataset we want
+        def norm_check = (norm == "log_cp10k" && pref == "counts") || norm == pref
+        def method_check = !state.method_ids || state.method_ids.contains(comp.config.functionality.name)
+
+        method_check && norm_check
+      },
+
+      // define a new 'id' by appending the method name to the dataset id
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: [input: "input_dataset"],
+
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          method_id: comp.config.functionality.name,
+          method_output: output.output,
+          method_subtype: comp.config.functionality.info.subtype
+        ]
+      }
+    )
+  
+
+  // append feature->embed transformations
+  method_out_ch2 = method_out_ch1
+    | runEach(
+      components: feature_to_embed,
+      id: { id, state, comp ->
+        id + "_f2e"
+      },
+      filter: { id, state, comp -> state.method_subtype == "feature"},
+      fromState: [ input: "method_output" ],
+      toState: { id, output, state, comp ->
+        state + [
+          method_output: output.output,
+          method_subtype: comp.config.functionality.info.subtype
+        ]
+      }
+    )
+    | mix(method_out_ch1)
+
+  // append embed->graph transformations
+  method_out_ch3 = method_out_ch2
+    | runEach(
+      components: embed_to_graph,
+      id: { id, state, comp ->
+        id + "_e2g"
+      },
+      filter: { id, state, comp -> state.method_subtype == "embedding"},
+      fromState: [ input: "method_output" ],
+      toState: { id, output, state, comp ->
+        state + [
+          method_output: output.output,
+          method_subtype: comp.config.functionality.info.subtype
+        ]
+      }
+    )
+    | mix(method_out_ch2)
+
+  // run metrics
+  score_ch = method_out_ch3
+    | runEach(
+      components: metrics,
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+      filter: { id, state, comp ->
+        state.method_subtype == comp.config.functionality.info.subtype
+      },
+      fromState: [
+        input_integrated: "method_output",
+        input_solution: "input_solution"
+      ],
+      toState: { id, output, state, comp ->
+        state + [
+          metric_id: comp.config.functionality.name,
+          metric_output: output.output
+        ]
+      }
+    )
+
+
+  /******************************
+   * GENERATE OUTPUT YAML FILES *
+   ******************************/
+  // TODO: can we store everything below in a separate helper function?
+
+  // extract the dataset metadata
+  dataset_meta_ch = dataset_ch
+    // only keep one of the normalization methods
+    | filter{ id, state ->
+      state.dataset_uns.normalization_id == "log_cp10k"
+    }
+    | joinStates { ids, states ->
+      // store the dataset metadata in a file
+      def dataset_uns = states.collect{state ->
+        def uns = state.dataset_uns.clone()
+        uns.remove("normalization_id")
+        uns
+      }
+      def dataset_uns_yaml_blob = toYamlBlob(dataset_uns)
+      def dataset_uns_file = tempFile("dataset_uns.yaml")
+      dataset_uns_file.write(dataset_uns_yaml_blob)
+
+      ["output", [output_dataset_info: dataset_uns_file]]
+    }
+
+  output_ch = score_ch
+    // extract scores
+    | extract_metadata.run(
+      key: "extract_scores",
+      fromState: [input: "metric_output"],
+      toState: { id, output, state ->
+        state + [
+          score_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | joinStates { ids, states ->
+      // store the method configs in a file
+      def method_configs = methods.collect{it.config}
+      def method_configs_yaml_blob = toYamlBlob(method_configs)
+      def method_configs_file = tempFile("method_configs.yaml")
+      method_configs_file.write(method_configs_yaml_blob)
+
+      // store the metric configs in a file
+      def metric_configs = metrics.collect{it.config}
+      def metric_configs_yaml_blob = toYamlBlob(metric_configs)
+      def metric_configs_file = tempFile("metric_configs.yaml")
+      metric_configs_file.write(metric_configs_yaml_blob)
+
+      // store the task info in a file
+      def task_info_file = meta.resources_dir.resolve("task_info.yaml")
+
+      // store the scores in a file
+      def score_uns = states.collect{it.score_uns}
+      def score_uns_yaml_blob = toYamlBlob(score_uns)
+      def score_uns_file = tempFile("score_uns.yaml")
+      score_uns_file.write(score_uns_yaml_blob)
+
+      // create state
+      def new_state = [
+        output_method_configs: method_configs_file,
+        output_metric_configs: metric_configs_file,
+        output_task_info: task_info_file,
+        output_scores: score_uns_file,
+        _meta: states[0]._meta
+      ]
+
+      ["output", new_state]
+    }
+
+    // merge all of the output data 
+    | mix(dataset_meta_ch)
+    | joinStates{ ids, states ->
+      def mergedStates = states.inject([:]) { acc, m -> acc + m }
+      [ids[0], mergedStates]
+    }
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/batch_integration/workflows/run_benchmark/nextflow.config b/target/nextflow/batch_integration/workflows/run_benchmark/nextflow.config
new file mode 100644
index 0000000000..f115425f1b
--- /dev/null
+++ b/target/nextflow/batch_integration/workflows/run_benchmark/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'batch_integration/workflows/run_benchmark'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/batch_integration/workflows/run_benchmark/task_info.yaml b/target/nextflow/batch_integration/workflows/run_benchmark/task_info.yaml
new file mode 100644
index 0000000000..bc3a575029
--- /dev/null
+++ b/target/nextflow/batch_integration/workflows/run_benchmark/task_info.yaml
@@ -0,0 +1,41 @@
+name: batch_integration
+label: Batch Integration
+v1:
+  path: openproblems/tasks/batch_integration/README.md
+  commit: 637163fba7d74ab5393c2adbee5354dcf4d46f85
+summary: Remove unwanted batch effects from scRNA data while retaining biologically meaningful variation.
+image: thumbnail.svg
+motivation: |
+  As single-cell technologies advance, single-cell datasets are growing both in size and complexity.
+  Especially in consortia such as the Human Cell Atlas, individual studies combine data from multiple labs, each sequencing multiple individuals possibly with different technologies.
+  This gives rise to complex batch effects in the data that must be computationally removed to perform a joint analysis.
+  These batch integration methods must remove the batch effect while not removing relevant biological information.
+  Currently, over 200 tools exist that aim to remove batch effects scRNA-seq datasets [@zappia2018exploring].
+  These methods balance the removal of batch effects with the conservation of nuanced biological information in different ways.
+  This abundance of tools has complicated batch integration method choice, leading to several benchmarks on this topic [@luecken2020benchmarking; @tran2020benchmark; @chazarragil2021flexible; @mereu2020benchmarking].
+  Yet, benchmarks use different metrics, method implementations and datasets. Here we build a living benchmarking task for batch integration methods with the vision of improving the consistency of method evaluation.
+description: |
+  In this task we evaluate batch integration methods on their ability to remove batch effects in the data while conserving variation attributed to biological effects.
+  As input, methods require either normalised or unnormalised data with multiple batches and consistent cell type labels.
+  The batch integrated output can be a feature matrix, a low dimensional embedding and/or a neighbourhood graph.
+  The respective batch-integrated representation is then evaluated using sets of metrics that capture how well batch effects are removed and whether biological variance is conserved.
+  We have based this particular task on the latest, and most extensive benchmark of single-cell data integration methods.
+authors:
+  - name: Michaela Mueller
+    roles: [ maintainer, author ]
+    info:
+      github: mumichae
+  - name: Kai Waldrant
+    roles: [ contributor ]
+    info:
+      github: KaiWaldrant
+      orcid: "0009-0003-8555-1361"
+  - name: Robrecht Cannoodt
+    roles: [ contributor ]
+    info:
+      github: rcannood
+      orcid: "0000-0003-3641-729X"
+  - name: Daniel Strobl
+    roles: [ author ]
+    info:
+      github: danielStrobl
diff --git a/target/nextflow/common/check_dataset_schema/.config.vsh.yaml b/target/nextflow/common/check_dataset_schema/.config.vsh.yaml
new file mode 100644
index 0000000000..93a9ce008c
--- /dev/null
+++ b/target/nextflow/common/check_dataset_schema/.config.vsh.yaml
@@ -0,0 +1,131 @@
+functionality:
+  name: "check_dataset_schema"
+  namespace: "common"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input"
+      description: "A h5ad file."
+      info: null
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--schema"
+      description: "A schema file for the h5ad object."
+      info: null
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Arguments"
+    arguments:
+    - type: "boolean"
+      name: "--stop_on_error"
+      description: "Whether or not to stop with exit code 1 if the input file does\
+        \ not adhere to the schema."
+      info: null
+      default:
+      - false
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Output"
+    arguments:
+    - type: "file"
+      name: "--output"
+      description: "If specified, this file will contain a structured log of which\
+        \ checks succeeded (or not)."
+      info: null
+      example:
+      - "checks.json"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Checks if the dataset has the necessary slots that are predefined\
+    \ in a schema."
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  test_setup:
+  - type: "python"
+    user: false
+    packages:
+    - "viashpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema/check_dataset_schema"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/common/check_dataset_schema/main.nf b/target/nextflow/common/check_dataset_schema/main.nf
new file mode 100644
index 0000000000..dee338c7a0
--- /dev/null
+++ b/target/nextflow/common/check_dataset_schema/main.nf
@@ -0,0 +1,3515 @@
+// check_dataset_schema 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "check_dataset_schema",
+    "namespace" : "common",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input",
+            "description" : "A h5ad file.",
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--schema",
+            "description" : "A schema file for the h5ad object.",
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Arguments",
+        "arguments" : [
+          {
+            "type" : "boolean",
+            "name" : "--stop_on_error",
+            "description" : "Whether or not to stop with exit code 1 if the input file does not adhere to the schema.",
+            "default" : [
+              false
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Output",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output",
+            "description" : "If specified, this file will contain a structured log of which checks succeeded (or not).",
+            "example" : [
+              "checks.json"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/"
+      }
+    ],
+    "description" : "Checks if the dataset has the necessary slots that are predefined in a schema.",
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/common/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "test.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/"
+      }
+    ],
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "test_setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "viashpy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import yaml
+import json
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'schema': $( if [ ! -z ${VIASH_PAR_SCHEMA+x} ]; then echo "r'${VIASH_PAR_SCHEMA//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'stop_on_error': $( if [ ! -z ${VIASH_PAR_STOP_ON_ERROR+x} ]; then echo "r'${VIASH_PAR_STOP_ON_ERROR//\\'/\\'\\"\\'\\"r\\'}'.lower() == 'true'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+def check_structure(slot, slot_info, adata_slot):
+  missing = []
+  if slot == "X":
+    slot_info["name"] = "X"
+    slot_info = [slot_info]
+  for obj in slot_info:
+    adata_data = adata_slot.get(obj['name']) if slot != 'X' else adata_slot
+    if obj.get('required') and adata_data is None:
+      missing.append(obj['name'])
+    # todo: check types
+  return missing
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input'])
+
+# create data structure
+out = {
+  "exit_code": 0,
+  "error": {},
+  "data_schema": "ok"
+}
+
+print("Check AnnData against schema", flush=True)
+with open(par["schema"], "r") as f:
+  data_struct = yaml.safe_load(f)
+
+def_slots = data_struct['info']['slots']
+
+out = {
+  "exit_code": 0,
+  "error": {},
+  "data_schema": "ok"
+}
+for slot in def_slots:
+  print("Checking slot", slot, flush=True)
+  missing = check_structure(slot, def_slots[slot], getattr(adata, slot))
+  if missing:
+    print(f"Dataset is missing {slot} {missing}", flush=True)
+    out['exit_code'] = 1
+    out['data_schema'] = 'not ok'
+    out['error'][slot] = missing
+
+with open(par["output"], "w") as f:
+  json.dump(out, f, indent=2)
+
+if par['stop_on_error']:
+  exit(out['exit_code'])  
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/common/check_dataset_schema",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/common/check_dataset_schema/nextflow.config b/target/nextflow/common/check_dataset_schema/nextflow.config
new file mode 100644
index 0000000000..92ce535442
--- /dev/null
+++ b/target/nextflow/common/check_dataset_schema/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'common/check_dataset_schema'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Checks if the dataset has the necessary slots that are predefined in a schema.'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/common/check_yaml_schema/.config.vsh.yaml b/target/nextflow/common/check_yaml_schema/.config.vsh.yaml
new file mode 100644
index 0000000000..f5f0443055
--- /dev/null
+++ b/target/nextflow/common/check_yaml_schema/.config.vsh.yaml
@@ -0,0 +1,90 @@
+functionality:
+  name: "check_yaml_schema"
+  namespace: "common"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input"
+      description: "A yaml file."
+      info: null
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--schema"
+      description: "A schema file for the yaml file."
+      info: null
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Checks if a YAML file adheres to a custom schema file."
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "jsonschema"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/check_yaml_schema/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_yaml_schema"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_yaml_schema/check_yaml_schema"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/common/check_yaml_schema/main.nf b/target/nextflow/common/check_yaml_schema/main.nf
new file mode 100644
index 0000000000..ada0d66c59
--- /dev/null
+++ b/target/nextflow/common/check_yaml_schema/main.nf
@@ -0,0 +1,3450 @@
+// check_yaml_schema 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "check_yaml_schema",
+    "namespace" : "common",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input",
+            "description" : "A yaml file.",
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--schema",
+            "description" : "A schema file for the yaml file.",
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/check_yaml_schema/"
+      }
+    ],
+    "description" : "Checks if a YAML file adheres to a custom schema file.",
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "jsonschema"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/common/check_yaml_schema/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_yaml_schema",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import jsonschema
+import yaml
+from pathlib import Path
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'schema': $( if [ ! -z ${VIASH_PAR_SCHEMA+x} ]; then echo "r'${VIASH_PAR_SCHEMA//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+def yaml_to_dict(file_path):
+    with open(file_path, 'r') as stream:
+        try:
+            return yaml.safe_load(stream)
+        except yaml.YAMLError as exc:
+            print(exc)
+
+def load_schemas(schema_dir):
+    schema_files = list(schema_dir.glob("./**/schema_*.yaml"))
+    
+    schemas = {}
+    for file in schema_files:
+        schema = yaml_to_dict(file)
+        schemas[file.absolute()] = schema
+    
+    return schemas
+
+def create_validator(schema_name, schemas):
+    schema_store = {}
+    for name, value in schemas.items():
+        schema_store[f"file://{name}"] = value
+
+    # Setting the first schema as the main schema
+    
+    main_schema = schemas[schema_name]
+    resolver = jsonschema.RefResolver(
+        base_uri=f"file://{schema_name}",
+        referrer=main_schema,
+        store=schema_store
+    )
+
+    return jsonschema.Draft7Validator(main_schema, resolver=resolver)
+
+print(">> Read input yaml", flush=True)
+input_yaml_file = Path(par["input"])
+with open(input_yaml_file, 'r') as f:
+  input_yaml = yaml.safe_load(f)
+
+print(">> Read schema(s)", flush=True)
+schema_yaml_file = Path(par["schema"])
+schemas = load_schemas(schema_yaml_file.parent)
+
+print(">> Validate input yaml against schema", flush=True)
+validator = create_validator(schema_yaml_file.absolute(), schemas)
+validator.validate(input_yaml)
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/common/check_yaml_schema",
+    "tag" : "2.0.0"
+  },
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/common/check_yaml_schema/nextflow.config b/target/nextflow/common/check_yaml_schema/nextflow.config
new file mode 100644
index 0000000000..9108db431f
--- /dev/null
+++ b/target/nextflow/common/check_yaml_schema/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'common/check_yaml_schema'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Checks if a YAML file adheres to a custom schema file.'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/common/create_component/.config.vsh.yaml b/target/nextflow/common/create_component/.config.vsh.yaml
new file mode 100644
index 0000000000..0b864440b5
--- /dev/null
+++ b/target/nextflow/common/create_component/.config.vsh.yaml
@@ -0,0 +1,172 @@
+functionality:
+  name: "create_component"
+  namespace: "common"
+  version: "2.0.0"
+  arguments:
+  - type: "string"
+    name: "--task"
+    description: "Which task the component will be added to."
+    info: null
+    example:
+    - "denoising"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--type"
+    description: "The type of component to create. Typically must be one of 'method',\
+      \ 'control_method' or 'metric'."
+    info: null
+    example:
+    - "metric"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--language"
+    description: "Which scripting language to use. Options are 'python', 'r'."
+    info: null
+    default:
+    - "python"
+    required: false
+    choices:
+    - "python"
+    - "r"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--name"
+    description: "Name of the new method, formatted in snake case."
+    info: null
+    example:
+    - "new_comp"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Path to the component directory. Suggested location is `src/<TASK>/<TYPE>s/<NAME>`."
+    info: null
+    default:
+    - "src/tasks/${VIASH_PAR_TASK}/${VIASH_PAR_TYPE}s/${VIASH_PAR_NAME}"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--api_file"
+    description: "Which API file to use. Defaults to `src/<TASK>/api/comp_<TYPE>.yaml`.\n\
+      In tasks with different subtypes of method, this location might not exist and\
+      \ you might need\nto manually specify a different API file to inherit from.\n"
+    info: null
+    default:
+    - "src/tasks/${VIASH_PAR_TASK}/api/comp_${VIASH_PAR_TYPE}.yaml"
+    must_exist: false
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--viash_yaml"
+    description: "Path to the project config file. Needed for knowing the relative\
+      \ location of a file to the project root.\n"
+    info: null
+    default:
+    - "_viash.yaml"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/read_and_merge_yaml.py"
+  description: "Create a component Viash component.\n\nUsage:\n```\nbin/create_component\
+    \ --task denoising --type method --language r --name foo\nbin/create_component\
+    \ --task denoising --type metric --language python --name bar\n```\n"
+  test_resources:
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  - type: "file"
+    path: "src"
+    dest: "openproblems/src"
+  - type: "file"
+    path: "_viash.yaml"
+    dest: "openproblems/_viash.yaml"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.10-slim"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "ruamel.yaml"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/create_component/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/common/create_component"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/common/create_component/create_component"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/common/create_component/main.nf b/target/nextflow/common/create_component/main.nf
new file mode 100644
index 0000000000..637f0ac970
--- /dev/null
+++ b/target/nextflow/common/create_component/main.nf
@@ -0,0 +1,3971 @@
+// create_component 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "create_component",
+    "namespace" : "common",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "string",
+        "name" : "--task",
+        "description" : "Which task the component will be added to.",
+        "example" : [
+          "denoising"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--type",
+        "description" : "The type of component to create. Typically must be one of 'method', 'control_method' or 'metric'.",
+        "example" : [
+          "metric"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--language",
+        "description" : "Which scripting language to use. Options are 'python', 'r'.",
+        "default" : [
+          "python"
+        ],
+        "required" : false,
+        "choices" : [
+          "python",
+          "r"
+        ],
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--name",
+        "description" : "Name of the new method, formatted in snake case.",
+        "example" : [
+          "new_comp"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "description" : "Path to the component directory. Suggested location is `src/<TASK>/<TYPE>s/<NAME>`.",
+        "default" : [
+          "src/tasks/${VIASH_PAR_TASK}/${VIASH_PAR_TYPE}s/${VIASH_PAR_NAME}"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--api_file",
+        "description" : "Which API file to use. Defaults to `src/<TASK>/api/comp_<TYPE>.yaml`.\nIn tasks with different subtypes of method, this location might not exist and you might need\nto manually specify a different API file to inherit from.\n",
+        "default" : [
+          "src/tasks/${VIASH_PAR_TASK}/api/comp_${VIASH_PAR_TYPE}.yaml"
+        ],
+        "must_exist" : false,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--viash_yaml",
+        "description" : "Path to the project config file. Needed for knowing the relative location of a file to the project root.\n",
+        "default" : [
+          "_viash.yaml"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/create_component/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/helper_functions/read_and_merge_yaml.py",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "description" : "Create a component Viash component.\n\nUsage:\n```\nbin/create_component --task denoising --type method --language r --name foo\nbin/create_component --task denoising --type metric --language python --name bar\n```\n",
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "test.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/create_component/"
+      },
+      {
+        "type" : "file",
+        "path" : "src",
+        "dest" : "openproblems/src",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "_viash.yaml",
+        "dest" : "openproblems/_viash.yaml",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "python:3.10-slim",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "ruamel.yaml"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/common/create_component/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/create_component",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+from typing import Any
+from pathlib import Path
+import sys
+import os
+import re
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'task': $( if [ ! -z ${VIASH_PAR_TASK+x} ]; then echo "r'${VIASH_PAR_TASK//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'type': $( if [ ! -z ${VIASH_PAR_TYPE+x} ]; then echo "r'${VIASH_PAR_TYPE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'language': $( if [ ! -z ${VIASH_PAR_LANGUAGE+x} ]; then echo "r'${VIASH_PAR_LANGUAGE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'name': $( if [ ! -z ${VIASH_PAR_NAME+x} ]; then echo "r'${VIASH_PAR_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'api_file': $( if [ ! -z ${VIASH_PAR_API_FILE+x} ]; then echo "r'${VIASH_PAR_API_FILE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'viash_yaml': $( if [ ! -z ${VIASH_PAR_VIASH_YAML+x} ]; then echo "r'${VIASH_PAR_VIASH_YAML//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# import helper function
+sys.path.append(meta["resources_dir"])
+from read_and_merge_yaml import read_and_merge_yaml
+
+def strip_margin(text: str) -> str:
+  return re.sub("(^|\\\\n)[ \\\\t]*\\\\|", "\\\\\\\\1", text)
+
+def create_config(par, component_type, pretty_name, script_path) -> str:
+  info_str = generate_info(par, component_type, pretty_name)
+  resources_str = generate_resources(par, script_path)
+  docker_platform = generate_docker_platform(par)
+
+  return strip_margin(f\'\'\'\\\\
+# The API specifies which type of component this is.
+# It contains specifications for:
+#   - The input/output files
+#   - Common parameters
+#   - A unit test
+__merge__: {os.path.relpath(par["api_file"], par["output"])}
+
+functionality:
+  # A unique identifier for your component (required).
+  # Can contain only lowercase letters or underscores.
+  name: {par["name"]}
+
+  # Metadata for your component
+  info:
+{info_str}
+  # Component-specific parameters (optional)
+  # arguments:
+  #   - name: "--n_neighbors"
+  #     type: "integer"
+  #     default: 5
+  #     description: Number of neighbors to use.
+
+  # Resources required to run the component
+  resources:
+{resources_str}
+platforms:
+  # Specifications for the Docker image for this component.
+{docker_platform}
+  # This platform allows running the component natively
+  - type: native
+  # Allows turning the component into a Nextflow module / pipeline.
+  - type: nextflow
+    directives:
+      label: [midtime,midmem, midcpu]
+\'\'\'
+  )
+
+def generate_info(par, component_type, pretty_name) -> str:
+  """Generate the functionality info for a component."""
+  if component_type in ["method", "control_method"]:
+    str = strip_margin(f\'\'\'\\\\
+    # A relatively short label, used when rendering visualisarions (required)
+    label: {pretty_name}
+    # A one sentence summary of how this method works (required). Used when 
+    # rendering summary tables.
+    summary: "FILL IN: A one sentence summary of this method."
+    # A multi-line description of how this component works (required). Used
+    # when rendering reference documentation.
+    description: |
+      FILL IN: A (multi-line) description of how this method works.
+    # Which normalisation method this component prefers to use (required).
+    preferred_normalization: log_cp10k
+\'\'\')
+    if component_type == "method":
+      str += strip_margin(f\'\'\'\\\\
+    # A reference key from the bibtex library at src/common/library.bib (required).
+    reference: bibtex_reference_key
+    # URL to the documentation for this method (required).
+    documentation_url: https://url.to/the/documentation
+    # URL to the code repository for this method (required).
+    repository_url: https://github.com/organisation/repository
+\'\'\')
+    return str
+  elif component_type == "metric":
+    return strip_margin(f\'\'\'\\\\
+    metrics:
+      # A unique identifier for your metric (required).
+      # Can contain only lowercase letters or underscores.
+      name: {par["name"]}
+      # A relatively short label, used when rendering visualisarions (required)
+      label: {pretty_name}
+      # A one sentence summary of how this metric works (required). Used when 
+      # rendering summary tables.
+      summary: "FILL IN: A one sentence summary of this metric."
+      # A multi-line description of how this component works (required). Used
+      # when rendering reference documentation.
+      description: |
+        FILL IN: A (multi-line) description of how this metric works.
+      # A reference key from the bibtex library at src/common/library.bib (required).
+      reference: bibtex_reference_key
+      # URL to the documentation for this metric (required).
+      documentation_url: https://url.to/the/documentation
+      # URL to the code repository for this metric (required).
+      repository_url: https://github.com/organisation/repository
+      # The minimum possible value for this metric (required)
+      min: 0
+      # The maximum possible value for this metric (required)
+      max: 1
+      # Whether a higher value represents a 'better' solution (required)
+      maximize: true
+\'\'\')
+
+
+def generate_resources(par, script_path) -> str:
+  """Add the script to the functionality resources."""
+  if par["language"] == "python":
+    type_str = "python_script"
+  elif par["language"] == "r":
+    type_str = "r_script"
+
+  return strip_margin(f\'\'\'\\\\
+    # The script of your component (required)
+    - type: {type_str}
+      path: {script_path}
+    # Additional resources your script needs (optional)
+    # - type: file
+    #   path: weights.pt
+\'\'\')
+
+def generate_docker_platform(par) -> str:
+  """Set up the docker platform for Python."""
+  if par["language"] == "python":
+    image_str = "openproblems/base_python:1.0.0"
+    setup_type = "python"
+    package_example = "scib==1.1.5"
+  elif par["language"] == "r":
+    image_str = "openproblems/base_r:1.0.0"
+    setup_type = "r"
+    package_example = "tidyverse"
+  return strip_margin(f\'\'\'\\\\
+  - type: docker
+    image: {image_str}
+    # Add custom dependencies here (optional). For more information, see
+    # https://viash.io/reference/config/platforms/docker/#setup .
+    # setup:
+    #   - type: {setup_type}
+    #     packages: {package_example}
+\'\'\')
+
+def set_par_values(config) -> None:
+  """Adds values to each of the arguments in a config file."""
+  args = config['functionality']['arguments']
+  for argi, arg in enumerate(args):
+    key = re.sub("^-*", "", arg['name'])
+
+    # find value
+    if arg["type"] != "file":
+      value = arg.get("default", arg.get("example", "..."))
+    elif arg.get("direction", "input") == "input":
+      key_strip = key.replace("input_", "")
+      value = f'resources_test/{par["task"]}/pancreas/{key_strip}.h5ad'
+    else:
+      key_strip = key.replace("output_", "")
+      value = f'{key_strip}.h5ad'
+
+    # store key and value
+    config['functionality']['arguments'][argi]["key"] = key
+    config['functionality']['arguments'][argi]["value"] = value
+  
+def look_for_adata_arg(args, uns_field):
+  """Look for an argument that has a .uns[uns_field] in its info.slots."""
+  for arg in args:
+    uns = arg.get("info", {}).get("slots", {}).get("uns", [])
+    for unval in uns:
+      if unval.get("name") == uns_field:
+        return arg["key"]
+  return "adata"
+
+def write_output_python(arg, copy_from_adata, is_metric):
+  """Create code for writing the output h5ad files."""
+  slots = arg.get("info", {}).get("slots", {})
+  outer = []
+  for group_name, slots in slots.items():
+    inner = []
+    for slot in slots:
+      if group_name == "uns" and slot["name"] in ["dataset_id", "normalization_id"]:
+        value = f"{copy_from_adata}.uns['{slot['name']}']"
+      elif group_name == "uns" and slot["name"] == "method_id":
+        if is_metric:
+          value = f"{copy_from_adata}.uns['{slot['name']}']"
+        else:
+          value = "meta['functionality_name']"
+      else:
+        value = group_name + "_" + slot["name"]
+      inner.append(f"'{slot['name']}': {value}")
+    inner_values = ',\\\\n    '.join(inner)
+    outer.append(f"{group_name}={{\\\\n    {inner_values}\\\\n  }}")
+  outer_values = ',\\\\n  '.join(outer)
+  return strip_margin(
+    f\'\'\'\\\\
+print("Write {arg["key"]} AnnData to file", flush=True)
+{arg["key"]} = ad.AnnData(
+  {outer_values}
+)
+{arg["key"]}.write_h5ad(par['{arg["key"]}'], compression='gzip')\'\'\'
+  )
+
+def write_output_r(arg, copy_from_adata, is_metric):
+  """Create code for writing the output h5ad files."""
+  slots = arg.get("info", {}).get("slots", {})
+  outer = []
+  for group_name, slots in slots.items():
+    inner = []
+    for slot in slots:
+      if group_name == "uns" and slot["name"] in ["dataset_id", "normalization_id"]:
+        value = f"{copy_from_adata}\\$uns[[\\\\"{slot['name']}\\\\"]]"
+      elif group_name == "uns" and slot["name"] == "method_id":
+        if is_metric:
+          value = f"{copy_from_adata}\\$uns[[\\\\"{slot['name']}\\\\"]]"
+        else:
+          value = "meta[[\\\\"functionality_name\\\\"]]"
+      else:
+        value = group_name + "_" + slot["name"]
+      inner.append(f"{slot['name']} = {value}")
+    inner_values = ',\\\\n    '.join(inner)
+    outer.append(f"{group_name} = list(\\\\n    {inner_values}\\\\n  )")
+  outer_values = ',\\\\n  '.join(outer)
+  return strip_margin(
+    f\'\'\'\\\\
+cat("Write {arg["key"]} AnnData to file\\\\\\\\n")
+{arg["key"]} <- anndata::AnnData(
+  {outer_values}
+)
+{arg["key"]}\\$write_h5ad(par[["{arg["key"]}"]], compression = "gzip")\'\'\'
+  )
+
+def create_python_script(par, config, type):
+  args = config['functionality']['arguments']
+
+  # create the arguments of the par string
+  par_string = ",\\\\n  ".join(f"'{arg['key']}': '{arg['value']}'" for arg in args)
+
+  # create code for reading the input h5ad file
+  read_h5ad_string = "\\\\n".join(
+    f"{arg['key']} = ad.read_h5ad(par['{arg['key']}'])"
+    for arg in args
+    if arg['type'] == "file"
+    and arg.get('direction', "input") == "input"
+  )
+
+  # determine which adata to copy from
+  copy_from_adata = look_for_adata_arg(args, "method_id" if type == "metric" else "dataset_id")
+
+  # create code for writing the output h5ad files
+  write_h5ad_string = "\\\\n".join(
+    write_output_python(arg, copy_from_adata, type == "metric")
+    for arg in args
+    if arg["type"] == "file"
+    and arg.get("direction", "input") == "output"
+  )
+
+  if type == 'metric':
+    processing_string = strip_margin(f\'\'\'\\\\
+print('Compute metrics', flush=True)
+# metric_ids and metric_values can have length > 1
+# but should be of equal length
+uns_metric_ids = [ '{par['name']}' ]
+uns_metric_values = [ 0.5 ]\'\'\')
+  else:
+    processing_string = strip_margin(f\'\'\'\\\\
+print('Preprocess data', flush=True)
+# ... preprocessing ...
+
+print('Train model', flush=True)
+# ... train model ...
+
+print('Generate predictions', flush=True)
+# ... generate predictions ...\'\'\')
+
+  script = strip_margin(f\'\'\'\\\\
+import anndata as ad
+
+## VIASH START
+# Note: this section is auto-generated by viash at runtime. To edit it, make changes
+# in config.vsh.yaml and then run \\`viash config inject config.vsh.yaml\\`.
+par = {{
+  {par_string}
+}}
+meta = {{
+  'functionality_name': '{par["name"]}'
+}}
+## VIASH END
+
+print('Reading input files', flush=True)
+{read_h5ad_string}
+
+{processing_string}
+
+{write_h5ad_string}
+\'\'\')
+
+  return script
+
+def create_r_script(par, api_spec, type):
+  args = api_spec['functionality']['arguments']
+
+  # create the arguments of the par string
+  par_string = ",\\\\n  ".join(f'{arg["key"]} = "{arg["value"]}"' for arg in args)
+
+  # create helpers for reading the h5ad file
+  read_h5ad_string = "\\\\n".join(
+    f'{arg["key"]} <- anndata::read_h5ad(par[["{arg["key"]}"]])'
+    for arg in args
+    if arg['type'] == "file"
+    and arg.get("direction", "input") == "input"
+  )
+
+  # determine which adata to copy from
+  copy_from_adata = look_for_adata_arg(args, "method_id" if type == "metric" else "dataset_id")
+
+  # create code for writing the output h5ad files
+  write_h5ad_string = "\\\\n".join(
+    write_output_r(arg, copy_from_adata, type == "metric")
+    for arg in args
+    if arg["type"] == "file"
+    and arg.get("direction", "input") == "output"
+  )
+
+  if type == 'metric':
+    processing_string = strip_margin(f\'\'\'\\\\
+cat("Compute metrics\\\\\\\\n")
+# metric_ids and metric_values can have length > 1
+# but should be of equal length
+uns_metric_ids <- c("{par['name']}")
+uns_metric_values <- c(0.5)\'\'\')
+  else:
+    processing_string = strip_margin(f\'\'\'\\\\
+cat("Preprocess data\\\\\\\\n")
+# ... preprocessing ...
+
+cat("Train model\\\\\\\\n")
+# ... train model ...
+
+cat("Generate predictions\\\\\\\\n")
+# ... generate predictions ...\'\'\')
+
+  script = strip_margin(f\'\'\'\\\\
+library(anndata)
+
+## VIASH START
+par <- list(
+  {par_string}
+)
+meta <- list(
+  functionality_name = "{par["name"]}"
+)
+## VIASH END
+
+cat("Reading input files\\\\\\\\n")
+{read_h5ad_string}
+
+{processing_string}
+
+{write_h5ad_string}
+\'\'\')
+
+  return script
+
+# def read_viash_config(file):
+#   file = file.absolute()
+
+#   # read in config
+#   command = ["viash", "config", "view", str(file)]
+
+#   # Execute the command and capture the output
+#   output = subprocess.check_output(
+#     command,
+#     universal_newlines=True,
+#     cwd=str(file.parent)
+#   )
+
+#   # Parse the output as YAML
+#   config = yaml.load(output)
+
+#   return config
+
+
+def main(par):
+  ####### CHECK INPUTS #######
+  print("Check inputs", flush=True)
+  assert re.match("[a-z][a-z0-9_]*", par["name"]), "Name should match the regular expression '[a-z][a-z0-9_]*'. Example: 'my_component'."
+  assert len(par['name']) <= 50, "Method name should be at most 50 characters."
+
+  pretty_name = re.sub("_", " ", par['name']).title()
+
+  ####### CHECK LANGUAGE #######
+  print("Check language", flush=True)
+  # check language and determine script path
+  if par["language"] == "python":
+    script_path = "script.py"
+  elif par["language"] == "r":
+    script_path = "script.R"
+  else:
+    sys.exit(f"Unrecognized language parameter '{par['language']}'.")
+
+  ## CHECK API FILE
+  print("Check API file", flush=True)
+  api_file = Path(par["api_file"])
+  viash_yaml = Path(par["viash_yaml"])
+  project_dir = viash_yaml.parent
+  if not api_file.exists():
+    comp_types = [x.with_suffix("").name.removeprefix("comp_") for x in api_file.parent.glob("**/comp_*.y*ml")]
+    list.sort(comp_types)
+    sys.exit(strip_margin(f"""\\\\
+Error: Invalid --type argument.
+  Reason: Could not find API file at '{api_file.relative_to(project_dir)}'.
+  Possible values for --type: {', '.join(comp_types)}."""))
+  
+  ## READ API FILE
+  print("Read API file", flush=True)
+  api = read_and_merge_yaml(api_file)
+  comp_type = api.get("functionality", {}).get("info", {}).get("type", {})
+  if not comp_type:
+    sys.exit(strip_margin(f"""\\\\
+Error: API file is incorrectly formatted.
+  Reason: Could not find component type at \\`.functionality.info.type\\`.'
+  Please fix the formatting of the API file."""))
+
+  ####### CREATE OUTPUT DIR #######
+  print("Create output dir", flush=True)
+  out_dir = Path(par["output"])
+  out_dir.mkdir(exist_ok=True)
+
+  ####### CREATE CONFIG #######
+  print("Create config", flush=True)
+  config_file = out_dir / "config.vsh.yaml"
+
+  # get config template
+  config_str = create_config(par, comp_type, pretty_name, script_path)
+
+  with open(config_file, "w") as f:
+    f.write(config_str)
+
+  ####### CREATE SCRIPT #######
+  print("Create script", flush=True)
+  script_file = out_dir / script_path
+
+  # set reasonable values
+  set_par_values(api)
+
+  if par["language"] == "python":
+    script_out = create_python_script(par, api, comp_type)
+
+  if par["language"] == "r":
+    script_out = create_r_script(par, api, comp_type)
+  
+  # write script
+  with open(script_file, "w") as f:
+    f.write(script_out)
+
+  print("Done!", flush=True)
+
+
+if __name__ == "__main__":
+  main(par)
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/common/create_component",
+    "tag" : "2.0.0"
+  },
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/common/create_component/nextflow.config b/target/nextflow/common/create_component/nextflow.config
new file mode 100644
index 0000000000..318b6187d3
--- /dev/null
+++ b/target/nextflow/common/create_component/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'common/create_component'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Create a component Viash component.\n\nUsage:\n```\nbin/create_component --task denoising --type method --language r --name foo\nbin/create_component --task denoising --type metric --language python --name bar\n```\n'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/common/create_component/read_and_merge_yaml.py b/target/nextflow/common/create_component/read_and_merge_yaml.py
new file mode 100644
index 0000000000..b74995aed1
--- /dev/null
+++ b/target/nextflow/common/create_component/read_and_merge_yaml.py
@@ -0,0 +1,52 @@
+def read_and_merge_yaml(path):
+    """Read a Viash YAML
+    
+    If the YAML contains a "__merge__" key anywhere in the yaml,
+    the path specified in that YAML will be read and the two
+    lists will be merged. This is a recursive procedure.
+    
+    Arguments:
+    path -- Path to the Viash YAML"""
+    from ruamel.yaml import YAML
+
+    yaml = YAML(typ='safe', pure=True)
+
+    with open(path, 'r') as stream:
+        data = yaml.load(stream)
+    return _ram_process_merge(data, path)
+
+def _ram_deep_merge(dict1, dict2):
+    if isinstance(dict1, dict) and isinstance(dict2, dict):
+        keys = set(list(dict1.keys()) + list(dict2.keys()))
+        out = {}
+        for key in keys:
+            if key in dict1:
+                if key in dict2:
+                    out[key] = _ram_deep_merge(dict1[key], dict2[key])
+                else:
+                    out[key] = dict1[key]
+            else:
+                out[key] = dict2[key]
+        return out
+    elif isinstance(dict1, list) and isinstance(dict2, list):
+        return dict1 + dict2
+    else:
+        return dict2
+
+def _ram_process_merge(data, path):
+    import os
+    if isinstance(data, dict):
+        processed_data = {k: _ram_process_merge(v, path) for k, v in data.items()}
+
+        if "__merge__" in processed_data:
+            new_data_path = os.path.join(os.path.dirname(path), processed_data["__merge__"])
+            new_data = read_and_merge_yaml(new_data_path)
+        else:
+            new_data = {}
+
+        return _ram_deep_merge(new_data, processed_data)
+    elif isinstance(data, list):
+        return [_ram_process_merge(dat, path) for dat in data]
+    else:
+        return data
+
diff --git a/target/nextflow/common/create_task_readme/.config.vsh.yaml b/target/nextflow/common/create_task_readme/.config.vsh.yaml
new file mode 100644
index 0000000000..875bf7cd9d
--- /dev/null
+++ b/target/nextflow/common/create_task_readme/.config.vsh.yaml
@@ -0,0 +1,175 @@
+functionality:
+  name: "create_task_readme"
+  namespace: "common"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "string"
+      name: "--task"
+      description: "Which task the component will be added to."
+      info: null
+      example:
+      - "denoising"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--task_dir"
+      description: "Path to the task directory."
+      info: null
+      default:
+      - "src/tasks/${VIASH_PAR_TASK}"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--viash_yaml"
+      description: "Path to the project config file. Needed for knowing the relative\
+        \ location of a file to the project root.\n"
+      info: null
+      default:
+      - "_viash.yaml"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--github_url"
+      description: "URL to the GitHub repository. Needed for linking to the source\
+        \ code.\n"
+      info: null
+      default:
+      - "https://github.com/openproblems-bio/openproblems/tree/main/"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output"
+      description: "Path to the component directory. Suggested location is `src/tasks/<TASK>/README.md`."
+      info: null
+      default:
+      - "src/tasks/${VIASH_PAR_TASK}/README.md"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/read_and_merge_yaml.R"
+  - type: "file"
+    path: "src/common/helper_functions/read_api_files.R"
+  - type: "file"
+    path: "src/common/helper_functions/strip_margin.R"
+  description: "Create a README for the task.\n"
+  test_resources:
+  - type: "r_script"
+    path: "test.R"
+    is_executable: true
+  - type: "file"
+    path: "src"
+    dest: "openproblems/src"
+  - type: "file"
+    path: "_viash.yaml"
+    dest: "openproblems/_viash.yaml"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    packages:
+    - "dplyr"
+    - "purrr"
+    - "rlang"
+    - "glue"
+    - "yaml"
+    - "fs"
+    - "cli"
+    - "igraph"
+    - "rmarkdown"
+    - "processx"
+    bioc_force_install: false
+  - type: "apt"
+    packages:
+    - "jq"
+    - "curl"
+    interactive: false
+  - type: "docker"
+    run:
+    - "release_info=$(curl -s https://api.github.com/repos/quarto-dev/quarto-cli/releases/latest)\
+      \ && \\\n  download_url=$(printf \"%s\" \"$release_info\" | jq -r '.assets[]\
+      \ | select(.name | test(\"quarto-.*-linux-amd64.deb\")) | .browser_download_url')\
+      \ && \\\n  curl -sL \"$download_url\" -o /opt/quarto.deb && \\\n  dpkg -i /opt/quarto.deb\
+      \ && \\\n  rm /opt/quarto.deb\n"
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/create_task_readme/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/common/create_task_readme"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/common/create_task_readme/create_task_readme"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/common/create_task_readme/main.nf b/target/nextflow/common/create_task_readme/main.nf
new file mode 100644
index 0000000000..7a2c236ec3
--- /dev/null
+++ b/target/nextflow/common/create_task_readme/main.nf
@@ -0,0 +1,3655 @@
+// create_task_readme 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "create_task_readme",
+    "namespace" : "common",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--task",
+            "description" : "Which task the component will be added to.",
+            "example" : [
+              "denoising"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--task_dir",
+            "description" : "Path to the task directory.",
+            "default" : [
+              "src/tasks/${VIASH_PAR_TASK}"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--viash_yaml",
+            "description" : "Path to the project config file. Needed for knowing the relative location of a file to the project root.\n",
+            "default" : [
+              "_viash.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--github_url",
+            "description" : "URL to the GitHub repository. Needed for linking to the source code.\n",
+            "default" : [
+              "https://github.com/openproblems-bio/openproblems/tree/main/"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output",
+            "description" : "Path to the component directory. Suggested location is `src/tasks/<TASK>/README.md`.",
+            "default" : [
+              "src/tasks/${VIASH_PAR_TASK}/README.md"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/create_task_readme/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/helper_functions/read_and_merge_yaml.R",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/helper_functions/read_api_files.R",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/helper_functions/strip_margin.R",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "description" : "Create a README for the task.\n",
+    "test_resources" : [
+      {
+        "type" : "r_script",
+        "path" : "test.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/create_task_readme/"
+      },
+      {
+        "type" : "file",
+        "path" : "src",
+        "dest" : "openproblems/src",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "_viash.yaml",
+        "dest" : "openproblems/_viash.yaml",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "packages" : [
+            "dplyr",
+            "purrr",
+            "rlang",
+            "glue",
+            "yaml",
+            "fs",
+            "cli",
+            "igraph",
+            "rmarkdown",
+            "processx"
+          ],
+          "bioc_force_install" : false
+        },
+        {
+          "type" : "apt",
+          "packages" : [
+            "jq",
+            "curl"
+          ],
+          "interactive" : false
+        },
+        {
+          "type" : "docker",
+          "run" : [
+            "release_info=$(curl -s https://api.github.com/repos/quarto-dev/quarto-cli/releases/latest) && \\\\\n  download_url=$(printf \\"%s\\" \\"$release_info\\" | jq -r '.assets[] | select(.name | test(\\"quarto-.*-linux-amd64.deb\\")) | .browser_download_url') && \\\\\n  curl -sL \\"$download_url\\" -o /opt/quarto.deb && \\\\\n  dpkg -i /opt/quarto.deb && \\\\\n  rm /opt/quarto.deb\n"
+          ]
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/common/create_task_readme/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/create_task_readme",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+library(rlang, quietly = TRUE, warn.conflicts = FALSE)
+library(purrr, quietly = TRUE, warn.conflicts = FALSE)
+library(dplyr, quietly = TRUE, warn.conflicts = FALSE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "task" = $( if [ ! -z ${VIASH_PAR_TASK+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_TASK" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "task_dir" = $( if [ ! -z ${VIASH_PAR_TASK_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_TASK_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "viash_yaml" = $( if [ ! -z ${VIASH_PAR_VIASH_YAML+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_VIASH_YAML" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "github_url" = $( if [ ! -z ${VIASH_PAR_GITHUB_URL+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_GITHUB_URL" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+if (is.null(par\\$task) && is.null(par\\$task_dir)) {
+  stop("Either 'task' or 'task_dir' must be provided")
+}
+if (is.null(par\\$viash_yaml)) {
+  stop("Argument 'viash_yaml' must be provided")
+}
+if (is.null(par\\$output)) {
+  stop("Argument 'output' must be provided")
+}
+
+# import helper function
+source(paste0(meta["resources_dir"], "/read_and_merge_yaml.R"))
+source(paste0(meta["resources_dir"], "/strip_margin.R"))
+source(paste0(meta["resources_dir"], "/read_api_files.R"))
+
+cat("Read task info\\\\n")
+task_api <- read_task_api(par[["task_dir"]])
+
+# determine ordering
+root <- .task_graph_get_root(task_api)
+
+r_graph <- render_task_graph(task_api, root)
+
+cat("Render API details\\\\n")
+order <- names(igraph::bfs(task_api\\$task_graph, root)\\$order)
+r_details <- map_chr(
+  order,
+  function(file_name) {
+    if (file_name %in% names(task_api\\$comp_specs)) {
+      render_component(task_api\\$comp_specs[[file_name]])
+    } else {
+      render_file(task_api\\$file_specs[[file_name]])
+    }
+  }
+)
+
+cat("Render authors\\\\n")
+authors_str <-
+  if (nrow(task_api\\$authors) > 0) {
+    paste0(
+      "\\\\n## Authors & contributors\\\\n\\\\n",
+      task_api\\$authors %>% knitr::kable() %>% paste(collapse = "\\\\n"),
+      "\\\\n"
+    )
+  } else {
+    ""
+  }
+readme_str <-
+  if (is.null(task_api\\$task_info\\$readme) || is.na(task_api\\$task_info\\$readme)) {
+    ""
+  } else {
+    paste0(
+      "\\\\n## README\\\\n\\\\n",
+      task_api\\$task_info\\$readme,
+      "\\\\n"
+    )
+  }
+
+cat("Generate qmd content\\\\n")
+relative_path <- par[["task_dir"]] %>%
+  gsub(paste0(dirname(par[["viash_yaml"]]), "/*"), "", .) %>%
+  gsub("/*\\$", "", .)
+source_url <- paste0(par[["github_url"]], relative_path)
+qmd_content <- strip_margin(glue::glue("
+  §---
+  §title: \\\\"{task_api\\$task_info\\$label}\\\\"
+  §format: gfm
+  §---
+  §
+  §<!--
+  §This file is automatically generated from the tasks's api/*.yaml files.
+  §Do not edit this file directly.
+  §-->
+  §
+  §{task_api\\$task_info\\$summary}
+  §
+  §Path to source: [\\`{relative_path}\\`]({source_url})
+  §
+  §{readme_str}
+  §
+  §## Motivation
+  §
+  §{task_api\\$task_info\\$motivation}
+  §
+  §## Description
+  §
+  §{task_api\\$task_info\\$description}
+  §{authors_str}
+  §## API
+  §
+  §{r_graph}
+  §
+  §{paste(r_details, collapse = '\\\\n\\\\n')}
+  §
+  §"), symbol = "§")
+
+cat("Write README.qmd to file\\\\n")
+qmd_file <- tempfile(
+  pattern = "README_",
+  fileext = ".qmd",
+  tmpdir = meta\\$temp_dir
+)
+
+if (!dir.exists(meta\\$temp_dir)) {
+  dir.create(meta\\$temp_dir, recursive = TRUE)
+}
+writeLines(qmd_content, qmd_file)
+
+cat("Render README.qmd to README.md\\\\n")
+out <- processx::run(
+  command = "quarto",
+  args = c("render", qmd_file, "--output", "-"),
+  echo = TRUE
+)
+
+writeLines(out\\$stdout, par\\$output)
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/common/create_task_readme",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/common/create_task_readme/nextflow.config b/target/nextflow/common/create_task_readme/nextflow.config
new file mode 100644
index 0000000000..bcbc64df08
--- /dev/null
+++ b/target/nextflow/common/create_task_readme/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'common/create_task_readme'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Create a README for the task.\n'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/common/create_task_readme/read_and_merge_yaml.R b/target/nextflow/common/create_task_readme/read_and_merge_yaml.R
new file mode 100644
index 0000000000..932d3feb92
--- /dev/null
+++ b/target/nextflow/common/create_task_readme/read_and_merge_yaml.R
@@ -0,0 +1,144 @@
+#' Read a Viash YAML
+#'
+#' If the YAML contains a "__merge__" key anywhere in the yaml,
+#' the path specified in that YAML will be read and the two
+#' lists will be merged. This is a recursive procedure.
+#'
+#' @param path Path to Viash YAML
+read_and_merge_yaml <- function(path, project_path = .ram_find_project(path)) {
+  path <- normalizePath(path, mustWork = FALSE)
+  data <- tryCatch({
+    suppressWarnings(yaml::read_yaml(path))
+  }, error = function(e) {
+    stop("Could not read ", path, ". Error: ", e)
+  })
+  .ram_process_merge(data, data, path, project_path)
+}
+
+.ram_find_project <- function(path) {
+  path <- normalizePath(path, mustWork = FALSE)
+  check <- paste0(dirname(path), "/_viash.yaml")
+  if (file.exists(check)) {
+    dirname(check)
+  } else if (check == "//_viash.yaml") {
+    NULL
+  } else {
+    .ram_find_project(dirname(check))
+  }
+}
+
+.ram_is_named_list <- function(obj) {
+  is.null(obj) || (is.list(obj) && (length(obj) == 0 || !is.null(names(obj))))
+}
+
+.ram_process_merge <- function(data, root_data, path, project_path) {
+  if (.ram_is_named_list(data)) {
+    # check whether children have `__merge__` entries
+    processed_data <- lapply(data, function(dat) {
+      .ram_process_merge(dat, root_data, path, project_path)
+    })
+    processed_data <- lapply(names(data), function(nm) {
+      dat <- data[[nm]]
+      .ram_process_merge(dat, root_data, path, project_path)
+    })
+    names(processed_data) <- names(data)
+
+    # if current element has __merge__, read list2 yaml and combine with data
+    new_data <-
+      if ("__merge__" %in% names(processed_data) && !.ram_is_named_list(processed_data$`__merge__`)) {
+        new_data_path <- .ram_resolve_path(
+          path = processed_data$`__merge__`,
+          project_path = project_path,
+          parent_path = dirname(path)
+        )
+        read_and_merge_yaml(new_data_path, project_path)
+      } else if ("$ref" %in% names(processed_data) && !.ram_is_named_list(processed_data$`$ref`)) {
+        ref_parts <- strsplit(processed_data$`$ref`, "#")[[1]]
+
+        # resolve the path in $ref
+        x <-
+          if (ref_parts[[1]] == "") {
+            root_data
+          } else {
+            new_data_path <- .ram_resolve_path(
+              path = ref_parts[[1]],
+              project_path = project_path,
+              parent_path = dirname(path)
+            )
+            new_data_path <- normalizePath(new_data_path, mustWork = FALSE)
+
+            # read in the new data
+            tryCatch({
+              suppressWarnings(yaml::read_yaml(new_data_path))
+            }, error = function(e) {
+              stop("Could not read ", new_data_path, ". Error: ", e)
+            })
+          }
+        x_root <- x
+        
+
+        # Navigate the path and retrieve the referenced data
+        ref_path_parts <- unlist(strsplit(ref_parts[[2]], "/"))
+        for (part in ref_path_parts) {
+          if (part == "") {
+            next
+          } else if (part %in% names(x)) {
+            x <- x[[part]]
+          } else {
+            stop("Could not find ", processed_data$`$ref`, " in ", path)
+          }
+        }
+
+        # postprocess the new data
+        if (ref_parts[[1]] == "") {
+          x
+        } else {
+          .ram_process_merge(x, x_root, new_data_path, project_path)
+        }
+      } else {
+        list()
+      }
+
+    .ram_deep_merge(new_data, processed_data)
+  } else if (is.list(data)) {
+    lapply(data, function(dat) {
+      .ram_process_merge(dat, root_data, path, project_path)
+    })
+  } else {
+    data
+  }
+}
+
+.ram_resolve_path <- function(path, project_path, parent_path) {
+  ifelse(
+    grepl("^/", path),
+    paste0(project_path, "/", path),
+    fs::path_abs(path, parent_path)
+  )
+}
+
+.ram_deep_merge <- function(list1, list2) {
+  if (.ram_is_named_list(list1) && .ram_is_named_list(list2)) {
+    # if list1 and list2 are objects, recursively merge
+    keys <- unique(c(names(list1), names(list2)))
+    out <- lapply(keys, function(key) {
+      if (key %in% names(list1)) {
+        if (key %in% names(list2)) {
+          .ram_deep_merge(list1[[key]], list2[[key]])
+        } else {
+          list1[[key]]
+        }
+      } else {
+        list2[[key]]
+      }
+    })
+    names(out) <- keys
+    out
+  } else if (is.list(list1) && is.list(list2)) {
+    # if list1 and list2 are both lists, append
+    c(list1, list2)
+  } else {
+    # else override list1 with list2
+    list2
+  }
+}
\ No newline at end of file
diff --git a/target/nextflow/common/create_task_readme/read_api_files.R b/target/nextflow/common/create_task_readme/read_api_files.R
new file mode 100644
index 0000000000..f2cf49b2f8
--- /dev/null
+++ b/target/nextflow/common/create_task_readme/read_api_files.R
@@ -0,0 +1,493 @@
+
+anndata_struct_names <- c("obs", "var", "obsm", "obsp", "varm", "varp", "layers", "uns")
+
+read_file_spec <- function(path) {
+  spec <- read_and_merge_yaml(path)
+  out <- list(
+    info = read_file_info(spec, path)
+  )
+  if (out$info$file_type == "h5ad" || "slots" %in% names(spec$info)) {
+    out$info$file_type <- "h5ad"
+    out$slots <- read_anndata_slots(spec, path)
+  }
+  if (out$info$file_type == "csv" || out$info$file_type == "tsv" || out$info$file_type == "parquet") {
+    out$columns <- read_tabular_columns(spec, path)
+  }
+  out
+}
+read_file_info <- function(spec, path) {
+  # TEMP: make it readable
+  spec$info$slots <- NULL
+  df <- list_as_tibble(spec)
+  if (list_contains_tibble(spec$info)) {
+    df <- dplyr::bind_cols(df, list_as_tibble(spec$info))
+  }
+  df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+  df$description <- df$description %||% NA_character_ %>% as.character
+  df$summary <- df$summary %||% NA_character_ %>% as.character
+  as_tibble(df)
+}
+read_anndata_slots <- function(spec, path) {
+  map_df(
+    anndata_struct_names,
+    function(struct_name, slot) {
+      slot <- spec$info$slots[[struct_name]]
+      if (is.null(slot)) return(NULL)
+      df <- map_df(slot, as.data.frame)
+      df$struct <- struct_name
+      df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+      df$required <- df$required %||% TRUE %|% TRUE
+      df$multiple <- df$multiple %||% FALSE %|% FALSE
+      as_tibble(df)
+    }
+  )
+}
+read_tabular_columns <- function(spec, path) {
+  map_df(
+    spec$info$columns,
+    function(column) {
+      df <- list_as_tibble(column)
+      df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+      df$required <- df$required %||% TRUE %|% TRUE
+      df$multiple <- df$multiple %||% FALSE %|% FALSE
+      as_tibble(df)
+    }
+  )
+}
+
+format_file_format <- function(spec) {
+  if (spec$info$file_type == "h5ad") {
+    example <- spec$slots %>%
+      group_by(struct) %>%
+      summarise(
+        str = paste0(unique(struct), ": ", paste0("'", name, "'", collapse = ", "))
+      ) %>%
+      arrange(match(struct, anndata_struct_names))
+
+    c("    AnnData object", paste0("     ", example$str))
+  } else if (spec$info$file_type == "csv" || spec$info$file_type == "tsv" || spec$info$file_type == "parquet") {
+    example <- spec$columns %>%
+      summarise(
+        str = paste0("'", name, "'", collapse = ", ")
+      )
+
+    c("    Tabular data", paste0("     ", example$str))
+  } else {
+    ""
+  }
+}
+
+format_file_format_as_kable <- function(spec) {
+  if (spec$info$file_type == "h5ad") {
+    spec$slots %>%
+      mutate(
+        tag_str = pmap_chr(lst(required), function(required) {
+          out <- c()
+          if (!required) {
+            out <- c(out, "Optional")
+          }
+          if (length(out) == 0) {
+            ""
+          } else {
+            paste0("(_", paste(out, collapse = ", "), "_) ")
+          }
+        })
+      ) %>%
+      transmute(
+        Slot = paste0("`", struct, "[\"", name, "\"]`"),
+        Type = paste0("`", type, "`"),
+        Description = paste0(
+          tag_str,
+          description %>% gsub(" *\n *", " ", .) %>% gsub("\\. *$", "", .), 
+          "."
+        )
+      ) %>%
+      knitr::kable()
+  } else if (spec$info$file_type == "csv" || spec$info$file_type == "tsv" || spec$info$file_type == "parquet") {
+    spec$columns %>%
+      mutate(
+        tag_str = pmap_chr(lst(required), function(required) {
+          out <- c()
+          if (!required) {
+            out <- c(out, "Optional")
+          }
+          if (length(out) == 0) {
+            ""
+          } else {
+            paste0("(_", paste(out, collapse = ", "), "_) ")
+          }
+        })
+      ) %>%
+      transmute(
+        Column = paste0("`", name, "`"),
+        Type = paste0("`", type, "`"),
+        Description = paste0(
+          tag_str,
+          description %>% gsub(" *\n *", " ", .) %>% gsub("\\. *$", "", .), 
+          "."
+        )
+      ) %>%
+      knitr::kable()
+  } else {
+    ""
+  }
+}
+
+list_contains_tibble <- function(li) {
+  is.list(li) && any(sapply(li, is.atomic))
+}
+
+list_as_tibble <- function(li) {
+  as.data.frame(li[sapply(li, is.atomic)], check.names = FALSE)
+}
+
+read_comp_spec <- function(path) {
+  spec_yaml <- read_and_merge_yaml(path)
+  list(
+    info = read_comp_info(spec_yaml, path),
+    args = read_comp_args(spec_yaml, path)
+  )
+}
+
+read_comp_info <- function(spec_yaml, path) {
+  # TEMP: make it readable
+  spec_yaml$functionality$arguments <- NULL
+  spec_yaml$functionality$argument_groups <- NULL
+  
+  df <- list_as_tibble(spec_yaml$functionality)
+  if (nrow(df) == 0) {
+    df <- data.frame(a = 1)[, integer(0)]
+  }
+  if (list_contains_tibble(spec_yaml$functionality$info)) {
+    df <- dplyr::bind_cols(df, list_as_tibble(spec_yaml$functionality$info))
+  }
+  if (list_contains_tibble(spec_yaml$functionality$info$type_info)) {
+    df <- dplyr::bind_cols(df, list_as_tibble(spec_yaml$functionality$info$type_info))
+  }
+  df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+  as_tibble(df)
+}
+
+read_comp_args <- function(spec_yaml, path) {
+  arguments <- spec_yaml$functionality$arguments
+  for (arg_group in spec_yaml$functionality$argument_groups) {
+    arguments <- c(arguments, arg_group$arguments)
+  }
+  map_df(arguments, function(arg) {
+    df <- list_as_tibble(arg)
+    if (list_contains_tibble(arg$info)) {
+      df <- dplyr::bind_cols(df, list_as_tibble(arg$info))
+    }
+    df$file_name <- basename(path) %>% gsub("\\.yaml", "", .)
+    df$arg_name <- gsub("^-*", "", arg$name)
+    df$direction <- df$direction %||% "input" %|% "input"
+    df$parent <- df$`__merge__` %||% NA_character_ %>% basename() %>% gsub("\\.yaml", "", .)
+    df$required <- df$required %||% FALSE %|% FALSE
+    df$default <- df$default %||% NA_character_ %>% as.character
+    df$example <- df$example %||% NA_character_ %>% as.character
+    df$description <- df$description %||% NA_character_ %>% as.character
+    df$summary <- df$summary %||% NA_character_ %>% as.character
+    df
+  })
+}
+
+format_comp_args_as_tibble <- function(spec) {
+  if (nrow(spec$args) == 0) return("")
+  spec$args %>%
+    mutate(
+      tag_str = pmap_chr(lst(required, direction), function(required, direction) {
+        out <- c()
+        if (!required) {
+          out <- c(out, "Optional")
+        }
+        if (direction == "output") {
+          out <- c(out, "Output")
+        }
+        if (length(out) == 0) {
+          ""
+        } else {
+          paste0("(_", paste(out, collapse = ", "), "_) ")
+        }
+      })
+    ) %>%
+    transmute(
+      Name = paste0("`--", arg_name, "`"),
+      Type = paste0("`", type, "`"),
+      Description = paste0(
+        tag_str,
+        (summary %|% description) %>% gsub(" *\n *", " ", .) %>% gsub("\\. *$", "", .), 
+        ".",
+        ifelse(!is.na(default), paste0(" Default: `", default, "`."), "")
+      )
+    ) %>%
+    knitr::kable()
+}
+
+# path <- "src/datasets/api/comp_processor_knn.yaml"
+render_component <- function(spec) {
+  if (is.character(spec)) {
+    spec <- read_comp_spec(spec)
+  }
+
+  strip_margin(glue::glue("
+    §## Component type: {spec$info$label}
+    §
+    §Path: [`src/{spec$info$namespace}`](https://github.com/openproblems-bio/openproblems/tree/main/src/{spec$info$namespace})
+    §
+    §{spec$info$summary}
+    §
+    §Arguments:
+    §
+    §:::{{.small}}
+    §{paste(format_comp_args_as_tibble(spec), collapse = '\n')}
+    §:::
+    §
+    §"), symbol = "§")
+}
+
+# path <- "src/datasets/api/file_pca.yaml"
+render_file <- function(spec) {
+  if (is.character(spec)) {
+    spec <- read_file_spec(spec)
+  }
+
+  if (!"label" %in% names(spec$info)) {
+    spec$info$label <- basename(spec$info$example)
+  }
+
+  example <-
+    if (is.null(spec$info$example) || is.na(spec$info$example)) {
+      ""
+    } else {
+      paste0("Example file: `", spec$info$example, "`")
+    }
+
+  description <-
+    if (is.null(spec$info$description) || is.na(spec$info$description)) {
+      ""
+    } else {
+      paste0("Description:\n\n", spec$info$description)
+    }
+
+  strip_margin(glue::glue("
+    §## File format: {spec$info$label}
+    §
+    §{spec$info$summary %||% ''}
+    §
+    §{example}
+    §
+    §{description}
+    §
+    §Format:
+    §
+    §:::{{.small}}
+    §{paste(format_file_format(spec), collapse = '\n')}
+    §:::
+    §
+    §Slot description:
+    §
+    §:::{{.small}}
+    §{paste(format_file_format_as_kable(spec), collapse = '\n')}
+    §:::
+    §
+    §"), symbol = "§")
+}
+
+# path <- "src/tasks/denoising"
+read_task_api <- function(path) {
+  cli::cli_inform("Looking for project root")
+  project_path <- .ram_find_project(path)
+  api_dir <- paste0(path, "/api")
+
+  cli::cli_inform("Reading task info")
+  task_info_yaml <- list.files(api_dir, pattern = "task_info.ya?ml", full.names = TRUE)
+  assertthat::assert_that(length(task_info_yaml) == 1)
+  task_info <- read_and_merge_yaml(task_info_yaml, project_path)
+
+  cli::cli_inform("Reading task authors")
+  authors <- map_df(task_info$authors, function(aut) {
+    aut$roles <- paste(aut$roles, collapse = ", ")
+    list_as_tibble(aut)
+  })
+
+  cli::cli_inform("Reading component yamls")
+  comp_yamls <- list.files(api_dir, pattern = "comp_.*\\.ya?ml", full.names = TRUE)
+  comps <- map(comp_yamls, read_comp_spec)
+  comp_info <- map_df(comps, "info")
+  comp_args <- map_df(comps, "args")
+  names(comps) <- basename(comp_yamls) %>% gsub("\\..*$", "", .)
+
+  cli::cli_inform("Reading file yamls")
+  file_yamls <- .ram_resolve_path(
+    path = na.omit(unique(comp_args$`__merge__`)),
+    project_path = project_path,
+    parent_path = api_dir
+  )
+  files <- map(file_yamls, read_file_spec)
+  names(files) <- basename(file_yamls) %>% gsub("\\..*$", "", .)
+  file_info <- map_df(files, "info")
+  file_slots <- map_df(files, "slots")
+
+  cli::cli_inform("Generating task graph")
+  task_graph <- create_task_graph(file_info, comp_info, comp_args)
+
+  list(
+    task_info = task_info,
+    file_specs = files,
+    file_info = file_info,
+    file_slots = file_slots,
+    comp_specs = comps,
+    comp_info = comp_info,
+    comp_args = comp_args,
+    task_graph = task_graph,
+    authors = authors
+  )
+}
+
+
+create_task_graph <- function(file_info, comp_info, comp_args) {
+  clean_id <- function(id) {
+    gsub("graph", "graaf", id)
+  }
+  nodes <-
+    bind_rows(
+      file_info %>%
+        mutate(id = file_name, label = label, is_comp = FALSE),
+      comp_info %>%
+        mutate(id = file_name, label = label, is_comp = TRUE)
+    ) %>%
+      select(id, label, everything()) %>%
+      mutate(str = paste0(
+        "  ",
+        clean_id(id),
+        ifelse(is_comp, "[/\"", "(\""),
+        label,
+        ifelse(is_comp, "\"/]", "\")")
+      ))
+  edges <- bind_rows(
+    comp_args %>%
+      filter(type == "file", direction == "input") %>%
+      mutate(
+        from = parent,
+        to = file_name,
+        arrow = "---"
+      ),
+    comp_args %>%
+      filter(type == "file", direction == "output") %>%
+      mutate(
+        from = file_name,
+        to = parent,
+        arrow = "-->"
+      )
+  ) %>%
+    select(from, to, everything()) %>%
+    mutate(str = paste0("  ", clean_id(from), arrow, clean_id(to)))
+
+  igraph::graph_from_data_frame(
+    edges,
+    vertices = nodes,
+    directed = TRUE
+  )
+}
+
+.task_graph_get_root <- function(task_api) {
+  root <- names(which(igraph::degree(task_api$task_graph, mode = "in") == 0))
+  if (length(root) > 1) {
+    warning(
+      "There should probably only be one node with in-degree equal to 0.\n",
+      "  Nodes with in-degree == 0: ", paste(root, collapse = ", ")
+    )
+  }
+  root[[1]]
+}
+
+render_task_graph <- function(task_api, root = .task_graph_get_root(task_api)) {
+  order <- names(igraph::bfs(task_api$task_graph, root)$order)
+
+  vdf <- igraph::as_data_frame(task_api$task_graph, "vertices") %>%
+    arrange(match(name, order))
+  edf <- igraph::as_data_frame(task_api$task_graph, "edges") %>%
+    arrange(match(from, order), match(to, order))
+
+  strip_margin(glue::glue("
+    §```mermaid
+    §flowchart LR
+    §{paste(vdf$str, collapse = '\n')}
+    §{paste(edf$str, collapse = '\n')}
+    §```
+    §"), symbol = "§")
+}
+
+
+
+# Recursive function to process each property with indentation
+.render_example_process_property <- function(prop, prop_name = NULL, indent_level = 0) {
+  if (is.null(prop_name)) {
+    prop_name <- ""
+  }
+
+  out <- c()
+
+  # define helper variables
+  indent_spaces <- strrep(" ", indent_level)
+  next_indent_spaces <- strrep(" ", indent_level + 2)
+
+  # add comment if available
+  if ("description" %in% names(prop)) {
+    comment <- gsub("\n", paste0("\n", indent_spaces, "# "), stringr::str_trim(prop$description))
+    out <- c(out, indent_spaces, "# ", comment, "\n")
+  }
+
+  # add variable
+  out <- c(out, indent_spaces, prop_name, ": ")
+
+  if (prop$type == "object" && "properties" %in% names(prop)) {
+    # Handle object with properties
+    prop_names <- setdiff(names(prop$properties), "additionalProperties")
+    sub_props <- unlist(lapply(prop_names, function(sub_prop_name) {
+      prop_out <- .render_example_process_property(
+        prop$properties[[sub_prop_name]],
+        sub_prop_name,
+        indent_level + 2
+      )
+      c(prop_out, "\n")
+    }))
+    c(out, "\n", sub_props[-length(sub_props)])
+  } else if (prop$type == "array") {
+    if (is.list(prop$items) && "properties" %in% names(prop$items)) {
+      # Handle array of objects
+      array_items_yaml <- unlist(lapply(names(prop$items$properties), function(item_prop_name) {
+        prop_out <- .render_example_process_property(
+          prop$items$properties[[item_prop_name]],
+          item_prop_name,
+          indent_level + 4
+        )
+        c(prop_out, "\n")
+      }))
+      c(out, "\n", next_indent_spaces, "- ", array_items_yaml[-1])
+    } else {
+      # Handle simple array
+      c(out, "[ ... ]")
+    }
+  } else {
+    c(out, "...")
+  }
+}
+
+# Function for rendering an example yaml based on a JSON schema
+render_example <- function(json_schema) {
+  if (!"properties" %in% names(json_schema)) {
+    return("")
+  }
+  text <-
+    unlist(lapply(names(json_schema$properties), function(prop_name) {
+      out <- .render_example_process_property(
+        json_schema$properties[[prop_name]],
+        prop_name,
+        0
+      )
+      c(out, "\n")
+    }))
+
+  paste(text, collapse = "")
+}
\ No newline at end of file
diff --git a/target/nextflow/common/create_task_readme/strip_margin.R b/target/nextflow/common/create_task_readme/strip_margin.R
new file mode 100644
index 0000000000..3830d58d79
--- /dev/null
+++ b/target/nextflow/common/create_task_readme/strip_margin.R
@@ -0,0 +1,3 @@
+strip_margin <- function(text, symbol = "\\|") {
+  gsub(paste0("(^|\n)[ \t]*", symbol), "\\1", text)
+}
\ No newline at end of file
diff --git a/target/nextflow/common/decompress_gzip/.config.vsh.yaml b/target/nextflow/common/decompress_gzip/.config.vsh.yaml
new file mode 100644
index 0000000000..5761c892e7
--- /dev/null
+++ b/target/nextflow/common/decompress_gzip/.config.vsh.yaml
@@ -0,0 +1,93 @@
+functionality:
+  name: "decompress_gzip"
+  namespace: "common"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    description: "Input file"
+    info: null
+    example:
+    - "/path/to/file.gz"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Output file"
+    info: null
+    example:
+    - "/path/to/file"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "bash_script"
+    path: "script.sh"
+    is_executable: true
+  test_resources:
+  - type: "bash_script"
+    path: "test.sh"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ubuntu:latest"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/decompress_gzip/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/common/decompress_gzip"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/common/decompress_gzip/decompress_gzip"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/common/decompress_gzip/main.nf b/target/nextflow/common/decompress_gzip/main.nf
new file mode 100644
index 0000000000..75c56f1ec8
--- /dev/null
+++ b/target/nextflow/common/decompress_gzip/main.nf
@@ -0,0 +1,3404 @@
+// decompress_gzip 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "decompress_gzip",
+    "namespace" : "common",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "description" : "Input file",
+        "example" : [
+          "/path/to/file.gz"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "description" : "Output file",
+        "example" : [
+          "/path/to/file"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "bash_script",
+        "path" : "script.sh",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/decompress_gzip/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "bash_script",
+        "path" : "test.sh",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/decompress_gzip/"
+      }
+    ],
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "ubuntu:latest",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/common/decompress_gzip/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/decompress_gzip",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+## VIASH START
+# The following code has been auto-generated by Viash.
+$( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "${VIASH_PAR_INPUT}" | sed "s#'#'\\"'\\"'#g;s#.*#par_input='&'#" ; else echo "# par_input="; fi )
+$( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "${VIASH_PAR_OUTPUT}" | sed "s#'#'\\"'\\"'#g;s#.*#par_output='&'#" ; else echo "# par_output="; fi )
+$( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "${VIASH_META_FUNCTIONALITY_NAME}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_functionality_name='&'#" ; else echo "# meta_functionality_name="; fi )
+$( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "${VIASH_META_RESOURCES_DIR}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_resources_dir='&'#" ; else echo "# meta_resources_dir="; fi )
+$( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "${VIASH_META_EXECUTABLE}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_executable='&'#" ; else echo "# meta_executable="; fi )
+$( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "${VIASH_META_CONFIG}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_config='&'#" ; else echo "# meta_config="; fi )
+$( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "${VIASH_META_TEMP_DIR}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_temp_dir='&'#" ; else echo "# meta_temp_dir="; fi )
+$( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "${VIASH_META_CPUS}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_cpus='&'#" ; else echo "# meta_cpus="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "${VIASH_META_MEMORY_B}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_memory_b='&'#" ; else echo "# meta_memory_b="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "${VIASH_META_MEMORY_KB}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_memory_kb='&'#" ; else echo "# meta_memory_kb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "${VIASH_META_MEMORY_MB}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_memory_mb='&'#" ; else echo "# meta_memory_mb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "${VIASH_META_MEMORY_GB}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_memory_gb='&'#" ; else echo "# meta_memory_gb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "${VIASH_META_MEMORY_TB}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_memory_tb='&'#" ; else echo "# meta_memory_tb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "${VIASH_META_MEMORY_PB}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_memory_pb='&'#" ; else echo "# meta_memory_pb="; fi )
+
+## VIASH END
+#!/bin/bash
+
+gunzip "\\$par_input" -c > "\\$par_output"
+VIASHMAIN
+bash "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/common/decompress_gzip",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/common/decompress_gzip/nextflow.config b/target/nextflow/common/decompress_gzip/nextflow.config
new file mode 100644
index 0000000000..38bddbd3ee
--- /dev/null
+++ b/target/nextflow/common/decompress_gzip/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'common/decompress_gzip'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/common/extract_metadata/.config.vsh.yaml b/target/nextflow/common/extract_metadata/.config.vsh.yaml
new file mode 100644
index 0000000000..f6ca5604df
--- /dev/null
+++ b/target/nextflow/common/extract_metadata/.config.vsh.yaml
@@ -0,0 +1,117 @@
+functionality:
+  name: "extract_metadata"
+  namespace: "common"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input"
+      description: "A h5ad file."
+      info: null
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--schema"
+      description: "An optional schema with which to annotate the output"
+      info: null
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Output"
+    arguments:
+    - type: "file"
+      name: "--output"
+      description: "A yaml file containing the metadata."
+      info: null
+      example:
+      - "output_meta.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Extract the metadata from an h5ad file."
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+  - type: "file"
+    path: "src/datasets/api/file_raw.yaml"
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  test_setup:
+  - type: "python"
+    user: false
+    packages:
+    - "viashpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata/extract_metadata"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/common/extract_metadata/main.nf b/target/nextflow/common/extract_metadata/main.nf
new file mode 100644
index 0000000000..cbb51385c4
--- /dev/null
+++ b/target/nextflow/common/extract_metadata/main.nf
@@ -0,0 +1,3648 @@
+// extract_metadata 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "extract_metadata",
+    "namespace" : "common",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input",
+            "description" : "A h5ad file.",
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--schema",
+            "description" : "An optional schema with which to annotate the output",
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Output",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output",
+            "description" : "A yaml file containing the metadata.",
+            "example" : [
+              "output_meta.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/extract_metadata/"
+      }
+    ],
+    "description" : "Extract the metadata from an h5ad file.",
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/common/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/datasets/api/file_raw.yaml",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "test.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/extract_metadata/"
+      }
+    ],
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "test_setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "viashpy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import yaml
+import numpy as np
+import pandas as pd
+import scipy
+import os
+import datetime
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'schema': $( if [ ! -z ${VIASH_PAR_SCHEMA+x} ]; then echo "r'${VIASH_PAR_SCHEMA//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input']).copy()
+
+if par["schema"]:
+  print("Load schema", flush=True)
+  with open(par["schema"], "r") as f:
+    schema = yaml.safe_load(f)
+else:
+  schema = None
+
+####################################################################################################
+## Helper functions for extracting the dataset metadata in uns                                    ##
+####################################################################################################
+def is_atomic(obj):
+  return isinstance(obj, str) or isinstance(obj, int) or isinstance(obj, bool) or isinstance(obj, float)
+
+def to_atomic(obj):
+  if isinstance(obj, np.float64):
+    return float(obj)
+  elif isinstance(obj, np.int64):
+    return int(obj)
+  elif isinstance(obj, np.bool_):
+    return bool(obj)
+  elif isinstance(obj, np.str_):
+    return str(obj)
+  return obj
+
+def is_list_of_atomics(obj):
+  if not isinstance(obj, (list,pd.core.series.Series,np.ndarray)):
+    return False
+  return all(is_atomic(elem) for elem in obj)
+
+def to_list_of_atomics(obj):
+  if isinstance(obj, pd.core.series.Series):
+    obj = obj.to_numpy()
+  if isinstance(obj, np.ndarray):
+    obj = obj.tolist()
+  return [to_atomic(elem) for elem in obj]
+
+def is_dict_of_atomics(obj):
+  if not isinstance(obj, dict):
+    return False
+  return all(is_atomic(elem) for _, elem in obj.items())
+
+def to_dict_of_atomics(obj):
+  return {k: to_atomic(v) for k, v in obj.items()}
+
+
+####################################################################################################
+## Helper functions for extracting metadata about the used data structures                        ##
+####################################################################################################
+def get_structure_shape(obj) -> list:
+  if isinstance(obj, np.ndarray):
+    return list(obj.shape)
+  elif scipy.sparse.issparse(obj):
+    return list(obj.shape)
+  elif isinstance(obj, pd.core.frame.DataFrame):
+    return list(obj.shape)
+  elif isinstance(obj, pd.core.series.Series):
+    return list(obj.shape)
+  elif isinstance(obj, list):
+    return [len(obj)]
+  elif isinstance(obj, dict):
+    return [len(obj)]
+  elif is_atomic(obj):
+    return [1]
+  return None
+
+def get_structure_type(obj) -> str:
+  # return one of: atomic, dataFrame, vector, dict, denseMatrix, sparseMatrix
+  if is_atomic(obj):
+    return "atomic"
+  elif isinstance(obj, (list,pd.core.series.Series)):
+    return "vector"
+  elif isinstance(obj, dict):
+    return "dict"
+  elif isinstance(obj, pd.core.frame.DataFrame):
+    return "dataframe"
+  elif scipy.sparse.issparse(obj):
+    return "sparsematrix"
+  elif isinstance(obj, np.ndarray):
+    return "densematrix"
+  return "other: " + str(type(obj))
+
+def get_structure_dtype(obj) -> str:
+  if isinstance(obj, np.ndarray):
+    return obj.dtype.name
+  elif isinstance(obj, pd.core.series.Series):
+    return obj.dtype.name
+  elif isinstance(obj, pd.core.frame.DataFrame):
+    return [dtype.name for dtype in obj.dtypes]
+  elif scipy.sparse.issparse(obj):
+    return obj.dtype.name
+  elif is_atomic(obj):
+    return type(obj).__name__
+  return None
+
+def get_structure_schema_info(struct, key) -> dict:
+  if schema is None:
+    return {}
+  struct_args = schema.get("info", {}).get("slots", {}).get(struct, {})
+  if struct_args is None:
+    return {}
+  if struct == "X":
+    return struct_args
+  
+  # look for item with the correct name
+  struct_results = [x for x in struct_args if x.get("name") == key]
+
+  # return None if no match is found
+  if len(struct_results) != 1:
+    return {}
+
+  return struct_results[0]
+
+def get_structure(adata, struct):
+  adata_struct = getattr(adata, struct)
+
+  # turn \\`adata_struct\\` into a dict for \\`X\\`
+  if (struct == "X"):
+    adata_struct = {"X": adata_struct} if adata_struct is not None else {}
+
+  output = []
+
+  for key, value in adata_struct.items():
+    out = {
+      "name": key,
+      "type": get_structure_type(value),
+      "shape": get_structure_shape(value),
+      "dtype": get_structure_dtype(value),
+    }
+
+    # see if the schema has information about this struct
+    schema_info = get_structure_schema_info(struct, key)
+
+    if schema_info.get("description"):
+      out["description"] = schema_info.get("description")
+    if schema_info.get("type"):
+      out["schema_type"] = schema_info.get("type")
+
+    output.append(out)
+  
+  return output
+
+####################################################################################################
+## Other helper functions                                                                         ##
+####################################################################################################
+
+def get_file_size(path: str) -> int:
+  """Get the file size in bytes of the file at the given path."""
+  return os.path.getsize(path)
+
+def get_file_creation_time(path: str) -> str:
+  """Get the creation time of the file at the given path."""
+  # Get file creation time
+  creation_time = os.path.getctime(path)
+  # Convert creation time from seconds since epoch to a readable timestamp
+  creation_time = datetime.datetime.fromtimestamp(creation_time)
+  # Format the datetime object as 'DD-MM-YYYY'
+  creation_time = creation_time.strftime('%d-%m-%Y')
+  return str(creation_time)
+
+
+print("Extract metadata from object", flush=True)
+# Extract metadata about the adata object
+uns = {}
+for key, val in adata.uns.items():
+  if is_atomic(val):
+    uns[key] = to_atomic(val)
+  elif is_list_of_atomics(val) and len(val) <= 10:
+    uns[key] = to_list_of_atomics(val)
+  elif is_dict_of_atomics(val) and len(val) <= 10:
+    uns[key] = to_dict_of_atomics(val)
+
+uns["file_size"] = get_file_size(par["input"])
+uns["date_created"] = get_file_creation_time(par["input"])
+
+# Extract metadata about the data structures
+structure = {
+  struct: get_structure(adata, struct)
+  for struct
+  in ["X", "obs", "var", "obsp", "varp", "obsm", "varm", "layers", "uns"]
+}
+
+# ¢reate metadata object
+meta = {"uns": uns, "structure": structure}
+
+print("Write metadata to file", flush=True)
+with open(par["output"], "w") as f:
+  yaml.dump(meta, f, indent=2)
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/common/extract_metadata",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/common/extract_metadata/nextflow.config b/target/nextflow/common/extract_metadata/nextflow.config
new file mode 100644
index 0000000000..102e5a03b9
--- /dev/null
+++ b/target/nextflow/common/extract_metadata/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'common/extract_metadata'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Extract the metadata from an h5ad file.'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/common/process_dataset_metadata/run/.config.vsh.yaml b/target/nextflow/common/process_dataset_metadata/run/.config.vsh.yaml
new file mode 100644
index 0000000000..540e33fc8a
--- /dev/null
+++ b/target/nextflow/common/process_dataset_metadata/run/.config.vsh.yaml
@@ -0,0 +1,96 @@
+functionality:
+  name: "run"
+  namespace: "common/process_dataset_metadata"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input"
+      info: null
+      example:
+      - "meta.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output"
+      info: null
+      default:
+      - "meta.json"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  description: "This workflow transforms the meta information of the datasets into\
+    \ a format that can be used by the website."
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/process_task_results/yaml_to_json"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/yaml_to_json/config.vsh.yaml"
+    configInfo:
+      functionalityName: "yaml_to_json"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/yaml_to_json/config.vsh.yaml"
+      functionalityNamespace: "common/process_task_results"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/process_task_results/yaml_to_json/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/yaml_to_json"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/process_dataset_metadata/run/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_dataset_metadata/run"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_dataset_metadata/run/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/common/process_dataset_metadata/run/main.nf b/target/nextflow/common/process_dataset_metadata/run/main.nf
new file mode 100644
index 0000000000..3fa1c71fcf
--- /dev/null
+++ b/target/nextflow/common/process_dataset_metadata/run/main.nf
@@ -0,0 +1,3063 @@
+// run 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "run",
+    "namespace" : "common/process_dataset_metadata",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input",
+            "example" : [
+              "meta.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output",
+            "default" : [
+              "meta.json"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/process_dataset_metadata/run/",
+        "entrypoint" : "run_wf"
+      }
+    ],
+    "description" : "This workflow transforms the meta information of the datasets into a format that can be used by the website.",
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/process_task_results/yaml_to_json",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/yaml_to_json/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "yaml_to_json",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/yaml_to_json/config.vsh.yaml",
+          "functionalityNamespace" : "common/process_task_results",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/process_task_results/yaml_to_json/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/yaml_to_json"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/common/process_dataset_metadata/run/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_dataset_metadata/run",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { yaml_to_json } from "${meta.resources_dir}/../../../../nextflow/common/process_task_results/yaml_to_json/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    | yaml_to_json.run(
+      fromState: ["input"],
+      toState: ["output"]
+    )
+
+    | setState(["output"])
+
+    emit:
+    output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/common/process_dataset_metadata/run/nextflow.config b/target/nextflow/common/process_dataset_metadata/run/nextflow.config
new file mode 100644
index 0000000000..765792d5e7
--- /dev/null
+++ b/target/nextflow/common/process_dataset_metadata/run/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'common/process_dataset_metadata/run'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'This workflow transforms the meta information of the datasets into a format that can be used by the website.'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/common/process_task_results/generate_qc/.config.vsh.yaml b/target/nextflow/common/process_task_results/generate_qc/.config.vsh.yaml
new file mode 100644
index 0000000000..9452a62145
--- /dev/null
+++ b/target/nextflow/common/process_task_results/generate_qc/.config.vsh.yaml
@@ -0,0 +1,142 @@
+functionality:
+  name: "generate_qc"
+  namespace: "common/process_task_results"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--task_info"
+    description: "Task info file"
+    info: null
+    example:
+    - "task_info.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--method_info"
+    description: "Method info file"
+    info: null
+    example:
+    - "method_info.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--metric_info"
+    description: "Metric info file"
+    info: null
+    example:
+    - "metric_info.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--dataset_info"
+    description: "Dataset info file"
+    info: null
+    example:
+    - "dataset_info.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--results"
+    description: "Results file"
+    info: null
+    example:
+    - "results.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Output json"
+    info: null
+    default:
+    - "output.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Generate task QC metrics"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowmem"
+    - "lowtime"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/generate_qc/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/generate_qc"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/generate_qc/generate_qc"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/common/process_task_results/generate_qc/main.nf b/target/nextflow/common/process_task_results/generate_qc/main.nf
new file mode 100644
index 0000000000..907a9c845f
--- /dev/null
+++ b/target/nextflow/common/process_task_results/generate_qc/main.nf
@@ -0,0 +1,3757 @@
+// generate_qc 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "generate_qc",
+    "namespace" : "common/process_task_results",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--task_info",
+        "description" : "Task info file",
+        "example" : [
+          "task_info.json"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--method_info",
+        "description" : "Method info file",
+        "example" : [
+          "method_info.json"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--metric_info",
+        "description" : "Metric info file",
+        "example" : [
+          "metric_info.json"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--dataset_info",
+        "description" : "Dataset info file",
+        "example" : [
+          "dataset_info.json"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--results",
+        "description" : "Results file",
+        "example" : [
+          "results.json"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "description" : "Output json",
+        "default" : [
+          "output.json"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/process_task_results/generate_qc/"
+      }
+    ],
+    "description" : "Generate task QC metrics",
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "lowmem",
+          "lowtime",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/generate_qc/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/generate_qc",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import json
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'task_info': $( if [ ! -z ${VIASH_PAR_TASK_INFO+x} ]; then echo "r'${VIASH_PAR_TASK_INFO//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'method_info': $( if [ ! -z ${VIASH_PAR_METHOD_INFO+x} ]; then echo "r'${VIASH_PAR_METHOD_INFO//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'metric_info': $( if [ ! -z ${VIASH_PAR_METRIC_INFO+x} ]; then echo "r'${VIASH_PAR_METRIC_INFO//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_info': $( if [ ! -z ${VIASH_PAR_DATASET_INFO+x} ]; then echo "r'${VIASH_PAR_DATASET_INFO//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'results': $( if [ ! -z ${VIASH_PAR_RESULTS+x} ]; then echo "r'${VIASH_PAR_RESULTS//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+EXPECTED_TASK_FIELDS = ["task_id", "task_name", "task_summary", "task_description"]
+EXPECTED_METHOD_FIELDS = ["task_id", "commit_sha", "method_id", "method_name", "method_summary", "paper_reference", "is_baseline"]
+EXPECTED_METRIC_FIELDS = ["task_id", "commit_sha", "metric_id", "metric_name", "metric_summary", "paper_reference", "maximize"]
+EXPECTED_DATASET_FIELDS = ["task_id", "dataset_id", "dataset_name", "dataset_summary", "data_reference", "data_url"]
+
+def dump_json(obj, fp):
+    """Dump to JSON in a numpy-safe fashion."""
+    json.dump(
+        obj,
+        fp,
+        indent=4,
+        sort_keys=False,
+        separators=(", ", ": "),
+        ensure_ascii=False,
+    )
+
+def create_quality_control(task_info, dataset_info, method_info, metric_info, results):
+    """Quality control to detect anomalies in the results."""
+    task_id = task_info["task_id"]
+
+    result_qc = []
+
+    def add_qc(
+        category: str,
+        name: str,
+        value,
+        severity_value: float,
+        code: str,
+        message: str,
+    ) -> None:
+        "Add an entry to the result qc"
+        if severity_value <= 1:
+            severity = 0
+        elif severity_value <= 2:
+            severity = 1
+        elif severity_value <= 3:
+            severity = 2
+        else:
+            severity = 3
+        result_qc.append({
+            "task_id": task_id,
+            "category": category,
+            "name": name,
+            "value": value,
+            "severity": severity,
+            "severity_value": severity_value,
+            "code": code,
+            "message": message
+        })
+    
+    def percent_missing(list_of_dicts, field):
+        are_missing = []
+        for item in list_of_dicts:
+            if field == 'paper_reference' and item.get('is_baseline', False):
+                are_missing.append(0.0)
+            elif field in item and item[field] is not None:
+                are_missing.append(0.0)
+            else:
+                are_missing.append(1.0)
+        return np.mean(are_missing)
+    
+    # check task_info
+    for field in EXPECTED_TASK_FIELDS:
+        pct_missing = percent_missing([task_info], field)
+        add_qc(
+            "Task info",
+            f"Pct '{field}' missing",
+            pct_missing,
+            3.0 if pct_missing > 0 else 0.0,
+            "percent_missing([task_info], field)",
+            f"Task metadata field '{field}' should be defined\\\\n"
+            f"  Task id: {task_id}\\\\n"
+            f"  Field: {field}\\\\n"
+        )
+    
+    # check method_info
+    for field in EXPECTED_METHOD_FIELDS:
+        pct_missing = percent_missing(method_info, field)
+        add_qc(
+            "Method info",
+            f"Pct '{field}' missing",
+            pct_missing,
+            3.0 if pct_missing > 0 else 0.0,
+            "percent_missing(method_info, field)",
+            f"Method metadata field '{field}' should be defined\\\\n"
+            f"  Task id: {task_id}\\\\n"
+            f"  Field: {field}\\\\n"
+        )
+
+    # check metric_info
+    for field in EXPECTED_METRIC_FIELDS:
+        pct_missing = percent_missing(metric_info, field)
+        add_qc(
+            "Metric info",
+            f"Pct '{field}' missing",
+            pct_missing,
+            3.0 if pct_missing > 0 else 0.0,
+            "percent_missing(metric_info, field)",
+            f"Metric metadata field '{field}' should be defined\\\\n"
+            f"  Task id: {task_id}\\\\n"
+            f"  Field: {field}\\\\n"
+        )
+
+    # check dataset_info
+    for field in EXPECTED_DATASET_FIELDS:
+        pct_missing = percent_missing(dataset_info, field)
+        add_qc(
+            "Dataset info",
+            f"Pct '{field}' missing",
+            pct_missing,
+            3.0 if pct_missing > 0 else 0.0,
+            "percent_missing(dataset_info, field)",
+            f"Dataset metadata field '{field}' should be defined\\\\n"
+            f"  Task id: {task_id}\\\\n"
+            f"  Field: {field}\\\\n"
+        )
+
+    # turn results into long format for easier processing
+    results_long = [
+        {
+            "task_id": x["task_id"],
+            "method_id": x["method_id"],
+            "dataset_id": x["dataset_id"],
+            "metric_id": metric["metric_id"],
+            "metric_value" : x["metric_values"].get(metric["metric_id"]),
+            "scaled_score" : x["scaled_scores"].get(metric["metric_id"]),
+        }
+        for metric in metric_info
+        for x in results
+    ]
+
+    # check percentage missing
+    pct_missing = 1 - len(results_long) / (len(method_info) * len(metric_info) * len(dataset_info))
+    add_qc(
+        "Raw data",
+        "Number of results",
+        len(results),
+        pct_missing / .1,
+        "len(results) == len(method_info) * len(metric_info) * len(dataset_info)",
+        f"Number of results should be equal to #methods × #metrics × #datasets.\\\\n"
+        f"  Task id: {task_id}\\\\n"
+        f"  Number of results: {len(results)}\\\\n"
+        f"  Number of methods: {len(method_info)}\\\\n"
+        f"  Number of metrics: {len(metric_info)}\\\\n"
+        f"  Number of datasets: {len(dataset_info)}\\\\n"
+    )
+
+    # QC per metric
+    for metric in metric_info:
+        metric_id = metric["metric_id"]
+        values = [
+            res
+            for res in results_long
+            if res["metric_id"] == metric_id
+            and res["metric_value"] is not None
+            and np.isreal(res["metric_value"])
+        ]
+        pct_missing = 1 - len(values) / len(dataset_info) / len(method_info)
+
+        add_qc(
+            "Raw results",
+            f"Metric '{metric_id}' %missing",
+            pct_missing,
+            pct_missing / .1,
+            "pct_missing <= .1",
+            f"Percentage of missing results should be less than 10%.\\\\n"
+            f"  Task id: {task_id}\\\\n"
+            f"  Metric id: {metric_id}\\\\n"
+            f"  Percentage missing: {pct_missing*100:.0f}%\\\\n"
+        )
+
+    # QC per method
+    for method in method_info:
+        method_id = method["method_id"]
+        values = [ 
+            res
+            for res in results_long
+            if res["method_id"] == method_id
+            and res["metric_value"] is not None
+            and np.isreal(res["metric_value"])
+        ]
+        pct_missing = 1 - len(values) / len(dataset_info) / len(metric_info)
+
+        add_qc(
+            "Raw results",
+            f"Method '{method_id}' %missing",
+            pct_missing,
+            pct_missing / .1,
+            "pct_missing <= .1",
+            f"Percentage of missing results should be less than 10%.\\\\n"
+            f"  Task id: {task_id}\\\\n"
+            f"  method id: {method_id}\\\\n"
+            f"  Percentage missing: {pct_missing*100:.0f}%\\\\n"
+        )
+
+    # QC per dataset
+    for dataset in dataset_info:
+        dataset_id = dataset["dataset_id"]
+        values = [
+            res
+            for res in results_long
+            if res["dataset_id"] == dataset_id
+            and res["metric_value"] is not None
+            and np.isreal(res["metric_value"])
+        ]
+        pct_missing = 1 - len(values) / len(metric_info) / len(method_info)
+
+        add_qc(
+            "Raw results",
+            f"Dataset '{dataset_id}' %missing",
+            pct_missing,
+            pct_missing / .1,
+            "pct_missing <= .1",
+            f"Percentage of missing results should be less than 10%.\\\\n"
+            f"  Task id: {task_id}\\\\n"
+            f"  dataset id: {dataset_id}\\\\n"
+            f"  Percentage missing: {pct_missing*100:.0f}%\\\\n"
+        )
+
+
+    # QC per metric and method
+    for metric in metric_info:
+        for method in method_info:
+            metric_id = metric["metric_id"]
+            method_id = method["method_id"]
+            scores = [ 
+                res["scaled_score"]
+                for res in results_long
+                if res["metric_id"] == metric_id
+                and res["method_id"] == method_id
+                and res["scaled_score"] is not None
+                and np.isreal(res["scaled_score"])
+            ]
+
+            if len(scores) >= 1:
+                worst_score = np.min(scores).item()
+                best_score = np.max(scores).item()
+
+                add_qc(
+                    "Scaling",
+                    f"Worst score {method_id} {metric_id}",
+                    worst_score,
+                    worst_score / -1,
+                    "worst_score >= -1",
+                    f"Method {method_id} performs much worse than baselines.\\\\n"
+                    f"  Task id: {task_id}\\\\n"
+                    f"  Method id: {method_id}\\\\n"
+                    f"  Metric id: {metric_id}\\\\n"
+                    f"  Worst score: {worst_score}%\\\\n"
+                )
+
+                add_qc(
+                    "Scaling",
+                    f"Best score {method_id} {metric_id}",
+                    best_score,
+                    best_score / 2,
+                    "best_score <= 2",
+                    f"Method {method_id} performs a lot better than baselines.\\\\n"
+                    f"  Task id: {task_id}\\\\n"
+                    f"  Method id: {method_id}\\\\n"
+                    f"  Metric id: {metric_id}\\\\n"
+                    f"  Best score: {best_score}%\\\\n"
+                )
+
+    return result_qc
+
+def main(par):
+    # read data from files
+    with open(par["task_info"], "r", encoding="utf8") as file:
+        task_info = json.load(file)
+    with open(par["method_info"], "r", encoding="utf8") as file:
+        method_info = json.load(file)
+    with open(par["metric_info"], "r", encoding="utf8") as file:
+        metric_info = json.load(file)
+    with open(par["dataset_info"], "r", encoding="utf8") as file:
+        dataset_info = json.load(file)
+    with open(par["results"], "r", encoding="utf8") as file:
+        results = json.load(file)
+
+    # create info objects
+    quality_control = create_quality_control(task_info, dataset_info, method_info, metric_info, results)
+
+    # write data to files
+    with open(par["output"], "w", encoding="utf8") as file:
+        dump_json(quality_control, file)
+
+if __name__ == "__main__":
+    main(par)
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/common/process_task_results/generate_qc",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "lowmem",
+    "lowtime",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/common/process_task_results/generate_qc/nextflow.config b/target/nextflow/common/process_task_results/generate_qc/nextflow.config
new file mode 100644
index 0000000000..62dfdf2160
--- /dev/null
+++ b/target/nextflow/common/process_task_results/generate_qc/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'common/process_task_results/generate_qc'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Generate task QC metrics'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/common/process_task_results/get_dataset_info/.config.vsh.yaml b/target/nextflow/common/process_task_results/get_dataset_info/.config.vsh.yaml
new file mode 100644
index 0000000000..02de71cbcc
--- /dev/null
+++ b/target/nextflow/common/process_task_results/get_dataset_info/.config.vsh.yaml
@@ -0,0 +1,120 @@
+functionality:
+  name: "get_dataset_info"
+  namespace: "common/process_task_results"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    description: "A yaml file"
+    info: null
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--task_id"
+    description: "A task dir"
+    info: null
+    example:
+    - "label_projection"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Output json"
+    info: null
+    default:
+    - "output.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  description: "Extract dataset info and convert to expected format for website results"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_get_info.py"
+    is_executable: true
+  - type: "file"
+    path: "src"
+    dest: "openproblems/src"
+  - type: "file"
+    path: "_viash.yaml"
+    dest: "openproblems/_viash.yaml"
+  - type: "file"
+    path: "resources_test/common/task_metadata/dataset_info.yaml"
+    dest: "test_file.yaml"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "purrr"
+    - "yaml"
+    - "rlang"
+    - "processx"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowmem"
+    - "lowtime"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_dataset_info/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_dataset_info"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_dataset_info/get_dataset_info"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/common/process_task_results/get_dataset_info/main.nf b/target/nextflow/common/process_task_results/get_dataset_info/main.nf
new file mode 100644
index 0000000000..308078f9b3
--- /dev/null
+++ b/target/nextflow/common/process_task_results/get_dataset_info/main.nf
@@ -0,0 +1,3506 @@
+// get_dataset_info 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "get_dataset_info",
+    "namespace" : "common/process_task_results",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "description" : "A yaml file",
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--task_id",
+        "description" : "A task dir",
+        "example" : [
+          "label_projection"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "description" : "Output json",
+        "default" : [
+          "output.json"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_dataset_info/"
+      }
+    ],
+    "description" : "Extract dataset info and convert to expected format for website results",
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_get_info.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src",
+        "dest" : "openproblems/src",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "_viash.yaml",
+        "dest" : "openproblems/_viash.yaml",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/common/task_metadata/dataset_info.yaml",
+        "dest" : "test_file.yaml",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "cran" : [
+            "purrr",
+            "yaml",
+            "rlang",
+            "processx"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "lowmem",
+          "lowtime",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_dataset_info/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_dataset_info",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+requireNamespace("jsonlite", quietly = TRUE)
+requireNamespace("yaml", quietly = TRUE)
+library(purrr, warn.conflicts = FALSE)
+library(rlang, warn.conflicts = FALSE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "task_id" = $( if [ ! -z ${VIASH_PAR_TASK_ID+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_TASK_ID" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+datasets <- yaml::yaml.load_file(par\\$input)
+
+# transform into format expected by website
+outputs <- map(datasets, function(dataset) {
+  # ↑ the 'dataset' object could be used as the new format
+
+  # TODO: it'd be nice if the s3 path was also included in the dataset info
+
+  # construct v1 format
+  out <- list(
+    "task_id" = par\\$task_id,
+    "dataset_id" = dataset\\$dataset_id,
+    "dataset_name" = dataset\\$dataset_name,
+    "dataset_summary" = dataset\\$dataset_summary,
+    "dataset_description" = dataset\\$dataset_description %||% NA_character_,
+    "data_reference" = dataset\\$dataset_reference %||% NA_character_,
+    "data_url" = dataset\\$dataset_url %||% NA_character_,
+    "date_created" = dataset\\$date_created %||% NA_character_,
+    "file_size" = dataset\\$file_size %||% NA_character_
+  )
+
+  if (!is.null(dataset[["common_dataset_id"]])) {
+    out[["common_dataset_id"]] <- dataset[["common_dataset_id"]]
+  }
+
+  # show warning when certain data is missing and return null?
+  for (n in names(out)) {
+    if (is.null(out[[n]])) {
+      out_as_str <- jsonlite::toJSON(out, auto_unbox = TRUE, pretty = TRUE)
+      stop("missing value for value '", n, "' in ", out_as_str)
+    }
+  }
+
+  out
+})
+
+jsonlite::write_json(
+  outputs,
+  par\\$output,
+  auto_unbox = TRUE,
+  pretty = TRUE
+)
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/common/process_task_results/get_dataset_info",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "lowmem",
+    "lowtime",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/common/process_task_results/get_dataset_info/nextflow.config b/target/nextflow/common/process_task_results/get_dataset_info/nextflow.config
new file mode 100644
index 0000000000..16dfe89f98
--- /dev/null
+++ b/target/nextflow/common/process_task_results/get_dataset_info/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'common/process_task_results/get_dataset_info'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Extract dataset info and convert to expected format for website results'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/common/process_task_results/get_method_info/.config.vsh.yaml b/target/nextflow/common/process_task_results/get_method_info/.config.vsh.yaml
new file mode 100644
index 0000000000..a76141d376
--- /dev/null
+++ b/target/nextflow/common/process_task_results/get_method_info/.config.vsh.yaml
@@ -0,0 +1,120 @@
+functionality:
+  name: "get_method_info"
+  namespace: "common/process_task_results"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    description: "A yaml file"
+    info: null
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--task_id"
+    description: "A task dir"
+    info: null
+    example:
+    - "label_projection"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Output json"
+    info: null
+    default:
+    - "output.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  description: "Extract method info"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_get_info.py"
+    is_executable: true
+  - type: "file"
+    path: "src"
+    dest: "openproblems/src"
+  - type: "file"
+    path: "_viash.yaml"
+    dest: "openproblems/_viash.yaml"
+  - type: "file"
+    path: "resources_test/common/task_metadata/method_configs.yaml"
+    dest: "test_file.yaml"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "purrr"
+    - "yaml"
+    - "rlang"
+    - "processx"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowmem"
+    - "lowtime"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_method_info/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_method_info"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_method_info/get_method_info"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/common/process_task_results/get_method_info/main.nf b/target/nextflow/common/process_task_results/get_method_info/main.nf
new file mode 100644
index 0000000000..8e86c42dd3
--- /dev/null
+++ b/target/nextflow/common/process_task_results/get_method_info/main.nf
@@ -0,0 +1,3528 @@
+// get_method_info 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "get_method_info",
+    "namespace" : "common/process_task_results",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "description" : "A yaml file",
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--task_id",
+        "description" : "A task dir",
+        "example" : [
+          "label_projection"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "description" : "Output json",
+        "default" : [
+          "output.json"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_method_info/"
+      }
+    ],
+    "description" : "Extract method info",
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_get_info.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src",
+        "dest" : "openproblems/src",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "_viash.yaml",
+        "dest" : "openproblems/_viash.yaml",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/common/task_metadata/method_configs.yaml",
+        "dest" : "test_file.yaml",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "cran" : [
+            "purrr",
+            "yaml",
+            "rlang",
+            "processx"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "lowmem",
+          "lowtime",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_method_info/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_method_info",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+requireNamespace("jsonlite", quietly = TRUE)
+requireNamespace("yaml", quietly = TRUE)
+library(purrr, warn.conflicts = FALSE)
+library(rlang, warn.conflicts = FALSE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "task_id" = $( if [ ! -z ${VIASH_PAR_TASK_ID+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_TASK_ID" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+configs <- yaml::yaml.load_file(par\\$input)
+
+outputs <- map(configs, function(config) {
+  if (length(config\\$functionality\\$status) > 0 && config\\$functionality\\$status == "disabled") {
+    return(NULL)
+  }
+
+  # prep for viash 0.9.0
+  build_info <- config\\$build_info %||% config\\$info
+  if ("functionality" %in% names(config)) {
+    config[names(config\\$functionality)] <- config\\$functionality
+    config[["functionality"]] <- NULL
+  }
+
+  info <- config\\$info
+
+  # add extra info
+  info\\$config_path <- gsub(".*/src/", "src/", build_info\\$config)
+  info\\$task_id <- gsub("/.*", "", config\\$namespace)
+  info\\$id <- config\\$name
+  info\\$namespace <- config\\$namespace
+  info\\$commit_sha <- build_info\\$git_commit %||% "missing-sha"
+  info\\$code_version <- "missing-version"
+    info\\$implementation_url <- paste0(
+      build_info\\$git_remote, "/blob/",
+      build_info\\$git_commit, "/",
+      info\\$config_path
+    )
+
+  # ↑ this could be used as the new format
+
+  # construct v1 format
+  out <- list(
+    task_id = info\\$task_id,
+    method_id = info\\$id,
+    method_name = info\\$label,
+    method_summary = info\\$summary,
+    method_description = info\\$description,
+    is_baseline = grepl("control", info\\$type),
+    paper_reference = info\\$reference %||% NA_character_,
+    code_url = info\\$repository_url %||% NA_character_,
+    implementation_url = info\\$implementation_url %||% NA_character_,
+    code_version = NA_character_,
+    commit_sha = info\\$commit_sha
+  )
+
+  # show warning when certain data is missing and return null?
+  for (n in names(out)) {
+    if (is.null(out[[n]])) {
+      out_as_str <- jsonlite::toJSON(out, auto_unbox = TRUE, pretty = TRUE)
+      stop("missing value for value '", n, "' in ", out_as_str)
+    }
+  }
+
+  # return output
+  out
+})
+
+jsonlite::write_json(
+  outputs,
+  par\\$output,
+  auto_unbox = TRUE,
+  pretty = TRUE
+)
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/common/process_task_results/get_method_info",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "lowmem",
+    "lowtime",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/common/process_task_results/get_method_info/nextflow.config b/target/nextflow/common/process_task_results/get_method_info/nextflow.config
new file mode 100644
index 0000000000..3bea9d2eb8
--- /dev/null
+++ b/target/nextflow/common/process_task_results/get_method_info/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'common/process_task_results/get_method_info'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Extract method info'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/common/process_task_results/get_metric_info/.config.vsh.yaml b/target/nextflow/common/process_task_results/get_metric_info/.config.vsh.yaml
new file mode 100644
index 0000000000..39c5963868
--- /dev/null
+++ b/target/nextflow/common/process_task_results/get_metric_info/.config.vsh.yaml
@@ -0,0 +1,120 @@
+functionality:
+  name: "get_metric_info"
+  namespace: "common/process_task_results"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    description: "A yaml file"
+    info: null
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--task_id"
+    description: "A task dir"
+    info: null
+    example:
+    - "label_projection"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Output json"
+    info: null
+    default:
+    - "output.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  description: "Extract metric info"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_get_info.py"
+    is_executable: true
+  - type: "file"
+    path: "src"
+    dest: "openproblems/src"
+  - type: "file"
+    path: "_viash.yaml"
+    dest: "openproblems/_viash.yaml"
+  - type: "file"
+    path: "resources_test/common/task_metadata/metric_configs.yaml"
+    dest: "test_file.yaml"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "purrr"
+    - "yaml"
+    - "rlang"
+    - "processx"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowmem"
+    - "lowtime"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_metric_info/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_metric_info"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_metric_info/get_metric_info"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/common/process_task_results/get_metric_info/main.nf b/target/nextflow/common/process_task_results/get_metric_info/main.nf
new file mode 100644
index 0000000000..75e394c2c2
--- /dev/null
+++ b/target/nextflow/common/process_task_results/get_metric_info/main.nf
@@ -0,0 +1,3533 @@
+// get_metric_info 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "get_metric_info",
+    "namespace" : "common/process_task_results",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "description" : "A yaml file",
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--task_id",
+        "description" : "A task dir",
+        "example" : [
+          "label_projection"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "description" : "Output json",
+        "default" : [
+          "output.json"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_metric_info/"
+      }
+    ],
+    "description" : "Extract metric info",
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_get_info.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src",
+        "dest" : "openproblems/src",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "_viash.yaml",
+        "dest" : "openproblems/_viash.yaml",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/common/task_metadata/metric_configs.yaml",
+        "dest" : "test_file.yaml",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "cran" : [
+            "purrr",
+            "yaml",
+            "rlang",
+            "processx"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "lowmem",
+          "lowtime",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_metric_info/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_metric_info",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+requireNamespace("jsonlite", quietly = TRUE)
+requireNamespace("yaml", quietly = TRUE)
+library(purrr, warn.conflicts = FALSE)
+library(rlang, warn.conflicts = FALSE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "task_id" = $( if [ ! -z ${VIASH_PAR_TASK_ID+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_TASK_ID" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+configs <- yaml::yaml.load_file(par\\$input)
+
+outputs <- map(configs, function(config) {
+  if (length(config\\$functionality\\$status) > 0 && config\\$functionality\\$status == "disabled") {
+    return(NULL)
+  }
+
+  # prep for viash 0.9.0
+  build_info <- config\\$build_info %||% config\\$info
+  if ("functionality" %in% names(config)) {
+    config[names(config\\$functionality)] <- config\\$functionality
+    config[["functionality"]] <- NULL
+  }
+
+  map(
+    config\\$info\\$metrics,
+    function(info) {
+      # add extra info
+      info\\$config_path <- gsub(".*/src/", "src/", build_info\\$config)
+      info\\$task_id <- gsub("/.*", "", config\\$namespace)
+      info\\$id <- info\\$name
+      info\\$component_id <- config\\$name
+      info\\$namespace <- config\\$namespace
+      info\\$commit_sha <- build_info\\$git_commit %||% "missing-sha"
+      info\\$code_version <- "missing-version"
+      info\\$implementation_url <- paste0(
+        build_info\\$git_remote, "/blob/",
+        build_info\\$git_commit, "/",
+        info\\$config_path
+      )
+
+      # ↑ this could be used as the new format
+
+      # construct v1 format
+      out <- list(
+        task_id = info\\$task_id,
+        metric_id = info\\$id,
+        metric_name = info\\$label,
+        metric_summary = info\\$summary,
+        metric_description = info\\$description,
+        paper_reference = info\\$reference %||% NA_character_,
+        implementation_url = info\\$implementation_url %||% NA_character_,
+        code_version = NA_character_,
+        commit_sha = info\\$commit_sha,
+        maximize = info\\$maximize
+      )
+
+      # show warning when certain data is missing and return null?
+      for (n in names(out)) {
+        if (is.null(out[[n]])) {
+          out_as_str <- jsonlite::toJSON(out, auto_unbox = TRUE, pretty = TRUE)
+          stop("missing value for value '", n, "' in ", out_as_str)
+        }
+      }
+
+      # return output
+      out
+    }
+  )
+})
+
+outputs <- unlist(outputs, recursive = FALSE)
+
+jsonlite::write_json(
+  outputs,
+  par\\$output,
+  auto_unbox = TRUE,
+  pretty = TRUE
+)
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/common/process_task_results/get_metric_info",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "lowmem",
+    "lowtime",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/common/process_task_results/get_metric_info/nextflow.config b/target/nextflow/common/process_task_results/get_metric_info/nextflow.config
new file mode 100644
index 0000000000..cda1cd6539
--- /dev/null
+++ b/target/nextflow/common/process_task_results/get_metric_info/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'common/process_task_results/get_metric_info'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Extract metric info'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/common/process_task_results/get_results/.config.vsh.yaml b/target/nextflow/common/process_task_results/get_results/.config.vsh.yaml
new file mode 100644
index 0000000000..c793704b23
--- /dev/null
+++ b/target/nextflow/common/process_task_results/get_results/.config.vsh.yaml
@@ -0,0 +1,179 @@
+functionality:
+  name: "get_results"
+  namespace: "common/process_task_results"
+  version: "2.0.0"
+  arguments:
+  - type: "string"
+    name: "--task_id"
+    description: "Task id"
+    info: null
+    example:
+    - "batch_integration"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_scores"
+    description: "Scores file"
+    info: null
+    example:
+    - "score_uns.yaml"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_execution"
+    description: "Nextflow log file"
+    info: null
+    example:
+    - "trace.txt"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_dataset_info"
+    description: "Method info file"
+    info: null
+    example:
+    - "dataset_info.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_method_info"
+    description: "Method info file"
+    info: null
+    example:
+    - "method_info.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_metric_info"
+    description: "Metric info file"
+    info: null
+    example:
+    - "metric_info.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_results"
+    description: "Output json"
+    info: null
+    default:
+    - "results.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_metric_execution_info"
+    description: "Output metric execution info"
+    info: null
+    default:
+    - "metric_execution_info.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  description: "Extract execution info"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "purrr"
+    - "yaml"
+    - "rlang"
+    - "dplyr"
+    - "tidyr"
+    - "readr"
+    - "lubridate"
+    - "dynutils"
+    - "processx"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowmem"
+    - "lowtime"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_results/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_results"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_results/get_results"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/common/process_task_results/get_results/main.nf b/target/nextflow/common/process_task_results/get_results/main.nf
new file mode 100644
index 0000000000..8f14ecd6ae
--- /dev/null
+++ b/target/nextflow/common/process_task_results/get_results/main.nf
@@ -0,0 +1,3744 @@
+// get_results 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "get_results",
+    "namespace" : "common/process_task_results",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "string",
+        "name" : "--task_id",
+        "description" : "Task id",
+        "example" : [
+          "batch_integration"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_scores",
+        "description" : "Scores file",
+        "example" : [
+          "score_uns.yaml"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_execution",
+        "description" : "Nextflow log file",
+        "example" : [
+          "trace.txt"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_dataset_info",
+        "description" : "Method info file",
+        "example" : [
+          "dataset_info.json"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_method_info",
+        "description" : "Method info file",
+        "example" : [
+          "method_info.json"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_metric_info",
+        "description" : "Metric info file",
+        "example" : [
+          "metric_info.json"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_results",
+        "description" : "Output json",
+        "default" : [
+          "results.json"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_metric_execution_info",
+        "description" : "Output metric execution info",
+        "default" : [
+          "metric_execution_info.json"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_results/"
+      }
+    ],
+    "description" : "Extract execution info",
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "cran" : [
+            "purrr",
+            "yaml",
+            "rlang",
+            "dplyr",
+            "tidyr",
+            "readr",
+            "lubridate",
+            "dynutils",
+            "processx"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "lowmem",
+          "lowtime",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_results/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_results",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+requireNamespace("jsonlite", quietly = TRUE)
+requireNamespace("yaml", quietly = TRUE)
+requireNamespace("dynutils", quietly = TRUE)
+requireNamespace("readr", quietly = TRUE)
+requireNamespace("lubridate", quietly = TRUE)
+library(dplyr, warn.conflicts = FALSE)
+library(tidyr, warn.conflicts = FALSE)
+library(purrr, warn.conflicts = FALSE)
+library(rlang, warn.conflicts = FALSE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "task_id" = $( if [ ! -z ${VIASH_PAR_TASK_ID+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_TASK_ID" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_scores" = $( if [ ! -z ${VIASH_PAR_INPUT_SCORES+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_SCORES" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_execution" = $( if [ ! -z ${VIASH_PAR_INPUT_EXECUTION+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_EXECUTION" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_dataset_info" = $( if [ ! -z ${VIASH_PAR_INPUT_DATASET_INFO+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_DATASET_INFO" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_method_info" = $( if [ ! -z ${VIASH_PAR_INPUT_METHOD_INFO+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_METHOD_INFO" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_metric_info" = $( if [ ! -z ${VIASH_PAR_INPUT_METRIC_INFO+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_METRIC_INFO" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output_results" = $( if [ ! -z ${VIASH_PAR_OUTPUT_RESULTS+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT_RESULTS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output_metric_execution_info" = $( if [ ! -z ${VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT_METRIC_EXECUTION_INFO" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+# --- helper functions ---------------------------------------------------------
+cat("Loading helper functions\\\\n")
+parse_exit <- function(x) {
+  if (is.na(x) || x == "-") {
+    NA_integer_
+  } else {
+    as.integer(x)
+  }
+}
+parse_duration <- function(x) {
+  if (is.na(x) || x == "-") {
+    NA_real_
+  } else {
+    as.numeric(lubridate::duration(toupper(x)))
+  }
+}
+parse_cpu <- function(x) {
+  if (is.na(x) || x == "-") {
+    NA_real_
+  } else {
+    as.numeric(gsub(" *%", "", x))
+  }
+}
+parse_size <- function(x) {
+  out <-
+    if (is.na(x) || x == "-") {
+      NA_integer_
+    } else if (grepl("GB", x)) {
+      as.numeric(gsub(" *GB", "", x)) * 1024
+    } else if (grepl("MB", x)) {
+      as.numeric(gsub(" *MB", "", x))
+    } else if (grepl("KB", x)) {
+      as.numeric(gsub(" *KB", "", x)) / 1024
+    } else if (grepl("B", x)) {
+      as.numeric(gsub(" *B", "", x)) / 1024 / 1024
+    } else {
+      NA_integer_
+    }
+  as.integer(ceiling(out))
+}
+
+# --- read input files ---------------------------------------------------------
+cat("Reading input files\\\\n")
+# read scores
+raw_scores <-
+  yaml::yaml.load_file(par\\$input_scores) %>%
+  map_df(function(x) {
+    tryCatch({
+      as_tibble(as.data.frame(
+        x[c("dataset_id", "method_id", "metric_ids", "metric_values")]
+      ))
+    }, error = function(e) {
+      message("Encountered error while reading scores: ", e\\$message)
+      NULL
+    })
+  })
+
+# read metric info
+dataset_info <- jsonlite::read_json(par\\$input_dataset_info, simplifyVector = TRUE)
+method_info <- jsonlite::read_json(par\\$input_method_info, simplifyVector = TRUE)
+metric_info <- jsonlite::read_json(par\\$input_metric_info, simplifyVector = TRUE)
+
+# --- process scores and execution info ----------------------------------------
+cat("Processing scores and execution info\\\\n")
+scale_scores <- function(values, is_control, maximize) {
+  control_values <- values[is_control & !is.na(values)]
+  if (length(control_values) < 2) {
+    return(NA_real_)
+  }
+
+  min_control_value <- min(control_values)
+  max_control_value <- max(control_values)
+
+  if (min_control_value == max_control_value) {
+    return(NA_real_)
+  }
+
+  scaled <- (values - min_control_value) / (max_control_value - min_control_value)
+
+  if (maximize) {
+    scaled
+  } else {
+    1 - scaled
+  }
+}
+aggregate_scores <- function(scaled_score) {
+  mean(pmin(1, pmax(0, scaled_score)) %|% 0)
+}
+scores <- raw_scores %>%
+  complete(
+    dataset_id,
+    method_id,
+    metric_ids,
+    fill = list(metric_values = NA_real_)
+  ) %>%
+  left_join(method_info %>% select(method_id, is_baseline), by = "method_id") %>%
+  left_join(metric_info %>% select(metric_ids = metric_id, maximize), by = "metric_ids") %>%
+  group_by(metric_ids, dataset_id) %>%
+  mutate(scaled_score = scale_scores(metric_values, is_baseline, maximize[[1]]) %|% 0) %>%
+  group_by(dataset_id, method_id) %>%
+  summarise(
+    metric_values = list(as.list(setNames(metric_values, metric_ids))),
+    scaled_scores = list(as.list(setNames(scaled_score, metric_ids))),
+    mean_score = aggregate_scores(scaled_score),
+    .groups = "drop"
+  )
+
+# read nxf log and process the task id
+norm_methods <- "/log_cp10k|/log_cpm|/sqrt_cp10k|/sqrt_cpm|/l1_sqrt|/log_scran_pooling"
+id_regex <- paste0("^.*:(.*)_process \\\\\\\\(([^\\\\\\\\.]*)(", norm_methods, ")?(.[^\\\\\\\\.]*)?\\\\\\\\.(.*)\\\\\\\\)\\$")
+
+trace <- readr::read_tsv(par\\$input_execution) %>%
+  mutate(
+    id = name,
+    process_id = stringr::str_extract(id, id_regex, 1L),
+    dataset_id = stringr::str_extract(id, id_regex, 2L),
+    normalization_id = gsub("^/", "", stringr::str_extract(id, id_regex, 3L)),
+    grp4 = gsub("^\\\\\\\\.", "", stringr::str_extract(id, id_regex, 4L)),
+    grp5 = stringr::str_extract(id, id_regex, 5L),
+    submit = strptime(submit, "%Y-%m-%d %H:%M:%S"),
+  ) %>%
+  # detect whether entry is a metric or a method
+  mutate(
+    method_id = ifelse(is.na(grp4), grp5, grp4),
+    metric_id = ifelse(is.na(grp4), grp4, grp5)
+  ) %>%
+  select(-grp4, -grp5) %>%
+  filter(!is.na(method_id)) %>%
+  # take last entry for each run
+  arrange(desc(submit)) %>%
+  group_by(name) %>%
+  slice(1) %>%
+  ungroup()
+
+# parse values
+execution_info <- trace %>%
+  filter(process_id == method_id) %>% # only keep method entries
+  rowwise() %>%
+  transmute(
+    dataset_id,
+    normalization_id,
+    method_id,
+    resources = list(list(
+      exit_code = parse_exit(exit),
+      duration_sec = parse_duration(realtime),
+      cpu_pct = parse_cpu(\\`%cpu\\`),
+      peak_memory_mb = parse_size(peak_vmem),
+      disk_read_mb = parse_size(rchar),
+      disk_write_mb = parse_size(wchar)
+    ))
+  ) %>%
+  ungroup()
+
+# combine scores with execution info
+# fill up missing entries with NAs and 0s
+metric_ids <- unique(raw_scores\\$metric_ids)
+rep_names <- function(val) {
+  setNames(
+    as.list(rep(val, length(metric_ids))),
+    metric_ids
+  )
+}
+out <- full_join(
+  scores,
+  execution_info,
+  by = c("method_id", "dataset_id")
+) %>%
+  rowwise() %>%
+  mutate(
+    task_id = par\\$task_id,
+    metric_values = list(metric_values %||% rep_names(NA_real_)),
+    scaled_scores = list(scaled_scores %||% rep_names(0)),
+    mean_score = mean_score %|% 0,
+  ) %>%
+  ungroup()
+
+
+# --- process metric execution info --------------------------------------------
+cat("Processing metric execution info\\\\n")
+metric_execution_info <- trace %>%
+  filter(process_id == metric_id) %>% # only keep metric entries
+  rowwise() %>%
+  transmute(
+    dataset_id,
+    normalization_id,
+    method_id,
+    metric_id,
+    resources = list(list(
+      exit_code = parse_exit(exit),
+      duration_sec = parse_duration(realtime),
+      cpu_pct = parse_cpu(\\`%cpu\\`),
+      peak_memory_mb = parse_size(peak_vmem),
+      disk_read_mb = parse_size(rchar),
+      disk_write_mb = parse_size(wchar)
+    ))
+  ) %>%
+  ungroup()
+
+# --- write output files -------------------------------------------------------
+cat("Writing output files\\\\n")
+# write output files
+jsonlite::write_json(
+  purrr::transpose(out),
+  par\\$output_results,
+  auto_unbox = TRUE,
+  pretty = TRUE
+)
+jsonlite::write_json(
+  purrr::transpose(metric_execution_info),
+  par\\$output_metric_execution_info,
+  auto_unbox = TRUE,
+  pretty = TRUE
+)
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/common/process_task_results/get_results",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "lowmem",
+    "lowtime",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/common/process_task_results/get_results/nextflow.config b/target/nextflow/common/process_task_results/get_results/nextflow.config
new file mode 100644
index 0000000000..65639b15b4
--- /dev/null
+++ b/target/nextflow/common/process_task_results/get_results/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'common/process_task_results/get_results'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Extract execution info'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/common/process_task_results/get_task_info/.config.vsh.yaml b/target/nextflow/common/process_task_results/get_task_info/.config.vsh.yaml
new file mode 100644
index 0000000000..e3156a97f0
--- /dev/null
+++ b/target/nextflow/common/process_task_results/get_task_info/.config.vsh.yaml
@@ -0,0 +1,120 @@
+functionality:
+  name: "get_task_info"
+  namespace: "common/process_task_results"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    description: "A yaml file"
+    info: null
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--task_id"
+    description: "A task dir"
+    info: null
+    example:
+    - "label_projection"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Output json"
+    info: null
+    default:
+    - "output.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  description: "Extract task info"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_get_info.py"
+    is_executable: true
+  - type: "file"
+    path: "src"
+    dest: "openproblems/src"
+  - type: "file"
+    path: "_viash.yaml"
+    dest: "openproblems/_viash.yaml"
+  - type: "file"
+    path: "resources_test/common/task_metadata/task_info.yaml"
+    dest: "test_file.yaml"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "purrr"
+    - "yaml"
+    - "rlang"
+    - "processx"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "lowmem"
+    - "lowtime"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_task_info/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_task_info"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_task_info/get_task_info"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/common/process_task_results/get_task_info/main.nf b/target/nextflow/common/process_task_results/get_task_info/main.nf
new file mode 100644
index 0000000000..43c5ed549f
--- /dev/null
+++ b/target/nextflow/common/process_task_results/get_task_info/main.nf
@@ -0,0 +1,3492 @@
+// get_task_info 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "get_task_info",
+    "namespace" : "common/process_task_results",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "description" : "A yaml file",
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--task_id",
+        "description" : "A task dir",
+        "example" : [
+          "label_projection"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "description" : "Output json",
+        "default" : [
+          "output.json"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_task_info/"
+      }
+    ],
+    "description" : "Extract task info",
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_get_info.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src",
+        "dest" : "openproblems/src",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "_viash.yaml",
+        "dest" : "openproblems/_viash.yaml",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/common/task_metadata/task_info.yaml",
+        "dest" : "test_file.yaml",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "cran" : [
+            "purrr",
+            "yaml",
+            "rlang",
+            "processx"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "lowmem",
+          "lowtime",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_task_info/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_task_info",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+requireNamespace("jsonlite", quietly = TRUE)
+requireNamespace("yaml", quietly = TRUE)
+library(purrr, warn.conflicts = FALSE)
+library(rlang, warn.conflicts = FALSE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "task_id" = $( if [ ! -z ${VIASH_PAR_TASK_ID+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_TASK_ID" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+info <- yaml::yaml.load_file(par\\$input)
+# ↑ this could be used as the new format
+
+# construct v1 format
+out <- list(
+  task_id = info\\$name,
+  commit_sha = NA_character_,
+  task_name = info\\$label,
+  task_summary = info\\$summary,
+  task_description = paste0(info\\$motivation, "\\\\n\\\\n", info\\$description),
+  repo = "openproblems-bio/openproblems",
+  authors = info\\$authors
+)
+
+# show warning when certain data is missing and return null?
+for (n in names(out)) {
+  if (is.null(out[[n]])) {
+    out_as_str <- jsonlite::toJSON(out, auto_unbox = TRUE, pretty = TRUE)
+    stop("missing value for value '", n, "' in ", out_as_str)
+  }
+}
+
+jsonlite::write_json(
+  out,
+  par\\$output,
+  auto_unbox = TRUE,
+  pretty = TRUE
+)
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/common/process_task_results/get_task_info",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "lowmem",
+    "lowtime",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/common/process_task_results/get_task_info/nextflow.config b/target/nextflow/common/process_task_results/get_task_info/nextflow.config
new file mode 100644
index 0000000000..e50a20da06
--- /dev/null
+++ b/target/nextflow/common/process_task_results/get_task_info/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'common/process_task_results/get_task_info'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Extract task info'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/common/process_task_results/run/.config.vsh.yaml b/target/nextflow/common/process_task_results/run/.config.vsh.yaml
new file mode 100644
index 0000000000..3753d3ba7e
--- /dev/null
+++ b/target/nextflow/common/process_task_results/run/.config.vsh.yaml
@@ -0,0 +1,315 @@
+functionality:
+  name: "run"
+  namespace: "common/process_task_results"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input_scores"
+      description: "A yaml file containing the scores of each of the methods"
+      info: null
+      example:
+      - "score_uns.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_method_configs"
+      info: null
+      example:
+      - "method_configs.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_metric_configs"
+      info: null
+      example:
+      - "metric_configs.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_dataset_info"
+      info: null
+      example:
+      - "dataset_info.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_execution"
+      info: null
+      example:
+      - "trace.txt"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_task_info"
+      info: null
+      example:
+      - "task_info.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_scores"
+      description: "A yaml file containing the scores of each of the methods"
+      info: null
+      default:
+      - "results.json"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_method_info"
+      info: null
+      default:
+      - "method_info.json"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_metric_info"
+      info: null
+      default:
+      - "metric_info.json"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_dataset_info"
+      info: null
+      default:
+      - "dataset_info.json"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_task_info"
+      info: null
+      default:
+      - "task_info.json"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_qc"
+      info: null
+      default:
+      - "quality_control.json"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_metric_execution_info"
+      info: null
+      default:
+      - "metric_execution_info.json"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  description: "This workflow transforms the meta information of the results into\
+    \ a format that can be used by the website."
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/process_task_results/get_results"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_results/config.vsh.yaml"
+    configInfo:
+      functionalityName: "get_results"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_results/config.vsh.yaml"
+      functionalityNamespace: "common/process_task_results"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/process_task_results/get_results/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_results"
+  - name: "common/process_task_results/get_method_info"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_method_info/config.vsh.yaml"
+    configInfo:
+      functionalityName: "get_method_info"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_method_info/config.vsh.yaml"
+      functionalityNamespace: "common/process_task_results"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/process_task_results/get_method_info/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_method_info"
+  - name: "common/process_task_results/get_metric_info"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_metric_info/config.vsh.yaml"
+    configInfo:
+      functionalityName: "get_metric_info"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_metric_info/config.vsh.yaml"
+      functionalityNamespace: "common/process_task_results"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/process_task_results/get_metric_info/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_metric_info"
+  - name: "common/process_task_results/get_dataset_info"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_dataset_info/config.vsh.yaml"
+    configInfo:
+      functionalityName: "get_dataset_info"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_dataset_info/config.vsh.yaml"
+      functionalityNamespace: "common/process_task_results"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/process_task_results/get_dataset_info/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_dataset_info"
+  - name: "common/process_task_results/get_task_info"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_task_info/config.vsh.yaml"
+    configInfo:
+      functionalityName: "get_task_info"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_task_info/config.vsh.yaml"
+      functionalityNamespace: "common/process_task_results"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/process_task_results/get_task_info/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_task_info"
+  - name: "common/process_task_results/generate_qc"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/generate_qc/config.vsh.yaml"
+    configInfo:
+      functionalityName: "generate_qc"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/generate_qc/config.vsh.yaml"
+      functionalityNamespace: "common/process_task_results"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/process_task_results/generate_qc/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/generate_qc"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/run/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/run"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/run/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/common/process_task_results/run/main.nf b/target/nextflow/common/process_task_results/run/main.nf
new file mode 100644
index 0000000000..fa8021c492
--- /dev/null
+++ b/target/nextflow/common/process_task_results/run/main.nf
@@ -0,0 +1,3403 @@
+// run 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "run",
+    "namespace" : "common/process_task_results",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input_scores",
+            "description" : "A yaml file containing the scores of each of the methods",
+            "example" : [
+              "score_uns.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_method_configs",
+            "example" : [
+              "method_configs.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_metric_configs",
+            "example" : [
+              "metric_configs.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_dataset_info",
+            "example" : [
+              "dataset_info.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_execution",
+            "example" : [
+              "trace.txt"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_task_info",
+            "example" : [
+              "task_info.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_scores",
+            "description" : "A yaml file containing the scores of each of the methods",
+            "default" : [
+              "results.json"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_method_info",
+            "default" : [
+              "method_info.json"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_metric_info",
+            "default" : [
+              "metric_info.json"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_dataset_info",
+            "default" : [
+              "dataset_info.json"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_task_info",
+            "default" : [
+              "task_info.json"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_qc",
+            "default" : [
+              "quality_control.json"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_metric_execution_info",
+            "default" : [
+              "metric_execution_info.json"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/process_task_results/run/",
+        "entrypoint" : "run_wf"
+      }
+    ],
+    "description" : "This workflow transforms the meta information of the results into a format that can be used by the website.",
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/process_task_results/get_results",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_results/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "get_results",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_results/config.vsh.yaml",
+          "functionalityNamespace" : "common/process_task_results",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/process_task_results/get_results/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_results"
+      },
+      {
+        "name" : "common/process_task_results/get_method_info",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_method_info/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "get_method_info",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_method_info/config.vsh.yaml",
+          "functionalityNamespace" : "common/process_task_results",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/process_task_results/get_method_info/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_method_info"
+      },
+      {
+        "name" : "common/process_task_results/get_metric_info",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_metric_info/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "get_metric_info",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_metric_info/config.vsh.yaml",
+          "functionalityNamespace" : "common/process_task_results",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/process_task_results/get_metric_info/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_metric_info"
+      },
+      {
+        "name" : "common/process_task_results/get_dataset_info",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_dataset_info/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "get_dataset_info",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_dataset_info/config.vsh.yaml",
+          "functionalityNamespace" : "common/process_task_results",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/process_task_results/get_dataset_info/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_dataset_info"
+      },
+      {
+        "name" : "common/process_task_results/get_task_info",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_task_info/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "get_task_info",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/get_task_info/config.vsh.yaml",
+          "functionalityNamespace" : "common/process_task_results",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/process_task_results/get_task_info/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/get_task_info"
+      },
+      {
+        "name" : "common/process_task_results/generate_qc",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/generate_qc/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "generate_qc",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/generate_qc/config.vsh.yaml",
+          "functionalityNamespace" : "common/process_task_results",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/process_task_results/generate_qc/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/generate_qc"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/run/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/run",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { get_results } from "${meta.resources_dir}/../../../../nextflow/common/process_task_results/get_results/main.nf"
+include { get_method_info } from "${meta.resources_dir}/../../../../nextflow/common/process_task_results/get_method_info/main.nf"
+include { get_metric_info } from "${meta.resources_dir}/../../../../nextflow/common/process_task_results/get_metric_info/main.nf"
+include { get_dataset_info } from "${meta.resources_dir}/../../../../nextflow/common/process_task_results/get_dataset_info/main.nf"
+include { get_task_info } from "${meta.resources_dir}/../../../../nextflow/common/process_task_results/get_task_info/main.nf"
+include { generate_qc } from "${meta.resources_dir}/../../../../nextflow/common/process_task_results/generate_qc/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+// workflow auto {
+//   findStates(params, meta.config)
+//     | meta.workflow.run(
+//       auto: [publish: "state"]
+//     )
+// }
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    | get_task_info.run(
+      key: "task_info",
+      fromState: [ 
+        "input": "input_task_info"
+      ],
+      toState: ["output_task": "output"]
+    )
+
+    // extract task id from task info
+    | map { id, state ->
+      def task_id = readJson(state.output_task).task_id
+      [id, state + ["task_id": task_id]]
+    }
+
+    | get_method_info.run(
+      fromState: [ 
+        "input": "input_method_configs",
+        "task_id" : "task_id"
+      ],
+      toState: ["output_method": "output"]
+    )
+
+    | get_metric_info.run(
+      fromState: [ 
+        "input": "input_metric_configs",
+        "task_id" : "task_id"
+      ],
+      toState: ["output_metric": "output"]
+    )
+
+    | get_dataset_info.run(
+      fromState: [
+        "task_id" : "task_id",
+        "input": "input_dataset_info",
+      ],
+      toState: ["output_dataset": "output"]
+    )
+
+    | get_results.run(
+      fromState: [ 
+        "task_id": "task_id",
+        "input_scores": "input_scores",
+        "input_execution": "input_execution",
+        "input_dataset_info": "output_dataset",
+        "input_method_info": "output_method",
+        "input_metric_info": "output_metric"
+      ],
+      toState: [
+        "output_results": "output_results",
+        "output_metric_execution_info": "output_metric_execution_info"
+      ]
+    )
+
+    | generate_qc.run(
+      fromState: [
+        "task_info": "output_task",
+        "method_info": "output_method",
+        "metric_info": "output_metric",
+        "dataset_info": "output_dataset",
+        "results": "output_results"
+      ],
+      toState: ["output_qc": "output"]
+    )
+
+    | setState([
+      "output_scores": "output_results",
+      "output_method_info": "output_method",
+      "output_metric_info": "output_metric",
+      "output_dataset_info": "output_dataset",
+      "output_task_info": "output_task",
+      "output_qc": "output_qc",
+      "output_metric_execution_info": "output_metric_execution_info"
+    ])
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/common/process_task_results/run/nextflow.config b/target/nextflow/common/process_task_results/run/nextflow.config
new file mode 100644
index 0000000000..4acf6bee92
--- /dev/null
+++ b/target/nextflow/common/process_task_results/run/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'common/process_task_results/run'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'This workflow transforms the meta information of the results into a format that can be used by the website.'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/common/process_task_results/yaml_to_json/.config.vsh.yaml b/target/nextflow/common/process_task_results/yaml_to_json/.config.vsh.yaml
new file mode 100644
index 0000000000..97dbb33f55
--- /dev/null
+++ b/target/nextflow/common/process_task_results/yaml_to_json/.config.vsh.yaml
@@ -0,0 +1,110 @@
+functionality:
+  name: "yaml_to_json"
+  namespace: "common/process_task_results"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    description: "A yaml file"
+    info: null
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--task_id"
+    description: "A task dir"
+    info: null
+    example:
+    - "label_projection"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Output json"
+    info: null
+    default:
+    - "output.json"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "convert yaml file to json file"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_get_info.py"
+    is_executable: true
+  - type: "file"
+    path: "src"
+    dest: "openproblems/src"
+  - type: "file"
+    path: "_viash.yaml"
+    dest: "openproblems/_viash.yaml"
+  - type: "file"
+    path: "resources_test/common/task_metadata/dataset_info.yaml"
+    dest: "test_file.yaml"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+- type: "native"
+  id: "native"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/process_task_results/yaml_to_json/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/yaml_to_json"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/yaml_to_json/yaml_to_json"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/common/process_task_results/yaml_to_json/main.nf b/target/nextflow/common/process_task_results/yaml_to_json/main.nf
new file mode 100644
index 0000000000..9ba3312801
--- /dev/null
+++ b/target/nextflow/common/process_task_results/yaml_to_json/main.nf
@@ -0,0 +1,3441 @@
+// yaml_to_json 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "yaml_to_json",
+    "namespace" : "common/process_task_results",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "description" : "A yaml file",
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--task_id",
+        "description" : "A task dir",
+        "example" : [
+          "label_projection"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "description" : "Output json",
+        "default" : [
+          "output.json"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/process_task_results/yaml_to_json/"
+      }
+    ],
+    "description" : "convert yaml file to json file",
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_get_info.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src",
+        "dest" : "openproblems/src",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "_viash.yaml",
+        "dest" : "openproblems/_viash.yaml",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/common/task_metadata/dataset_info.yaml",
+        "dest" : "test_file.yaml",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/common/process_task_results/yaml_to_json/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/process_task_results/yaml_to_json",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import yaml
+import json
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'task_id': $( if [ ! -z ${VIASH_PAR_TASK_ID+x} ]; then echo "r'${VIASH_PAR_TASK_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+with open(par["input"], "r") as f:
+    yaml_file = yaml.safe_load(f)
+
+with open(par["output"], "w") as out:
+    json.dump(yaml_file, out, indent=2)
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/common/process_task_results/yaml_to_json",
+    "tag" : "2.0.0"
+  },
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/common/process_task_results/yaml_to_json/nextflow.config b/target/nextflow/common/process_task_results/yaml_to_json/nextflow.config
new file mode 100644
index 0000000000..5dafbd277c
--- /dev/null
+++ b/target/nextflow/common/process_task_results/yaml_to_json/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'common/process_task_results/yaml_to_json'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'convert yaml file to json file'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/common/sync_test_resources/.config.vsh.yaml b/target/nextflow/common/sync_test_resources/.config.vsh.yaml
new file mode 100644
index 0000000000..eee3e6aaa2
--- /dev/null
+++ b/target/nextflow/common/sync_test_resources/.config.vsh.yaml
@@ -0,0 +1,126 @@
+functionality:
+  name: "sync_test_resources"
+  namespace: "common"
+  version: "2.0.0"
+  arguments:
+  - type: "string"
+    name: "--input"
+    alternatives:
+    - "-i"
+    description: "Path to the S3 bucket to sync from."
+    info: null
+    default:
+    - "s3://openproblems-data/resources_test"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    alternatives:
+    - "-o"
+    description: "Path to the test resource directory."
+    info: null
+    default:
+    - "resources_test"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean_true"
+    name: "--quiet"
+    description: "Displays the operations that would be performed using the specified\
+      \ command without actually running them."
+    info: null
+    direction: "input"
+    dest: "par"
+  - type: "boolean_true"
+    name: "--dryrun"
+    description: "Does not display the operations performed from the specified command."
+    info: null
+    direction: "input"
+    dest: "par"
+  - type: "boolean_true"
+    name: "--delete"
+    description: "Files that exist in the destination but not in the source are deleted\
+      \ during sync."
+    info: null
+    direction: "input"
+    dest: "par"
+  - type: "string"
+    name: "--exclude"
+    description: "Exclude all files or objects from the command that matches the specified\
+      \ pattern."
+    info: null
+    required: false
+    direction: "input"
+    multiple: true
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "bash_script"
+    path: "script.sh"
+    is_executable: true
+  description: "Synchronise the test resources from s3 to resources_test"
+  usage: "sync_test_resources\nsync_test_resources --input s3://openproblems-data/resources_test\
+    \ --output resources_test\n"
+  test_resources:
+  - type: "bash_script"
+    path: "run_test.sh"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "amazon/aws-cli:2.7.12"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/common/sync_test_resources/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/common/sync_test_resources"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/common/sync_test_resources/sync_test_resources"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/common/sync_test_resources/main.nf b/target/nextflow/common/sync_test_resources/main.nf
new file mode 100644
index 0000000000..a210fe4536
--- /dev/null
+++ b/target/nextflow/common/sync_test_resources/main.nf
@@ -0,0 +1,3466 @@
+// sync_test_resources 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "sync_test_resources",
+    "namespace" : "common",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "string",
+        "name" : "--input",
+        "alternatives" : [
+          "-i"
+        ],
+        "description" : "Path to the S3 bucket to sync from.",
+        "default" : [
+          "s3://openproblems-data/resources_test"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "alternatives" : [
+          "-o"
+        ],
+        "description" : "Path to the test resource directory.",
+        "default" : [
+          "resources_test"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "boolean_true",
+        "name" : "--quiet",
+        "description" : "Displays the operations that would be performed using the specified command without actually running them.",
+        "direction" : "input",
+        "dest" : "par"
+      },
+      {
+        "type" : "boolean_true",
+        "name" : "--dryrun",
+        "description" : "Does not display the operations performed from the specified command.",
+        "direction" : "input",
+        "dest" : "par"
+      },
+      {
+        "type" : "boolean_true",
+        "name" : "--delete",
+        "description" : "Files that exist in the destination but not in the source are deleted during sync.",
+        "direction" : "input",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--exclude",
+        "description" : "Exclude all files or objects from the command that matches the specified pattern.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : true,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "bash_script",
+        "path" : "script.sh",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/sync_test_resources/"
+      }
+    ],
+    "description" : "Synchronise the test resources from s3 to resources_test",
+    "usage" : "sync_test_resources\nsync_test_resources --input s3://openproblems-data/resources_test --output resources_test\n",
+    "test_resources" : [
+      {
+        "type" : "bash_script",
+        "path" : "run_test.sh",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/common/sync_test_resources/"
+      }
+    ],
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "amazon/aws-cli:2.7.12",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/common/sync_test_resources/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/sync_test_resources",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+#!/bin/bash
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+$( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "${VIASH_PAR_INPUT}" | sed "s#'#'\\"'\\"'#g;s#.*#par_input='&'#" ; else echo "# par_input="; fi )
+$( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "${VIASH_PAR_OUTPUT}" | sed "s#'#'\\"'\\"'#g;s#.*#par_output='&'#" ; else echo "# par_output="; fi )
+$( if [ ! -z ${VIASH_PAR_QUIET+x} ]; then echo "${VIASH_PAR_QUIET}" | sed "s#'#'\\"'\\"'#g;s#.*#par_quiet='&'#" ; else echo "# par_quiet="; fi )
+$( if [ ! -z ${VIASH_PAR_DRYRUN+x} ]; then echo "${VIASH_PAR_DRYRUN}" | sed "s#'#'\\"'\\"'#g;s#.*#par_dryrun='&'#" ; else echo "# par_dryrun="; fi )
+$( if [ ! -z ${VIASH_PAR_DELETE+x} ]; then echo "${VIASH_PAR_DELETE}" | sed "s#'#'\\"'\\"'#g;s#.*#par_delete='&'#" ; else echo "# par_delete="; fi )
+$( if [ ! -z ${VIASH_PAR_EXCLUDE+x} ]; then echo "${VIASH_PAR_EXCLUDE}" | sed "s#'#'\\"'\\"'#g;s#.*#par_exclude='&'#" ; else echo "# par_exclude="; fi )
+$( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "${VIASH_META_FUNCTIONALITY_NAME}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_functionality_name='&'#" ; else echo "# meta_functionality_name="; fi )
+$( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "${VIASH_META_RESOURCES_DIR}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_resources_dir='&'#" ; else echo "# meta_resources_dir="; fi )
+$( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "${VIASH_META_EXECUTABLE}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_executable='&'#" ; else echo "# meta_executable="; fi )
+$( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "${VIASH_META_CONFIG}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_config='&'#" ; else echo "# meta_config="; fi )
+$( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "${VIASH_META_TEMP_DIR}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_temp_dir='&'#" ; else echo "# meta_temp_dir="; fi )
+$( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "${VIASH_META_CPUS}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_cpus='&'#" ; else echo "# meta_cpus="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "${VIASH_META_MEMORY_B}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_memory_b='&'#" ; else echo "# meta_memory_b="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "${VIASH_META_MEMORY_KB}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_memory_kb='&'#" ; else echo "# meta_memory_kb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "${VIASH_META_MEMORY_MB}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_memory_mb='&'#" ; else echo "# meta_memory_mb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "${VIASH_META_MEMORY_GB}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_memory_gb='&'#" ; else echo "# meta_memory_gb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "${VIASH_META_MEMORY_TB}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_memory_tb='&'#" ; else echo "# meta_memory_tb="; fi )
+$( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "${VIASH_META_MEMORY_PB}" | sed "s#'#'\\"'\\"'#g;s#.*#meta_memory_pb='&'#" ; else echo "# meta_memory_pb="; fi )
+
+## VIASH END
+
+extra_params=( )
+
+if [ "\\$par_quiet" == "true" ]; then
+  extra_params+=( "--quiet" )
+fi
+if [ "\\$par_dryrun" == "true" ]; then
+  extra_params+=( "--dryrun" )
+fi
+if [ "\\$par_delete" == "true" ]; then
+  extra_params+=( "--delete" )
+fi
+
+if [ ! -z \\${par_exclude+x} ]; then
+  IFS=":"
+  for var in \\$par_exclude; do
+    unset IFS
+    extra_params+=( "--exclude" "\\$var" )
+  done
+fi
+
+
+# Disable the use of the Amazon EC2 instance metadata service (IMDS).
+# see https://florian.ec/blog/github-actions-awscli-errors/
+# or https://github.com/aws/aws-cli/issues/5234#issuecomment-705831465
+export AWS_EC2_METADATA_DISABLED=true
+
+aws s3 sync "\\$par_input" "\\$par_output" --no-sign-request "\\${extra_params[@]}"
+VIASHMAIN
+bash "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/common/sync_test_resources",
+    "tag" : "2.0.0"
+  },
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/common/sync_test_resources/nextflow.config b/target/nextflow/common/sync_test_resources/nextflow.config
new file mode 100644
index 0000000000..223b684211
--- /dev/null
+++ b/target/nextflow/common/sync_test_resources/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'common/sync_test_resources'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Synchronise the test resources from s3 to resources_test'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/loaders/cellxgene_census/.config.vsh.yaml b/target/nextflow/datasets/loaders/cellxgene_census/.config.vsh.yaml
new file mode 100644
index 0000000000..9f602fe81a
--- /dev/null
+++ b/target/nextflow/datasets/loaders/cellxgene_census/.config.vsh.yaml
@@ -0,0 +1,345 @@
+functionality:
+  name: "cellxgene_census"
+  namespace: "datasets/loaders"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Input database"
+    description: "Open CellxGene Census by version or URI."
+    arguments:
+    - type: "string"
+      name: "--input_uri"
+      description: "If specified, a URI containing the Census SOMA objects. If specified,\
+        \ will take precedence over the `--census_version` argument."
+      info: null
+      example:
+      - "s3://bucket/path"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--census_version"
+      description: "Which release of CellxGene census to use. Possible values are\
+        \ \"latest\", \"stable\", or the date of one of the releases (e.g. \"2023-07-25\"\
+        ). For more information, check the documentation on [Census data releases](https://chanzuckerberg.github.io/cellxgene-census/cellxgene_census_docsite_data_release_info.html)."
+      info: null
+      example:
+      - "stable"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Cell query"
+    description: "Arguments related to the query."
+    arguments:
+    - type: "string"
+      name: "--species"
+      description: "The organism to query, usually one of `Homo sapiens` or `Mus musculus`."
+      info: null
+      example:
+      - "homo_sapiens"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_value_filter"
+      description: "Filter for selecting the `obs` metadata (i.e. cells). Value is\
+        \ a filter query written in the SOMA `value_filter` syntax."
+      info: null
+      example:
+      - "is_primary_data == True and cell_type_ontology_term_id in ['CL:0000136',\
+        \ 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Filter cells by grouping"
+    arguments:
+    - type: "string"
+      name: "--cell_filter_grouping"
+      description: "A subset of 'obs' columns by which to group the cells for filtering.\n\
+        Only groups surpassing or equal to the `--cell_filter_minimum_count`\nthreshold\
+        \ will be retained. Take care not to introduce a selection\nbias against cells\
+        \ with more fine-grained ontology annotations.\n"
+      info: null
+      example:
+      - "dataset_id"
+      - "tissue"
+      - "assay"
+      - "disease"
+      - "cell_type"
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--cell_filter_minimum_count"
+      description: "A minimum number of cells per group to retain. If `--cell_filter_grouping`\n\
+        is defined, this parameter should also be provided and vice versa.\n"
+      info: null
+      example:
+      - 100
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Count filtering"
+    description: "Arguments related to filtering cells and genes by counts."
+    arguments:
+    - type: "integer"
+      name: "--cell_filter_min_genes"
+      description: "Remove cells with less than this number of genes."
+      info: null
+      default:
+      - 50
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--cell_filter_min_counts"
+      description: "Remove cells with less than this number of counts."
+      info: null
+      default:
+      - 0
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_cells"
+      description: "Remove genes expressed in less than this number of cells."
+      info: null
+      default:
+      - 5
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_counts"
+      description: "Remove genes with less than this number of counts."
+      info: null
+      default:
+      - 0
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Cell metadata"
+    description: "Cell metadata arguments"
+    arguments:
+    - type: "string"
+      name: "--obs_batch"
+      description: "Location of where to find the observation batch IDs.  \n\n* If\
+        \ not specified, the `.obs[\"batch\"]` field will not be included.\n* If one\
+        \ or more values are specified, the `.obs[\"batch\"]` field will be \n  set\
+        \ to the concatenated values of the specified fields, separated by\n  the\
+        \ `obs_batch_separator`.\n"
+      info: null
+      example:
+      - "batch"
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ","
+      dest: "par"
+    - type: "string"
+      name: "--obs_batch_separator"
+      description: "Separator to use when concatenating the values of the `--obs_batch`\
+        \ fields."
+      info: null
+      default:
+      - "+"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Dataset metadata"
+    description: "Information about the dataset that will be stored in the `.uns`\
+      \ slot."
+    arguments:
+    - type: "string"
+      name: "--dataset_id"
+      description: "Unique identifier of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    description: "Output arguments."
+    arguments:
+    - type: "file"
+      name: "--output"
+      description: "Output h5ad file."
+      info: null
+      example:
+      - "output.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--output_compression"
+      info: null
+      example:
+      - "gzip"
+      required: false
+      choices:
+      - "gzip"
+      - "lzf"
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/setup_logger.py"
+  description: "Query cells from a CellxGene Census or custom TileDBSoma object.\n\
+    Aside from fetching the cells' RNA counts (`.X`), cell metadata\n(`.obs`) and\
+    \ gene metadata (`.var`), this component also fetches\nthe dataset metadata and\
+    \ joins it into the cell metadata.\n"
+  test_resources:
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.11"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "cellxgene-census"
+    - "scanpy"
+    upgrade: true
+  test_setup:
+  - type: "python"
+    user: false
+    packages:
+    - "viashpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/cellxgene_census/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/cellxgene_census"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/cellxgene_census/cellxgene_census"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/loaders/cellxgene_census/main.nf b/target/nextflow/datasets/loaders/cellxgene_census/main.nf
new file mode 100644
index 0000000000..70cd71dc44
--- /dev/null
+++ b/target/nextflow/datasets/loaders/cellxgene_census/main.nf
@@ -0,0 +1,3893 @@
+// cellxgene_census 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "cellxgene_census",
+    "namespace" : "datasets/loaders",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Input database",
+        "description" : "Open CellxGene Census by version or URI.",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--input_uri",
+            "description" : "If specified, a URI containing the Census SOMA objects. If specified, will take precedence over the `--census_version` argument.",
+            "example" : [
+              "s3://bucket/path"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--census_version",
+            "description" : "Which release of CellxGene census to use. Possible values are \\"latest\\", \\"stable\\", or the date of one of the releases (e.g. \\"2023-07-25\\"). For more information, check the documentation on [Census data releases](https://chanzuckerberg.github.io/cellxgene-census/cellxgene_census_docsite_data_release_info.html).",
+            "example" : [
+              "stable"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Cell query",
+        "description" : "Arguments related to the query.",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--species",
+            "description" : "The organism to query, usually one of `Homo sapiens` or `Mus musculus`.",
+            "example" : [
+              "homo_sapiens"
+            ],
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--obs_value_filter",
+            "description" : "Filter for selecting the `obs` metadata (i.e. cells). Value is a filter query written in the SOMA `value_filter` syntax.",
+            "example" : [
+              "is_primary_data == True and cell_type_ontology_term_id in ['CL:0000136', 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'"
+            ],
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Filter cells by grouping",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--cell_filter_grouping",
+            "description" : "A subset of 'obs' columns by which to group the cells for filtering.\nOnly groups surpassing or equal to the `--cell_filter_minimum_count`\nthreshold will be retained. Take care not to introduce a selection\nbias against cells with more fine-grained ontology annotations.\n",
+            "example" : [
+              "dataset_id",
+              "tissue",
+              "assay",
+              "disease",
+              "cell_type"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--cell_filter_minimum_count",
+            "description" : "A minimum number of cells per group to retain. If `--cell_filter_grouping`\nis defined, this parameter should also be provided and vice versa.\n",
+            "example" : [
+              100
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Count filtering",
+        "description" : "Arguments related to filtering cells and genes by counts.",
+        "arguments" : [
+          {
+            "type" : "integer",
+            "name" : "--cell_filter_min_genes",
+            "description" : "Remove cells with less than this number of genes.",
+            "default" : [
+              50
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--cell_filter_min_counts",
+            "description" : "Remove cells with less than this number of counts.",
+            "default" : [
+              0
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--gene_filter_min_cells",
+            "description" : "Remove genes expressed in less than this number of cells.",
+            "default" : [
+              5
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--gene_filter_min_counts",
+            "description" : "Remove genes with less than this number of counts.",
+            "default" : [
+              0
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Cell metadata",
+        "description" : "Cell metadata arguments",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--obs_batch",
+            "description" : "Location of where to find the observation batch IDs.  \n\n* If not specified, the `.obs[\\"batch\\"]` field will not be included.\n* If one or more values are specified, the `.obs[\\"batch\\"]` field will be \n  set to the concatenated values of the specified fields, separated by\n  the `obs_batch_separator`.\n",
+            "example" : [
+              "batch"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ",",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--obs_batch_separator",
+            "description" : "Separator to use when concatenating the values of the `--obs_batch` fields.",
+            "default" : [
+              "+"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Dataset metadata",
+        "description" : "Information about the dataset that will be stored in the `.uns` slot.",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--dataset_id",
+            "description" : "Unique identifier of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_name",
+            "description" : "Nicely formatted name.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_url",
+            "description" : "Link to the original source of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_reference",
+            "description" : "Bibtex reference of the paper in which the dataset was published.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_summary",
+            "description" : "Short description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_description",
+            "description" : "Long description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_organism",
+            "description" : "The organism of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "description" : "Output arguments.",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output",
+            "description" : "Output h5ad file.",
+            "example" : [
+              "output.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--output_compression",
+            "example" : [
+              "gzip"
+            ],
+            "required" : false,
+            "choices" : [
+              "gzip",
+              "lzf"
+            ],
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/loaders/cellxgene_census/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/helper_functions/setup_logger.py",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "description" : "Query cells from a CellxGene Census or custom TileDBSoma object.\nAside from fetching the cells' RNA counts (`.X`), cell metadata\n(`.obs`) and gene metadata (`.var`), this component also fetches\nthe dataset metadata and joins it into the cell metadata.\n",
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "test.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/loaders/cellxgene_census/"
+      }
+    ],
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "python:3.11",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "cellxgene-census",
+            "scanpy"
+          ],
+          "upgrade" : true
+        }
+      ],
+      "test_setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "viashpy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "highmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/cellxgene_census/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/cellxgene_census",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import cellxgene_census
+import scanpy as sc
+import tiledbsoma as soma
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_uri': $( if [ ! -z ${VIASH_PAR_INPUT_URI+x} ]; then echo "r'${VIASH_PAR_INPUT_URI//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'census_version': $( if [ ! -z ${VIASH_PAR_CENSUS_VERSION+x} ]; then echo "r'${VIASH_PAR_CENSUS_VERSION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'species': $( if [ ! -z ${VIASH_PAR_SPECIES+x} ]; then echo "r'${VIASH_PAR_SPECIES//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'obs_value_filter': $( if [ ! -z ${VIASH_PAR_OBS_VALUE_FILTER+x} ]; then echo "r'${VIASH_PAR_OBS_VALUE_FILTER//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cell_filter_grouping': $( if [ ! -z ${VIASH_PAR_CELL_FILTER_GROUPING+x} ]; then echo "r'${VIASH_PAR_CELL_FILTER_GROUPING//\\'/\\'\\"\\'\\"r\\'}'.split(':')"; else echo None; fi ),
+  'cell_filter_minimum_count': $( if [ ! -z ${VIASH_PAR_CELL_FILTER_MINIMUM_COUNT+x} ]; then echo "int(r'${VIASH_PAR_CELL_FILTER_MINIMUM_COUNT//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'cell_filter_min_genes': $( if [ ! -z ${VIASH_PAR_CELL_FILTER_MIN_GENES+x} ]; then echo "int(r'${VIASH_PAR_CELL_FILTER_MIN_GENES//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'cell_filter_min_counts': $( if [ ! -z ${VIASH_PAR_CELL_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_CELL_FILTER_MIN_COUNTS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'gene_filter_min_cells': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_CELLS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_CELLS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'gene_filter_min_counts': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_COUNTS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'obs_batch': $( if [ ! -z ${VIASH_PAR_OBS_BATCH+x} ]; then echo "r'${VIASH_PAR_OBS_BATCH//\\'/\\'\\"\\'\\"r\\'}'.split(',')"; else echo None; fi ),
+  'obs_batch_separator': $( if [ ! -z ${VIASH_PAR_OBS_BATCH_SEPARATOR+x} ]; then echo "r'${VIASH_PAR_OBS_BATCH_SEPARATOR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_id': $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo "r'${VIASH_PAR_DATASET_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_name': $( if [ ! -z ${VIASH_PAR_DATASET_NAME+x} ]; then echo "r'${VIASH_PAR_DATASET_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_url': $( if [ ! -z ${VIASH_PAR_DATASET_URL+x} ]; then echo "r'${VIASH_PAR_DATASET_URL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_reference': $( if [ ! -z ${VIASH_PAR_DATASET_REFERENCE+x} ]; then echo "r'${VIASH_PAR_DATASET_REFERENCE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_summary': $( if [ ! -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then echo "r'${VIASH_PAR_DATASET_SUMMARY//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_description': $( if [ ! -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then echo "r'${VIASH_PAR_DATASET_DESCRIPTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_organism': $( if [ ! -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then echo "r'${VIASH_PAR_DATASET_ORGANISM//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_compression': $( if [ ! -z ${VIASH_PAR_OUTPUT_COMPRESSION+x} ]; then echo "r'${VIASH_PAR_OUTPUT_COMPRESSION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+
+from setup_logger import setup_logger
+logger = setup_logger()
+
+def connect_census(uri, census_version):
+    """
+    Connect to CellxGene Census or user-provided TileDBSoma object
+    """
+    ver = census_version or "stable"
+    logger.info("Connecting to CellxGene Census at %s", f"'{uri}'" if uri else f"version '{ver}'")
+    return cellxgene_census.open_soma(uri=uri, census_version=ver)
+
+def get_anndata(census_connection, par):
+    logger.info("Getting gene expression data based on \\`%s\\` query.", par["obs_value_filter"])
+    # workaround for https://github.com/chanzuckerberg/cellxgene-census/issues/891
+    return cellxgene_census.get_anndata(
+        census=census_connection,
+        obs_value_filter=par["obs_value_filter"],
+        organism=par["species"]
+    )
+
+    # exp = census_connection["census_data"][par["species"]]
+    # query = exp.axis_query(
+    #     "RNA",
+    #     obs_query=soma.AxisQuery(value_filter=par["obs_value_filter"]),
+    #     var_query=soma.AxisQuery(),
+    # )
+
+    # n_obs = query.n_obs
+    # n_vars = query.n_vars
+    # logger.info(f"Query yields {n_obs} cells and {n_vars} genes.")
+
+    # logger.info("Fetching obs.")
+    # obs = query.obs().concat().to_pandas()
+
+    # logger.info("Fetching var.")
+    # var = query.var().concat().to_pandas()
+
+    # logger.info("Fetching X.")
+    # X = query.X("raw")
+    # Xcoo = X.coos().concat()
+    # Xcoos = Xcoo.to_scipy().tocsr()
+    # Xcoos_subset = Xcoos[obs["soma_joinid"]]
+
+    # logger.info("Creating AnnData object.")
+    # return sc.AnnData(
+    #     layers={"counts": Xcoos_subset},
+    #     obs=obs,
+    #     var=var
+    # )
+
+def filter_min_cells_per_group(adata, par):
+    n_cells_before, _ = adata.shape
+    cell_count = adata.obs \\\\
+        .groupby(par["cell_filter_grouping"])["soma_joinid"] \\\\
+        .transform("count") \\\\
+        
+    adata = adata[cell_count >= par["cell_filter_minimum_count"]]
+    n_cells_after, _ = adata.shape
+    logger.info(
+        "Removed %s cells based on %s cell_filter_minimum_count of %s cell_filter_grouping."
+        % ((n_cells_before - n_cells_after), par["cell_filter_minimum_count"], par["cell_filter_grouping"])
+    )
+    return adata
+
+def filter_by_counts(adata, par):
+    logger.info("Remove cells with few counts and genes with few counts.")
+    n_cells_before, n_genes_before = adata.shape
+    # remove cells with few counts and genes with few counts
+    scanpy_proc = {
+        par["cell_filter_min_counts"]: (sc.pp.filter_cells, "min_counts"),
+        par["cell_filter_min_genes"]: (sc.pp.filter_cells, "min_genes"),
+        par["gene_filter_min_counts"]: (sc.pp.filter_genes, "min_counts"),
+        par["gene_filter_min_cells"]: (sc.pp.filter_genes, "min_cells"),
+    }
+    for threshold, (func, arg) in scanpy_proc.items():
+        if threshold:
+            func(adata, **{arg: threshold})
+    n_cells_after, n_genes_after = adata.shape
+    logger.info("Removed %s cells and %s genes.", (n_cells_before - n_cells_after), (n_genes_before - n_genes_after))
+
+def move_x_to_layers(adata):
+    logger.info("Move .X to .layers['counts']")
+    adata.layers["counts"] = adata.X
+    adata.X = None
+
+def add_batch_to_obs(adata, par):
+    logger.info("Add batch to the AnnData object.")
+    if par["obs_batch"]:
+        # fetch batch columns from obs
+        cols = [adata.obs[key] for key in par["obs_batch"]]
+        
+        # join cols
+        obs_batch = [par["obs_batch_separator"].join(row) for row in zip(*cols)]
+
+        # store in adata
+        adata.obs["batch"] = obs_batch
+
+def add_metadata_to_uns(adata, par):
+    logger.info("Add metadata to the AnnData object.")
+    for key in ["dataset_id", "dataset_name", "dataset_url", "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism"]:
+        adata.uns[key] = par[key]
+
+def print_unique(adata, column):
+    formatted = "', '".join(adata.obs[column].unique())
+    logger.info(f"Unique {column}: ['{formatted}']")
+
+def print_summary(adata):
+    logger.info(f"Resulting dataset: {adata}")
+
+    logger.info("Summary of dataset:")
+    obs_fields = ["assay", "assay_ontology_term_id", "cell_type", "cell_type_ontology_term_id", "dataset_id", "development_stage", "development_stage_ontology_term_id", "disease", "disease_ontology_term_id", "tissue", "tissue_ontology_term_id", "tissue_general", "tissue_general_ontology_term_id"]
+    for field in obs_fields:
+        print_unique(adata, field)
+def write_anndata(adata, par):
+    logger.info("Writing AnnData object to '%s'", par["output"])
+
+    adata.write_h5ad(par["output"], compression=par["output_compression"])
+
+def main(par, meta):
+    # check arguments
+    if (par["cell_filter_grouping"] is None) != (par["cell_filter_minimum_count"] is None):
+        raise NotImplementedError(
+            "You need to specify either both or none of the following parameters: cell_filter_grouping, cell_filter_minimum_count"
+        )
+    
+    with connect_census(uri=par["input_uri"], census_version=par["census_version"]) as conn:
+        adata = get_anndata(conn, par)
+    
+    print(f"AnnData: {adata}", flush=True)
+
+    if par["cell_filter_grouping"] is not None:
+        adata = filter_min_cells_per_group(adata, par)
+
+    # remove cells with few counts and genes with few counts
+    filter_by_counts(adata, par)
+
+    # logger.log(f"Filtered AnnData: {adata}")
+    print(f"Filtered AnnData: {adata}", flush=True)
+
+    # use feature_id as var_names
+    adata.var_names = adata.var["feature_id"]
+
+    # not needed as long as we have our own implementation of \\`get_anndata\\`
+    # move .X to .layers["counts"]
+    move_x_to_layers(adata)
+
+    # add batch to obs
+    add_batch_to_obs(adata, par)
+
+    # add metadata to uns
+    add_metadata_to_uns(adata, par)
+
+    # print summary
+    print_summary(adata)
+
+    # write output to file
+    write_anndata(adata, par)
+
+
+if __name__ == "__main__":
+    main(par, meta)
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/loaders/cellxgene_census",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "highmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/loaders/cellxgene_census/nextflow.config b/target/nextflow/datasets/loaders/cellxgene_census/nextflow.config
new file mode 100644
index 0000000000..fadb808253
--- /dev/null
+++ b/target/nextflow/datasets/loaders/cellxgene_census/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/loaders/cellxgene_census'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Query cells from a CellxGene Census or custom TileDBSoma object.\nAside from fetching the cells\' RNA counts (`.X`), cell metadata\n(`.obs`) and gene metadata (`.var`), this component also fetches\nthe dataset metadata and joins it into the cell metadata.\n'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/loaders/cellxgene_census/setup_logger.py b/target/nextflow/datasets/loaders/cellxgene_census/setup_logger.py
new file mode 100644
index 0000000000..ae71eb9611
--- /dev/null
+++ b/target/nextflow/datasets/loaders/cellxgene_census/setup_logger.py
@@ -0,0 +1,12 @@
+def setup_logger():
+    import logging
+    from sys import stdout
+
+    logger = logging.getLogger()
+    logger.setLevel(logging.INFO)
+    console_handler = logging.StreamHandler(stdout)
+    logFormatter = logging.Formatter("%(asctime)s %(levelname)-8s %(message)s")
+    console_handler.setFormatter(logFormatter)
+    logger.addHandler(console_handler)
+
+    return logger
\ No newline at end of file
diff --git a/target/nextflow/datasets/loaders/cellxgene_census_from_source_h5ad/.config.vsh.yaml b/target/nextflow/datasets/loaders/cellxgene_census_from_source_h5ad/.config.vsh.yaml
new file mode 100644
index 0000000000..811e63483e
--- /dev/null
+++ b/target/nextflow/datasets/loaders/cellxgene_census_from_source_h5ad/.config.vsh.yaml
@@ -0,0 +1,272 @@
+functionality:
+  name: "cellxgene_census_from_source_h5ad"
+  namespace: "datasets/loaders"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Input"
+    description: "Input arguments"
+    arguments:
+    - type: "string"
+      name: "--input_id"
+      description: "The dataset ID of the CellxGene Census dataset to query.\n"
+      info: null
+      example:
+      - "a93eab58-3d82-4b61-8a2f-d7666dcdb7c4"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Count filtering"
+    description: "Arguments related to filtering cells and genes by counts."
+    arguments:
+    - type: "integer"
+      name: "--cell_filter_min_genes"
+      description: "Remove cells with less than this number of genes."
+      info: null
+      default:
+      - 50
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--cell_filter_min_counts"
+      description: "Remove cells with less than this number of counts."
+      info: null
+      default:
+      - 0
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_cells"
+      description: "Remove genes expressed in less than this number of cells."
+      info: null
+      default:
+      - 5
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_counts"
+      description: "Remove genes with less than this number of counts."
+      info: null
+      default:
+      - 0
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Cell metadata"
+    description: "Cell metadata arguments"
+    arguments:
+    - type: "string"
+      name: "--obs_batch"
+      description: "Location of where to find the observation batch IDs.  \n\n* If\
+        \ not specified, the `.obs[\"batch\"]` field will not be included.\n* If one\
+        \ or more values are specified, the `.obs[\"batch\"]` field will be \n  set\
+        \ to the concatenated values of the specified fields, separated by\n  the\
+        \ `obs_batch_separator`.\n"
+      info: null
+      example:
+      - "batch"
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ","
+      dest: "par"
+    - type: "string"
+      name: "--obs_batch_separator"
+      description: "Separator to use when concatenating the values of the `--obs_batch`\
+        \ fields."
+      info: null
+      default:
+      - "+"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Dataset metadata"
+    description: "Information about the dataset that will be stored in the `.uns`\
+      \ slot."
+    arguments:
+    - type: "string"
+      name: "--dataset_id"
+      description: "Unique identifier of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    description: "Output arguments."
+    arguments:
+    - type: "file"
+      name: "--output"
+      description: "Output h5ad file."
+      info: null
+      example:
+      - "output.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--output_compression"
+      info: null
+      example:
+      - "gzip"
+      required: false
+      choices:
+      - "gzip"
+      - "lzf"
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/setup_logger.py"
+  description: "Query cells from a CellxGene Census or custom TileDBSoma object.\n\
+    Aside from fetching the cells' RNA counts (`.X`), cell metadata\n(`.obs`) and\
+    \ gene metadata (`.var`), this component also fetches\nthe dataset metadata and\
+    \ joins it into the cell metadata.\n"
+  test_resources:
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.11"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "cellxgene-census"
+    - "scanpy"
+    upgrade: true
+  test_setup:
+  - type: "python"
+    user: false
+    packages:
+    - "viashpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/cellxgene_census_from_source_h5ad/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/cellxgene_census_from_source_h5ad"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/cellxgene_census_from_source_h5ad/cellxgene_census_from_source_h5ad"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/loaders/cellxgene_census_from_source_h5ad/main.nf b/target/nextflow/datasets/loaders/cellxgene_census_from_source_h5ad/main.nf
new file mode 100644
index 0000000000..bac8e98c84
--- /dev/null
+++ b/target/nextflow/datasets/loaders/cellxgene_census_from_source_h5ad/main.nf
@@ -0,0 +1,3754 @@
+// cellxgene_census_from_source_h5ad 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "cellxgene_census_from_source_h5ad",
+    "namespace" : "datasets/loaders",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Input",
+        "description" : "Input arguments",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--input_id",
+            "description" : "The dataset ID of the CellxGene Census dataset to query.\n",
+            "example" : [
+              "a93eab58-3d82-4b61-8a2f-d7666dcdb7c4"
+            ],
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Count filtering",
+        "description" : "Arguments related to filtering cells and genes by counts.",
+        "arguments" : [
+          {
+            "type" : "integer",
+            "name" : "--cell_filter_min_genes",
+            "description" : "Remove cells with less than this number of genes.",
+            "default" : [
+              50
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--cell_filter_min_counts",
+            "description" : "Remove cells with less than this number of counts.",
+            "default" : [
+              0
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--gene_filter_min_cells",
+            "description" : "Remove genes expressed in less than this number of cells.",
+            "default" : [
+              5
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--gene_filter_min_counts",
+            "description" : "Remove genes with less than this number of counts.",
+            "default" : [
+              0
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Cell metadata",
+        "description" : "Cell metadata arguments",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--obs_batch",
+            "description" : "Location of where to find the observation batch IDs.  \n\n* If not specified, the `.obs[\\"batch\\"]` field will not be included.\n* If one or more values are specified, the `.obs[\\"batch\\"]` field will be \n  set to the concatenated values of the specified fields, separated by\n  the `obs_batch_separator`.\n",
+            "example" : [
+              "batch"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ",",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--obs_batch_separator",
+            "description" : "Separator to use when concatenating the values of the `--obs_batch` fields.",
+            "default" : [
+              "+"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Dataset metadata",
+        "description" : "Information about the dataset that will be stored in the `.uns` slot.",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--dataset_id",
+            "description" : "Unique identifier of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_name",
+            "description" : "Nicely formatted name.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_url",
+            "description" : "Link to the original source of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_reference",
+            "description" : "Bibtex reference of the paper in which the dataset was published.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_summary",
+            "description" : "Short description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_description",
+            "description" : "Long description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_organism",
+            "description" : "The organism of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "description" : "Output arguments.",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output",
+            "description" : "Output h5ad file.",
+            "example" : [
+              "output.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--output_compression",
+            "example" : [
+              "gzip"
+            ],
+            "required" : false,
+            "choices" : [
+              "gzip",
+              "lzf"
+            ],
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/loaders/cellxgene_census_from_source_h5ad/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/helper_functions/setup_logger.py",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "description" : "Query cells from a CellxGene Census or custom TileDBSoma object.\nAside from fetching the cells' RNA counts (`.X`), cell metadata\n(`.obs`) and gene metadata (`.var`), this component also fetches\nthe dataset metadata and joins it into the cell metadata.\n",
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "test.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/loaders/cellxgene_census_from_source_h5ad/"
+      }
+    ],
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "python:3.11",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "cellxgene-census",
+            "scanpy"
+          ],
+          "upgrade" : true
+        }
+      ],
+      "test_setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "viashpy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "highmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/cellxgene_census_from_source_h5ad/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/cellxgene_census_from_source_h5ad",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import cellxgene_census
+import scanpy as sc
+import tempfile
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_id': $( if [ ! -z ${VIASH_PAR_INPUT_ID+x} ]; then echo "r'${VIASH_PAR_INPUT_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cell_filter_min_genes': $( if [ ! -z ${VIASH_PAR_CELL_FILTER_MIN_GENES+x} ]; then echo "int(r'${VIASH_PAR_CELL_FILTER_MIN_GENES//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'cell_filter_min_counts': $( if [ ! -z ${VIASH_PAR_CELL_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_CELL_FILTER_MIN_COUNTS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'gene_filter_min_cells': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_CELLS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_CELLS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'gene_filter_min_counts': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_COUNTS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'obs_batch': $( if [ ! -z ${VIASH_PAR_OBS_BATCH+x} ]; then echo "r'${VIASH_PAR_OBS_BATCH//\\'/\\'\\"\\'\\"r\\'}'.split(',')"; else echo None; fi ),
+  'obs_batch_separator': $( if [ ! -z ${VIASH_PAR_OBS_BATCH_SEPARATOR+x} ]; then echo "r'${VIASH_PAR_OBS_BATCH_SEPARATOR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_id': $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo "r'${VIASH_PAR_DATASET_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_name': $( if [ ! -z ${VIASH_PAR_DATASET_NAME+x} ]; then echo "r'${VIASH_PAR_DATASET_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_url': $( if [ ! -z ${VIASH_PAR_DATASET_URL+x} ]; then echo "r'${VIASH_PAR_DATASET_URL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_reference': $( if [ ! -z ${VIASH_PAR_DATASET_REFERENCE+x} ]; then echo "r'${VIASH_PAR_DATASET_REFERENCE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_summary': $( if [ ! -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then echo "r'${VIASH_PAR_DATASET_SUMMARY//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_description': $( if [ ! -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then echo "r'${VIASH_PAR_DATASET_DESCRIPTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_organism': $( if [ ! -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then echo "r'${VIASH_PAR_DATASET_ORGANISM//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_compression': $( if [ ! -z ${VIASH_PAR_OUTPUT_COMPRESSION+x} ]; then echo "r'${VIASH_PAR_OUTPUT_COMPRESSION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta["resources_dir"])
+
+from setup_logger import setup_logger
+logger = setup_logger()
+
+def get_anndata(par):
+    with tempfile.TemporaryDirectory() as tmp:
+        path = tmp + "/source.h5ad"
+        logger.info("Downloading source h5ad for dataset '%s' to '%s'.", par["input_id"], path)
+        cellxgene_census.download_source_h5ad(par["input_id"], path)
+        return sc.read_h5ad(path)
+
+def filter_by_counts(adata, par):
+    logger.info("Remove cells with few counts and genes with few counts.")
+    t0 = adata.shape
+    # remove cells with few counts and genes with few counts
+    if par["cell_filter_min_counts"]:
+        sc.pp.filter_cells(adata, min_counts=par["cell_filter_min_counts"])
+    if par["cell_filter_min_genes"]:
+        sc.pp.filter_cells(adata, min_genes=par["cell_filter_min_genes"])
+    if par["gene_filter_min_counts"]:
+        sc.pp.filter_genes(adata, min_counts=par["gene_filter_min_counts"])
+    if par["gene_filter_min_cells"]:
+        sc.pp.filter_genes(adata, min_cells=par["gene_filter_min_cells"])
+    t1 = adata.shape
+    logger.info("Removed %s cells and %s genes.", (t0[0] - t1[0]), (t0[1] - t1[1]))
+
+def move_x_to_layers(adata):
+    logger.info("Move .X to .layers['counts']")
+    adata.layers["counts"] = adata.X
+    adata.X = None
+
+def add_batch_to_obs(adata, par):
+    logger.info("Add batch to the AnnData object.")
+    if par["obs_batch"]:
+        # fetch batch columns from obs
+        cols = [adata.obs[key] for key in par["obs_batch"]]
+        
+        # join cols
+        obs_batch = [par["obs_batch_separator"].join(row) for row in zip(*cols)]
+
+        # store in adata
+        adata.obs["batch"] = obs_batch
+
+def add_metadata_to_uns(adata, par):
+    logger.info("Add metadata to the AnnData object.")
+    for key in ["dataset_id", "dataset_name", "dataset_url", "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism"]:
+        adata.uns[key] = par[key]
+
+def print_unique(adata, column):
+    if column not in adata.obs.columns:
+        logger.info(f"Column {column} not found in obs")
+        return
+    formatted = "', '".join(adata.obs[column].unique())
+    logger.info(f"Unique {column}: ['{formatted}']")
+
+def print_summary(adata):
+    logger.info(f"Resulting dataset: {adata}")
+
+    logger.info("Summary of dataset:")
+    print_unique(adata, "assay")
+    print_unique(adata, "assay_ontology_term_id")
+    print_unique(adata, "cell_type")
+    print_unique(adata, "cell_type_ontology_term_id")
+    print_unique(adata, "dataset_id")
+    print_unique(adata, "development_stage")
+    print_unique(adata, "development_stage_ontology_term_id")
+    print_unique(adata, "disease")
+    print_unique(adata, "disease_ontology_term_id")
+    print_unique(adata, "tissue")
+    print_unique(adata, "tissue_ontology_term_id")
+    print_unique(adata, "tissue_general")
+    print_unique(adata, "tissue_general_ontology_term_id")
+
+def write_anndata(adata, par):
+    logger.info("Writing AnnData object to '%s'", par["output"])
+
+    adata.write_h5ad(par["output"], compression=par["output_compression"])
+
+def main(par, meta):
+    adata = get_anndata(par)
+
+    logger.info("AnnData: %s", str(adata))
+
+    # remove cells with few counts and genes with few counts
+    filter_by_counts(adata, par)
+
+    # this is not needed in source h5ads
+    # # use feature_id as var_names
+    # adata.var_names = adata.var["feature_id"]
+
+    # move .X to .layers["counts"]
+    move_x_to_layers(adata)
+
+    # add batch to obs
+    add_batch_to_obs(adata, par)
+
+    # add metadata to uns
+    add_metadata_to_uns(adata, par)
+
+    # print summary
+    print_summary(adata)
+
+    # write output to file
+    write_anndata(adata, par)
+
+
+if __name__ == "__main__":
+    main(par, meta)
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/loaders/cellxgene_census_from_source_h5ad",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "highmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/loaders/cellxgene_census_from_source_h5ad/nextflow.config b/target/nextflow/datasets/loaders/cellxgene_census_from_source_h5ad/nextflow.config
new file mode 100644
index 0000000000..37447bee1e
--- /dev/null
+++ b/target/nextflow/datasets/loaders/cellxgene_census_from_source_h5ad/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/loaders/cellxgene_census_from_source_h5ad'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Query cells from a CellxGene Census or custom TileDBSoma object.\nAside from fetching the cells\' RNA counts (`.X`), cell metadata\n(`.obs`) and gene metadata (`.var`), this component also fetches\nthe dataset metadata and joins it into the cell metadata.\n'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/loaders/cellxgene_census_from_source_h5ad/setup_logger.py b/target/nextflow/datasets/loaders/cellxgene_census_from_source_h5ad/setup_logger.py
new file mode 100644
index 0000000000..ae71eb9611
--- /dev/null
+++ b/target/nextflow/datasets/loaders/cellxgene_census_from_source_h5ad/setup_logger.py
@@ -0,0 +1,12 @@
+def setup_logger():
+    import logging
+    from sys import stdout
+
+    logger = logging.getLogger()
+    logger.setLevel(logging.INFO)
+    console_handler = logging.StreamHandler(stdout)
+    logFormatter = logging.Formatter("%(asctime)s %(levelname)-8s %(message)s")
+    console_handler.setFormatter(logFormatter)
+    logger.addHandler(console_handler)
+
+    return logger
\ No newline at end of file
diff --git a/target/nextflow/datasets/loaders/openproblems_neurips2021_bmmc/.config.vsh.yaml b/target/nextflow/datasets/loaders/openproblems_neurips2021_bmmc/.config.vsh.yaml
new file mode 100644
index 0000000000..c8abc00bb5
--- /dev/null
+++ b/target/nextflow/datasets/loaders/openproblems_neurips2021_bmmc/.config.vsh.yaml
@@ -0,0 +1,592 @@
+functionality:
+  name: "openproblems_neurips2021_bmmc"
+  namespace: "datasets/loaders"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input"
+      description: "Processed h5ad file published at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122."
+      info: null
+      example:
+      - "GSE194122_openproblems_neurips2021_cite_BMMC_processed.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--mod1"
+      description: "Name of the first modality."
+      info: null
+      example:
+      - "GEX"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--mod2"
+      description: "Name of the second modality."
+      info: null
+      example:
+      - "ADT"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--dataset_id"
+      description: "A unique identifier for the dataset"
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_mod1"
+      info:
+        label: "Raw dataset"
+        summary: "An unprocessed dataset as output by a dataset loader."
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+      example:
+      - "resources_test/common/pancreas/raw.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_mod2"
+      info:
+        label: "Raw dataset"
+        summary: "An unprocessed dataset as output by a dataset loader."
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+      example:
+      - "resources_test/common/pancreas/raw.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Fetch a dataset from the OpenProblems NeurIPS2021 competition"
+  test_resources:
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_neurips2021_bmmc/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_neurips2021_bmmc"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_neurips2021_bmmc/openproblems_neurips2021_bmmc"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/loaders/openproblems_neurips2021_bmmc/main.nf b/target/nextflow/datasets/loaders/openproblems_neurips2021_bmmc/main.nf
new file mode 100644
index 0000000000..caee2e8917
--- /dev/null
+++ b/target/nextflow/datasets/loaders/openproblems_neurips2021_bmmc/main.nf
@@ -0,0 +1,4099 @@
+// openproblems_neurips2021_bmmc 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "openproblems_neurips2021_bmmc",
+    "namespace" : "datasets/loaders",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input",
+            "description" : "Processed h5ad file published at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122.",
+            "example" : [
+              "GSE194122_openproblems_neurips2021_cite_BMMC_processed.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--mod1",
+            "description" : "Name of the first modality.",
+            "example" : [
+              "GEX"
+            ],
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--mod2",
+            "description" : "Name of the second modality.",
+            "example" : [
+              "ADT"
+            ],
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Metadata",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--dataset_id",
+            "description" : "A unique identifier for the dataset",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_name",
+            "description" : "Nicely formatted name.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_url",
+            "description" : "Link to the original source of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_reference",
+            "description" : "Bibtex reference of the paper in which the dataset was published.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_summary",
+            "description" : "Short description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_description",
+            "description" : "Long description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_organism",
+            "description" : "The organism of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_mod1",
+            "info" : {
+              "label" : "Raw dataset",
+              "summary" : "An unprocessed dataset as output by a dataset loader.",
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/raw.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_mod2",
+            "info" : {
+              "label" : "Raw dataset",
+              "summary" : "An unprocessed dataset as output by a dataset loader.",
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/raw.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_neurips2021_bmmc/"
+      }
+    ],
+    "description" : "Fetch a dataset from the OpenProblems NeurIPS2021 competition",
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "test.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_neurips2021_bmmc/"
+      }
+    ],
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "highmem",
+          "midcpu",
+          "midtime"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_neurips2021_bmmc/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_neurips2021_bmmc",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import pandas as pd
+import numpy as np
+from scipy import sparse
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'mod1': $( if [ ! -z ${VIASH_PAR_MOD1+x} ]; then echo "r'${VIASH_PAR_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'mod2': $( if [ ! -z ${VIASH_PAR_MOD2+x} ]; then echo "r'${VIASH_PAR_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_id': $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo "r'${VIASH_PAR_DATASET_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_name': $( if [ ! -z ${VIASH_PAR_DATASET_NAME+x} ]; then echo "r'${VIASH_PAR_DATASET_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_url': $( if [ ! -z ${VIASH_PAR_DATASET_URL+x} ]; then echo "r'${VIASH_PAR_DATASET_URL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_reference': $( if [ ! -z ${VIASH_PAR_DATASET_REFERENCE+x} ]; then echo "r'${VIASH_PAR_DATASET_REFERENCE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_summary': $( if [ ! -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then echo "r'${VIASH_PAR_DATASET_SUMMARY//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_description': $( if [ ! -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then echo "r'${VIASH_PAR_DATASET_DESCRIPTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_organism': $( if [ ! -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then echo "r'${VIASH_PAR_DATASET_ORGANISM//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+def remove_mod_col(df, mod):
+  df.drop(list(df.filter(like=mod)), axis=1, inplace=True)
+
+def remove_mod_prefix(df, mod):
+  suffix = f"{mod}_"
+  df.columns = df.columns.str.removeprefix(suffix)
+
+def convert_matrix(adata):
+  for key in adata:
+      if isinstance(adata[key], sparse.csr_matrix):
+        adata[key] = sparse.csc_matrix(adata[key])
+      
+
+print("load dataset file", flush=True)
+adata = ad.read_h5ad(par["input"])
+
+# Convert to sparse csc_matrix
+convert_matrix(adata.layers)
+convert_matrix(adata.obsm)
+
+# Add is_train to obs if it is missing
+if "is_train" not in adata.obs.columns:
+  batch_info = adata.obs["batch"]
+  batch_categories = batch_info.dtype.categories
+  # From https://github.com/openproblems-bio/neurips2021_multimodal_viash/blob/75281c039ab98b459edbf52058a18597e710ed4d/src/common/datasets/process_inhouse_datasets/script.R#L14-L17
+  train = ["s1d1", "s1d2", "s2d1", "s2d4", "s3d1", "s3d6", "s3d7"]
+  adata.obs["is_train"] = [ "train" if x in train else "test" for x in batch_info ]
+
+# Construct Modality datasets
+print("Construct Mod datasets", flush=True)
+mask_mod1 = adata.var['feature_types'] == par["mod1"]
+mask_mod2 = adata.var['feature_types'] == par["mod2"]
+
+adata_mod1 = adata[:, mask_mod1]
+adata_mod2 = adata[:, mask_mod2]
+
+# Remove other modality data from obs and var
+mod1_var = pd.DataFrame(adata_mod1.var)
+remove_mod_col(mod1_var, par["mod2"])
+remove_mod_prefix(mod1_var, par["mod1"])
+mod1_var.index.name = "feature_name"
+mod1_var.reset_index("feature_name", inplace=True)
+mod1_var["feature_id"] = np.where(mod1_var.gene_id.isna(), mod1_var.feature_name, mod1_var.gene_id.astype(str))
+mod1_var.drop("gene_id", axis=1, inplace=True)
+mod1_var.set_index("feature_id", drop=False, inplace=True)
+
+mod1_obs = pd.DataFrame(adata_mod1.obs)
+remove_mod_col(mod1_obs, par["mod2"])
+remove_mod_prefix(mod1_obs, par["mod1"])
+
+adata_mod1.var = mod1_var
+adata_mod1.obs = mod1_obs
+
+adata_mod1.uns = { key.replace(f"{par['mod1']}_", ""): value for key, value in adata.uns.items() if not key.startswith(par['mod2'])}
+del adata_mod1.obsm
+del adata_mod1.X
+
+mod2_var = pd.DataFrame(adata_mod2.var)
+remove_mod_col(mod2_var, par["mod1"])
+remove_mod_prefix(mod2_var, par["mod2"])
+mod2_var.index.name = "feature_name"
+mod2_var.reset_index("feature_name", inplace=True)
+mod2_var["feature_id"] = np.where(mod2_var.gene_id.isna(), mod2_var.feature_name, mod2_var.gene_id.astype(str))
+mod2_var.drop("gene_id", axis=1, inplace=True)
+mod2_var.set_index("feature_id", drop=False, inplace=True)
+
+mod2_obs = pd.DataFrame(adata_mod2.obs)
+remove_mod_col(mod2_obs, par["mod1"])
+remove_mod_prefix(mod2_obs, par["mod2"])
+
+adata_mod2.var = mod2_var
+adata_mod2.obs = mod2_obs
+
+adata_mod2.uns = { key.replace(f"{par['mod2']}_", ""): value for key, value in adata.uns.items() if not key.startswith(par['mod1'])}
+if par["mod2"] == "ATAC":
+  adata_mod2.obsm = { key.replace(f"{par['mod2']}_", ""): value for key, value in adata_mod2.obsm.items() if key.startswith(par['mod2'])}
+else:
+  del adata_mod2.obsm
+
+
+del adata_mod2.X
+
+print("Add metadata to uns", flush=True)
+metadata_fields = [
+    "dataset_id", "dataset_name", "dataset_url", "dataset_reference",
+    "dataset_summary", "dataset_description", "dataset_organism"
+]
+for key in metadata_fields:
+    if key in par:
+        print(f"  Setting .uns['{key}']", flush=True)
+        adata_mod1.uns[key] = par[key]
+        adata_mod2.uns[key] = par[key]
+
+print("Writing adata to file", flush=True)
+adata_mod1.write_h5ad(par["output_mod1"], compression="gzip")
+adata_mod2.write_h5ad(par["output_mod2"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/loaders/openproblems_neurips2021_bmmc",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "highmem",
+    "midcpu",
+    "midtime"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/loaders/openproblems_neurips2021_bmmc/nextflow.config b/target/nextflow/datasets/loaders/openproblems_neurips2021_bmmc/nextflow.config
new file mode 100644
index 0000000000..19fa02b9dc
--- /dev/null
+++ b/target/nextflow/datasets/loaders/openproblems_neurips2021_bmmc/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/loaders/openproblems_neurips2021_bmmc'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Fetch a dataset from the OpenProblems NeurIPS2021 competition'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/loaders/openproblems_neurips2022_pbmc/.config.vsh.yaml b/target/nextflow/datasets/loaders/openproblems_neurips2022_pbmc/.config.vsh.yaml
new file mode 100644
index 0000000000..84fb4463a7
--- /dev/null
+++ b/target/nextflow/datasets/loaders/openproblems_neurips2022_pbmc/.config.vsh.yaml
@@ -0,0 +1,601 @@
+functionality:
+  name: "openproblems_neurips2022_pbmc"
+  namespace: "datasets/loaders"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input_mod1"
+      description: "Processed RNA h5ad file"
+      info: null
+      example:
+      - "cite_rna_merged.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_mod2"
+      description: "Processed ADT or ATAC h5ad file"
+      info: null
+      example:
+      - "cite_prot_merged.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--mod1"
+      description: "Name of the first modality."
+      info: null
+      example:
+      - "GEX"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--mod2"
+      description: "Name of the second modality."
+      info: null
+      example:
+      - "ADT"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--dataset_id"
+      description: "A unique identifier for the dataset"
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_mod1"
+      info:
+        label: "Raw dataset"
+        summary: "An unprocessed dataset as output by a dataset loader."
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+      example:
+      - "resources_test/common/pancreas/raw.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_mod2"
+      info:
+        label: "Raw dataset"
+        summary: "An unprocessed dataset as output by a dataset loader."
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+      example:
+      - "resources_test/common/pancreas/raw.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Fetch a dataset from the OpenProblems NeurIPS2022 competition"
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_neurips2022_pbmc/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_neurips2022_pbmc"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_neurips2022_pbmc/openproblems_neurips2022_pbmc"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/loaders/openproblems_neurips2022_pbmc/main.nf b/target/nextflow/datasets/loaders/openproblems_neurips2022_pbmc/main.nf
new file mode 100644
index 0000000000..517c21ebea
--- /dev/null
+++ b/target/nextflow/datasets/loaders/openproblems_neurips2022_pbmc/main.nf
@@ -0,0 +1,4075 @@
+// openproblems_neurips2022_pbmc 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "openproblems_neurips2022_pbmc",
+    "namespace" : "datasets/loaders",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input_mod1",
+            "description" : "Processed RNA h5ad file",
+            "example" : [
+              "cite_rna_merged.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_mod2",
+            "description" : "Processed ADT or ATAC h5ad file",
+            "example" : [
+              "cite_prot_merged.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--mod1",
+            "description" : "Name of the first modality.",
+            "example" : [
+              "GEX"
+            ],
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--mod2",
+            "description" : "Name of the second modality.",
+            "example" : [
+              "ADT"
+            ],
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Metadata",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--dataset_id",
+            "description" : "A unique identifier for the dataset",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_name",
+            "description" : "Nicely formatted name.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_url",
+            "description" : "Link to the original source of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_reference",
+            "description" : "Bibtex reference of the paper in which the dataset was published.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_summary",
+            "description" : "Short description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_description",
+            "description" : "Long description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_organism",
+            "description" : "The organism of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_mod1",
+            "info" : {
+              "label" : "Raw dataset",
+              "summary" : "An unprocessed dataset as output by a dataset loader.",
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/raw.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_mod2",
+            "info" : {
+              "label" : "Raw dataset",
+              "summary" : "An unprocessed dataset as output by a dataset loader.",
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/raw.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_neurips2022_pbmc/"
+      }
+    ],
+    "description" : "Fetch a dataset from the OpenProblems NeurIPS2022 competition",
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "highmem",
+          "midcpu",
+          "midtime"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_neurips2022_pbmc/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_neurips2022_pbmc",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+from scipy import sparse
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'mod1': $( if [ ! -z ${VIASH_PAR_MOD1+x} ]; then echo "r'${VIASH_PAR_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'mod2': $( if [ ! -z ${VIASH_PAR_MOD2+x} ]; then echo "r'${VIASH_PAR_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_id': $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo "r'${VIASH_PAR_DATASET_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_name': $( if [ ! -z ${VIASH_PAR_DATASET_NAME+x} ]; then echo "r'${VIASH_PAR_DATASET_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_url': $( if [ ! -z ${VIASH_PAR_DATASET_URL+x} ]; then echo "r'${VIASH_PAR_DATASET_URL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_reference': $( if [ ! -z ${VIASH_PAR_DATASET_REFERENCE+x} ]; then echo "r'${VIASH_PAR_DATASET_REFERENCE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_summary': $( if [ ! -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then echo "r'${VIASH_PAR_DATASET_SUMMARY//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_description': $( if [ ! -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then echo "r'${VIASH_PAR_DATASET_DESCRIPTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_organism': $( if [ ! -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then echo "r'${VIASH_PAR_DATASET_ORGANISM//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+
+def convert_matrix(adata):
+  for key in adata:
+      if isinstance(adata[key], sparse.csr_matrix):
+        adata[key] = sparse.csc_matrix(adata[key])
+      
+
+print("load dataset modality 1 file", flush=True)
+adata_mod1 = ad.read_h5ad(par["input_mod1"])
+
+print("load dataset modality 2 file", flush=True)
+adata_mod2 = ad.read_h5ad(par["input_mod2"])
+
+# Convert to sparse csc_matrix
+convert_matrix(adata_mod1.layers)
+convert_matrix(adata_mod1.obsm)
+convert_matrix(adata_mod2.layers)
+convert_matrix(adata_mod2.obsm)
+
+
+# Add is_train to obs (modality 1)
+if "is_train" not in adata_mod1.obs.columns:
+    split_info = adata_mod1.obs["kaggle_dataset"]
+    train_sets = ["train", "test_public"]
+    adata_mod1.obs["is_train"] = [ "train" if x in train_sets else "test" for x in split_info ]
+
+# Add is_train to obs if it is missing (modality 2)
+if "is_train" not in adata_mod2.obs.columns:
+    split_info = adata_mod2.obs["kaggle_dataset"]
+    train_sets = ["train", "test_public"]
+    adata_mod2.obs["is_train"] = [ "train" if x in train_sets else "test" for x in split_info ]
+
+
+# split up index in modality 1 into feature ID and feature name
+adata_mod1.var['feature_id'] = [str(s).split('_')[0] for s in adata_mod1.var.index.tolist()]
+# TODO: index does not always contain an underscore.
+if "_" in adata_mod1.var.index[0]:
+  adata_mod1.var['feature_name'] = [str(s).split('_')[1] for s in adata_mod1.var.index.tolist()]
+adata_mod1.var.set_index('feature_id',drop=False, inplace=True)
+
+# set feature_name (proteins have only partial ensemble IDs))
+adata_mod2.var['feature_id'] = adata_mod2.var.index.tolist() # feature id needs to be filled in
+adata_mod2.var['feature_name'] = adata_mod2.var.index.tolist()
+adata_mod2.var.set_index('feature_name',drop=False, inplace=True)
+
+
+# remove adata.X
+del adata_mod1.X
+del adata_mod2.X
+
+
+print("Add metadata to uns", flush=True)
+metadata_fields = [
+    "dataset_id", "dataset_name", "dataset_url", "dataset_reference",
+    "dataset_summary", "dataset_description", "dataset_organism"
+]
+for key in metadata_fields:
+    if key in par:
+        print(f"  Setting .uns['{key}']", flush=True)
+        adata_mod1.uns[key] = par[key]
+        adata_mod2.uns[key] = par[key]
+
+
+print("Writing adata to file", flush=True)
+adata_mod1.write_h5ad(par["output_mod1"], compression="gzip")
+adata_mod2.write_h5ad(par["output_mod2"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/loaders/openproblems_neurips2022_pbmc",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "highmem",
+    "midcpu",
+    "midtime"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/loaders/openproblems_neurips2022_pbmc/nextflow.config b/target/nextflow/datasets/loaders/openproblems_neurips2022_pbmc/nextflow.config
new file mode 100644
index 0000000000..3d538535db
--- /dev/null
+++ b/target/nextflow/datasets/loaders/openproblems_neurips2022_pbmc/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/loaders/openproblems_neurips2022_pbmc'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Fetch a dataset from the OpenProblems NeurIPS2022 competition'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/loaders/openproblems_v1/.config.vsh.yaml b/target/nextflow/datasets/loaders/openproblems_v1/.config.vsh.yaml
new file mode 100644
index 0000000000..28d2e522da
--- /dev/null
+++ b/target/nextflow/datasets/loaders/openproblems_v1/.config.vsh.yaml
@@ -0,0 +1,446 @@
+functionality:
+  name: "openproblems_v1"
+  namespace: "datasets/loaders"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Raw dataset"
+      summary: "An unprocessed dataset as output by a dataset loader."
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+    example:
+    - "resources_test/common/pancreas/raw.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "string"
+      name: "--input_id"
+      description: "The ID of the dataset in OpenProblems v1"
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_cell_type"
+      description: "Location of where to find the observation cell types."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_batch"
+      description: "Location of where to find the observation batch IDs."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_tissue"
+      description: "Location of where to find the observation tissue information."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--layer_counts"
+      description: "In which layer to find the counts matrix. Leave undefined to use\
+        \ `.X`."
+      info: null
+      example:
+      - "counts"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean"
+      name: "--sparse"
+      description: "Convert layers to a sparse CSR format."
+      info: null
+      default:
+      - true
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--var_feature_id"
+      description: "Location of where to find the feature IDs. Can be set to index\
+        \ if the feature IDs are the index."
+      info: null
+      example:
+      - "gene_ids"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--var_feature_name"
+      description: "Location of where to find the feature names. Can be set to index\
+        \ if the feature names are the index."
+      info: null
+      default:
+      - "index"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--dataset_id"
+      description: "Unique identifier of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Fetch a dataset from OpenProblems v1"
+  test_resources:
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info:
+    type: "dataset_loader"
+    type_info:
+      label: "Dataset loader"
+      summary: "A component which generates a \"Common dataset\"."
+      description: "A dataset loader will typically have an identifier (e.g. a GEO\
+        \ identifier)\nor URL as input argument and additional arguments to define\
+        \ where the script needs to download a dataset from and how to process it.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone -b 'v0.8.0' --depth 1 https://github.com/openproblems-bio/openproblems.git\
+      \ /opt/openproblems && \\\n  pip install --no-cache-dir -r /opt/openproblems/docker/openproblems/requirements.txt\
+      \ && \\\n  pip install --no-cache-dir --editable /opt/openproblems\n"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_v1/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_v1"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_v1/openproblems_v1"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/loaders/openproblems_v1/main.nf b/target/nextflow/datasets/loaders/openproblems_v1/main.nf
new file mode 100644
index 0000000000..874eb0e3da
--- /dev/null
+++ b/target/nextflow/datasets/loaders/openproblems_v1/main.nf
@@ -0,0 +1,3945 @@
+// openproblems_v1 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "openproblems_v1",
+    "namespace" : "datasets/loaders",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Raw dataset",
+          "summary" : "An unprocessed dataset as output by a dataset loader.",
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/pancreas/raw.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--input_id",
+            "description" : "The ID of the dataset in OpenProblems v1",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--obs_cell_type",
+            "description" : "Location of where to find the observation cell types.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--obs_batch",
+            "description" : "Location of where to find the observation batch IDs.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--obs_tissue",
+            "description" : "Location of where to find the observation tissue information.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--layer_counts",
+            "description" : "In which layer to find the counts matrix. Leave undefined to use `.X`.",
+            "example" : [
+              "counts"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "boolean",
+            "name" : "--sparse",
+            "description" : "Convert layers to a sparse CSR format.",
+            "default" : [
+              true
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--var_feature_id",
+            "description" : "Location of where to find the feature IDs. Can be set to index if the feature IDs are the index.",
+            "example" : [
+              "gene_ids"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--var_feature_name",
+            "description" : "Location of where to find the feature names. Can be set to index if the feature names are the index.",
+            "default" : [
+              "index"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Metadata",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--dataset_id",
+            "description" : "Unique identifier of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_name",
+            "description" : "Nicely formatted name.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_url",
+            "description" : "Link to the original source of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_reference",
+            "description" : "Bibtex reference of the paper in which the dataset was published.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_summary",
+            "description" : "Short description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_description",
+            "description" : "Long description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_organism",
+            "description" : "The organism of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_v1/"
+      }
+    ],
+    "description" : "Fetch a dataset from OpenProblems v1",
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "test.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_v1/"
+      }
+    ],
+    "info" : {
+      "type" : "dataset_loader",
+      "type_info" : {
+        "label" : "Dataset loader",
+        "summary" : "A component which generates a \\"Common dataset\\".",
+        "description" : "A dataset loader will typically have an identifier (e.g. a GEO identifier)\nor URL as input argument and additional arguments to define where the script needs to download a dataset from and how to process it.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "apt",
+          "packages" : [
+            "git"
+          ],
+          "interactive" : false
+        },
+        {
+          "type" : "docker",
+          "run" : [
+            "git clone -b 'v0.8.0' --depth 1 https://github.com/openproblems-bio/openproblems.git /opt/openproblems && \\\\\n  pip install --no-cache-dir -r /opt/openproblems/docker/openproblems/requirements.txt && \\\\\n  pip install --no-cache-dir --editable /opt/openproblems\n"
+          ]
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "highmem",
+          "midcpu",
+          "midtime"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_v1/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_v1",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+from typing import Any, Callable, Dict, Tuple
+import openproblems as op
+import scanpy as sc
+import scipy
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_id': $( if [ ! -z ${VIASH_PAR_INPUT_ID+x} ]; then echo "r'${VIASH_PAR_INPUT_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'obs_cell_type': $( if [ ! -z ${VIASH_PAR_OBS_CELL_TYPE+x} ]; then echo "r'${VIASH_PAR_OBS_CELL_TYPE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'obs_batch': $( if [ ! -z ${VIASH_PAR_OBS_BATCH+x} ]; then echo "r'${VIASH_PAR_OBS_BATCH//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'obs_tissue': $( if [ ! -z ${VIASH_PAR_OBS_TISSUE+x} ]; then echo "r'${VIASH_PAR_OBS_TISSUE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'layer_counts': $( if [ ! -z ${VIASH_PAR_LAYER_COUNTS+x} ]; then echo "r'${VIASH_PAR_LAYER_COUNTS//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'sparse': $( if [ ! -z ${VIASH_PAR_SPARSE+x} ]; then echo "r'${VIASH_PAR_SPARSE//\\'/\\'\\"\\'\\"r\\'}'.lower() == 'true'"; else echo None; fi ),
+  'var_feature_id': $( if [ ! -z ${VIASH_PAR_VAR_FEATURE_ID+x} ]; then echo "r'${VIASH_PAR_VAR_FEATURE_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'var_feature_name': $( if [ ! -z ${VIASH_PAR_VAR_FEATURE_NAME+x} ]; then echo "r'${VIASH_PAR_VAR_FEATURE_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_id': $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo "r'${VIASH_PAR_DATASET_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_name': $( if [ ! -z ${VIASH_PAR_DATASET_NAME+x} ]; then echo "r'${VIASH_PAR_DATASET_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_url': $( if [ ! -z ${VIASH_PAR_DATASET_URL+x} ]; then echo "r'${VIASH_PAR_DATASET_URL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_reference': $( if [ ! -z ${VIASH_PAR_DATASET_REFERENCE+x} ]; then echo "r'${VIASH_PAR_DATASET_REFERENCE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_summary': $( if [ ! -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then echo "r'${VIASH_PAR_DATASET_SUMMARY//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_description': $( if [ ! -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then echo "r'${VIASH_PAR_DATASET_DESCRIPTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_organism': $( if [ ! -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then echo "r'${VIASH_PAR_DATASET_ORGANISM//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# make dataset lookup table
+# If need be, this could be stored in a separate yaml file
+dataset_funs: Dict[str, Tuple[Callable, Dict[str, Any]]] = {
+    "allen_brain_atlas": (op.data.allen_brain_atlas.load_mouse_brain_atlas, {}),
+    "cengen": (op.data.cengen.load_cengen, {}),
+    "immune_cells": (op.data.immune_cells.load_immune, {}),
+    "mouse_blood_olsson_labelled": (op.data.mouse_blood_olsson_labelled.load_olsson_2016_mouse_blood, {}),
+    "mouse_hspc_nestorowa2016": (op.data.mouse_hspc_nestorowa2016.load_mouse_hspc_nestorowa2016, {}),
+    "pancreas": (op.data.pancreas.load_pancreas, {}),
+    # "tabula_muris_senis": op.data.tabula_muris_senis.load_tabula_muris_senis,
+    "tabula_muris_senis_droplet_lung": (
+        op.data.tabula_muris_senis.load_tabula_muris_senis,
+        {"organ_list": ["lung"], "method_list": ["droplet"]}
+    ),
+    "tenx_1k_pbmc": (op.data.tenx.load_tenx_1k_pbmc, {}),
+    "tenx_5k_pbmc": (op.data.tenx.load_tenx_5k_pbmc, {}),
+    "tnbc_wu2021": (op.data.tnbc_wu2021.load_tnbc_data, {}),
+    "zebrafish": (op.data.zebrafish.load_zebrafish, {})
+}
+
+# fetch dataset
+dataset_fun, kwargs = dataset_funs[par["input_id"]]
+
+print("Fetch dataset", flush=True)
+adata = dataset_fun(**kwargs)
+
+# override values one by one because adata.uns and
+# metadata are two different classes.
+for key, value in dataset_fun.metadata.items():
+    print(f"Setting .uns['{key}']", flush=True)
+    adata.uns[key] = value
+
+print("Setting .obs['cell_type']", flush=True)
+if par["obs_cell_type"]:
+    if par["obs_cell_type"] in adata.obs:
+        adata.obs["cell_type"] = adata.obs[par["obs_cell_type"]]
+    else:
+        print(f"Warning: key '{par['obs_cell_type']}' could not be found in adata.obs.", flush=True)
+
+print("Setting .obs['batch']", flush=True)
+if par["obs_batch"]:
+    if par["obs_batch"] in adata.obs:
+        adata.obs["batch"] = adata.obs[par["obs_batch"]]
+    else:
+        print(f"Warning: key '{par['obs_batch']}' could not be found in adata.obs.", flush=True)
+
+print("Setting .obs['tissue']", flush=True)
+if par["obs_tissue"]:
+    if par["obs_tissue"] in adata.obs:
+        adata.obs["tissue"] = adata.obs[par["obs_tissue"]]
+    else:
+        print(f"Warning: key '{par['obs_tissue']}' could not be found in adata.obs.", flush=True)
+
+if par["layer_counts"] and par["layer_counts"] in adata.layers:
+    print(f"Temporarily moving .layers['{par['layer_counts']}'] to .X", flush=True)
+    adata.X = adata.layers[par["layer_counts"]]
+    del adata.layers[par["layer_counts"]]
+
+if par["sparse"] and not scipy.sparse.issparse(adata.X):
+    print("Make counts sparse", flush=True)
+    adata.X = scipy.sparse.csr_matrix(adata.X)
+
+print("Removing empty genes", flush=True)
+sc.pp.filter_genes(adata, min_cells=1)
+
+print("Removing empty cells", flush=True)
+sc.pp.filter_cells(adata, min_counts=2)
+
+print("Moving .X to .layers['counts']", flush=True)
+adata.layers["counts"] = adata.X
+del adata.X
+
+print("Add metadata to uns", flush=True)
+metadata_fields = [
+    "dataset_id", "dataset_name", "dataset_url", "dataset_reference",
+    "dataset_summary", "dataset_description", "dataset_organism"
+]
+uns_metadata = {
+    id: par[id]
+    for id in metadata_fields
+    if id in par
+}
+adata.uns.update(uns_metadata)
+
+print("Setting .var['feature_name']", flush=True)
+
+if par["var_feature_name"] == "index":
+    adata.var["feature_name"] = adata.var.index
+else:
+    if par["var_feature_name"] in adata.var:
+        adata.var["feature_name"] = adata.var[par["feature_name"]]
+        del adata.var[par["feature_name"]]
+    else:
+        print(f"Warning: key '{par['var_feature_name']}' could not be found in adata.var.", flush=True)
+
+print("Setting .var['feature_id']", flush=True)
+
+if par["var_feature_id"] == "index":
+    adata.var["feature_id"] = adata.var.index
+else:
+    if par["var_feature_id"] in adata.var:
+        adata.var["feature_id"] = adata.var[par["feature_id"]]
+        del adata.var[par["feature_id"]]
+    else:
+        print(f"Warning: key '{par['var_feature_id']}' could not be found in adata.var.", flush=True)
+
+print("Writing adata to file", flush=True)
+adata.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/loaders/openproblems_v1",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "highmem",
+    "midcpu",
+    "midtime"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/loaders/openproblems_v1/nextflow.config b/target/nextflow/datasets/loaders/openproblems_v1/nextflow.config
new file mode 100644
index 0000000000..068c55c25e
--- /dev/null
+++ b/target/nextflow/datasets/loaders/openproblems_v1/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/loaders/openproblems_v1'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Fetch a dataset from OpenProblems v1'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/loaders/openproblems_v1_multimodal/.config.vsh.yaml b/target/nextflow/datasets/loaders/openproblems_v1_multimodal/.config.vsh.yaml
new file mode 100644
index 0000000000..e804df8b8d
--- /dev/null
+++ b/target/nextflow/datasets/loaders/openproblems_v1_multimodal/.config.vsh.yaml
@@ -0,0 +1,650 @@
+functionality:
+  name: "openproblems_v1_multimodal"
+  namespace: "datasets/loaders"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "string"
+      name: "--input_id"
+      description: "The ID of the dataset in OpenProblems v1"
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_cell_type"
+      description: "Location of where to find the observation cell types."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_batch"
+      description: "Location of where to find the observation batch IDs."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_tissue"
+      description: "Location of where to find the observation tissue information."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--layer_counts"
+      description: "In which layer to find the counts matrix. Leave undefined to use\
+        \ `.X`."
+      info: null
+      example:
+      - "counts"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean"
+      name: "--sparse"
+      description: "Convert layers to a sparse CSR format."
+      info: null
+      default:
+      - true
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--var_feature_id"
+      description: "Location of where to find the feature IDs. Can be set to index\
+        \ if the feature IDs are the index."
+      info: null
+      example:
+      - "gene_ids"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--var_feature_name"
+      description: "Location of where to find the feature names. Can be set to index\
+        \ if the feature names are the index."
+      info: null
+      default:
+      - "index"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--dataset_id"
+      description: "Unique identifier of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_mod1"
+      info:
+        label: "Raw dataset"
+        summary: "An unprocessed dataset as output by a dataset loader."
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+      example:
+      - "resources_test/common/pancreas/raw.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_mod2"
+      info:
+        label: "Raw dataset"
+        summary: "An unprocessed dataset as output by a dataset loader."
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+      example:
+      - "resources_test/common/pancreas/raw.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Fetch a dataset from OpenProblems v1"
+  test_resources:
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone -b 'v0.8.0' --depth 1 https://github.com/openproblems-bio/openproblems.git\
+      \ /opt/openproblems && \\\n  pip install --no-cache-dir -r /opt/openproblems/docker/openproblems/requirements.txt\
+      \ && \\\n  pip install --no-cache-dir --editable /opt/openproblems\n"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_v1_multimodal/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_v1_multimodal"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_v1_multimodal/openproblems_v1_multimodal"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/loaders/openproblems_v1_multimodal/main.nf b/target/nextflow/datasets/loaders/openproblems_v1_multimodal/main.nf
new file mode 100644
index 0000000000..1344c7f3be
--- /dev/null
+++ b/target/nextflow/datasets/loaders/openproblems_v1_multimodal/main.nf
@@ -0,0 +1,4219 @@
+// openproblems_v1_multimodal 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "openproblems_v1_multimodal",
+    "namespace" : "datasets/loaders",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--input_id",
+            "description" : "The ID of the dataset in OpenProblems v1",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--obs_cell_type",
+            "description" : "Location of where to find the observation cell types.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--obs_batch",
+            "description" : "Location of where to find the observation batch IDs.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--obs_tissue",
+            "description" : "Location of where to find the observation tissue information.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--layer_counts",
+            "description" : "In which layer to find the counts matrix. Leave undefined to use `.X`.",
+            "example" : [
+              "counts"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "boolean",
+            "name" : "--sparse",
+            "description" : "Convert layers to a sparse CSR format.",
+            "default" : [
+              true
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--var_feature_id",
+            "description" : "Location of where to find the feature IDs. Can be set to index if the feature IDs are the index.",
+            "example" : [
+              "gene_ids"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--var_feature_name",
+            "description" : "Location of where to find the feature names. Can be set to index if the feature names are the index.",
+            "default" : [
+              "index"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Metadata",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--dataset_id",
+            "description" : "Unique identifier of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_name",
+            "description" : "Nicely formatted name.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_url",
+            "description" : "Link to the original source of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_reference",
+            "description" : "Bibtex reference of the paper in which the dataset was published.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_summary",
+            "description" : "Short description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_description",
+            "description" : "Long description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_organism",
+            "description" : "The organism of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_mod1",
+            "info" : {
+              "label" : "Raw dataset",
+              "summary" : "An unprocessed dataset as output by a dataset loader.",
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/raw.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_mod2",
+            "info" : {
+              "label" : "Raw dataset",
+              "summary" : "An unprocessed dataset as output by a dataset loader.",
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/raw.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_v1_multimodal/"
+      }
+    ],
+    "description" : "Fetch a dataset from OpenProblems v1",
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "test.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_v1_multimodal/"
+      }
+    ],
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "apt",
+          "packages" : [
+            "git"
+          ],
+          "interactive" : false
+        },
+        {
+          "type" : "docker",
+          "run" : [
+            "git clone -b 'v0.8.0' --depth 1 https://github.com/openproblems-bio/openproblems.git /opt/openproblems && \\\\\n  pip install --no-cache-dir -r /opt/openproblems/docker/openproblems/requirements.txt && \\\\\n  pip install --no-cache-dir --editable /opt/openproblems\n"
+          ]
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "highmem",
+          "midcpu",
+          "midtime"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_v1_multimodal/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_v1_multimodal",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+from typing import Any, Callable, Dict, Tuple
+import openproblems as op
+import scanpy as sc
+import scipy
+import pandas as pd
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_id': $( if [ ! -z ${VIASH_PAR_INPUT_ID+x} ]; then echo "r'${VIASH_PAR_INPUT_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'obs_cell_type': $( if [ ! -z ${VIASH_PAR_OBS_CELL_TYPE+x} ]; then echo "r'${VIASH_PAR_OBS_CELL_TYPE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'obs_batch': $( if [ ! -z ${VIASH_PAR_OBS_BATCH+x} ]; then echo "r'${VIASH_PAR_OBS_BATCH//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'obs_tissue': $( if [ ! -z ${VIASH_PAR_OBS_TISSUE+x} ]; then echo "r'${VIASH_PAR_OBS_TISSUE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'layer_counts': $( if [ ! -z ${VIASH_PAR_LAYER_COUNTS+x} ]; then echo "r'${VIASH_PAR_LAYER_COUNTS//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'sparse': $( if [ ! -z ${VIASH_PAR_SPARSE+x} ]; then echo "r'${VIASH_PAR_SPARSE//\\'/\\'\\"\\'\\"r\\'}'.lower() == 'true'"; else echo None; fi ),
+  'var_feature_id': $( if [ ! -z ${VIASH_PAR_VAR_FEATURE_ID+x} ]; then echo "r'${VIASH_PAR_VAR_FEATURE_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'var_feature_name': $( if [ ! -z ${VIASH_PAR_VAR_FEATURE_NAME+x} ]; then echo "r'${VIASH_PAR_VAR_FEATURE_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_id': $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo "r'${VIASH_PAR_DATASET_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_name': $( if [ ! -z ${VIASH_PAR_DATASET_NAME+x} ]; then echo "r'${VIASH_PAR_DATASET_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_url': $( if [ ! -z ${VIASH_PAR_DATASET_URL+x} ]; then echo "r'${VIASH_PAR_DATASET_URL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_reference': $( if [ ! -z ${VIASH_PAR_DATASET_REFERENCE+x} ]; then echo "r'${VIASH_PAR_DATASET_REFERENCE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_summary': $( if [ ! -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then echo "r'${VIASH_PAR_DATASET_SUMMARY//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_description': $( if [ ! -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then echo "r'${VIASH_PAR_DATASET_DESCRIPTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_organism': $( if [ ! -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then echo "r'${VIASH_PAR_DATASET_ORGANISM//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+
+# make dataset lookup table
+# If need be, this could be stored in a separate yaml file
+dataset_funs: Dict[str, Tuple[Callable, Dict[str, Any]]] = {
+    "citeseq_cbmc": (op.data.multimodal.citeseq.load_citeseq_cbmc, {}),
+    "scicar_cell_lines": (op.data.multimodal.scicar.load_scicar_cell_lines, {}),
+    "scicar_mouse_kidney": (op.data.multimodal.scicar.load_scicar_mouse_kidney, {}),
+}
+
+# fetch dataset
+dataset_fun, kwargs = dataset_funs[par["input_id"]]
+
+print("Fetch dataset", flush=True)
+adata = dataset_fun(**kwargs)
+
+print(f"source adata: {adata}", flush=True)
+
+# construct modality2 dataset
+mod2_var_data = {
+    key.replace("mode2_var_", ""): adata.uns[key]
+    for key in adata.uns.keys()
+    if key.startswith("mode2_var_")
+}
+mod2_var = pd.DataFrame(
+    mod2_var_data,
+    index=adata.uns["mode2_var"]
+)
+mod2_obs = adata.obs.loc[adata.uns["mode2_obs"]]
+mod2 = sc.AnnData(
+    obs=mod2_obs,
+    var=mod2_var,
+    layers={ "counts": adata.obsm["mode2"] }
+)
+
+# construct modality1 dataset
+mod1 = adata.copy()
+mod1.uns = { key: value for key, value in mod1.uns.items() if not key.startswith("mode2_")}
+mod1.obsm = { key: value for key, value in mod1.obsm.items() if not key.startswith("mode2_")}
+mod1.obsp = { key: value for key, value in mod1.obsp.items() if not key.startswith("mode2_")}
+mod1.varm = { key: value for key, value in mod1.varm.items() if not key.startswith("mode2_")}
+mod1.varp = { key: value for key, value in mod1.varp.items() if not key.startswith("mode2_")}
+
+# override values one by one because adata.uns and
+# metadata are two different classes.
+for key, value in dataset_fun.metadata.items():
+    print(f"Setting .uns['{key}']", flush=True)
+    mod1.uns[key] = value
+    mod2.uns[key] = value
+
+print("Setting .obs['cell_type']", flush=True)
+if par["obs_cell_type"]:
+    if par["obs_cell_type"] in mod1.obs:
+        mod1.obs["cell_type"] = mod1.obs[par["obs_cell_type"]]
+        mod2.obs["cell_type"] = mod2.obs[par["obs_cell_type"]]
+    else:
+        print(f"Warning: key '{par['obs_cell_type']}' could not be found in adata.obs.", flush=True)
+
+print("Setting .obs['batch']", flush=True)
+if par["obs_batch"]:
+    if par["obs_batch"] in mod1.obs:
+        mod1.obs["batch"] = mod1.obs[par["obs_batch"]]
+        mod2.obs["batch"] = mod2.obs[par["obs_batch"]]
+    else:
+        print(f"Warning: key '{par['obs_batch']}' could not be found in adata.obs.", flush=True)
+
+print("Setting .obs['tissue']", flush=True)
+if par["obs_tissue"]:
+    if par["obs_tissue"] in mod1.obs:
+        mod1.obs["tissue"] = mod1.obs[par["obs_tissue"]]
+        mod2.obs["tissue"] = mod2.obs[par["obs_tissue"]]
+    else:
+        print(f"Warning: key '{par['obs_tissue']}' could not be found in adata.obs.", flush=True)
+
+if par["layer_counts"] and par["layer_counts"] in mod1.layers:
+    print(f"Temporarily moving mod1.layers['{par['layer_counts']}']", flush=True)
+    mod1_X = mod1.layers[par["layer_counts"]]
+    del mod1.layers[par["layer_counts"]]
+else:
+    print("Temporarily moving mod1.X", flush=True)
+    mod1_X = mod1.X
+    del mod1.X
+
+if par["sparse"] and not scipy.sparse.issparse(mod1_X):
+    print("Make mod1 counts sparse", flush=True)
+    mod1_X = scipy.sparse.csr_matrix(mod1_X)
+
+if par["sparse"] and not scipy.sparse.issparse(mod2.layers["counts"]):
+    print("Make mod2 counts sparse", flush=True)
+    mod2.layers["counts"] = scipy.sparse.csr_matrix(mod2.layers["counts"])
+
+print("Moving .X to .layers['counts']", flush=True)
+mod1.layers["counts"] = mod1_X
+
+# just in case
+del mod1.X
+del mod2.X
+
+print("Setting .var['feature_name']", flush=True)
+if par["var_feature_name"] == "index":
+    mod1.var["feature_name"] = mod1.var.index
+    mod2.var["feature_name"] = mod2.var.index
+else: 
+    if par["var_feature_name"] in mod1.var:
+        mod1.var["feature_name"] = mod1.var[par["feature_name"]]
+        del mod1.var[par["feature_name"]]
+    else:
+        print(f"Warning: key '{par['var_feature_name']}' could not be found in adata_mod1.var.", flush=True)
+    if par["var_feature_name"] in mod2.var:
+        mod2.var["feature_name"] = mod2.var[par["feature_name"]]
+        del mod2.var[par["feature_name"]]
+    else:
+        print(f"Warning: key '{par['var_feature_name']}' could not be found in adata_mod2.var.", flush=True)
+
+print("Setting .var['feature_id']", flush=True)
+if par["var_feature_id"] == "index":
+    mod1.var["feature_id"] = mod1.var.index
+    mod2.var["feature_id"] = mod2.var.index
+else:
+    if par["var_feature_id"] in mod1.var:
+        mod1.var["feature_id"] = mod1.var[par["feature_id"]]
+        del mod1.var[par["feature_id"]]
+    else:
+        print(f"Warning: key '{par['var_feature_id']}' could not be found in adata_mod1.var.", flush=True)
+    if par["var_feature_id"] in mod2.var:
+        mod2.var["feature_id"] = mod2.var[par["feature_id"]]
+        del mod2.var[par["feature_id"]]
+    else:
+        print(f"Warning: key '{par['var_feature_id']}' could not be found in adata_mod2.var.", flush=True)
+
+
+print("Add metadata to uns", flush=True)
+metadata_fields = [
+    "dataset_id", "dataset_name", "dataset_url", "dataset_reference",
+    "dataset_summary", "dataset_description", "dataset_organism"
+]
+for key in metadata_fields:
+    if key in par:
+        print(f"  Setting .uns['{key}']", flush=True)
+        mod1.uns[key] = par[key]
+        mod2.uns[key] = par[key]
+
+print("Writing adata to file", flush=True)
+mod1.write_h5ad(par["output_mod1"], compression="gzip")
+mod2.write_h5ad(par["output_mod2"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/loaders/openproblems_v1_multimodal",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "highmem",
+    "midcpu",
+    "midtime"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/loaders/openproblems_v1_multimodal/nextflow.config b/target/nextflow/datasets/loaders/openproblems_v1_multimodal/nextflow.config
new file mode 100644
index 0000000000..50b6f5a192
--- /dev/null
+++ b/target/nextflow/datasets/loaders/openproblems_v1_multimodal/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/loaders/openproblems_v1_multimodal'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Fetch a dataset from OpenProblems v1'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/loaders/tenx_visium/.config.vsh.yaml b/target/nextflow/datasets/loaders/tenx_visium/.config.vsh.yaml
new file mode 100644
index 0000000000..33a77f2cf3
--- /dev/null
+++ b/target/nextflow/datasets/loaders/tenx_visium/.config.vsh.yaml
@@ -0,0 +1,223 @@
+functionality:
+  name: "tenx_visium"
+  namespace: "datasets/loaders"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "string"
+      name: "--input_expression"
+      description: "URL to the feature / barcode matrix HDF5 of the 10x dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--input_spatial"
+      description: "URL to the Spatial imaging data of the 10x dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--dataset"
+      description: "Output h5ad file"
+      info: null
+      example:
+      - "dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--dataset_id"
+      description: "Unique identifier of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Gene or spot filtering"
+    description: "Arguments related to filtering cells and genes by counts."
+    arguments:
+    - type: "integer"
+      name: "--spot_filter_min_genes"
+      description: "Remove spots with less than this number of genes."
+      info: null
+      example:
+      - 200
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--spot_filter_min_counts"
+      description: "Remove spots with less than this number of counts."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_spots"
+      description: "Remove genes expressed in less than this number of cells."
+      info: null
+      example:
+      - 50
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_counts"
+      description: "Remove genes with less than this number of counts."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean"
+      name: "--remove_mitochondrial"
+      description: "Remove mitochondrial genes?"
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Download a SpaceRanger h5 gene expression file and spatial imaging\
+    \ data from the 10x genomics website (or someplace else).\n"
+  test_resources:
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "squidpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/tenx_visium/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/tenx_visium"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/tenx_visium/tenx_visium"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/loaders/tenx_visium/main.nf b/target/nextflow/datasets/loaders/tenx_visium/main.nf
new file mode 100644
index 0000000000..c03b01449c
--- /dev/null
+++ b/target/nextflow/datasets/loaders/tenx_visium/main.nf
@@ -0,0 +1,3638 @@
+// tenx_visium 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "tenx_visium",
+    "namespace" : "datasets/loaders",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--input_expression",
+            "description" : "URL to the feature / barcode matrix HDF5 of the 10x dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--input_spatial",
+            "description" : "URL to the Spatial imaging data of the 10x dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--dataset",
+            "description" : "Output h5ad file",
+            "example" : [
+              "dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Metadata",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--dataset_id",
+            "description" : "Unique identifier of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_name",
+            "description" : "Nicely formatted name.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_url",
+            "description" : "Link to the original source of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_reference",
+            "description" : "Bibtex reference of the paper in which the dataset was published.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_summary",
+            "description" : "Short description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_description",
+            "description" : "Long description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_organism",
+            "description" : "The organism of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Gene or spot filtering",
+        "description" : "Arguments related to filtering cells and genes by counts.",
+        "arguments" : [
+          {
+            "type" : "integer",
+            "name" : "--spot_filter_min_genes",
+            "description" : "Remove spots with less than this number of genes.",
+            "example" : [
+              200
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--spot_filter_min_counts",
+            "description" : "Remove spots with less than this number of counts.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--gene_filter_min_spots",
+            "description" : "Remove genes expressed in less than this number of cells.",
+            "example" : [
+              50
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--gene_filter_min_counts",
+            "description" : "Remove genes with less than this number of counts.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "boolean",
+            "name" : "--remove_mitochondrial",
+            "description" : "Remove mitochondrial genes?",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/loaders/tenx_visium/"
+      }
+    ],
+    "description" : "Download a SpaceRanger h5 gene expression file and spatial imaging data from the 10x genomics website (or someplace else).\n",
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "test.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/loaders/tenx_visium/"
+      }
+    ],
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "ghcr.io/openproblems-bio/base_python:1.0.4",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "squidpy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/tenx_visium/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/tenx_visium",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import subprocess
+import squidpy as sq
+import tempfile
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_expression': $( if [ ! -z ${VIASH_PAR_INPUT_EXPRESSION+x} ]; then echo "r'${VIASH_PAR_INPUT_EXPRESSION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_spatial': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset': $( if [ ! -z ${VIASH_PAR_DATASET+x} ]; then echo "r'${VIASH_PAR_DATASET//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_id': $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo "r'${VIASH_PAR_DATASET_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_name': $( if [ ! -z ${VIASH_PAR_DATASET_NAME+x} ]; then echo "r'${VIASH_PAR_DATASET_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_url': $( if [ ! -z ${VIASH_PAR_DATASET_URL+x} ]; then echo "r'${VIASH_PAR_DATASET_URL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_reference': $( if [ ! -z ${VIASH_PAR_DATASET_REFERENCE+x} ]; then echo "r'${VIASH_PAR_DATASET_REFERENCE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_summary': $( if [ ! -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then echo "r'${VIASH_PAR_DATASET_SUMMARY//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_description': $( if [ ! -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then echo "r'${VIASH_PAR_DATASET_DESCRIPTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_organism': $( if [ ! -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then echo "r'${VIASH_PAR_DATASET_ORGANISM//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'spot_filter_min_genes': $( if [ ! -z ${VIASH_PAR_SPOT_FILTER_MIN_GENES+x} ]; then echo "int(r'${VIASH_PAR_SPOT_FILTER_MIN_GENES//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'spot_filter_min_counts': $( if [ ! -z ${VIASH_PAR_SPOT_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_SPOT_FILTER_MIN_COUNTS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'gene_filter_min_spots': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_SPOTS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_SPOTS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'gene_filter_min_counts': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_COUNTS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'remove_mitochondrial': $( if [ ! -z ${VIASH_PAR_REMOVE_MITOCHONDRIAL+x} ]; then echo "r'${VIASH_PAR_REMOVE_MITOCHONDRIAL//\\'/\\'\\"\\'\\"r\\'}'.lower() == 'true'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print(f"Downloading data", flush=True)
+with tempfile.TemporaryDirectory() as tempdir:
+  input_exp = "feature_bc_matrix.h5"
+  input_sp = "image_data.tar.gz"
+  epx_data = subprocess.run(["wget", "-O", f"{tempdir}/{input_exp}", par['input_expression']], stderr=subprocess.STDOUT)
+  sp_data = subprocess.run(["wget", "-O", f"{tempdir}/{input_sp}", par['input_spatial']], stderr=subprocess.STDOUT)
+  extract_spatial = subprocess.run(["tar", "-xzf", f"{tempdir}/{input_sp}", "-C", tempdir], stderr=subprocess.STDOUT)
+
+  # Read visium data and create anndata object
+  adata = sq.read.visium(path=tempdir, counts_file=input_exp)
+
+# Make variable names unique
+adata.var_names_make_unique()
+
+sc.pp.calculate_qc_metrics(adata, inplace=True)
+
+print("Filtering spots or genes")
+t0 = adata.shape
+# remove cells with few counts
+if par["spot_filter_min_counts"]:
+  sc.pp.filter_cells(adata, min_counts=par["spot_filter_min_counts"], inplace=True)
+# remove cells with few genes 
+if par["spot_filter_min_genes"]:
+  sc.pp.filter_cells(adata, min_genes=par["spot_filter_min_genes"], inplace=True)
+# remove genes that have few counts
+if par["gene_filter_min_counts"]:
+  sc.pp.filter_genes(adata, min_counts=par["gene_filter_min_counts"], inplace=True)
+# remove genes that are found in few cells
+if par["gene_filter_min_spots"]:
+  sc.pp.filter_genes(adata, min_cells=par["gene_filter_min_spots"], inplace=True)
+t1 = adata.shape
+print(f"Removed {t0[0] - t1[0]} cells and {(t0[1] - t1[1])} genes.")
+
+if par["remove_mitochondrial"]:
+  print("Removing mitochondrial genes")
+  non_mito_genes_list = [name for name in adata.var_names if not (name.startswith('MT-') or name.startswith('mt-'))]
+  adata = adata[:, non_mito_genes_list]
+  
+# Rename .var columns
+adata.var['feature_name'] = adata.var_names
+adata.var.set_index(adata.var['gene_ids'], inplace=True)
+adata.var.rename(columns={"gene_ids": "feature_id"}, inplace=True)
+
+# Move counts to .layers
+print("Add metadata to uns", flush=True)
+adata.layers["counts"] = adata.X
+adata.X = None
+
+# Add metadata
+print("Add metadata to uns", flush=True)
+metadata_fields = ["dataset_id", "dataset_name", "dataset_url", "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism"]
+for key in metadata_fields:
+  if key in par:
+    print(f"Setting .uns['{key}']", flush=True)
+    adata.uns[key] = par[key]
+
+print("Writing adata to file", flush=True)
+adata.write_h5ad(par["dataset"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/loaders/tenx_visium",
+    "tag" : "2.0.0"
+  },
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/loaders/tenx_visium/nextflow.config b/target/nextflow/datasets/loaders/tenx_visium/nextflow.config
new file mode 100644
index 0000000000..536b59a8c0
--- /dev/null
+++ b/target/nextflow/datasets/loaders/tenx_visium/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/loaders/tenx_visium'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Download a SpaceRanger h5 gene expression file and spatial imaging data from the 10x genomics website (or someplace else).\n'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/loaders/zenodo_spatial/.config.vsh.yaml b/target/nextflow/datasets/loaders/zenodo_spatial/.config.vsh.yaml
new file mode 100644
index 0000000000..67fb1b779b
--- /dev/null
+++ b/target/nextflow/datasets/loaders/zenodo_spatial/.config.vsh.yaml
@@ -0,0 +1,208 @@
+functionality:
+  name: "zenodo_spatial"
+  namespace: "datasets/loaders"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "string"
+      name: "--input_data"
+      description: "URL to the Anndata file."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--dataset"
+      description: "Output h5ad file"
+      info: null
+      example:
+      - "dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--dataset_id"
+      description: "Unique identifier of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Gene or spot filtering"
+    description: "Arguments related to filtering cells and genes by counts."
+    arguments:
+    - type: "integer"
+      name: "--spot_filter_min_genes"
+      description: "Remove spots with less than this number of genes."
+      info: null
+      example:
+      - 200
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--spot_filter_min_counts"
+      description: "Remove spots with less than this number of counts."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_spots"
+      description: "Remove genes expressed in less than this number of cells."
+      info: null
+      example:
+      - 50
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_counts"
+      description: "Remove genes with less than this number of counts."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean"
+      name: "--remove_mitochondrial"
+      description: "Remove mitochondrial genes?"
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Download an Anndata file containing DBiT seq, MERFISH, seqFISH, Slide-seq\
+    \ v2, STARmap, and Stereo-seq data from Zenodo.\n"
+  test_resources:
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/zenodo_spatial/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/zenodo_spatial"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/zenodo_spatial/zenodo_spatial"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/loaders/zenodo_spatial/main.nf b/target/nextflow/datasets/loaders/zenodo_spatial/main.nf
new file mode 100644
index 0000000000..436cf3909a
--- /dev/null
+++ b/target/nextflow/datasets/loaders/zenodo_spatial/main.nf
@@ -0,0 +1,3621 @@
+// zenodo_spatial 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "zenodo_spatial",
+    "namespace" : "datasets/loaders",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--input_data",
+            "description" : "URL to the Anndata file.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--dataset",
+            "description" : "Output h5ad file",
+            "example" : [
+              "dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Metadata",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--dataset_id",
+            "description" : "Unique identifier of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_name",
+            "description" : "Nicely formatted name.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_url",
+            "description" : "Link to the original source of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_reference",
+            "description" : "Bibtex reference of the paper in which the dataset was published.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_summary",
+            "description" : "Short description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_description",
+            "description" : "Long description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_organism",
+            "description" : "The organism of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Gene or spot filtering",
+        "description" : "Arguments related to filtering cells and genes by counts.",
+        "arguments" : [
+          {
+            "type" : "integer",
+            "name" : "--spot_filter_min_genes",
+            "description" : "Remove spots with less than this number of genes.",
+            "example" : [
+              200
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--spot_filter_min_counts",
+            "description" : "Remove spots with less than this number of counts.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--gene_filter_min_spots",
+            "description" : "Remove genes expressed in less than this number of cells.",
+            "example" : [
+              50
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--gene_filter_min_counts",
+            "description" : "Remove genes with less than this number of counts.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "boolean",
+            "name" : "--remove_mitochondrial",
+            "description" : "Remove mitochondrial genes?",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/loaders/zenodo_spatial/"
+      }
+    ],
+    "description" : "Download an Anndata file containing DBiT seq, MERFISH, seqFISH, Slide-seq v2, STARmap, and Stereo-seq data from Zenodo.\n",
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "test.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/loaders/zenodo_spatial/"
+      }
+    ],
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/zenodo_spatial/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/zenodo_spatial",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import subprocess
+import tempfile
+import scanpy as sc
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset': $( if [ ! -z ${VIASH_PAR_DATASET+x} ]; then echo "r'${VIASH_PAR_DATASET//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_id': $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo "r'${VIASH_PAR_DATASET_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_name': $( if [ ! -z ${VIASH_PAR_DATASET_NAME+x} ]; then echo "r'${VIASH_PAR_DATASET_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_url': $( if [ ! -z ${VIASH_PAR_DATASET_URL+x} ]; then echo "r'${VIASH_PAR_DATASET_URL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_reference': $( if [ ! -z ${VIASH_PAR_DATASET_REFERENCE+x} ]; then echo "r'${VIASH_PAR_DATASET_REFERENCE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_summary': $( if [ ! -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then echo "r'${VIASH_PAR_DATASET_SUMMARY//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_description': $( if [ ! -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then echo "r'${VIASH_PAR_DATASET_DESCRIPTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_organism': $( if [ ! -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then echo "r'${VIASH_PAR_DATASET_ORGANISM//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'spot_filter_min_genes': $( if [ ! -z ${VIASH_PAR_SPOT_FILTER_MIN_GENES+x} ]; then echo "int(r'${VIASH_PAR_SPOT_FILTER_MIN_GENES//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'spot_filter_min_counts': $( if [ ! -z ${VIASH_PAR_SPOT_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_SPOT_FILTER_MIN_COUNTS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'gene_filter_min_spots': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_SPOTS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_SPOTS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'gene_filter_min_counts': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_COUNTS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'remove_mitochondrial': $( if [ ! -z ${VIASH_PAR_REMOVE_MITOCHONDRIAL+x} ]; then echo "r'${VIASH_PAR_REMOVE_MITOCHONDRIAL//\\'/\\'\\"\\'\\"r\\'}'.lower() == 'true'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print(f"Downloading data", flush=True)
+with tempfile.TemporaryDirectory() as tempdir:
+    input_data = "input_data.h5ad"
+    epx_data = subprocess.run(["wget", "-O", f"{tempdir}/{input_data}", par['input_data']], stderr=subprocess.STDOUT)
+    adata = sc.read_h5ad(filename=f"{tempdir}/{input_data}")
+
+# Make variable names unique
+adata.var_names_make_unique()
+
+sc.pp.calculate_qc_metrics(adata, inplace=True, percent_top=None)
+
+print("Filtering spots or genes")
+t0 = adata.shape
+# remove cells with few counts
+if par["spot_filter_min_counts"]:
+    sc.pp.filter_cells(
+        adata, min_counts=par["spot_filter_min_counts"], inplace=True)
+
+# remove cells with few genes
+if par["spot_filter_min_genes"]:
+    sc.pp.filter_cells(
+        adata, min_genes=par["spot_filter_min_genes"], inplace=True)
+
+# remove genes that have few counts
+if par["gene_filter_min_counts"]:
+    sc.pp.filter_genes(
+        adata, min_counts=par["gene_filter_min_counts"], inplace=True)
+
+# remove genes that are found in few cells
+if par["gene_filter_min_spots"]:
+    sc.pp.filter_genes(
+        adata, min_cells=par["gene_filter_min_spots"], inplace=True)
+
+t1 = adata.shape
+print(f"Removed {t0[0] - t1[0]} cells and {(t0[1] - t1[1])} genes.")
+
+if par["remove_mitochondrial"]:
+    print("Removing mitochondrial genes")
+    non_mito_genes_list = [name for name in adata.var_names if not (
+        name.startswith('MT-') or name.startswith('mt-'))]
+    adata = adata[:, non_mito_genes_list]
+
+# Rename .var columns
+adata.var['feature_name'] = adata.var_names
+if('gene_ids' in adata.var):
+    adata.var.set_index(adata.var['gene_ids'], inplace=True)
+    adata.var.rename(columns={"gene_ids": "feature_id"}, inplace=True)
+
+# Move counts to .layers
+print("Add metadata to uns", flush=True)
+adata.layers["counts"] = adata.X
+adata.X = None
+
+# Add metadata
+print("Add metadata to uns", flush=True)
+metadata_fields = ["dataset_id", "dataset_name", "dataset_url", "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism"]
+for key in metadata_fields:
+    if key in par:
+        print(f"Setting .uns['{key}']", flush=True)
+        adata.uns[key] = par[key]
+
+print("Writing adata to file", flush=True)
+adata.write_h5ad(par["dataset"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/loaders/zenodo_spatial",
+    "tag" : "2.0.0"
+  },
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/loaders/zenodo_spatial/nextflow.config b/target/nextflow/datasets/loaders/zenodo_spatial/nextflow.config
new file mode 100644
index 0000000000..41b3f8a722
--- /dev/null
+++ b/target/nextflow/datasets/loaders/zenodo_spatial/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/loaders/zenodo_spatial'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Download an Anndata file containing DBiT seq, MERFISH, seqFISH, Slide-seq v2, STARmap, and Stereo-seq data from Zenodo.\n'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/loaders/zenodo_spatial_slidetags/.config.vsh.yaml b/target/nextflow/datasets/loaders/zenodo_spatial_slidetags/.config.vsh.yaml
new file mode 100644
index 0000000000..5004d8e63c
--- /dev/null
+++ b/target/nextflow/datasets/loaders/zenodo_spatial_slidetags/.config.vsh.yaml
@@ -0,0 +1,208 @@
+functionality:
+  name: "zenodo_spatial_slidetags"
+  namespace: "datasets/loaders"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "string"
+      name: "--input_data"
+      description: "URL to the file."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--dataset"
+      description: "Output h5ad file"
+      info: null
+      example:
+      - "dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--dataset_id"
+      description: "Unique identifier of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Gene or spot filtering"
+    description: "Arguments related to filtering cells and genes by counts."
+    arguments:
+    - type: "integer"
+      name: "--spot_filter_min_genes"
+      description: "Remove spots with less than this number of genes."
+      info: null
+      example:
+      - 200
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--spot_filter_min_counts"
+      description: "Remove spots with less than this number of counts."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_spots"
+      description: "Remove genes expressed in less than this number of cells."
+      info: null
+      example:
+      - 50
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_counts"
+      description: "Remove genes with less than this number of counts."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean"
+      name: "--remove_mitochondrial"
+      description: "Remove mitochondrial genes?"
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Download a compressed file containing gene expression matrix and spatial\
+    \ locations from zenodo.\n"
+  test_resources:
+  - type: "python_script"
+    path: "test.py"
+    is_executable: true
+  info: null
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/zenodo_spatial_slidetags/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/zenodo_spatial_slidetags"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/zenodo_spatial_slidetags/zenodo_spatial_slidetags"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/loaders/zenodo_spatial_slidetags/main.nf b/target/nextflow/datasets/loaders/zenodo_spatial_slidetags/main.nf
new file mode 100644
index 0000000000..492c901c5e
--- /dev/null
+++ b/target/nextflow/datasets/loaders/zenodo_spatial_slidetags/main.nf
@@ -0,0 +1,3639 @@
+// zenodo_spatial_slidetags 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "zenodo_spatial_slidetags",
+    "namespace" : "datasets/loaders",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--input_data",
+            "description" : "URL to the file.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--dataset",
+            "description" : "Output h5ad file",
+            "example" : [
+              "dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Metadata",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--dataset_id",
+            "description" : "Unique identifier of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_name",
+            "description" : "Nicely formatted name.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_url",
+            "description" : "Link to the original source of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_reference",
+            "description" : "Bibtex reference of the paper in which the dataset was published.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_summary",
+            "description" : "Short description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_description",
+            "description" : "Long description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_organism",
+            "description" : "The organism of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Gene or spot filtering",
+        "description" : "Arguments related to filtering cells and genes by counts.",
+        "arguments" : [
+          {
+            "type" : "integer",
+            "name" : "--spot_filter_min_genes",
+            "description" : "Remove spots with less than this number of genes.",
+            "example" : [
+              200
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--spot_filter_min_counts",
+            "description" : "Remove spots with less than this number of counts.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--gene_filter_min_spots",
+            "description" : "Remove genes expressed in less than this number of cells.",
+            "example" : [
+              50
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--gene_filter_min_counts",
+            "description" : "Remove genes with less than this number of counts.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "boolean",
+            "name" : "--remove_mitochondrial",
+            "description" : "Remove mitochondrial genes?",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/loaders/zenodo_spatial_slidetags/"
+      }
+    ],
+    "description" : "Download a compressed file containing gene expression matrix and spatial locations from zenodo.\n",
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "test.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/loaders/zenodo_spatial_slidetags/"
+      }
+    ],
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/zenodo_spatial_slidetags/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/zenodo_spatial_slidetags",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import subprocess
+import pandas as pd
+import tempfile
+import scanpy as sc
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset': $( if [ ! -z ${VIASH_PAR_DATASET+x} ]; then echo "r'${VIASH_PAR_DATASET//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_id': $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo "r'${VIASH_PAR_DATASET_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_name': $( if [ ! -z ${VIASH_PAR_DATASET_NAME+x} ]; then echo "r'${VIASH_PAR_DATASET_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_url': $( if [ ! -z ${VIASH_PAR_DATASET_URL+x} ]; then echo "r'${VIASH_PAR_DATASET_URL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_reference': $( if [ ! -z ${VIASH_PAR_DATASET_REFERENCE+x} ]; then echo "r'${VIASH_PAR_DATASET_REFERENCE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_summary': $( if [ ! -z ${VIASH_PAR_DATASET_SUMMARY+x} ]; then echo "r'${VIASH_PAR_DATASET_SUMMARY//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_description': $( if [ ! -z ${VIASH_PAR_DATASET_DESCRIPTION+x} ]; then echo "r'${VIASH_PAR_DATASET_DESCRIPTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'dataset_organism': $( if [ ! -z ${VIASH_PAR_DATASET_ORGANISM+x} ]; then echo "r'${VIASH_PAR_DATASET_ORGANISM//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'spot_filter_min_genes': $( if [ ! -z ${VIASH_PAR_SPOT_FILTER_MIN_GENES+x} ]; then echo "int(r'${VIASH_PAR_SPOT_FILTER_MIN_GENES//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'spot_filter_min_counts': $( if [ ! -z ${VIASH_PAR_SPOT_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_SPOT_FILTER_MIN_COUNTS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'gene_filter_min_spots': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_SPOTS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_SPOTS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'gene_filter_min_counts': $( if [ ! -z ${VIASH_PAR_GENE_FILTER_MIN_COUNTS+x} ]; then echo "int(r'${VIASH_PAR_GENE_FILTER_MIN_COUNTS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'remove_mitochondrial': $( if [ ! -z ${VIASH_PAR_REMOVE_MITOCHONDRIAL+x} ]; then echo "r'${VIASH_PAR_REMOVE_MITOCHONDRIAL//\\'/\\'\\"\\'\\"r\\'}'.lower() == 'true'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print(f"Downloading data", flush=True)
+with tempfile.TemporaryDirectory() as tempdir:
+    input_data = "input_data.tar.gz"
+    dataset_name = par['dataset_name']
+    epx_data = subprocess.run(
+        ["wget", "-O", f"{tempdir}/{input_data}", par['input_data']], stderr=subprocess.STDOUT)
+    extract_spatial = subprocess.run(
+        ["tar", "-xzf", f"{tempdir}/{input_data}", "-C", tempdir, "--strip-components=1"], stderr=subprocess.STDOUT)
+
+    # Read gene expression and create anndata object
+    adata = sc.read_10x_mtx(path=tempdir)
+
+    # Read spatial locations
+    df = pd.read_csv(f"{tempdir}/spatial.csv", skiprows=1)
+    df = df.set_index('TYPE')
+    df.columns = ['spatial1', 'spatial2', 'cell_type']
+
+    # add spatial locations to anndata object
+    sel_cells = list(set(df.index) & set(adata.obs_names))
+
+    df = df.loc[sel_cells, ]
+    adata = adata[sel_cells, ]
+
+    adata.obs = df
+    adata.obsm['spatial'] = df[['spatial2', 'spatial1']].values
+
+# Make variable names unique
+adata.var_names_make_unique()
+
+sc.pp.calculate_qc_metrics(adata, inplace=True)
+
+print("Filtering spots or genes")
+t0 = adata.shape
+# remove cells with few counts
+if par["spot_filter_min_counts"]:
+    sc.pp.filter_cells(
+        adata, min_counts=par["spot_filter_min_counts"], inplace=True)
+# remove cells with few genes
+if par["spot_filter_min_genes"]:
+    sc.pp.filter_cells(
+        adata, min_genes=par["spot_filter_min_genes"], inplace=True)
+# remove genes that have few counts
+if par["gene_filter_min_counts"]:
+    sc.pp.filter_genes(
+        adata, min_counts=par["gene_filter_min_counts"], inplace=True)
+# remove genes that are found in few cells
+if par["gene_filter_min_spots"]:
+    sc.pp.filter_genes(
+        adata, min_cells=par["gene_filter_min_spots"], inplace=True)
+t1 = adata.shape
+print(f"Removed {t0[0] - t1[0]} cells and {(t0[1] - t1[1])} genes.")
+
+if par["remove_mitochondrial"]:
+    print("Removing mitochondrial genes")
+    non_mito_genes_list = [name for name in adata.var_names if not (
+        name.startswith('MT-') or name.startswith('mt-'))]
+    adata = adata[:, non_mito_genes_list]
+
+
+# Rename .var columns
+adata.var['feature_name'] = adata.var_names
+adata.var.set_index(adata.var['gene_ids'], inplace=True)
+adata.var.rename(columns={"gene_ids": "feature_id"}, inplace=True)
+
+# Move counts to .layers
+print("Add metadata to uns", flush=True)
+adata.layers["counts"] = adata.X
+adata.X = None
+
+# Add metadata
+print("Add metadata to uns", flush=True)
+metadata_fields = ["dataset_id", "dataset_name", "dataset_url",
+                   "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism"]
+for key in metadata_fields:
+    if key in par:
+        print(f"Setting .uns['{key}']", flush=True)
+        adata.uns[key] = par[key]
+
+print("Writing adata to file", flush=True)
+adata.write_h5ad(par["dataset"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/loaders/zenodo_spatial_slidetags",
+    "tag" : "2.0.0"
+  },
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/loaders/zenodo_spatial_slidetags/nextflow.config b/target/nextflow/datasets/loaders/zenodo_spatial_slidetags/nextflow.config
new file mode 100644
index 0000000000..8c1d9eb9d8
--- /dev/null
+++ b/target/nextflow/datasets/loaders/zenodo_spatial_slidetags/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/loaders/zenodo_spatial_slidetags'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Download a compressed file containing gene expression matrix and spatial locations from zenodo.\n'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/normalization/atac_tfidf/.config.vsh.yaml b/target/nextflow/datasets/normalization/atac_tfidf/.config.vsh.yaml
new file mode 100644
index 0000000000..16e8b1da18
--- /dev/null
+++ b/target/nextflow/datasets/normalization/atac_tfidf/.config.vsh.yaml
@@ -0,0 +1,551 @@
+functionality:
+  name: "atac_tfidf"
+  namespace: "datasets/normalization"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Raw dataset"
+      summary: "An unprocessed dataset as output by a dataset loader."
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+    example:
+    - "resources_test/common/pancreas/raw.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Normalized dataset"
+      summary: "A normalized dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/normalized.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--normalization_id"
+    description: "The normalization id to store in the dataset metadata. If not specified,\
+      \ the functionality name will be used."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--layer_output"
+    description: "The name of the layer in which to store the normalized data."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_size_factors"
+    description: "In which .obs slot to store the size factors (if any)."
+    info: null
+    default:
+    - "size_factors"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Transform peak counts with TF-IDF (Term Frequency - Inverse Document\
+    \ Frequency).\n\nTF: peak counts are normalised by total number of counts per\
+    \ cell DF: total number of counts for each peak IDF: number of cells divided by\
+    \ DF\n\nBy default, log(TF) * log(IDF) is returned.\n"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_normalization"
+    type_info:
+      label: "Dataset normalization"
+      summary: "A normalization method which processes the raw counts into a normalized\
+        \ dataset.\n"
+      description: "A component for normalizing the raw counts as output by dataset\
+        \ loaders into a normalized dataset."
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "muon"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/atac_tfidf/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/atac_tfidf"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/atac_tfidf/atac_tfidf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/normalization/atac_tfidf/main.nf b/target/nextflow/datasets/normalization/atac_tfidf/main.nf
new file mode 100644
index 0000000000..38c7439437
--- /dev/null
+++ b/target/nextflow/datasets/normalization/atac_tfidf/main.nf
@@ -0,0 +1,3959 @@
+// atac_tfidf 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "atac_tfidf",
+    "namespace" : "datasets/normalization",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Raw dataset",
+          "summary" : "An unprocessed dataset as output by a dataset loader.",
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/pancreas/raw.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Normalized dataset",
+          "summary" : "A normalized dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              }
+            ]
+          },
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        },
+        "example" : [
+          "resources_test/common/pancreas/normalized.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--normalization_id",
+        "description" : "The normalization id to store in the dataset metadata. If not specified, the functionality name will be used.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--layer_output",
+        "description" : "The name of the layer in which to store the normalized data.",
+        "default" : [
+          "normalized"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--obs_size_factors",
+        "description" : "In which .obs slot to store the size factors (if any).",
+        "default" : [
+          "size_factors"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/normalization/atac_tfidf/"
+      }
+    ],
+    "description" : "Transform peak counts with TF-IDF (Term Frequency - Inverse Document Frequency).\n\nTF: peak counts are normalised by total number of counts per cell DF: total number of counts for each peak IDF: number of cells divided by DF\n\nBy default, log(TF) * log(IDF) is returned.\n",
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/common/pancreas",
+        "dest" : "resources_test/common/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "dataset_normalization",
+      "type_info" : {
+        "label" : "Dataset normalization",
+        "summary" : "A normalization method which processes the raw counts into a normalized dataset.\n",
+        "description" : "A component for normalizing the raw counts as output by dataset loaders into a normalized dataset."
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "muon"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/atac_tfidf/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/atac_tfidf",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+from muon import atac as ac
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'normalization_id': $( if [ ! -z ${VIASH_PAR_NORMALIZATION_ID+x} ]; then echo "r'${VIASH_PAR_NORMALIZATION_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'layer_output': $( if [ ! -z ${VIASH_PAR_LAYER_OUTPUT+x} ]; then echo "r'${VIASH_PAR_LAYER_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'obs_size_factors': $( if [ ! -z ${VIASH_PAR_OBS_SIZE_FACTORS+x} ]; then echo "r'${VIASH_PAR_OBS_SIZE_FACTORS//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+adata = ad.read_h5ad(par['input'])
+
+print("Normalize data", flush=True)
+input_adata = ad.AnnData(X=adata.layers["counts"])
+normalized_counts = ac.pp.tfidf(input_adata, inplace=False)
+
+print("Store output in adata", flush=True)
+adata.layers[par["layer_output"]] = normalized_counts
+adata.uns["normalization_id"] = par["normalization_id"] or meta['functionality_name']
+
+print("Write data", flush=True)
+adata.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/normalization/atac_tfidf",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/normalization/atac_tfidf/nextflow.config b/target/nextflow/datasets/normalization/atac_tfidf/nextflow.config
new file mode 100644
index 0000000000..3654650956
--- /dev/null
+++ b/target/nextflow/datasets/normalization/atac_tfidf/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/normalization/atac_tfidf'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Transform peak counts with TF-IDF (Term Frequency - Inverse Document Frequency).\n\nTF: peak counts are normalised by total number of counts per cell DF: total number of counts for each peak IDF: number of cells divided by DF\n\nBy default, log(TF) * log(IDF) is returned.\n'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/normalization/l1_sqrt/.config.vsh.yaml b/target/nextflow/datasets/normalization/l1_sqrt/.config.vsh.yaml
new file mode 100644
index 0000000000..982efe9885
--- /dev/null
+++ b/target/nextflow/datasets/normalization/l1_sqrt/.config.vsh.yaml
@@ -0,0 +1,553 @@
+functionality:
+  name: "l1_sqrt"
+  namespace: "datasets/normalization"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Raw dataset"
+      summary: "An unprocessed dataset as output by a dataset loader."
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+    example:
+    - "resources_test/common/pancreas/raw.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Normalized dataset"
+      summary: "A normalized dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/normalized.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--normalization_id"
+    description: "The normalization id to store in the dataset metadata. If not specified,\
+      \ the functionality name will be used."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--layer_output"
+    description: "The name of the layer in which to store the normalized data."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_size_factors"
+    description: "In which .obs slot to store the size factors (if any)."
+    info: null
+    default:
+    - "size_factors"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Scaled L1 sqrt normalization.\n\nThis normalization method causes\
+    \ all cells to have the same sum of values.\n\nSteps:\n\n* Compute the square\
+    \ root of the counts.\n* Apply L1 normalization (rescaled such that the sum of\
+    \ the values of each cell sum to 1).\n* Multiply by the median UMI count per cell,\
+    \ causing all cells to have the sum of values.\n"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_normalization"
+    type_info:
+      label: "Dataset normalization"
+      summary: "A normalization method which processes the raw counts into a normalized\
+        \ dataset.\n"
+      description: "A component for normalizing the raw counts as output by dataset\
+        \ loaders into a normalized dataset."
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scprep"
+    - "numpy<2"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/l1_sqrt"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/l1_sqrt/l1_sqrt"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/normalization/l1_sqrt/main.nf b/target/nextflow/datasets/normalization/l1_sqrt/main.nf
new file mode 100644
index 0000000000..a277ea611f
--- /dev/null
+++ b/target/nextflow/datasets/normalization/l1_sqrt/main.nf
@@ -0,0 +1,3963 @@
+// l1_sqrt 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "l1_sqrt",
+    "namespace" : "datasets/normalization",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Raw dataset",
+          "summary" : "An unprocessed dataset as output by a dataset loader.",
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/pancreas/raw.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Normalized dataset",
+          "summary" : "A normalized dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              }
+            ]
+          },
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        },
+        "example" : [
+          "resources_test/common/pancreas/normalized.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--normalization_id",
+        "description" : "The normalization id to store in the dataset metadata. If not specified, the functionality name will be used.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--layer_output",
+        "description" : "The name of the layer in which to store the normalized data.",
+        "default" : [
+          "normalized"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--obs_size_factors",
+        "description" : "In which .obs slot to store the size factors (if any).",
+        "default" : [
+          "size_factors"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/"
+      }
+    ],
+    "description" : "Scaled L1 sqrt normalization.\n\nThis normalization method causes all cells to have the same sum of values.\n\nSteps:\n\n* Compute the square root of the counts.\n* Apply L1 normalization (rescaled such that the sum of the values of each cell sum to 1).\n* Multiply by the median UMI count per cell, causing all cells to have the sum of values.\n",
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/common/pancreas",
+        "dest" : "resources_test/common/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "dataset_normalization",
+      "type_info" : {
+        "label" : "Dataset normalization",
+        "summary" : "A normalization method which processes the raw counts into a normalized dataset.\n",
+        "description" : "A component for normalizing the raw counts as output by dataset loaders into a normalized dataset."
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scprep",
+            "numpy<2"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/l1_sqrt",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import scprep
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'normalization_id': $( if [ ! -z ${VIASH_PAR_NORMALIZATION_ID+x} ]; then echo "r'${VIASH_PAR_NORMALIZATION_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'layer_output': $( if [ ! -z ${VIASH_PAR_LAYER_OUTPUT+x} ]; then echo "r'${VIASH_PAR_LAYER_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'obs_size_factors': $( if [ ! -z ${VIASH_PAR_OBS_SIZE_FACTORS+x} ]; then echo "r'${VIASH_PAR_OBS_SIZE_FACTORS//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+adata = ad.read_h5ad(par['input'])
+
+print("Normalize data", flush=True)
+# libsize and sqrt L1 norm
+sqrt_data = scprep.utils.matrix_transform(adata.layers['counts'], np.sqrt)
+l1_sqrt, libsize = scprep.normalize.library_size_normalize(sqrt_data, rescale=1, return_library_size=True)
+l1_sqrt = l1_sqrt.tocsr()
+
+print("Store output in adata", flush=True)
+adata.layers[par["layer_output"]] = l1_sqrt
+adata.uns["normalization_id"] = par["normalization_id"] or meta['functionality_name']
+
+print("Write data", flush=True)
+adata.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/normalization/l1_sqrt",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/normalization/l1_sqrt/nextflow.config b/target/nextflow/datasets/normalization/l1_sqrt/nextflow.config
new file mode 100644
index 0000000000..d29afa4489
--- /dev/null
+++ b/target/nextflow/datasets/normalization/l1_sqrt/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/normalization/l1_sqrt'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Scaled L1 sqrt normalization.\n\nThis normalization method causes all cells to have the same sum of values.\n\nSteps:\n\n* Compute the square root of the counts.\n* Apply L1 normalization (rescaled such that the sum of the values of each cell sum to 1).\n* Multiply by the median UMI count per cell, causing all cells to have the sum of values.\n'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/normalization/log_cp/.config.vsh.yaml b/target/nextflow/datasets/normalization/log_cp/.config.vsh.yaml
new file mode 100644
index 0000000000..a239fb43f4
--- /dev/null
+++ b/target/nextflow/datasets/normalization/log_cp/.config.vsh.yaml
@@ -0,0 +1,553 @@
+functionality:
+  name: "log_cp"
+  namespace: "datasets/normalization"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Raw dataset"
+      summary: "An unprocessed dataset as output by a dataset loader."
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+    example:
+    - "resources_test/common/pancreas/raw.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Normalized dataset"
+      summary: "A normalized dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/normalized.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--normalization_id"
+    description: "The normalization id to store in the dataset metadata. If not specified,\
+      \ the functionality name will be used."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--layer_output"
+    description: "The name of the layer in which to store the normalized data."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_size_factors"
+    description: "In which .obs slot to store the size factors (if any)."
+    info: null
+    default:
+    - "size_factors"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_cp"
+    description: "Number of counts per cell. When set to -1, will use None."
+    info: null
+    default:
+    - 10000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Normalize data using Log CP"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_normalization"
+    type_info:
+      label: "Dataset normalization"
+      summary: "A normalization method which processes the raw counts into a normalized\
+        \ dataset.\n"
+      description: "A component for normalizing the raw counts as output by dataset\
+        \ loaders into a normalized dataset."
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp/log_cp"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/normalization/log_cp/main.nf b/target/nextflow/datasets/normalization/log_cp/main.nf
new file mode 100644
index 0000000000..4d18c294e4
--- /dev/null
+++ b/target/nextflow/datasets/normalization/log_cp/main.nf
@@ -0,0 +1,3976 @@
+// log_cp 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "log_cp",
+    "namespace" : "datasets/normalization",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Raw dataset",
+          "summary" : "An unprocessed dataset as output by a dataset loader.",
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/pancreas/raw.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Normalized dataset",
+          "summary" : "A normalized dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              }
+            ]
+          },
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        },
+        "example" : [
+          "resources_test/common/pancreas/normalized.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--normalization_id",
+        "description" : "The normalization id to store in the dataset metadata. If not specified, the functionality name will be used.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--layer_output",
+        "description" : "The name of the layer in which to store the normalized data.",
+        "default" : [
+          "normalized"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--obs_size_factors",
+        "description" : "In which .obs slot to store the size factors (if any).",
+        "default" : [
+          "size_factors"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_cp",
+        "description" : "Number of counts per cell. When set to -1, will use None.",
+        "default" : [
+          10000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/"
+      }
+    ],
+    "description" : "Normalize data using Log CP",
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/common/pancreas",
+        "dest" : "resources_test/common/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "dataset_normalization",
+      "type_info" : {
+        "label" : "Dataset normalization",
+        "summary" : "A normalization method which processes the raw counts into a normalized dataset.\n",
+        "description" : "A component for normalizing the raw counts as output by dataset loaders into a normalized dataset."
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'normalization_id': $( if [ ! -z ${VIASH_PAR_NORMALIZATION_ID+x} ]; then echo "r'${VIASH_PAR_NORMALIZATION_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'layer_output': $( if [ ! -z ${VIASH_PAR_LAYER_OUTPUT+x} ]; then echo "r'${VIASH_PAR_LAYER_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'obs_size_factors': $( if [ ! -z ${VIASH_PAR_OBS_SIZE_FACTORS+x} ]; then echo "r'${VIASH_PAR_OBS_SIZE_FACTORS//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_cp': $( if [ ! -z ${VIASH_PAR_N_CP+x} ]; then echo "int(r'${VIASH_PAR_N_CP//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print(">> Load data", flush=True)
+adata = sc.read_h5ad(par['input'])
+
+print(">> Normalize data", flush=True)
+if par["n_cp"] == -1:
+    norm = sc.pp.normalize_total(
+        adata, 
+        target_sum=None, 
+        layer="counts", 
+        inplace=False
+    )
+else:
+    norm = sc.pp.normalize_total(
+        adata, 
+        target_sum=par["n_cp"], 
+        layer="counts", 
+        inplace=False
+    )
+lognorm = sc.pp.log1p(norm["X"])
+
+print(">> Store output in adata", flush=True)
+adata.layers[par["layer_output"]] = lognorm
+adata.obs[par["obs_size_factors"]] = norm["norm_factor"]
+adata.uns["normalization_id"] = par["normalization_id"] or meta['functionality_name']
+
+print(">> Write data", flush=True)
+adata.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/normalization/log_cp",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/normalization/log_cp/nextflow.config b/target/nextflow/datasets/normalization/log_cp/nextflow.config
new file mode 100644
index 0000000000..dfedf19116
--- /dev/null
+++ b/target/nextflow/datasets/normalization/log_cp/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/normalization/log_cp'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Normalize data using Log CP'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/normalization/log_scran_pooling/.config.vsh.yaml b/target/nextflow/datasets/normalization/log_scran_pooling/.config.vsh.yaml
new file mode 100644
index 0000000000..34b5e3ed14
--- /dev/null
+++ b/target/nextflow/datasets/normalization/log_scran_pooling/.config.vsh.yaml
@@ -0,0 +1,555 @@
+functionality:
+  name: "log_scran_pooling"
+  namespace: "datasets/normalization"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Raw dataset"
+      summary: "An unprocessed dataset as output by a dataset loader."
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+    example:
+    - "resources_test/common/pancreas/raw.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Normalized dataset"
+      summary: "A normalized dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/normalized.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--normalization_id"
+    description: "The normalization id to store in the dataset metadata. If not specified,\
+      \ the functionality name will be used."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--layer_output"
+    description: "The name of the layer in which to store the normalized data."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_size_factors"
+    description: "In which .obs slot to store the size factors (if any)."
+    info: null
+    default:
+    - "size_factors"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  description: "Normalize data using scran pooling"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_normalization"
+    type_info:
+      label: "Dataset normalization"
+      summary: "A normalization method which processes the raw counts into a normalized\
+        \ dataset.\n"
+      description: "A component for normalizing the raw counts as output by dataset\
+        \ loaders into a normalized dataset."
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "Matrix"
+    - "rlang"
+    - "scran"
+    - "BiocParallel"
+    bioc_force_install: false
+  - type: "python"
+    user: false
+    pip:
+    - "scanpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_scran_pooling"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_scran_pooling/log_scran_pooling"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/normalization/log_scran_pooling/main.nf b/target/nextflow/datasets/normalization/log_scran_pooling/main.nf
new file mode 100644
index 0000000000..03c9f9159f
--- /dev/null
+++ b/target/nextflow/datasets/normalization/log_scran_pooling/main.nf
@@ -0,0 +1,3990 @@
+// log_scran_pooling 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "log_scran_pooling",
+    "namespace" : "datasets/normalization",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Raw dataset",
+          "summary" : "An unprocessed dataset as output by a dataset loader.",
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/pancreas/raw.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Normalized dataset",
+          "summary" : "A normalized dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              }
+            ]
+          },
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        },
+        "example" : [
+          "resources_test/common/pancreas/normalized.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--normalization_id",
+        "description" : "The normalization id to store in the dataset metadata. If not specified, the functionality name will be used.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--layer_output",
+        "description" : "The name of the layer in which to store the normalized data.",
+        "default" : [
+          "normalized"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--obs_size_factors",
+        "description" : "In which .obs slot to store the size factors (if any).",
+        "default" : [
+          "size_factors"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/"
+      }
+    ],
+    "description" : "Normalize data using scran pooling",
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/common/pancreas",
+        "dest" : "resources_test/common/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "dataset_normalization",
+      "type_info" : {
+        "label" : "Dataset normalization",
+        "summary" : "A normalization method which processes the raw counts into a normalized dataset.\n",
+        "description" : "A component for normalizing the raw counts as output by dataset loaders into a normalized dataset."
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "cran" : [
+            "Matrix",
+            "rlang",
+            "scran",
+            "BiocParallel"
+          ],
+          "bioc_force_install" : false
+        },
+        {
+          "type" : "python",
+          "user" : false,
+          "pip" : [
+            "scanpy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_scran_pooling",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+cat(">> Loading dependencies\\\\n")
+library(anndata, warn.conflicts = FALSE)
+requireNamespace("scran", quietly = TRUE)
+requireNamespace("BiocParallel", quietly = TRUE)
+library(Matrix, warn.conflicts = FALSE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "normalization_id" = $( if [ ! -z ${VIASH_PAR_NORMALIZATION_ID+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_NORMALIZATION_ID" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "layer_output" = $( if [ ! -z ${VIASH_PAR_LAYER_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_LAYER_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "obs_size_factors" = $( if [ ! -z ${VIASH_PAR_OBS_SIZE_FACTORS+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OBS_SIZE_FACTORS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat(">> Load data\\\\n")
+adata <- anndata::read_h5ad(par\\$input)
+counts <- as(t(adata\\$layers[["counts"]]), "CsparseMatrix")
+
+cat(">> Normalizing data\\\\n")
+size_factors <- scran::calculateSumFactors(
+  counts,
+  min.mean = 0.1,
+  BPPARAM = BiocParallel::MulticoreParam()
+)
+lognorm <- log1p(sweep(adata\\$layers[["counts"]], 1, size_factors, "*"))
+
+cat(">> Storing in anndata\\\\n")
+adata\\$obs[[par\\$obs_size_factors]] <- size_factors
+adata\\$layers[[par\\$layer_output]] <- lognorm
+norm_id <- par[["normalization_id"]]
+if (is.null(norm_id)) {
+  norm_id <- meta[["functionality_name"]]
+}
+adata\\$uns[["normalization_id"]] <- norm_id
+
+cat(">> Writing to file\\\\n")
+zzz <- adata\\$write_h5ad(par\\$output, compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/normalization/log_scran_pooling",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/normalization/log_scran_pooling/nextflow.config b/target/nextflow/datasets/normalization/log_scran_pooling/nextflow.config
new file mode 100644
index 0000000000..946d80f040
--- /dev/null
+++ b/target/nextflow/datasets/normalization/log_scran_pooling/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/normalization/log_scran_pooling'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Normalize data using scran pooling'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/normalization/prot_clr/.config.vsh.yaml b/target/nextflow/datasets/normalization/prot_clr/.config.vsh.yaml
new file mode 100644
index 0000000000..18ef5b50ef
--- /dev/null
+++ b/target/nextflow/datasets/normalization/prot_clr/.config.vsh.yaml
@@ -0,0 +1,551 @@
+functionality:
+  name: "prot_clr"
+  namespace: "datasets/normalization"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Raw dataset"
+      summary: "An unprocessed dataset as output by a dataset loader."
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+    example:
+    - "resources_test/common/pancreas/raw.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Normalized dataset"
+      summary: "A normalized dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/normalized.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--normalization_id"
+    description: "The normalization id to store in the dataset metadata. If not specified,\
+      \ the functionality name will be used."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--layer_output"
+    description: "The name of the layer in which to store the normalized data."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_size_factors"
+    description: "In which .obs slot to store the size factors (if any)."
+    info: null
+    default:
+    - "size_factors"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Perform center log ratio (CLR) normalization on input CITE-seq data\
+    \ (Stoeckius et al. 2017).\n\nThe CLR transformation is defined as:\n\n$$\nx_{\\\
+    text{clr}} = \\log\\left(\\frac{x}{g(x)}\\right)\n$$\n\nwhere $\\(g(x)\\)$ is\
+    \ the geometric mean of the row $\\(x\\)$.\n"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_normalization"
+    type_info:
+      label: "Dataset normalization"
+      summary: "A normalization method which processes the raw counts into a normalized\
+        \ dataset.\n"
+      description: "A component for normalizing the raw counts as output by dataset\
+        \ loaders into a normalized dataset."
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "muon"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/prot_clr/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/prot_clr"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/prot_clr/prot_clr"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/normalization/prot_clr/main.nf b/target/nextflow/datasets/normalization/prot_clr/main.nf
new file mode 100644
index 0000000000..8632319bb3
--- /dev/null
+++ b/target/nextflow/datasets/normalization/prot_clr/main.nf
@@ -0,0 +1,3961 @@
+// prot_clr 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "prot_clr",
+    "namespace" : "datasets/normalization",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Raw dataset",
+          "summary" : "An unprocessed dataset as output by a dataset loader.",
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/pancreas/raw.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Normalized dataset",
+          "summary" : "A normalized dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              }
+            ]
+          },
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        },
+        "example" : [
+          "resources_test/common/pancreas/normalized.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--normalization_id",
+        "description" : "The normalization id to store in the dataset metadata. If not specified, the functionality name will be used.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--layer_output",
+        "description" : "The name of the layer in which to store the normalized data.",
+        "default" : [
+          "normalized"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--obs_size_factors",
+        "description" : "In which .obs slot to store the size factors (if any).",
+        "default" : [
+          "size_factors"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/normalization/prot_clr/"
+      }
+    ],
+    "description" : "Perform center log ratio (CLR) normalization on input CITE-seq data (Stoeckius et al. 2017).\n\nThe CLR transformation is defined as:\n\n$$\nx_{\\\\text{clr}} = \\\\log\\\\left(\\\\frac{x}{g(x)}\\\\right)\n$$\n\nwhere $\\\\(g(x)\\\\)$ is the geometric mean of the row $\\\\(x\\\\)$.\n",
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/common/pancreas",
+        "dest" : "resources_test/common/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "dataset_normalization",
+      "type_info" : {
+        "label" : "Dataset normalization",
+        "summary" : "A normalization method which processes the raw counts into a normalized dataset.\n",
+        "description" : "A component for normalizing the raw counts as output by dataset loaders into a normalized dataset."
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "muon"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/prot_clr/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/prot_clr",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+from muon import prot as pt
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'normalization_id': $( if [ ! -z ${VIASH_PAR_NORMALIZATION_ID+x} ]; then echo "r'${VIASH_PAR_NORMALIZATION_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'layer_output': $( if [ ! -z ${VIASH_PAR_LAYER_OUTPUT+x} ]; then echo "r'${VIASH_PAR_LAYER_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'obs_size_factors': $( if [ ! -z ${VIASH_PAR_OBS_SIZE_FACTORS+x} ]; then echo "r'${VIASH_PAR_OBS_SIZE_FACTORS//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+adata = ad.read_h5ad(par['input'])
+
+print("Normalize data", flush=True)
+input_adata = ad.AnnData(X=adata.layers["counts"])
+normalized_counts = pt.pp.clr(input_adata, inplace=False)
+if not normalized_counts:
+    raise RuntimeError("CLR failed to return the requested output layer")
+
+print("Store output in adata", flush=True)
+adata.layers[par["layer_output"]] = normalized_counts.X
+adata.uns["normalization_id"] = par["normalization_id"] or meta['functionality_name']
+
+print("Write data", flush=True)
+adata.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/normalization/prot_clr",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/normalization/prot_clr/nextflow.config b/target/nextflow/datasets/normalization/prot_clr/nextflow.config
new file mode 100644
index 0000000000..eec1b67c17
--- /dev/null
+++ b/target/nextflow/datasets/normalization/prot_clr/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/normalization/prot_clr'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Perform center log ratio (CLR) normalization on input CITE-seq data (Stoeckius et al. 2017).\n\nThe CLR transformation is defined as:\n\n$$\nx_{\\text{clr}} = \\log\\left(\\frac{x}{g(x)}\\right)\n$$\n\nwhere $\\(g(x)\\)$ is the geometric mean of the row $\\(x\\)$.\n'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/normalization/sqrt_cp/.config.vsh.yaml b/target/nextflow/datasets/normalization/sqrt_cp/.config.vsh.yaml
new file mode 100644
index 0000000000..041cc88b55
--- /dev/null
+++ b/target/nextflow/datasets/normalization/sqrt_cp/.config.vsh.yaml
@@ -0,0 +1,553 @@
+functionality:
+  name: "sqrt_cp"
+  namespace: "datasets/normalization"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Raw dataset"
+      summary: "An unprocessed dataset as output by a dataset loader."
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+    example:
+    - "resources_test/common/pancreas/raw.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Normalized dataset"
+      summary: "A normalized dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/normalized.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--normalization_id"
+    description: "The normalization id to store in the dataset metadata. If not specified,\
+      \ the functionality name will be used."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--layer_output"
+    description: "The name of the layer in which to store the normalized data."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_size_factors"
+    description: "In which .obs slot to store the size factors (if any)."
+    info: null
+    default:
+    - "size_factors"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_cp"
+    description: "Number of counts per cell"
+    info: null
+    default:
+    - 10000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Normalize data using Log Sqrt"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_normalization"
+    type_info:
+      label: "Dataset normalization"
+      summary: "A normalization method which processes the raw counts into a normalized\
+        \ dataset.\n"
+      description: "A component for normalizing the raw counts as output by dataset\
+        \ loaders into a normalized dataset."
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/sqrt_cp"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/sqrt_cp/sqrt_cp"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/normalization/sqrt_cp/main.nf b/target/nextflow/datasets/normalization/sqrt_cp/main.nf
new file mode 100644
index 0000000000..9d2468a850
--- /dev/null
+++ b/target/nextflow/datasets/normalization/sqrt_cp/main.nf
@@ -0,0 +1,3969 @@
+// sqrt_cp 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "sqrt_cp",
+    "namespace" : "datasets/normalization",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Raw dataset",
+          "summary" : "An unprocessed dataset as output by a dataset loader.",
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/pancreas/raw.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Normalized dataset",
+          "summary" : "A normalized dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              }
+            ]
+          },
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        },
+        "example" : [
+          "resources_test/common/pancreas/normalized.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--normalization_id",
+        "description" : "The normalization id to store in the dataset metadata. If not specified, the functionality name will be used.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--layer_output",
+        "description" : "The name of the layer in which to store the normalized data.",
+        "default" : [
+          "normalized"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--obs_size_factors",
+        "description" : "In which .obs slot to store the size factors (if any).",
+        "default" : [
+          "size_factors"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_cp",
+        "description" : "Number of counts per cell",
+        "default" : [
+          10000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/"
+      }
+    ],
+    "description" : "Normalize data using Log Sqrt",
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/common/pancreas",
+        "dest" : "resources_test/common/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "dataset_normalization",
+      "type_info" : {
+        "label" : "Dataset normalization",
+        "summary" : "A normalization method which processes the raw counts into a normalized dataset.\n",
+        "description" : "A component for normalizing the raw counts as output by dataset loaders into a normalized dataset."
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/sqrt_cp",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import scanpy as sc
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'normalization_id': $( if [ ! -z ${VIASH_PAR_NORMALIZATION_ID+x} ]; then echo "r'${VIASH_PAR_NORMALIZATION_ID//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'layer_output': $( if [ ! -z ${VIASH_PAR_LAYER_OUTPUT+x} ]; then echo "r'${VIASH_PAR_LAYER_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'obs_size_factors': $( if [ ! -z ${VIASH_PAR_OBS_SIZE_FACTORS+x} ]; then echo "r'${VIASH_PAR_OBS_SIZE_FACTORS//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_cp': $( if [ ! -z ${VIASH_PAR_N_CP+x} ]; then echo "int(r'${VIASH_PAR_N_CP//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print(">> Load data", flush=True)
+adata = sc.read_h5ad(par['input'])
+
+print(">> Normalize data", flush=True)
+norm = sc.pp.normalize_total(
+    adata, 
+    target_sum=par['n_cp'], 
+    layer="counts", 
+    inplace=False
+)
+lognorm = np.sqrt(norm['X'])
+
+print(">> Store output in adata", flush=True)
+adata.layers[par["layer_output"]] = lognorm
+adata.obs[par["obs_size_factors"]] = norm["norm_factor"]
+adata.uns["normalization_id"] = par["normalization_id"] or meta['functionality_name']
+
+print(">> Write data", flush=True)
+adata.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/normalization/sqrt_cp",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/normalization/sqrt_cp/nextflow.config b/target/nextflow/datasets/normalization/sqrt_cp/nextflow.config
new file mode 100644
index 0000000000..b790eb0e75
--- /dev/null
+++ b/target/nextflow/datasets/normalization/sqrt_cp/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/normalization/sqrt_cp'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Normalize data using Log Sqrt'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/processors/hvg/.config.vsh.yaml b/target/nextflow/datasets/processors/hvg/.config.vsh.yaml
new file mode 100644
index 0000000000..9fc5b9a9e9
--- /dev/null
+++ b/target/nextflow/datasets/processors/hvg/.config.vsh.yaml
@@ -0,0 +1,577 @@
+functionality:
+  name: "hvg"
+  namespace: "datasets/processors"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Normalized dataset"
+      summary: "A normalized dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/normalized.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--input_layer"
+    description: "Which layer to use as input."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Dataset+HVG"
+      summary: "A normalised dataset with a PCA embedding and HVG selection."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/hvg.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--var_hvg"
+    description: "In which .var slot to store whether a feature is considered to be\
+      \ hvg."
+    info: null
+    default:
+    - "hvg"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--var_hvg_score"
+    description: "In which .var slot to store the gene variance score (normalized\
+      \ dispersion)."
+    info: null
+    default:
+    - "hvg_score"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--num_features"
+    description: "The number of HVG to select"
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Compute HVG"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_processor"
+    type_info:
+      label: "HVG"
+      summary: "Computes the highly variable genes scores.\n"
+      description: "The resulting AnnData will contain both a boolean 'hvg' column\
+        \ in 'var', as well as a numerical 'hvg_score' in 'var'.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/hvg"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/hvg/hvg"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/processors/hvg/main.nf b/target/nextflow/datasets/processors/hvg/main.nf
new file mode 100644
index 0000000000..817ebc7289
--- /dev/null
+++ b/target/nextflow/datasets/processors/hvg/main.nf
@@ -0,0 +1,4004 @@
+// hvg 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "hvg",
+    "namespace" : "datasets/processors",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Normalized dataset",
+          "summary" : "A normalized dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              }
+            ]
+          },
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        },
+        "example" : [
+          "resources_test/common/pancreas/normalized.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--input_layer",
+        "description" : "Which layer to use as input.",
+        "default" : [
+          "normalized"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Dataset+HVG",
+          "summary" : "A normalised dataset with a PCA embedding and HVG selection.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          },
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        },
+        "example" : [
+          "resources_test/common/pancreas/hvg.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--var_hvg",
+        "description" : "In which .var slot to store whether a feature is considered to be hvg.",
+        "default" : [
+          "hvg"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--var_hvg_score",
+        "description" : "In which .var slot to store the gene variance score (normalized dispersion).",
+        "default" : [
+          "hvg_score"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--num_features",
+        "description" : "The number of HVG to select",
+        "default" : [
+          1000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/"
+      }
+    ],
+    "description" : "Compute HVG",
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/common/pancreas",
+        "dest" : "resources_test/common/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "dataset_processor",
+      "type_info" : {
+        "label" : "HVG",
+        "summary" : "Computes the highly variable genes scores.\n",
+        "description" : "The resulting AnnData will contain both a boolean 'hvg' column in 'var', as well as a numerical 'hvg_score' in 'var'.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/hvg",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+
+import scanpy as sc
+
+### VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_layer': $( if [ ! -z ${VIASH_PAR_INPUT_LAYER+x} ]; then echo "r'${VIASH_PAR_INPUT_LAYER//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'var_hvg': $( if [ ! -z ${VIASH_PAR_VAR_HVG+x} ]; then echo "r'${VIASH_PAR_VAR_HVG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'var_hvg_score': $( if [ ! -z ${VIASH_PAR_VAR_HVG_SCORE+x} ]; then echo "r'${VIASH_PAR_VAR_HVG_SCORE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'num_features': $( if [ ! -z ${VIASH_PAR_NUM_FEATURES+x} ]; then echo "int(r'${VIASH_PAR_NUM_FEATURES//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+### VIASH END
+
+print(">> Load data", flush=True)
+adata = sc.read_h5ad(par['input'])
+
+print(">> Look for layer", flush=True)
+layer = adata.X if not par['input_layer'] else adata.layers[par['input_layer']]
+
+print(">> Run HVG", flush=True)
+out = sc.pp.highly_variable_genes(
+  adata,
+  layer=par["input_layer"],
+  n_top_genes=par["num_features"],
+  flavor='cell_ranger',
+  inplace=False
+)
+
+print(">> Storing output", flush=True)
+adata.var[par["var_hvg"]] = out['highly_variable'].values
+adata.var[par["var_hvg_score"]] = out['dispersions_norm'].values
+
+print(">> Writing data", flush=True)
+adata.write_h5ad(par['output'])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/processors/hvg",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/processors/hvg/nextflow.config b/target/nextflow/datasets/processors/hvg/nextflow.config
new file mode 100644
index 0000000000..b32c90c0f0
--- /dev/null
+++ b/target/nextflow/datasets/processors/hvg/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/processors/hvg'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Compute HVG'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/processors/knn/.config.vsh.yaml b/target/nextflow/datasets/processors/knn/.config.vsh.yaml
new file mode 100644
index 0000000000..2377d51541
--- /dev/null
+++ b/target/nextflow/datasets/processors/knn/.config.vsh.yaml
@@ -0,0 +1,617 @@
+functionality:
+  name: "knn"
+  namespace: "datasets/processors"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset+HVG+PCA"
+      summary: "A normalised dataset with a PCA embedding"
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        varm:
+        - type: "double"
+          name: "pca_loadings"
+          description: "The PCA loadings matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "double"
+          name: "pca_variance"
+          description: "The PCA variance objects."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/pca.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--input_layer"
+    description: "Which layer to use as input."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Dataset+HVG+PCA+kNN"
+      summary: "A normalised data with a PCA embedding, HVG selection and a kNN graph"
+      slots:
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "double"
+          name: "pca_variance"
+          description: "The PCA variance objects."
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        varm:
+        - type: "double"
+          name: "pca_loadings"
+          description: "The PCA loadings matrix."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/knn.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--key_added"
+    description: "The neighbors data is added to `.uns[key_added]`, \ndistances are\
+      \ stored in `.obsp[key_added+'_distances']` and \nconnectivities in `.obsp[key_added+'_connectivities']`.\n"
+    info: null
+    default:
+    - "knn"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--num_neighbors"
+    description: "The size of local neighborhood (in terms of number of neighboring\
+      \ data points) used for manifold approximation."
+    info: null
+    default:
+    - 15
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Compute KNN"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_processor"
+    type_info:
+      label: "KNN"
+      summary: "Computes the k-nearest-neighbours for each cell.\n"
+      description: "The resulting AnnData will contain both the knn distances and\
+        \ the knn connectivities in 'obsp'.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/knn/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/knn"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/knn/knn"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/processors/knn/main.nf b/target/nextflow/datasets/processors/knn/main.nf
new file mode 100644
index 0000000000..d40807e804
--- /dev/null
+++ b/target/nextflow/datasets/processors/knn/main.nf
@@ -0,0 +1,4058 @@
+// knn 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "knn",
+    "namespace" : "datasets/processors",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset+HVG+PCA",
+          "summary" : "A normalised dataset with a PCA embedding",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "varm" : [
+              {
+                "type" : "double",
+                "name" : "pca_loadings",
+                "description" : "The PCA loadings matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "pca_variance",
+                "description" : "The PCA variance objects.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ]
+          },
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        },
+        "example" : [
+          "resources_test/common/pancreas/pca.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--input_layer",
+        "description" : "Which layer to use as input.",
+        "default" : [
+          "normalized"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Dataset+HVG+PCA+kNN",
+          "summary" : "A normalised data with a PCA embedding, HVG selection and a kNN graph",
+          "slots" : {
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "pca_variance",
+                "description" : "The PCA variance objects.",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "varm" : [
+              {
+                "type" : "double",
+                "name" : "pca_loadings",
+                "description" : "The PCA loadings matrix.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ]
+          },
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        },
+        "example" : [
+          "resources_test/common/pancreas/knn.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--key_added",
+        "description" : "The neighbors data is added to `.uns[key_added]`, \ndistances are stored in `.obsp[key_added+'_distances']` and \nconnectivities in `.obsp[key_added+'_connectivities']`.\n",
+        "default" : [
+          "knn"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--num_neighbors",
+        "description" : "The size of local neighborhood (in terms of number of neighboring data points) used for manifold approximation.",
+        "default" : [
+          15
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/processors/knn/"
+      }
+    ],
+    "description" : "Compute KNN",
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/common/pancreas",
+        "dest" : "resources_test/common/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "dataset_processor",
+      "type_info" : {
+        "label" : "KNN",
+        "summary" : "Computes the k-nearest-neighbours for each cell.\n",
+        "description" : "The resulting AnnData will contain both the knn distances and the knn connectivities in 'obsp'.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/knn/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/knn",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+
+import scanpy as sc
+
+### VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_layer': $( if [ ! -z ${VIASH_PAR_INPUT_LAYER+x} ]; then echo "r'${VIASH_PAR_INPUT_LAYER//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'key_added': $( if [ ! -z ${VIASH_PAR_KEY_ADDED+x} ]; then echo "r'${VIASH_PAR_KEY_ADDED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'num_neighbors': $( if [ ! -z ${VIASH_PAR_NUM_NEIGHBORS+x} ]; then echo "int(r'${VIASH_PAR_NUM_NEIGHBORS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+### VIASH END
+
+print(">> Load data", flush=True)
+adata = sc.read(par['input'])
+
+print(">> Run kNN", flush=True)
+sc.pp.neighbors(
+    adata,
+    use_rep='X_pca',
+    key_added=par['key_added'],
+    n_neighbors=par['num_neighbors']
+)
+
+print(">> Writing data", flush=True)
+adata.write_h5ad(par['output'])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/processors/knn",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/processors/knn/nextflow.config b/target/nextflow/datasets/processors/knn/nextflow.config
new file mode 100644
index 0000000000..a1a0fb3c96
--- /dev/null
+++ b/target/nextflow/datasets/processors/knn/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/processors/knn'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Compute KNN'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/processors/pca/.config.vsh.yaml b/target/nextflow/datasets/processors/pca/.config.vsh.yaml
new file mode 100644
index 0000000000..812e9b6a9d
--- /dev/null
+++ b/target/nextflow/datasets/processors/pca/.config.vsh.yaml
@@ -0,0 +1,623 @@
+functionality:
+  name: "pca"
+  namespace: "datasets/processors"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset+HVG"
+      summary: "A normalised dataset with a PCA embedding and HVG selection."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/hvg.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--input_layer"
+    description: "Which layer to use as input."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--input_var_features"
+    description: "Column name in .var matrix that will be used to select which genes\
+      \ to run the PCA on."
+    info: null
+    default:
+    - "hvg"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Dataset+HVG+PCA"
+      summary: "A normalised dataset with a PCA embedding"
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        varm:
+        - type: "double"
+          name: "pca_loadings"
+          description: "The PCA loadings matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "double"
+          name: "pca_variance"
+          description: "The PCA variance objects."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/pca.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obsm_embedding"
+    description: "In which .obsm slot to store the resulting embedding."
+    info: null
+    default:
+    - "X_pca"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--varm_loadings"
+    description: "In which .varm slot to store the resulting loadings matrix."
+    info: null
+    default:
+    - "pca_loadings"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--uns_variance"
+    description: "In which .uns slot to store the resulting variance objects."
+    info: null
+    default:
+    - "pca_variance"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--num_components"
+    description: "Number of principal components to compute. Defaults to 50, or 1\
+      \ - minimum dimension size of selected representation."
+    info: null
+    example:
+    - 25
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Compute PCA"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_processor"
+    type_info:
+      label: "PCA"
+      summary: "Computes a PCA embedding of the normalized data.\n"
+      description: "The resulting AnnData will contain an embedding in obsm, as well\
+        \ as optional loadings in 'varm'."
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/pca/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/pca"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/pca/pca"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/processors/pca/main.nf b/target/nextflow/datasets/processors/pca/main.nf
new file mode 100644
index 0000000000..1d16374460
--- /dev/null
+++ b/target/nextflow/datasets/processors/pca/main.nf
@@ -0,0 +1,4068 @@
+// pca 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "pca",
+    "namespace" : "datasets/processors",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset+HVG",
+          "summary" : "A normalised dataset with a PCA embedding and HVG selection.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          },
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        },
+        "example" : [
+          "resources_test/common/pancreas/hvg.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--input_layer",
+        "description" : "Which layer to use as input.",
+        "default" : [
+          "normalized"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--input_var_features",
+        "description" : "Column name in .var matrix that will be used to select which genes to run the PCA on.",
+        "default" : [
+          "hvg"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Dataset+HVG+PCA",
+          "summary" : "A normalised dataset with a PCA embedding",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "varm" : [
+              {
+                "type" : "double",
+                "name" : "pca_loadings",
+                "description" : "The PCA loadings matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "pca_variance",
+                "description" : "The PCA variance objects.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ]
+          },
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        },
+        "example" : [
+          "resources_test/common/pancreas/pca.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--obsm_embedding",
+        "description" : "In which .obsm slot to store the resulting embedding.",
+        "default" : [
+          "X_pca"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--varm_loadings",
+        "description" : "In which .varm slot to store the resulting loadings matrix.",
+        "default" : [
+          "pca_loadings"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--uns_variance",
+        "description" : "In which .uns slot to store the resulting variance objects.",
+        "default" : [
+          "pca_variance"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--num_components",
+        "description" : "Number of principal components to compute. Defaults to 50, or 1 - minimum dimension size of selected representation.",
+        "example" : [
+          25
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/processors/pca/"
+      }
+    ],
+    "description" : "Compute PCA",
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/common/pancreas",
+        "dest" : "resources_test/common/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "dataset_processor",
+      "type_info" : {
+        "label" : "PCA",
+        "summary" : "Computes a PCA embedding of the normalized data.\n",
+        "description" : "The resulting AnnData will contain an embedding in obsm, as well as optional loadings in 'varm'."
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/pca/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/pca",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+
+import scanpy as sc
+
+### VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_layer': $( if [ ! -z ${VIASH_PAR_INPUT_LAYER+x} ]; then echo "r'${VIASH_PAR_INPUT_LAYER//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_var_features': $( if [ ! -z ${VIASH_PAR_INPUT_VAR_FEATURES+x} ]; then echo "r'${VIASH_PAR_INPUT_VAR_FEATURES//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'obsm_embedding': $( if [ ! -z ${VIASH_PAR_OBSM_EMBEDDING+x} ]; then echo "r'${VIASH_PAR_OBSM_EMBEDDING//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'varm_loadings': $( if [ ! -z ${VIASH_PAR_VARM_LOADINGS+x} ]; then echo "r'${VIASH_PAR_VARM_LOADINGS//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'uns_variance': $( if [ ! -z ${VIASH_PAR_UNS_VARIANCE+x} ]; then echo "r'${VIASH_PAR_UNS_VARIANCE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'num_components': $( if [ ! -z ${VIASH_PAR_NUM_COMPONENTS+x} ]; then echo "int(r'${VIASH_PAR_NUM_COMPONENTS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+### VIASH END
+
+print(">> Load data", flush=True)
+adata = sc.read(par['input'])
+
+print(">> Look for layer", flush=True)
+layer = adata.X if not par['input_layer'] else adata.layers[par['input_layer']]
+
+print(">> Run PCA", flush=True)
+X_pca, loadings, variance, variance_ratio = sc.tl.pca(
+    layer, 
+    n_comps=par["num_components"], 
+    return_info=True
+)
+
+print(">> Storing output", flush=True)
+adata.obsm[par["obsm_embedding"]] = X_pca
+adata.varm[par["varm_loadings"]] = loadings.T
+adata.uns[par["uns_variance"]] = {
+    "variance": variance, 
+    "variance_ratio": variance_ratio
+}
+
+print(">> Writing data", flush=True)
+adata.write_h5ad(par['output'])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/processors/pca",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/processors/pca/nextflow.config b/target/nextflow/datasets/processors/pca/nextflow.config
new file mode 100644
index 0000000000..0625b90ca2
--- /dev/null
+++ b/target/nextflow/datasets/processors/pca/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/processors/pca'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Compute PCA'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/processors/subsample/.config.vsh.yaml b/target/nextflow/datasets/processors/subsample/.config.vsh.yaml
new file mode 100644
index 0000000000..b78f4589b6
--- /dev/null
+++ b/target/nextflow/datasets/processors/subsample/.config.vsh.yaml
@@ -0,0 +1,1188 @@
+functionality:
+  name: "subsample"
+  namespace: "datasets/processors"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Common dataset"
+      summary: "A dataset processed by the common dataset processing pipeline."
+      description: "This dataset contains both raw counts and normalized data matrices,\n\
+        as well as a PCA embedding, HVG selection and a kNN graph."
+      slots:
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "double"
+          name: "pca_variance"
+          description: "The PCA variance objects."
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        varm:
+        - type: "double"
+          name: "pca_loadings"
+          description: "The PCA loadings matrix."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+    example:
+    - "resources_test/common/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Common dataset"
+      summary: "A dataset processed by the common dataset processing pipeline."
+      description: "This dataset contains both raw counts and normalized data matrices,\n\
+        as well as a PCA embedding, HVG selection and a kNN graph."
+      slots:
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "double"
+          name: "pca_variance"
+          description: "The PCA variance objects."
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        varm:
+        - type: "double"
+          name: "pca_loadings"
+          description: "The PCA loadings matrix."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+    example:
+    - "resources_test/common/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Common dataset"
+      summary: "A dataset processed by the common dataset processing pipeline."
+      description: "This dataset contains both raw counts and normalized data matrices,\n\
+        as well as a PCA embedding, HVG selection and a kNN graph."
+      slots:
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "double"
+          name: "pca_variance"
+          description: "The PCA variance objects."
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        varm:
+        - type: "double"
+          name: "pca_loadings"
+          description: "The PCA loadings matrix."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+    example:
+    - "resources_test/common/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod2"
+    info:
+      label: "Common dataset"
+      summary: "A dataset processed by the common dataset processing pipeline."
+      description: "This dataset contains both raw counts and normalized data matrices,\n\
+        as well as a PCA embedding, HVG selection and a kNN graph."
+      slots:
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "double"
+          name: "pca_variance"
+          description: "The PCA variance objects."
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        varm:
+        - type: "double"
+          name: "pca_loadings"
+          description: "The PCA loadings matrix."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+    example:
+    - "resources_test/common/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_obs"
+    description: "Maximum number of observations to be kept. It might end up being\
+      \ less because empty cells / genes are removed."
+    info: null
+    default:
+    - 500
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_vars"
+    description: "Maximum number of variables to be kept. It might end up being less\
+      \ because empty cells / genes are removed."
+    info: null
+    default:
+    - 500
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--keep_features"
+    description: "A list of genes to keep."
+    info: null
+    required: false
+    direction: "input"
+    multiple: true
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--keep_cell_type_categories"
+    description: "Cell type indexes to be selected"
+    info: null
+    required: false
+    direction: "input"
+    multiple: true
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--keep_batch_categories"
+    description: "Categories indexes to be selected"
+    info: null
+    required: false
+    direction: "input"
+    multiple: true
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean_true"
+    name: "--even"
+    description: "Subsample evenly from different batches"
+    info: null
+    direction: "input"
+    dest: "par"
+  - type: "integer"
+    name: "--seed"
+    description: "A seed for the subsampling."
+    info: null
+    example:
+    - 123
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Subsample an h5ad file"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "python_script"
+    path: "test_script.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/common/pancreas"
+  info:
+    type: "dataset_processor"
+    type_info:
+      label: "Subset"
+      summary: "Sample cells and genes randomly."
+      description: "This component subsets the layers, obs and var to create smaller\
+        \ test datasets."
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  test_setup:
+  - type: "python"
+    user: false
+    packages:
+    - "viashpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/subsample"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/subsample/subsample"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/processors/subsample/main.nf b/target/nextflow/datasets/processors/subsample/main.nf
new file mode 100644
index 0000000000..076e623102
--- /dev/null
+++ b/target/nextflow/datasets/processors/subsample/main.nf
@@ -0,0 +1,4879 @@
+// subsample 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "subsample",
+    "namespace" : "datasets/processors",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Common dataset",
+          "summary" : "A dataset processed by the common dataset processing pipeline.",
+          "description" : "This dataset contains both raw counts and normalized data matrices,\nas well as a PCA embedding, HVG selection and a kNN graph.",
+          "slots" : {
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "pca_variance",
+                "description" : "The PCA variance objects.",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "varm" : [
+              {
+                "type" : "double",
+                "name" : "pca_loadings",
+                "description" : "The PCA loadings matrix.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_mod2",
+        "info" : {
+          "label" : "Common dataset",
+          "summary" : "A dataset processed by the common dataset processing pipeline.",
+          "description" : "This dataset contains both raw counts and normalized data matrices,\nas well as a PCA embedding, HVG selection and a kNN graph.",
+          "slots" : {
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "pca_variance",
+                "description" : "The PCA variance objects.",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "varm" : [
+              {
+                "type" : "double",
+                "name" : "pca_loadings",
+                "description" : "The PCA loadings matrix.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Common dataset",
+          "summary" : "A dataset processed by the common dataset processing pipeline.",
+          "description" : "This dataset contains both raw counts and normalized data matrices,\nas well as a PCA embedding, HVG selection and a kNN graph.",
+          "slots" : {
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "pca_variance",
+                "description" : "The PCA variance objects.",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "varm" : [
+              {
+                "type" : "double",
+                "name" : "pca_loadings",
+                "description" : "The PCA loadings matrix.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_mod2",
+        "info" : {
+          "label" : "Common dataset",
+          "summary" : "A dataset processed by the common dataset processing pipeline.",
+          "description" : "This dataset contains both raw counts and normalized data matrices,\nas well as a PCA embedding, HVG selection and a kNN graph.",
+          "slots" : {
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "pca_variance",
+                "description" : "The PCA variance objects.",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "varm" : [
+              {
+                "type" : "double",
+                "name" : "pca_loadings",
+                "description" : "The PCA loadings matrix.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_obs",
+        "description" : "Maximum number of observations to be kept. It might end up being less because empty cells / genes are removed.",
+        "default" : [
+          500
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_vars",
+        "description" : "Maximum number of variables to be kept. It might end up being less because empty cells / genes are removed.",
+        "default" : [
+          500
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--keep_features",
+        "description" : "A list of genes to keep.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : true,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--keep_cell_type_categories",
+        "description" : "Cell type indexes to be selected",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : true,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--keep_batch_categories",
+        "description" : "Categories indexes to be selected",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : true,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "boolean_true",
+        "name" : "--even",
+        "description" : "Subsample evenly from different batches",
+        "direction" : "input",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--seed",
+        "description" : "A seed for the subsampling.",
+        "example" : [
+          123
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/"
+      }
+    ],
+    "description" : "Subsample an h5ad file",
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/common/pancreas",
+        "dest" : "resources_test/common/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "test_script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/common/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "dataset_processor",
+      "type_info" : {
+        "label" : "Subset",
+        "summary" : "Sample cells and genes randomly.",
+        "description" : "This component subsets the layers, obs and var to create smaller test datasets."
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "test_setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "viashpy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/subsample",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import scanpy as sc
+import random
+import numpy as np
+
+### VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_obs': $( if [ ! -z ${VIASH_PAR_N_OBS+x} ]; then echo "int(r'${VIASH_PAR_N_OBS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_vars': $( if [ ! -z ${VIASH_PAR_N_VARS+x} ]; then echo "int(r'${VIASH_PAR_N_VARS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'keep_features': $( if [ ! -z ${VIASH_PAR_KEEP_FEATURES+x} ]; then echo "r'${VIASH_PAR_KEEP_FEATURES//\\'/\\'\\"\\'\\"r\\'}'.split(':')"; else echo None; fi ),
+  'keep_cell_type_categories': $( if [ ! -z ${VIASH_PAR_KEEP_CELL_TYPE_CATEGORIES+x} ]; then echo "r'${VIASH_PAR_KEEP_CELL_TYPE_CATEGORIES//\\'/\\'\\"\\'\\"r\\'}'.split(':')"; else echo None; fi ),
+  'keep_batch_categories': $( if [ ! -z ${VIASH_PAR_KEEP_BATCH_CATEGORIES+x} ]; then echo "r'${VIASH_PAR_KEEP_BATCH_CATEGORIES//\\'/\\'\\"\\'\\"r\\'}'.split(':')"; else echo None; fi ),
+  'even': $( if [ ! -z ${VIASH_PAR_EVEN+x} ]; then echo "r'${VIASH_PAR_EVEN//\\'/\\'\\"\\'\\"r\\'}'.lower() == 'true'"; else echo None; fi ),
+  'seed': $( if [ ! -z ${VIASH_PAR_SEED+x} ]; then echo "int(r'${VIASH_PAR_SEED//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+### VIASH END
+
+if par["seed"]:
+    print(f">> Setting seed to {par['seed']}", flush=True)
+    random.seed(par["seed"])
+
+print(">> Load data", flush=True)
+adata_input = sc.read_h5ad(par["input"])
+
+if par["input_mod2"] is not None:
+    adata_mod2 = sc.read_h5ad(par["input_mod2"])
+
+# copy counts to .X because otherwise filter_genes and filter_cells won't work
+adata_input.X = adata_input.layers["counts"]
+if par["input_mod2"] is not None:
+    adata_mod2.X = adata_mod2.layers["counts"]
+
+print(">> Determining output shape", flush=True)
+min_obs_list = [par["n_obs"], adata_input.shape[0]]
+if par["input_mod2"] is not None:
+    min_obs_list.append(adata_mod2.shape[0])
+n_obs = min(min_obs_list)
+
+min_vars_list = [par["n_vars"], adata_input.shape[1]]
+if par["input_mod2"] is not None:
+    min_vars_list.append(adata_mod2.shape[1])
+n_vars = min(min_vars_list)
+
+print(">> Subsampling the observations", flush=True)
+obs_filt = np.ones(dtype=np.bool_, shape=adata_input.n_obs)
+
+# subset by cell_type
+if par.get("keep_cell_type_categories"):
+    print(f">> Selecting cell_type_categories {par['keep_cell_type_categories']}")
+    obs_filt = obs_filt & adata_input.obs["cell_type"].isin(par["keep_cell_type_categories"])
+
+# subset by batch
+if par.get("keep_batch_categories"):
+    print(f">> Selecting cell_type_categories {par['keep_batch_categories']}")
+    obs_filt = obs_filt & adata_input.obs["batch"].isin(par["keep_batch_categories"])
+
+# subsample evenly across batches or not
+if par.get("even"):
+    obs_evenly = "batch"
+    choice_ix = np.where(obs_filt)[0]
+    choice_batch = adata_input[choice_ix].obs[obs_evenly]
+    names, counts = np.unique(choice_batch, return_counts=True)
+    probs = dict(zip(names, 1 / counts / len(names)))
+    
+    choice_probs = [ probs[batch] for batch in choice_batch ]
+    obs_index = np.random.choice(choice_ix, size=n_obs, replace=False, p=choice_probs)
+else:
+    obs_index = np.random.choice(np.where(obs_filt)[0], n_obs, replace=False)
+
+# subsample obs
+adata_output = adata_input[obs_index].copy()
+if par["input_mod2"] is not None:
+    adata_output_mod2 = adata_mod2[obs_index].copy()
+
+# filter cells and genes
+if par["input_mod2"] is not None:
+    n_cells =  adata_output.X.sum(axis=1).A.flatten()
+    n_cells_mod2 =  adata_output_mod2.X.sum(axis=1).A.flatten()
+    keep_cells = np.minimum(n_cells, n_cells_mod2) > 1
+    adata_output = adata_output[keep_cells, :].copy()
+    adata_output_mod2 = adata_output_mod2[keep_cells, :].copy()
+
+    sc.pp.filter_genes(adata_output, min_cells=1)
+    sc.pp.filter_genes(adata_output_mod2, min_cells=1)
+    
+else:
+    # todo: this should not remove features in keep_features!
+    print(">> Remove empty observations and features", flush=True)
+    sc.pp.filter_genes(adata_output, min_cells=1)
+    sc.pp.filter_cells(adata_output, min_counts=2)
+
+print(">> Subsampling the features", flush=True)
+if par.get("keep_features"):
+    initial_filt = adata_output.var_names.isin(par["keep_features"])
+    initial_idx, *_ = initial_filt.nonzero()
+    remaining_idx, *_ = (~initial_filt).nonzero()
+    rest_idx = remaining_idx[np.random.choice(len(remaining_idx), n_vars - len(initial_idx), replace=False)]
+    var_ix = np.concatenate([initial_idx, rest_idx])
+else:
+    var_ix = np.random.choice(adata_output.shape[1], n_vars, replace=False)
+    if par["input_mod2"] is not None:
+        var_ix_mod2 = np.random.choice(adata_output_mod2.shape[1], n_vars, replace=False)
+
+#  subsample vars
+adata_output = adata_output[:, var_ix].copy()
+if par["input_mod2"] is not None:
+    adata_output_mod2 = adata_output_mod2[:, var_ix_mod2].copy()
+
+# filter cells and genes
+if par["input_mod2"] is not None:
+    n_cells =  adata_output.X.sum(axis=1).A.flatten()
+    n_cells_mod2 =  adata_output_mod2.X.sum(axis=1).A.flatten()
+    keep_cells = np.minimum(n_cells, n_cells_mod2) > 1
+    adata_output = adata_output[keep_cells, :].copy()
+    adata_output_mod2 = adata_output_mod2[keep_cells, :].copy()
+
+    sc.pp.filter_genes(adata_output, min_cells=1)
+    sc.pp.filter_genes(adata_output_mod2, min_cells=1)
+    
+
+else:
+    # todo: this should not remove features in keep_features!
+    print(">> Remove empty observations and features", flush=True)
+    sc.pp.filter_genes(adata_output, min_cells=1)
+    sc.pp.filter_cells(adata_output, min_counts=2)
+
+print(">> Update dataset_id", flush=True)
+adata_output.uns["dataset_id"] = adata_output.uns["dataset_id"] + "_subsample"
+if par["input_mod2"] is not None:
+    adata_output_mod2.uns["dataset_id"] = adata_output_mod2.uns["dataset_id"] + "_subsample"
+
+# remove previously copied .X
+del adata_output.X
+if par["input_mod2"] is not None:
+    del adata_output_mod2.X
+
+print(">> Writing data", flush=True)
+adata_output.write_h5ad(par["output"])
+if par["output_mod2"] is not None:
+    adata_output_mod2.write_h5ad(par["output_mod2"])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/processors/subsample",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/processors/subsample/nextflow.config b/target/nextflow/datasets/processors/subsample/nextflow.config
new file mode 100644
index 0000000000..15ffe7fa0a
--- /dev/null
+++ b/target/nextflow/datasets/processors/subsample/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/processors/subsample'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Subsample an h5ad file'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/processors/svd/.config.vsh.yaml b/target/nextflow/datasets/processors/svd/.config.vsh.yaml
new file mode 100644
index 0000000000..61d3cee41f
--- /dev/null
+++ b/target/nextflow/datasets/processors/svd/.config.vsh.yaml
@@ -0,0 +1,1014 @@
+functionality:
+  name: "svd"
+  namespace: "datasets/processors"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Normalized dataset"
+      summary: "A normalized dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/normalized.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Normalized dataset"
+      summary: "A normalized dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/normalized.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--input_layer"
+    description: "Which layer to use as input."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Dataset+SVD"
+      summary: "A normalised dataset with a SVD embedding"
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD embedding."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/svd.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod2"
+    info:
+      label: "Dataset+SVD"
+      summary: "A normalised dataset with a SVD embedding"
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD embedding."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+      description: "This dataset contains raw counts and metadata as output by a dataset\
+        \ loader.\n\nThe format of this file is derived from the [CELLxGENE schema\
+        \ v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+    example:
+    - "resources_test/common/pancreas/svd.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obsm_embedding"
+    description: "In which .obsm slot to store the resulting embedding."
+    info: null
+    default:
+    - "X_svd"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--num_components"
+    description: "Number of principal components to compute. Defaults to 100, or 1\
+      \ - minimum dimension size of selected representation."
+    info: null
+    default:
+    - 100
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Compute SVD pca reduction"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_processor"
+    type_info:
+      label: "SVD"
+      summary: "Computes a SVD PCA embedding of the normalized data.\n"
+      description: "The resulting AnnData will contain an embedding in obsm."
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/svd/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/svd"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/svd/svd"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/processors/svd/main.nf b/target/nextflow/datasets/processors/svd/main.nf
new file mode 100644
index 0000000000..86d53243db
--- /dev/null
+++ b/target/nextflow/datasets/processors/svd/main.nf
@@ -0,0 +1,4533 @@
+// svd 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "svd",
+    "namespace" : "datasets/processors",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Normalized dataset",
+          "summary" : "A normalized dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              }
+            ]
+          },
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        },
+        "example" : [
+          "resources_test/common/pancreas/normalized.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_mod2",
+        "info" : {
+          "label" : "Normalized dataset",
+          "summary" : "A normalized dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              }
+            ]
+          },
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        },
+        "example" : [
+          "resources_test/common/pancreas/normalized.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--input_layer",
+        "description" : "Which layer to use as input.",
+        "default" : [
+          "normalized"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Dataset+SVD",
+          "summary" : "A normalised dataset with a SVD embedding",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD embedding.",
+                "required" : true
+              }
+            ],
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              }
+            ]
+          },
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        },
+        "example" : [
+          "resources_test/common/pancreas/svd.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_mod2",
+        "info" : {
+          "label" : "Dataset+SVD",
+          "summary" : "A normalised dataset with a SVD embedding",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD embedding.",
+                "required" : true
+              }
+            ],
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              }
+            ]
+          },
+          "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        },
+        "example" : [
+          "resources_test/common/pancreas/svd.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--obsm_embedding",
+        "description" : "In which .obsm slot to store the resulting embedding.",
+        "default" : [
+          "X_svd"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--num_components",
+        "description" : "Number of principal components to compute. Defaults to 100, or 1 - minimum dimension size of selected representation.",
+        "default" : [
+          100
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/processors/svd/"
+      }
+    ],
+    "description" : "Compute SVD pca reduction",
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/common/pancreas",
+        "dest" : "resources_test/common/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "dataset_processor",
+      "type_info" : {
+        "label" : "SVD",
+        "summary" : "Computes a SVD PCA embedding of the normalized data.\n",
+        "description" : "The resulting AnnData will contain an embedding in obsm."
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scikit-learn"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/svd/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/svd",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import sklearn.decomposition
+
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_layer': $( if [ ! -z ${VIASH_PAR_INPUT_LAYER+x} ]; then echo "r'${VIASH_PAR_INPUT_LAYER//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'obsm_embedding': $( if [ ! -z ${VIASH_PAR_OBSM_EMBEDDING+x} ]; then echo "r'${VIASH_PAR_OBSM_EMBEDDING//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'num_components': $( if [ ! -z ${VIASH_PAR_NUM_COMPONENTS+x} ]; then echo "int(r'${VIASH_PAR_NUM_COMPONENTS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print(">> Load data", flush=True)
+adata = ad.read(par["input"])
+if par["input_mod2"] is not None:
+    adata2 = ad.read(par["input_mod2"])
+
+print(">> check parameters", flush=True)
+min_list = [par["num_components"], min(adata.layers[par["input_layer"]].shape) - 1]
+
+if par["input_mod2"] is not None:
+    min_list.append(min(adata2.layers[par["input_layer"]].shape) - 1)
+
+n_svd = min(min_list)
+
+
+print(">> Run SVD", flush=True)
+svd1 = sklearn.decomposition.TruncatedSVD(n_svd).fit_transform(adata.layers[par["input_layer"]])
+if par["input_mod2"] is not None:
+    svd2 = sklearn.decomposition.TruncatedSVD(n_svd).fit_transform(adata2.layers[par["input_layer"]])
+
+print(">> Storing output", flush=True)
+adata.obsm[par["obsm_embedding"]] = svd1
+if par["input_mod2"] is not None:
+    adata2.obsm[par["obsm_embedding"]] = svd2
+
+
+print(">> Writing data", flush=True)
+adata.write_h5ad(par["output"])
+if par["input_mod2"] is not None:
+    adata2.write_h5ad(par["output_mod2"])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/datasets/processors/svd",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/processors/svd/nextflow.config b/target/nextflow/datasets/processors/svd/nextflow.config
new file mode 100644
index 0000000000..ed850a5a55
--- /dev/null
+++ b/target/nextflow/datasets/processors/svd/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/processors/svd'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Compute SVD pca reduction'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/workflows/extract_dataset_info/.config.vsh.yaml b/target/nextflow/datasets/workflows/extract_dataset_info/.config.vsh.yaml
new file mode 100644
index 0000000000..08fd50e8e3
--- /dev/null
+++ b/target/nextflow/datasets/workflows/extract_dataset_info/.config.vsh.yaml
@@ -0,0 +1,306 @@
+functionality:
+  name: "extract_dataset_info"
+  namespace: "datasets/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input"
+      info:
+        label: "Raw dataset"
+        summary: "An unprocessed dataset as output by a dataset loader."
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+      example:
+      - "resources_test/common/pancreas/raw.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Filter arguments"
+    arguments:
+    - type: "string"
+      name: "--filter_normalization_id"
+      description: "If defined, only the normalization with this ID will be included\
+        \ in the output."
+      info: null
+      example:
+      - "log_cp10k"
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output"
+      info: null
+      example:
+      - "dataset_uns.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/extract_metadata"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+    configInfo:
+      functionalityName: "extract_metadata"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/extract_metadata/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/workflows/extract_dataset_info/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/extract_dataset_info"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/extract_dataset_info/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/workflows/extract_dataset_info/main.nf b/target/nextflow/datasets/workflows/extract_dataset_info/main.nf
new file mode 100644
index 0000000000..a1b95eef7f
--- /dev/null
+++ b/target/nextflow/datasets/workflows/extract_dataset_info/main.nf
@@ -0,0 +1,3348 @@
+// extract_dataset_info 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "extract_dataset_info",
+    "namespace" : "datasets/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input",
+            "info" : {
+              "label" : "Raw dataset",
+              "summary" : "An unprocessed dataset as output by a dataset loader.",
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/raw.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Filter arguments",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--filter_normalization_id",
+            "description" : "If defined, only the normalization with this ID will be included in the output.",
+            "example" : [
+              "log_cp10k"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output",
+            "example" : [
+              "dataset_uns.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/workflows/extract_dataset_info/",
+        "entrypoint" : "run_wf"
+      }
+    ],
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/extract_metadata",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "extract_metadata",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/extract_metadata/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/workflows/extract_dataset_info/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/extract_dataset_info",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { extract_metadata } from "${meta.resources_dir}/../../../../nextflow/common/extract_metadata/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    // extract the dataset metadata
+    | extract_metadata.run(
+      fromState: [input: "input"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    // only keep one of the normalization methods
+    | filter{ id, state ->
+      if (state.filter_normalization_id) {
+        state.filter_normalization_id.contains(state.dataset_uns.normalization_id)
+      } else {
+        true
+      }
+    }
+
+    | joinStates { ids, states ->
+      // remove normalization id
+      // TODO: make this optional through a parameter?
+      def dataset_uns = states.collect{state ->
+        def uns = state.dataset_uns.clone()
+        uns.remove("normalization_id")
+        uns
+      }
+
+      // store data as yaml
+      def dataset_uns_yaml_blob = toYamlBlob(dataset_uns)
+      def dataset_uns_file = tempFile("dataset_uns.yaml")
+      dataset_uns_file.write(dataset_uns_yaml_blob)
+
+      def new_state = [
+        output: dataset_uns_file, 
+        _meta: [join_id: ids[0]]
+      ]
+      ["output", new_state]
+    }
+
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/workflows/extract_dataset_info/nextflow.config b/target/nextflow/datasets/workflows/extract_dataset_info/nextflow.config
new file mode 100644
index 0000000000..f07e8529bf
--- /dev/null
+++ b/target/nextflow/datasets/workflows/extract_dataset_info/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'datasets/workflows/extract_dataset_info'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/workflows/extract_dataset_meta/.config.vsh.yaml b/target/nextflow/datasets/workflows/extract_dataset_meta/.config.vsh.yaml
new file mode 100644
index 0000000000..bd9410731d
--- /dev/null
+++ b/target/nextflow/datasets/workflows/extract_dataset_meta/.config.vsh.yaml
@@ -0,0 +1,292 @@
+functionality:
+  name: "extract_dataset_meta"
+  namespace: "datasets/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input"
+      info:
+        label: "Raw dataset"
+        summary: "An unprocessed dataset as output by a dataset loader."
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+      example:
+      - "resources_test/common/pancreas/raw.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output"
+      info: null
+      example:
+      - "meta.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/extract_metadata"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+    configInfo:
+      functionalityName: "extract_metadata"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/extract_metadata/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/workflows/extract_dataset_meta/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/extract_dataset_meta"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/extract_dataset_meta/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/workflows/extract_dataset_meta/main.nf b/target/nextflow/datasets/workflows/extract_dataset_meta/main.nf
new file mode 100644
index 0000000000..0d871a7e7f
--- /dev/null
+++ b/target/nextflow/datasets/workflows/extract_dataset_meta/main.nf
@@ -0,0 +1,3292 @@
+// extract_dataset_meta 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "extract_dataset_meta",
+    "namespace" : "datasets/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input",
+            "info" : {
+              "label" : "Raw dataset",
+              "summary" : "An unprocessed dataset as output by a dataset loader.",
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/raw.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output",
+            "example" : [
+              "meta.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/workflows/extract_dataset_meta/",
+        "entrypoint" : "run_wf"
+      }
+    ],
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/extract_metadata",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "extract_metadata",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/extract_metadata/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/workflows/extract_dataset_meta/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/extract_dataset_meta",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { extract_metadata } from "${meta.resources_dir}/../../../../nextflow/common/extract_metadata/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    // extract the dataset metadata
+    | extract_metadata.run(
+      fromState: [input: "input"],
+      toState: [output: "output"]
+    )
+
+    | setState([
+      "output",
+    ])
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/workflows/extract_dataset_meta/nextflow.config b/target/nextflow/datasets/workflows/extract_dataset_meta/nextflow.config
new file mode 100644
index 0000000000..1a4273c755
--- /dev/null
+++ b/target/nextflow/datasets/workflows/extract_dataset_meta/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'datasets/workflows/extract_dataset_meta'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/workflows/process_cellxgene_census/.config.vsh.yaml b/target/nextflow/datasets/workflows/process_cellxgene_census/.config.vsh.yaml
new file mode 100644
index 0000000000..8f815f2153
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_cellxgene_census/.config.vsh.yaml
@@ -0,0 +1,1960 @@
+functionality:
+  name: "process_cellxgene_census"
+  namespace: "datasets/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Input database"
+    description: "Open CellxGene Census by version or URI."
+    arguments:
+    - type: "string"
+      name: "--input_uri"
+      description: "If specified, a URI containing the Census SOMA objects. If specified,\
+        \ will take precedence over the `--census_version` argument."
+      info: null
+      example:
+      - "s3://bucket/path"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--census_version"
+      description: "Which release of CellxGene census to use. Possible values are\
+        \ \"latest\", \"stable\", or the date of one of the releases (e.g. \"2023-07-25\"\
+        ). For more information, check the documentation on [Census data releases](https://chanzuckerberg.github.io/cellxgene-census/cellxgene_census_docsite_data_release_info.html)."
+      info: null
+      example:
+      - "stable"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Cell query"
+    description: "Arguments related to the query."
+    arguments:
+    - type: "string"
+      name: "--species"
+      description: "The organism to query, usually one of `Homo sapiens` or `Mus musculus`."
+      info: null
+      default:
+      - "homo_sapiens"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_value_filter"
+      description: "Filter for selecting the `obs` metadata (i.e. cells). Value is\
+        \ a filter query written in the SOMA `value_filter` syntax."
+      info: null
+      example:
+      - "is_primary_data == True and cell_type_ontology_term_id in ['CL:0000136',\
+        \ 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Cell filter"
+    description: "Filter the cells based on a minimum cell count per specified group"
+    arguments:
+    - type: "string"
+      name: "--cell_filter_grouping"
+      description: "A subset of 'obs' columns by which to group the cells for filtering.\n\
+        Only groups surpassing or equal to the `--cell_filter_minimum_count`\nthreshold\
+        \ will be retained. Take care not to introduce a selection\nbias against cells\
+        \ with more fine-grained ontology annotations.\n"
+      info: null
+      example:
+      - "dataset_id"
+      - "tissue"
+      - "assay"
+      - "disease"
+      - "cell_type"
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+    - type: "double"
+      name: "--cell_filter_minimum_count"
+      description: "A minimum number of cells per group to retain. If `--cell_filter_grouping`\n\
+        is defined, this parameter should also be provided and vice versa.\n"
+      info: null
+      example:
+      - 100.0
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Cell metadata"
+    description: "Cell metadata arguments"
+    arguments:
+    - type: "string"
+      name: "--obs_batch"
+      description: "Location of where to find the observation batch IDs.  \n\n* If\
+        \ not specified, the `.obs[\"batch\"]` field will not be included.\n* If one\
+        \ or more values are specified, the `.obs[\"batch\"]` field will be \n  set\
+        \ to the concatenated values of the specified fields, separated by\n  the\
+        \ `obs_batch_separator`.\n"
+      info: null
+      example:
+      - "batch"
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ","
+      dest: "par"
+    - type: "string"
+      name: "--obs_batch_separator"
+      description: "Separator to use when concatenating the values of the `--obs_batch`\
+        \ fields."
+      info: null
+      default:
+      - "+"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Dataset metadata"
+    description: "Information about the dataset that will be stored in the `.uns`\
+      \ slot."
+    arguments:
+    - type: "string"
+      name: "--id"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Sampling options"
+    arguments:
+    - type: "boolean"
+      name: "--do_subsample"
+      description: "Whether or not to subsample the dataset"
+      info: null
+      default:
+      - false
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--n_obs"
+      description: "Maximum number of observations to be kept. It might end up being\
+        \ less because empty cells / genes are removed."
+      info: null
+      default:
+      - 500
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--n_vars"
+      description: "Maximum number of variables to be kept. It might end up being\
+        \ less because empty cells / genes are removed."
+      info: null
+      default:
+      - 500
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--keep_features"
+      description: "A list of genes to keep."
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--keep_cell_type_categories"
+      description: "Categories indexes to be selected"
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--keep_batch_categories"
+      description: "Categories indexes to be selected"
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean_true"
+      name: "--even"
+      description: "Subsample evenly from different batches"
+      info: null
+      direction: "input"
+      dest: "par"
+    - type: "integer"
+      name: "--seed"
+      description: "A seed for the subsampling."
+      info: null
+      example:
+      - 123
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Normalization"
+    arguments:
+    - type: "string"
+      name: "--normalization_methods"
+      description: "Which normalization methods to run."
+      info: null
+      default:
+      - "log_cp10k"
+      - "log_cpm"
+      - "sqrt_cp10k"
+      - "sqrt_cpm"
+      - "l1_sqrt"
+      required: false
+      choices:
+      - "log_cp10k"
+      - "log_cpm"
+      - "sqrt_cp10k"
+      - "sqrt_cpm"
+      - "l1_sqrt"
+      - "log_scran_pooling"
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_dataset"
+      info:
+        label: "Common dataset"
+        summary: "A dataset processed by the common dataset processing pipeline."
+        description: "This dataset contains both raw counts and normalized data matrices,\n\
+          as well as a PCA embedding, HVG selection and a kNN graph."
+        slots:
+          obsp:
+          - type: "double"
+            name: "knn_distances"
+            description: "K nearest neighbors distance matrix."
+            required: true
+          - type: "double"
+            name: "knn_connectivities"
+            description: "K nearest neighbors connectivities matrix."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+          - type: "double"
+            name: "pca_variance"
+            description: "The PCA variance objects."
+            required: true
+          - type: "object"
+            name: "knn"
+            description: "Supplementary K nearest neighbors data."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_pca"
+            description: "The resulting PCA embedding."
+            required: true
+          varm:
+          - type: "double"
+            name: "pca_loadings"
+            description: "The PCA loadings matrix."
+            required: true
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A score for the feature indicating how highly variable it\
+              \ is."
+            required: true
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalised expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors created by the normalisation method, if\
+              \ any."
+            required: false
+      example:
+      - "resources_test/common/pancreas/dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_meta"
+      description: "Dataset metadata"
+      info: null
+      default:
+      - "dataset_metadata.yaml"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_raw"
+      info:
+        label: "Raw dataset"
+        summary: "An unprocessed dataset as output by a dataset loader."
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+      example:
+      - "resources_test/common/pancreas/raw.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_normalized"
+      info:
+        label: "Normalized dataset"
+        summary: "A normalized dataset"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalised expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors created by the normalisation method, if\
+              \ any."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      example:
+      - "resources_test/common/pancreas/normalized.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_pca"
+      info:
+        label: "Dataset+HVG+PCA"
+        summary: "A normalised dataset with a PCA embedding"
+        slots:
+          obsm:
+          - type: "double"
+            name: "X_pca"
+            description: "The resulting PCA embedding."
+            required: true
+          varm:
+          - type: "double"
+            name: "pca_loadings"
+            description: "The PCA loadings matrix."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+          - type: "double"
+            name: "pca_variance"
+            description: "The PCA variance objects."
+            required: true
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A score for the feature indicating how highly variable it\
+              \ is."
+            required: true
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalised expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors created by the normalisation method, if\
+              \ any."
+            required: false
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      example:
+      - "resources_test/common/pancreas/pca.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_hvg"
+      info:
+        label: "Dataset+HVG"
+        summary: "A normalised dataset with a PCA embedding and HVG selection."
+        slots:
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A score for the feature indicating how highly variable it\
+              \ is."
+            required: true
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalised expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors created by the normalisation method, if\
+              \ any."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      example:
+      - "resources_test/common/pancreas/hvg.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_knn"
+      info:
+        label: "Dataset+HVG+PCA+kNN"
+        summary: "A normalised data with a PCA embedding, HVG selection and a kNN\
+          \ graph"
+        slots:
+          obsp:
+          - type: "double"
+            name: "knn_distances"
+            description: "K nearest neighbors distance matrix."
+            required: true
+          - type: "double"
+            name: "knn_connectivities"
+            description: "K nearest neighbors connectivities matrix."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+          - type: "double"
+            name: "pca_variance"
+            description: "The PCA variance objects."
+            required: true
+          - type: "object"
+            name: "knn"
+            description: "Supplementary K nearest neighbors data."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_pca"
+            description: "The resulting PCA embedding."
+            required: true
+          varm:
+          - type: "double"
+            name: "pca_loadings"
+            description: "The PCA loadings matrix."
+            required: true
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A score for the feature indicating how highly variable it\
+              \ is."
+            required: true
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalised expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors created by the normalisation method, if\
+              \ any."
+            required: false
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      example:
+      - "resources_test/common/pancreas/knn.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "src/wf_utils/helper.nf"
+  description: "Fetch and process datasets originating from the CELLxGENE census.\n"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "datasets/loaders/cellxgene_census"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/cellxgene_census/config.vsh.yaml"
+    configInfo:
+      functionalityName: "cellxgene_census"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/cellxgene_census/config.vsh.yaml"
+      functionalityNamespace: "datasets/loaders"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/loaders/cellxgene_census/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/cellxgene_census"
+  - name: "datasets/normalization/log_cp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+    configInfo:
+      functionalityName: "log_cp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/log_cp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp"
+  - name: "datasets/normalization/log_scran_pooling"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml"
+    configInfo:
+      functionalityName: "log_scran_pooling"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/log_scran_pooling/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_scran_pooling"
+  - name: "datasets/normalization/sqrt_cp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml"
+    configInfo:
+      functionalityName: "sqrt_cp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/sqrt_cp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/sqrt_cp"
+  - name: "datasets/normalization/l1_sqrt"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml"
+    configInfo:
+      functionalityName: "l1_sqrt"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/l1_sqrt/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/l1_sqrt"
+  - name: "datasets/processors/subsample"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml"
+    configInfo:
+      functionalityName: "subsample"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/subsample/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/subsample"
+  - name: "datasets/processors/pca"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/pca/config.vsh.yaml"
+    configInfo:
+      functionalityName: "pca"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/pca/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/pca/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/pca"
+  - name: "datasets/processors/hvg"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml"
+    configInfo:
+      functionalityName: "hvg"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/hvg/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/hvg"
+  - name: "datasets/processors/knn"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/knn/config.vsh.yaml"
+    configInfo:
+      functionalityName: "knn"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/knn/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/knn/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/knn"
+  - name: "common/extract_metadata"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+    configInfo:
+      functionalityName: "extract_metadata"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/extract_metadata/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_cellxgene_census/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_cellxgene_census"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_cellxgene_census/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/workflows/process_cellxgene_census/helper.nf b/target/nextflow/datasets/workflows/process_cellxgene_census/helper.nf
new file mode 100644
index 0000000000..7b3acd5b1c
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_cellxgene_census/helper.nf
@@ -0,0 +1,14 @@
+Map findArgumentSchema(Map config, String argument_id) {
+  def argument_groups =
+    (config.functionality.argument_groups ?: []) +
+    [
+      arguments: config.functionality.arguments ?: []
+    ]
+
+  def schema_value = argument_groups.findResult{ gr ->
+    gr.arguments.find { arg ->
+      arg.name == ("--" + argument_id)
+    }
+  }
+  return schema_value
+}
diff --git a/target/nextflow/datasets/workflows/process_cellxgene_census/main.nf b/target/nextflow/datasets/workflows/process_cellxgene_census/main.nf
new file mode 100644
index 0000000000..0be17e1755
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_cellxgene_census/main.nf
@@ -0,0 +1,5413 @@
+// process_cellxgene_census 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_cellxgene_census",
+    "namespace" : "datasets/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Input database",
+        "description" : "Open CellxGene Census by version or URI.",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--input_uri",
+            "description" : "If specified, a URI containing the Census SOMA objects. If specified, will take precedence over the `--census_version` argument.",
+            "example" : [
+              "s3://bucket/path"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--census_version",
+            "description" : "Which release of CellxGene census to use. Possible values are \\"latest\\", \\"stable\\", or the date of one of the releases (e.g. \\"2023-07-25\\"). For more information, check the documentation on [Census data releases](https://chanzuckerberg.github.io/cellxgene-census/cellxgene_census_docsite_data_release_info.html).",
+            "example" : [
+              "stable"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Cell query",
+        "description" : "Arguments related to the query.",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--species",
+            "description" : "The organism to query, usually one of `Homo sapiens` or `Mus musculus`.",
+            "default" : [
+              "homo_sapiens"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--obs_value_filter",
+            "description" : "Filter for selecting the `obs` metadata (i.e. cells). Value is a filter query written in the SOMA `value_filter` syntax.",
+            "example" : [
+              "is_primary_data == True and cell_type_ontology_term_id in ['CL:0000136', 'CL:1000311', 'CL:0002616'] and suspension_type == 'cell'"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Cell filter",
+        "description" : "Filter the cells based on a minimum cell count per specified group",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--cell_filter_grouping",
+            "description" : "A subset of 'obs' columns by which to group the cells for filtering.\nOnly groups surpassing or equal to the `--cell_filter_minimum_count`\nthreshold will be retained. Take care not to introduce a selection\nbias against cells with more fine-grained ontology annotations.\n",
+            "example" : [
+              "dataset_id",
+              "tissue",
+              "assay",
+              "disease",
+              "cell_type"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "double",
+            "name" : "--cell_filter_minimum_count",
+            "description" : "A minimum number of cells per group to retain. If `--cell_filter_grouping`\nis defined, this parameter should also be provided and vice versa.\n",
+            "example" : [
+              100.0
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Cell metadata",
+        "description" : "Cell metadata arguments",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--obs_batch",
+            "description" : "Location of where to find the observation batch IDs.  \n\n* If not specified, the `.obs[\\"batch\\"]` field will not be included.\n* If one or more values are specified, the `.obs[\\"batch\\"]` field will be \n  set to the concatenated values of the specified fields, separated by\n  the `obs_batch_separator`.\n",
+            "example" : [
+              "batch"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ",",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--obs_batch_separator",
+            "description" : "Separator to use when concatenating the values of the `--obs_batch` fields.",
+            "default" : [
+              "+"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Dataset metadata",
+        "description" : "Information about the dataset that will be stored in the `.uns` slot.",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--id",
+            "description" : "Nicely formatted name.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_name",
+            "description" : "Nicely formatted name.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_url",
+            "description" : "Link to the original source of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_reference",
+            "description" : "Bibtex reference of the paper in which the dataset was published.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_summary",
+            "description" : "Short description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_description",
+            "description" : "Long description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_organism",
+            "description" : "The organism of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Sampling options",
+        "arguments" : [
+          {
+            "type" : "boolean",
+            "name" : "--do_subsample",
+            "description" : "Whether or not to subsample the dataset",
+            "default" : [
+              false
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--n_obs",
+            "description" : "Maximum number of observations to be kept. It might end up being less because empty cells / genes are removed.",
+            "default" : [
+              500
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--n_vars",
+            "description" : "Maximum number of variables to be kept. It might end up being less because empty cells / genes are removed.",
+            "default" : [
+              500
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--keep_features",
+            "description" : "A list of genes to keep.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--keep_cell_type_categories",
+            "description" : "Categories indexes to be selected",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--keep_batch_categories",
+            "description" : "Categories indexes to be selected",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "boolean_true",
+            "name" : "--even",
+            "description" : "Subsample evenly from different batches",
+            "direction" : "input",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--seed",
+            "description" : "A seed for the subsampling.",
+            "example" : [
+              123
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Normalization",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--normalization_methods",
+            "description" : "Which normalization methods to run.",
+            "default" : [
+              "log_cp10k",
+              "log_cpm",
+              "sqrt_cp10k",
+              "sqrt_cpm",
+              "l1_sqrt"
+            ],
+            "required" : false,
+            "choices" : [
+              "log_cp10k",
+              "log_cpm",
+              "sqrt_cp10k",
+              "sqrt_cpm",
+              "l1_sqrt",
+              "log_scran_pooling"
+            ],
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_dataset",
+            "info" : {
+              "label" : "Common dataset",
+              "summary" : "A dataset processed by the common dataset processing pipeline.",
+              "description" : "This dataset contains both raw counts and normalized data matrices,\nas well as a PCA embedding, HVG selection and a kNN graph.",
+              "slots" : {
+                "obsp" : [
+                  {
+                    "type" : "double",
+                    "name" : "knn_distances",
+                    "description" : "K nearest neighbors distance matrix.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "knn_connectivities",
+                    "description" : "K nearest neighbors connectivities matrix.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "pca_variance",
+                    "description" : "The PCA variance objects.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "object",
+                    "name" : "knn",
+                    "description" : "Supplementary K nearest neighbors data.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_pca",
+                    "description" : "The resulting PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "varm" : [
+                  {
+                    "type" : "double",
+                    "name" : "pca_loadings",
+                    "description" : "The PCA loadings matrix.",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A score for the feature indicating how highly variable it is.",
+                    "required" : true
+                  }
+                ],
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalised expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors created by the normalisation method, if any.",
+                    "required" : false
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_meta",
+            "description" : "Dataset metadata",
+            "default" : [
+              "dataset_metadata.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_raw",
+            "info" : {
+              "label" : "Raw dataset",
+              "summary" : "An unprocessed dataset as output by a dataset loader.",
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/raw.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_normalized",
+            "info" : {
+              "label" : "Normalized dataset",
+              "summary" : "A normalized dataset",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalised expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors created by the normalisation method, if any.",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  }
+                ]
+              },
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+            },
+            "example" : [
+              "resources_test/common/pancreas/normalized.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_pca",
+            "info" : {
+              "label" : "Dataset+HVG+PCA",
+              "summary" : "A normalised dataset with a PCA embedding",
+              "slots" : {
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_pca",
+                    "description" : "The resulting PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "varm" : [
+                  {
+                    "type" : "double",
+                    "name" : "pca_loadings",
+                    "description" : "The PCA loadings matrix.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "pca_variance",
+                    "description" : "The PCA variance objects.",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A score for the feature indicating how highly variable it is.",
+                    "required" : true
+                  }
+                ],
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalised expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or sour''' + '''ce organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors created by the normalisation method, if any.",
+                    "required" : false
+                  }
+                ]
+              },
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+            },
+            "example" : [
+              "resources_test/common/pancreas/pca.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_hvg",
+            "info" : {
+              "label" : "Dataset+HVG",
+              "summary" : "A normalised dataset with a PCA embedding and HVG selection.",
+              "slots" : {
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A score for the feature indicating how highly variable it is.",
+                    "required" : true
+                  }
+                ],
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalised expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors created by the normalisation method, if any.",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              },
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+            },
+            "example" : [
+              "resources_test/common/pancreas/hvg.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_knn",
+            "info" : {
+              "label" : "Dataset+HVG+PCA+kNN",
+              "summary" : "A normalised data with a PCA embedding, HVG selection and a kNN graph",
+              "slots" : {
+                "obsp" : [
+                  {
+                    "type" : "double",
+                    "name" : "knn_distances",
+                    "description" : "K nearest neighbors distance matrix.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "knn_connectivities",
+                    "description" : "K nearest neighbors connectivities matrix.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "pca_variance",
+                    "description" : "The PCA variance objects.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "object",
+                    "name" : "knn",
+                    "description" : "Supplementary K nearest neighbors data.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_pca",
+                    "description" : "The resulting PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "varm" : [
+                  {
+                    "type" : "double",
+                    "name" : "pca_loadings",
+                    "description" : "The PCA loadings matrix.",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A score for the feature indicating how highly variable it is.",
+                    "required" : true
+                  }
+                ],
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalised expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors created by the normalisation method, if any.",
+                    "required" : false
+                  }
+                ]
+              },
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+            },
+            "example" : [
+              "resources_test/common/pancreas/knn.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_cellxgene_census/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "src/wf_utils/helper.nf",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "description" : "Fetch and process datasets originating from the CELLxGENE census.\n",
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "datasets/loaders/cellxgene_census",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/cellxgene_census/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "cellxgene_census",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/cellxgene_census/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/loaders",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/loaders/cellxgene_census/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/cellxgene_census"
+      },
+      {
+        "name" : "datasets/normalization/log_cp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "log_cp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/log_cp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp"
+      },
+      {
+        "name" : "datasets/normalization/log_scran_pooling",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "log_scran_pooling",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/log_scran_pooling/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_scran_pooling"
+      },
+      {
+        "name" : "datasets/normalization/sqrt_cp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "sqrt_cp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/sqrt_cp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/sqrt_cp"
+      },
+      {
+        "name" : "datasets/normalization/l1_sqrt",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "l1_sqrt",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/l1_sqrt/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/l1_sqrt"
+      },
+      {
+        "name" : "datasets/processors/subsample",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "subsample",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/subsample/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/subsample"
+      },
+      {
+        "name" : "datasets/processors/pca",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/pca/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "pca",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/pca/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/pca/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/pca"
+      },
+      {
+        "name" : "datasets/processors/hvg",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "hvg",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/hvg/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/hvg"
+      },
+      {
+        "name" : "datasets/processors/knn",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/knn/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "knn",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/knn/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/knn/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/knn"
+      },
+      {
+        "name" : "common/extract_metadata",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "extract_metadata",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/extract_metadata/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_cellxgene_census/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_cellxgene_census",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { cellxgene_census } from "${meta.resources_dir}/../../../../nextflow/datasets/loaders/cellxgene_census/main.nf"
+include { log_cp } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/log_cp/main.nf"
+include { log_scran_pooling } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/log_scran_pooling/main.nf"
+include { sqrt_cp } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/sqrt_cp/main.nf"
+include { l1_sqrt } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/l1_sqrt/main.nf"
+include { subsample } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/subsample/main.nf"
+include { pca } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/pca/main.nf"
+include { hvg } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/hvg/main.nf"
+include { knn } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/knn/main.nf"
+include { extract_metadata } from "${meta.resources_dir}/../../../../nextflow/common/extract_metadata/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // create different normalization methods by overriding the defaults
+  normalization_methods = [
+    log_cp.run(
+      key: "log_cp10k",
+      args: [normalization_id: "log_cp10k", n_cp: 10000],
+    ),
+    log_cp.run(
+      key: "log_cpm",
+      args: [normalization_id: "log_cpm", n_cp: 1000000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cp10k",
+      args: [normalization_id: "sqrt_cp10k", n_cp: 10000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cpm",
+      args: [normalization_id: "sqrt_cpm", n_cp: 1000000],
+    ),
+    l1_sqrt.run(
+      key: "l1_sqrt",
+      args: [normalization_id: "l1_sqrt"],
+    ),
+    log_scran_pooling.run(
+      key: "log_scran_pooling",
+      args: [normalization_id: "log_scran_pooling"],
+    )
+  ]
+
+  output_ch = input_ch
+
+    // store original id for later use
+    | map{ id, state ->
+      [id, state + [_meta: [join_id: id]]]
+    }
+
+    // fetch data from legacy openproblems
+    | cellxgene_census.run(
+      fromState: [
+        "input_uri": "input_uri",
+        "census_version": "census_version",
+        "species": "species",
+        "obs_value_filter": "obs_value_filter",
+        "cell_filter_grouping": "cell_filter_grouping",
+        "cell_filter_minimum_count": "cell_filter_minimum_count",
+        "obs_batch": "obs_batch",
+        "obs_batch_separator": "obs_batch_separator",
+        "dataset_id": "id",
+        "dataset_name": "dataset_name",
+        "dataset_url": "dataset_url",
+        "dataset_reference": "dataset_reference",
+        "dataset_summary": "dataset_summary",
+        "dataset_description": "dataset_description",
+        "dataset_organism": "dataset_organism",
+      ],
+      toState: ["output_raw": "output"]
+    )
+    
+    // subsample if so desired
+    | subsample.run(
+      runIf: { id, state -> state.do_subsample },
+      fromState: [
+        "input": "output_raw",
+        "n_obs": "n_obs",
+        "n_vars": "n_vars",
+        "keep_features": "keep_features",
+        "keep_cell_type_categories": "keep_cell_type_categories",
+        "keep_batch_categories": "keep_batch_categories",
+        "even": "even",
+        "seed": "seed"
+      ],
+      args: [output_mod2: null],
+      toState: ["output_raw": "output"]
+    )
+
+    | runEach(
+      components: normalization_methods,
+      id: { id, state, comp ->
+        if (state.normalization_methods.size() > 1) {
+          id + "/" + comp.name
+        } else {
+          id
+        }
+      },
+      filter: { id, state, comp ->
+        comp.name in state.normalization_methods
+      },
+      fromState: ["input": "output_raw"],
+      toState: { id, output, state, comp ->
+        state + [
+          output_normalized: output.output,
+          normalization_id: comp.name
+        ]
+      }
+    )
+
+    | hvg.run(
+      fromState: ["input": "output_normalized"],
+      toState: ["output_hvg": "output"]
+    )
+
+    | pca.run(
+      fromState: ["input": "output_hvg"],
+      toState: ["output_pca": "output" ]
+    )
+
+    | knn.run(
+      fromState: ["input": "output_pca"],
+      toState: ["output_knn": "output"]
+    )
+
+    // add synonym
+    | map{ id, state ->
+      [id, state + [output_dataset: state.output_knn]]
+    }
+
+    | extract_metadata.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_dataset")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_dataset,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta": "output"]
+    )
+
+    // only output the files for which an output file was specified
+    | setState([
+      "output_dataset",
+      "output_meta",
+      "output_raw",
+      "output_normalized",
+      "output_pca",
+      "output_hvg",
+      "output_knn",
+      "_meta"
+    ])
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/workflows/process_cellxgene_census/nextflow.config b/target/nextflow/datasets/workflows/process_cellxgene_census/nextflow.config
new file mode 100644
index 0000000000..bcda789ebb
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_cellxgene_census/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/workflows/process_cellxgene_census'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Fetch and process datasets originating from the CELLxGENE census.\n'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/workflows/process_openproblems_neurips2021_bmmc/.config.vsh.yaml b/target/nextflow/datasets/workflows/process_openproblems_neurips2021_bmmc/.config.vsh.yaml
new file mode 100644
index 0000000000..5fa94f00f0
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_openproblems_neurips2021_bmmc/.config.vsh.yaml
@@ -0,0 +1,963 @@
+functionality:
+  name: "process_openproblems_neurips2021_bmmc"
+  namespace: "datasets/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "string"
+      name: "--id"
+      description: "The ID of the dataset"
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input"
+      description: "Path to the input dataset"
+      info: null
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--mod1"
+      description: "Name of the first modality."
+      info: null
+      example:
+      - "GEX"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--mod2"
+      description: "Name of the second modality."
+      info: null
+      example:
+      - "ADT"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Sampling options"
+    arguments:
+    - type: "boolean"
+      name: "--do_subsample"
+      description: "Whether or not to subsample the dataset"
+      info: null
+      default:
+      - false
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--n_obs"
+      description: "Maximum number of observations to be kept. It might end up being\
+        \ less because empty cells / genes are removed."
+      info: null
+      default:
+      - 500
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--n_vars"
+      description: "Maximum number of variables to be kept. It might end up being\
+        \ less because empty cells / genes are removed."
+      info: null
+      default:
+      - 500
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--keep_features"
+      description: "A list of genes to keep."
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--keep_cell_type_categories"
+      description: "Categories indexes to be selected"
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--keep_batch_categories"
+      description: "Categories indexes to be selected"
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean_true"
+      name: "--even"
+      description: "Subsample evenly from different batches"
+      info: null
+      direction: "input"
+      dest: "par"
+    - type: "integer"
+      name: "--seed"
+      description: "A seed for the subsampling."
+      info: null
+      example:
+      - 123
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Normalization"
+    arguments:
+    - type: "string"
+      name: "--normalization_methods"
+      description: "Which normalization methods to run."
+      info: null
+      default:
+      - "log_cp10k"
+      - "log_cpm"
+      - "sqrt_cp10k"
+      - "sqrt_cpm"
+      - "l1_sqrt"
+      required: false
+      choices:
+      - "log_cp10k"
+      - "log_cpm"
+      - "sqrt_cp10k"
+      - "sqrt_cpm"
+      - "l1_sqrt"
+      - "log_scran_pooling"
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_mod1"
+      info:
+        label: "Common dataset"
+        summary: "A dataset processed by the common dataset processing pipeline."
+        description: "This dataset contains both raw counts and normalized data matrices,\n\
+          as well as a SVD embedding and a HVG selection.\n\nThe format of this file\
+          \ is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalised expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors created by the normalisation method, if\
+              \ any."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_svd"
+            description: "The resulting SVD embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/common/pancreas/dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_mod2"
+      info:
+        label: "Common dataset"
+        summary: "A dataset processed by the common dataset processing pipeline."
+        description: "This dataset contains both raw counts and normalized data matrices,\n\
+          as well as a SVD embedding and a HVG selection.\n\nThe format of this file\
+          \ is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalised expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors created by the normalisation method, if\
+              \ any."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_svd"
+            description: "The resulting SVD embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/common/pancreas/dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_meta_mod1"
+      description: "Dataset metadata"
+      info: null
+      example:
+      - "dataset_metadata_mod1.yaml"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_meta_mod2"
+      description: "Dataset metadata"
+      info: null
+      example:
+      - "dataset_metadata_mod2.yaml"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "src/wf_utils/helper.nf"
+  description: "Fetch and process Neurips 2021 multimodal datasets\n"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "datasets/loaders/openproblems_neurips2021_bmmc"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_neurips2021_bmmc/config.vsh.yaml"
+    configInfo:
+      functionalityName: "openproblems_neurips2021_bmmc"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_neurips2021_bmmc/config.vsh.yaml"
+      functionalityNamespace: "datasets/loaders"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/loaders/openproblems_neurips2021_bmmc/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_neurips2021_bmmc"
+  - name: "datasets/normalization/log_cp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+    configInfo:
+      functionalityName: "log_cp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/log_cp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp"
+  - name: "datasets/normalization/log_scran_pooling"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml"
+    configInfo:
+      functionalityName: "log_scran_pooling"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/log_scran_pooling/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_scran_pooling"
+  - name: "datasets/normalization/sqrt_cp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml"
+    configInfo:
+      functionalityName: "sqrt_cp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/sqrt_cp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/sqrt_cp"
+  - name: "datasets/normalization/l1_sqrt"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml"
+    configInfo:
+      functionalityName: "l1_sqrt"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/l1_sqrt/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/l1_sqrt"
+  - name: "datasets/normalization/prot_clr"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/prot_clr/config.vsh.yaml"
+    configInfo:
+      functionalityName: "prot_clr"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/prot_clr/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/prot_clr/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/prot_clr"
+  - name: "datasets/normalization/atac_tfidf"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/atac_tfidf/config.vsh.yaml"
+    configInfo:
+      functionalityName: "atac_tfidf"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/atac_tfidf/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/atac_tfidf/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/atac_tfidf"
+  - name: "datasets/processors/subsample"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml"
+    configInfo:
+      functionalityName: "subsample"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/subsample/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/subsample"
+  - name: "datasets/processors/svd"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/svd/config.vsh.yaml"
+    configInfo:
+      functionalityName: "svd"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/svd/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/svd/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/svd"
+  - name: "datasets/processors/hvg"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml"
+    configInfo:
+      functionalityName: "hvg"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/hvg/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/hvg"
+  - name: "common/extract_metadata"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+    configInfo:
+      functionalityName: "extract_metadata"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/extract_metadata/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  - name: "common/decompress_gzip"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/decompress_gzip/config.vsh.yaml"
+    configInfo:
+      functionalityName: "decompress_gzip"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/decompress_gzip/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/decompress_gzip/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/decompress_gzip"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_openproblems_neurips2021_bmmc/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_openproblems_neurips2021_bmmc"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_openproblems_neurips2021_bmmc/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/workflows/process_openproblems_neurips2021_bmmc/helper.nf b/target/nextflow/datasets/workflows/process_openproblems_neurips2021_bmmc/helper.nf
new file mode 100644
index 0000000000..7b3acd5b1c
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_openproblems_neurips2021_bmmc/helper.nf
@@ -0,0 +1,14 @@
+Map findArgumentSchema(Map config, String argument_id) {
+  def argument_groups =
+    (config.functionality.argument_groups ?: []) +
+    [
+      arguments: config.functionality.arguments ?: []
+    ]
+
+  def schema_value = argument_groups.findResult{ gr ->
+    gr.arguments.find { arg ->
+      arg.name == ("--" + argument_id)
+    }
+  }
+  return schema_value
+}
diff --git a/target/nextflow/datasets/workflows/process_openproblems_neurips2021_bmmc/main.nf b/target/nextflow/datasets/workflows/process_openproblems_neurips2021_bmmc/main.nf
new file mode 100644
index 0000000000..1bd2352779
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_openproblems_neurips2021_bmmc/main.nf
@@ -0,0 +1,4286 @@
+// process_openproblems_neurips2021_bmmc 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_openproblems_neurips2021_bmmc",
+    "namespace" : "datasets/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--id",
+            "description" : "The ID of the dataset",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input",
+            "description" : "Path to the input dataset",
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--mod1",
+            "description" : "Name of the first modality.",
+            "example" : [
+              "GEX"
+            ],
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--mod2",
+            "description" : "Name of the second modality.",
+            "example" : [
+              "ADT"
+            ],
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Metadata",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--dataset_name",
+            "description" : "Nicely formatted name.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_url",
+            "description" : "Link to the original source of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_reference",
+            "description" : "Bibtex reference of the paper in which the dataset was published.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_summary",
+            "description" : "Short description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_description",
+            "description" : "Long description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_organism",
+            "description" : "The organism of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Sampling options",
+        "arguments" : [
+          {
+            "type" : "boolean",
+            "name" : "--do_subsample",
+            "description" : "Whether or not to subsample the dataset",
+            "default" : [
+              false
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--n_obs",
+            "description" : "Maximum number of observations to be kept. It might end up being less because empty cells / genes are removed.",
+            "default" : [
+              500
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--n_vars",
+            "description" : "Maximum number of variables to be kept. It might end up being less because empty cells / genes are removed.",
+            "default" : [
+              500
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--keep_features",
+            "description" : "A list of genes to keep.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--keep_cell_type_categories",
+            "description" : "Categories indexes to be selected",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--keep_batch_categories",
+            "description" : "Categories indexes to be selected",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "boolean_true",
+            "name" : "--even",
+            "description" : "Subsample evenly from different batches",
+            "direction" : "input",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--seed",
+            "description" : "A seed for the subsampling.",
+            "example" : [
+              123
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Normalization",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--normalization_methods",
+            "description" : "Which normalization methods to run.",
+            "default" : [
+              "log_cp10k",
+              "log_cpm",
+              "sqrt_cp10k",
+              "sqrt_cpm",
+              "l1_sqrt"
+            ],
+            "required" : false,
+            "choices" : [
+              "log_cp10k",
+              "log_cpm",
+              "sqrt_cp10k",
+              "sqrt_cpm",
+              "l1_sqrt",
+              "log_scran_pooling"
+            ],
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_mod1",
+            "info" : {
+              "label" : "Common dataset",
+              "summary" : "A dataset processed by the common dataset processing pipeline.",
+              "description" : "This dataset contains both raw counts and normalized data matrices,\nas well as a SVD embedding and a HVG selection.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalised expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors created by the normalisation method, if any.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_svd",
+                    "description" : "The resulting SVD embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_mod2",
+            "info" : {
+              "label" : "Common dataset",
+              "summary" : "A dataset processed by the common dataset processing pipeline.",
+              "description" : "This dataset contains both raw counts and normalized data matrices,\nas well as a SVD embedding and a HVG selection.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalised expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors created by the normalisation method, if any.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_svd",
+                    "description" : "The resulting SVD embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_meta_mod1",
+            "description" : "Dataset metadata",
+            "example" : [
+              "dataset_metadata_mod1.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_meta_mod2",
+            "description" : "Dataset metadata",
+            "example" : [
+              "dataset_metadata_mod2.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_openproblems_neurips2021_bmmc/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "src/wf_utils/helper.nf",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "description" : "Fetch and process Neurips 2021 multimodal datasets\n",
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "datasets/loaders/openproblems_neurips2021_bmmc",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_neurips2021_bmmc/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "openproblems_neurips2021_bmmc",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_neurips2021_bmmc/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/loaders",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/loaders/openproblems_neurips2021_bmmc/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_neurips2021_bmmc"
+      },
+      {
+        "name" : "datasets/normalization/log_cp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "log_cp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/log_cp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp"
+      },
+      {
+        "name" : "datasets/normalization/log_scran_pooling",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "log_scran_pooling",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/log_scran_pooling/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_scran_pooling"
+      },
+      {
+        "name" : "datasets/normalization/sqrt_cp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "sqrt_cp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/sqrt_cp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/sqrt_cp"
+      },
+      {
+        "name" : "datasets/normalization/l1_sqrt",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "l1_sqrt",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/l1_sqrt/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/l1_sqrt"
+      },
+      {
+        "name" : "datasets/normalization/prot_clr",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/prot_clr/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "prot_clr",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/prot_clr/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/prot_clr/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/prot_clr"
+      },
+      {
+        "name" : "datasets/normalization/atac_tfidf",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/atac_tfidf/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "atac_tfidf",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/atac_tfidf/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/atac_tfidf/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/atac_tfidf"
+      },
+      {
+        "name" : "datasets/processors/subsample",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "subsample",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/subsample/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/subsample"
+      },
+      {
+        "name" : "datasets/processors/svd",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/svd/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "svd",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/svd/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/svd/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/svd"
+      },
+      {
+        "name" : "datasets/processors/hvg",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "hvg",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/hvg/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/hvg"
+      },
+      {
+        "name" : "common/extract_metadata",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "extract_metadata",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/extract_metadata/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+      },
+      {
+        "name" : "common/decompress_gzip",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/decompress_gzip/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "decompress_gzip",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/decompress_gzip/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/decompress_gzip/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/decompress_gzip"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_openproblems_neurips2021_bmmc/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_openproblems_neurips2021_bmmc",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { openproblems_neurips2021_bmmc } from "${meta.resources_dir}/../../../../nextflow/datasets/loaders/openproblems_neurips2021_bmmc/main.nf"
+include { log_cp } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/log_cp/main.nf"
+include { log_scran_pooling } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/log_scran_pooling/main.nf"
+include { sqrt_cp } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/sqrt_cp/main.nf"
+include { l1_sqrt } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/l1_sqrt/main.nf"
+include { prot_clr } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/prot_clr/main.nf"
+include { atac_tfidf } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/atac_tfidf/main.nf"
+include { subsample } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/subsample/main.nf"
+include { svd } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/svd/main.nf"
+include { hvg } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/hvg/main.nf"
+include { extract_metadata } from "${meta.resources_dir}/../../../../nextflow/common/extract_metadata/main.nf"
+include { decompress_gzip } from "${meta.resources_dir}/../../../../nextflow/common/decompress_gzip/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // create different normalization methods by overriding the defaults
+  normalization_methods = [
+    log_cp.run(
+      key: "log_cp10k",
+      args: [normalization_id: "log_cp10k", n_cp: 10000]
+    ),
+    log_cp.run(
+      key: "log_cpm",
+      args: [normalization_id: "log_cpm", n_cp: 1000000]
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cp10k",
+      args: [normalization_id: "sqrt_cp10k", n_cp: 10000]
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cpm",
+      args: [normalization_id: "sqrt_cpm", n_cp: 1000000]
+    ),
+    l1_sqrt.run(
+      key: "l1_sqrt",
+      args: [normalization_id: "l1_sqrt"]
+    ),
+    log_scran_pooling.run(
+      key: "log_scran_pooling",
+      args: [normalization_id: "log_scran_pooling"]
+    )
+  ]
+
+  output_ch = input_ch
+
+    // store original id for later use
+    | map{ id, state ->
+      [id, state + [_meta: [join_id: id]]]
+    }
+
+    | decompress_gzip.run(
+      fromState: ["input": "input"],
+      toState: ["input_decompressed": "output"]
+    )
+
+    // process neurips downloaded dataset
+    | openproblems_neurips2021_bmmc.run(
+      fromState: [
+        "input": "input_decompressed",
+        "mod1": "mod1",
+        "mod2": "mod2",
+        "dataset_id": "id",
+        "dataset_name": "dataset_name",
+        "dataset_url": "dataset_url",
+        "dataset_reference": "dataset_reference",
+        "dataset_summary": "dataset_summary",
+        "dataset_description": "dataset_description",
+        "dataset_organism": "dataset_organism"
+      ],
+      toState: [
+        "raw_mod1": "output_mod1",
+        "raw_mod2": "output_mod2"
+      ]
+    )
+
+    // subsample if need be
+    | subsample.run(
+      runIf: { id, state -> state.do_subsample },
+      fromState: [
+        "input": "raw_mod1",
+        "input_mod2": "raw_mod2",
+        "n_obs": "n_obs",
+        "n_vars": "n_vars",
+        "keep_features": "keep_features",
+        "keep_cell_type_categories": "keep_cell_type_categories",
+        "keep_batch_categories": "keep_batch_categories",
+        "even": "even",
+        "seed": "seed"
+      ],
+      toState: [
+        "raw_mod1": "output",
+        "raw_mod2": "output_mod2"
+      ]
+    )
+
+    // run mod1 normalization methods
+    | runEach(
+      components: normalization_methods,
+      id: { id, state, comp ->
+        if (state.normalization_methods.size() > 1) {
+          id + "/" + comp.name
+        } else {
+          id
+        }
+      },
+      filter: { id, state, comp ->
+        comp.name in state.normalization_methods
+      },
+      fromState: ["input": "raw_mod1"],
+      toState: { id, output, state, comp -> 
+        state + [
+          "normalization_id": comp.name,
+          "normalized_mod1": output.output
+        ]
+      }
+    )
+
+    // run normalization methods on second modality
+    // TODO: can we change this to DSB?
+    | prot_clr.run(
+      runIf: { id, state -> state.mod2 == "ADT" },
+      args: [normalization_id: "prot_clr"],
+      fromState: ["input": "raw_mod2"],
+      toState: ["normalized_mod2": "output"]
+    )
+    | atac_tfidf.run(
+      runIf: { id, state -> state.mod2 == "ATAC" },
+      args: [normalization_id: "atac_tfidf"],
+      fromState: ["input": "raw_mod2"],
+      toState: ["normalized_mod2": "output"]
+    )
+
+    | svd.run(
+      fromState: [
+        "input": "normalized_mod1",
+        "input_mod2": "normalized_mod2"
+      ],
+      toState: [
+        "svd_mod1": "output",
+        "svd_mod2": "output_mod2"
+      ]
+    )
+
+    | hvg.run(
+      fromState: [ "input": "svd_mod1" ],
+      toState: [ "hvg_mod1": "output" ]
+    )
+
+    | hvg.run(
+      key: "hvg_mod2",
+      fromState: [ "input": "svd_mod2" ],
+      toState: [ "hvg_mod2": "output" ]
+    )
+
+    // add synonyms
+    | map{ id, state ->
+      [id, state + ["output_mod1": state.hvg_mod1, "output_mod2": state.hvg_mod2]]
+    }
+
+    | extract_metadata.run(
+      key: "extract_metadata_mod1",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_mod1")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_mod1,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta_mod1": "output"]
+    )
+
+    | extract_metadata.run(
+      key: "extract_metadata_mod2",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_mod2")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_mod2,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta_mod2": "output"]
+    )
+
+    // only output the files for which an output file was specified
+    | setState([
+      "output_mod1",
+      "output_mod2",
+      "output_meta_mod1",
+      "output_meta_mod2",
+      "_meta"
+    ])
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/workflows/process_openproblems_neurips2021_bmmc/nextflow.config b/target/nextflow/datasets/workflows/process_openproblems_neurips2021_bmmc/nextflow.config
new file mode 100644
index 0000000000..83310ee2f0
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_openproblems_neurips2021_bmmc/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/workflows/process_openproblems_neurips2021_bmmc'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Fetch and process Neurips 2021 multimodal datasets\n'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/workflows/process_openproblems_neurips2022_pbmc/.config.vsh.yaml b/target/nextflow/datasets/workflows/process_openproblems_neurips2022_pbmc/.config.vsh.yaml
new file mode 100644
index 0000000000..47be9bbef8
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_openproblems_neurips2022_pbmc/.config.vsh.yaml
@@ -0,0 +1,978 @@
+functionality:
+  name: "process_openproblems_neurips2022_pbmc"
+  namespace: "datasets/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "string"
+      name: "--id"
+      description: "The ID of the dataset"
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_mod1"
+      description: "Processed RNA h5ad file"
+      info: null
+      example:
+      - "cite_rna_merged.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_mod2"
+      description: "Processed ADT or ATAC h5ad file"
+      info: null
+      example:
+      - "cite_prot_merged.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--mod1"
+      description: "Name of the first modality."
+      info: null
+      example:
+      - "GEX"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--mod2"
+      description: "Name of the second modality."
+      info: null
+      example:
+      - "ADT"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Sampling options"
+    arguments:
+    - type: "boolean"
+      name: "--do_subsample"
+      description: "Whether or not to subsample the dataset"
+      info: null
+      default:
+      - false
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--n_obs"
+      description: "Maximum number of observations to be kept. It might end up being\
+        \ less because empty cells / genes are removed."
+      info: null
+      default:
+      - 500
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--n_vars"
+      description: "Maximum number of variables to be kept. It might end up being\
+        \ less because empty cells / genes are removed."
+      info: null
+      default:
+      - 500
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--keep_features"
+      description: "A list of genes to keep."
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--keep_cell_type_categories"
+      description: "Categories indexes to be selected"
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--keep_batch_categories"
+      description: "Categories indexes to be selected"
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean_true"
+      name: "--even"
+      description: "Subsample evenly from different batches"
+      info: null
+      direction: "input"
+      dest: "par"
+    - type: "integer"
+      name: "--seed"
+      description: "A seed for the subsampling."
+      info: null
+      example:
+      - 123
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Normalization"
+    arguments:
+    - type: "string"
+      name: "--normalization_methods"
+      description: "Which normalization methods to run."
+      info: null
+      default:
+      - "log_cp10k"
+      - "log_cpm"
+      - "sqrt_cp10k"
+      - "sqrt_cpm"
+      - "l1_sqrt"
+      required: false
+      choices:
+      - "log_cp10k"
+      - "log_cpm"
+      - "sqrt_cp10k"
+      - "sqrt_cpm"
+      - "l1_sqrt"
+      - "log_scran_pooling"
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_mod1"
+      info:
+        label: "Common dataset"
+        summary: "A dataset processed by the common dataset processing pipeline."
+        description: "This dataset contains both raw counts and normalized data matrices,\n\
+          as well as a SVD embedding and a HVG selection.\n\nThe format of this file\
+          \ is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalised expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors created by the normalisation method, if\
+              \ any."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_svd"
+            description: "The resulting SVD embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/common/pancreas/dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_mod2"
+      info:
+        label: "Common dataset"
+        summary: "A dataset processed by the common dataset processing pipeline."
+        description: "This dataset contains both raw counts and normalized data matrices,\n\
+          as well as a SVD embedding and a HVG selection.\n\nThe format of this file\
+          \ is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalised expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors created by the normalisation method, if\
+              \ any."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_svd"
+            description: "The resulting SVD embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/common/pancreas/dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_meta_mod1"
+      description: "Dataset metadata"
+      info: null
+      example:
+      - "dataset_metadata_mod1.yaml"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_meta_mod2"
+      description: "Dataset metadata"
+      info: null
+      example:
+      - "dataset_metadata_mod2.yaml"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "src/wf_utils/helper.nf"
+  description: "Fetch and process Neurips 2022 multimodal datasets\n"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "datasets/loaders/openproblems_neurips2022_pbmc"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_neurips2022_pbmc/config.vsh.yaml"
+    configInfo:
+      functionalityName: "openproblems_neurips2022_pbmc"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_neurips2022_pbmc/config.vsh.yaml"
+      functionalityNamespace: "datasets/loaders"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/loaders/openproblems_neurips2022_pbmc/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_neurips2022_pbmc"
+  - name: "datasets/normalization/log_cp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+    configInfo:
+      functionalityName: "log_cp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/log_cp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp"
+  - name: "datasets/normalization/log_scran_pooling"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml"
+    configInfo:
+      functionalityName: "log_scran_pooling"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/log_scran_pooling/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_scran_pooling"
+  - name: "datasets/normalization/sqrt_cp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml"
+    configInfo:
+      functionalityName: "sqrt_cp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/sqrt_cp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/sqrt_cp"
+  - name: "datasets/normalization/l1_sqrt"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml"
+    configInfo:
+      functionalityName: "l1_sqrt"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/l1_sqrt/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/l1_sqrt"
+  - name: "datasets/normalization/prot_clr"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/prot_clr/config.vsh.yaml"
+    configInfo:
+      functionalityName: "prot_clr"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/prot_clr/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/prot_clr/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/prot_clr"
+  - name: "datasets/normalization/atac_tfidf"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/atac_tfidf/config.vsh.yaml"
+    configInfo:
+      functionalityName: "atac_tfidf"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/atac_tfidf/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/atac_tfidf/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/atac_tfidf"
+  - name: "datasets/processors/subsample"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml"
+    configInfo:
+      functionalityName: "subsample"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/subsample/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/subsample"
+  - name: "datasets/processors/svd"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/svd/config.vsh.yaml"
+    configInfo:
+      functionalityName: "svd"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/svd/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/svd/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/svd"
+  - name: "datasets/processors/hvg"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml"
+    configInfo:
+      functionalityName: "hvg"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/hvg/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/hvg"
+  - name: "common/extract_metadata"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+    configInfo:
+      functionalityName: "extract_metadata"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/extract_metadata/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  - name: "common/decompress_gzip"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/decompress_gzip/config.vsh.yaml"
+    configInfo:
+      functionalityName: "decompress_gzip"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/decompress_gzip/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/decompress_gzip/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/decompress_gzip"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_openproblems_neurips2022_pbmc/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_openproblems_neurips2022_pbmc"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_openproblems_neurips2022_pbmc/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/workflows/process_openproblems_neurips2022_pbmc/helper.nf b/target/nextflow/datasets/workflows/process_openproblems_neurips2022_pbmc/helper.nf
new file mode 100644
index 0000000000..7b3acd5b1c
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_openproblems_neurips2022_pbmc/helper.nf
@@ -0,0 +1,14 @@
+Map findArgumentSchema(Map config, String argument_id) {
+  def argument_groups =
+    (config.functionality.argument_groups ?: []) +
+    [
+      arguments: config.functionality.arguments ?: []
+    ]
+
+  def schema_value = argument_groups.findResult{ gr ->
+    gr.arguments.find { arg ->
+      arg.name == ("--" + argument_id)
+    }
+  }
+  return schema_value
+}
diff --git a/target/nextflow/datasets/workflows/process_openproblems_neurips2022_pbmc/main.nf b/target/nextflow/datasets/workflows/process_openproblems_neurips2022_pbmc/main.nf
new file mode 100644
index 0000000000..fe6b74f665
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_openproblems_neurips2022_pbmc/main.nf
@@ -0,0 +1,4300 @@
+// process_openproblems_neurips2022_pbmc 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_openproblems_neurips2022_pbmc",
+    "namespace" : "datasets/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--id",
+            "description" : "The ID of the dataset",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_mod1",
+            "description" : "Processed RNA h5ad file",
+            "example" : [
+              "cite_rna_merged.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_mod2",
+            "description" : "Processed ADT or ATAC h5ad file",
+            "example" : [
+              "cite_prot_merged.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--mod1",
+            "description" : "Name of the first modality.",
+            "example" : [
+              "GEX"
+            ],
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--mod2",
+            "description" : "Name of the second modality.",
+            "example" : [
+              "ADT"
+            ],
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Metadata",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--dataset_name",
+            "description" : "Nicely formatted name.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_url",
+            "description" : "Link to the original source of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_reference",
+            "description" : "Bibtex reference of the paper in which the dataset was published.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_summary",
+            "description" : "Short description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_description",
+            "description" : "Long description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_organism",
+            "description" : "The organism of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Sampling options",
+        "arguments" : [
+          {
+            "type" : "boolean",
+            "name" : "--do_subsample",
+            "description" : "Whether or not to subsample the dataset",
+            "default" : [
+              false
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--n_obs",
+            "description" : "Maximum number of observations to be kept. It might end up being less because empty cells / genes are removed.",
+            "default" : [
+              500
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--n_vars",
+            "description" : "Maximum number of variables to be kept. It might end up being less because empty cells / genes are removed.",
+            "default" : [
+              500
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--keep_features",
+            "description" : "A list of genes to keep.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--keep_cell_type_categories",
+            "description" : "Categories indexes to be selected",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--keep_batch_categories",
+            "description" : "Categories indexes to be selected",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "boolean_true",
+            "name" : "--even",
+            "description" : "Subsample evenly from different batches",
+            "direction" : "input",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--seed",
+            "description" : "A seed for the subsampling.",
+            "example" : [
+              123
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Normalization",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--normalization_methods",
+            "description" : "Which normalization methods to run.",
+            "default" : [
+              "log_cp10k",
+              "log_cpm",
+              "sqrt_cp10k",
+              "sqrt_cpm",
+              "l1_sqrt"
+            ],
+            "required" : false,
+            "choices" : [
+              "log_cp10k",
+              "log_cpm",
+              "sqrt_cp10k",
+              "sqrt_cpm",
+              "l1_sqrt",
+              "log_scran_pooling"
+            ],
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_mod1",
+            "info" : {
+              "label" : "Common dataset",
+              "summary" : "A dataset processed by the common dataset processing pipeline.",
+              "description" : "This dataset contains both raw counts and normalized data matrices,\nas well as a SVD embedding and a HVG selection.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalised expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors created by the normalisation method, if any.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_svd",
+                    "description" : "The resulting SVD embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_mod2",
+            "info" : {
+              "label" : "Common dataset",
+              "summary" : "A dataset processed by the common dataset processing pipeline.",
+              "description" : "This dataset contains both raw counts and normalized data matrices,\nas well as a SVD embedding and a HVG selection.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalised expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors created by the normalisation method, if any.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_svd",
+                    "description" : "The resulting SVD embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_meta_mod1",
+            "description" : "Dataset metadata",
+            "example" : [
+              "dataset_metadata_mod1.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_meta_mod2",
+            "description" : "Dataset metadata",
+            "example" : [
+              "dataset_metadata_mod2.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_openproblems_neurips2022_pbmc/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "src/wf_utils/helper.nf",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "description" : "Fetch and process Neurips 2022 multimodal datasets\n",
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "datasets/loaders/openproblems_neurips2022_pbmc",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_neurips2022_pbmc/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "openproblems_neurips2022_pbmc",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_neurips2022_pbmc/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/loaders",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/loaders/openproblems_neurips2022_pbmc/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_neurips2022_pbmc"
+      },
+      {
+        "name" : "datasets/normalization/log_cp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "log_cp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/log_cp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp"
+      },
+      {
+        "name" : "datasets/normalization/log_scran_pooling",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "log_scran_pooling",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/log_scran_pooling/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_scran_pooling"
+      },
+      {
+        "name" : "datasets/normalization/sqrt_cp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "sqrt_cp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/sqrt_cp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/sqrt_cp"
+      },
+      {
+        "name" : "datasets/normalization/l1_sqrt",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "l1_sqrt",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/l1_sqrt/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/l1_sqrt"
+      },
+      {
+        "name" : "datasets/normalization/prot_clr",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/prot_clr/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "prot_clr",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/prot_clr/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/prot_clr/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/prot_clr"
+      },
+      {
+        "name" : "datasets/normalization/atac_tfidf",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/atac_tfidf/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "atac_tfidf",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/atac_tfidf/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/atac_tfidf/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/atac_tfidf"
+      },
+      {
+        "name" : "datasets/processors/subsample",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "subsample",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/subsample/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/subsample"
+      },
+      {
+        "name" : "datasets/processors/svd",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/svd/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "svd",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/svd/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/svd/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/svd"
+      },
+      {
+        "name" : "datasets/processors/hvg",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "hvg",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/hvg/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/hvg"
+      },
+      {
+        "name" : "common/extract_metadata",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "extract_metadata",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/extract_metadata/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+      },
+      {
+        "name" : "common/decompress_gzip",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/decompress_gzip/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "decompress_gzip",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/decompress_gzip/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/decompress_gzip/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/decompress_gzip"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_openproblems_neurips2022_pbmc/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_openproblems_neurips2022_pbmc",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { openproblems_neurips2022_pbmc } from "${meta.resources_dir}/../../../../nextflow/datasets/loaders/openproblems_neurips2022_pbmc/main.nf"
+include { log_cp } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/log_cp/main.nf"
+include { log_scran_pooling } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/log_scran_pooling/main.nf"
+include { sqrt_cp } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/sqrt_cp/main.nf"
+include { l1_sqrt } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/l1_sqrt/main.nf"
+include { prot_clr } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/prot_clr/main.nf"
+include { atac_tfidf } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/atac_tfidf/main.nf"
+include { subsample } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/subsample/main.nf"
+include { svd } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/svd/main.nf"
+include { hvg } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/hvg/main.nf"
+include { extract_metadata } from "${meta.resources_dir}/../../../../nextflow/common/extract_metadata/main.nf"
+include { decompress_gzip } from "${meta.resources_dir}/../../../../nextflow/common/decompress_gzip/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // create different normalization methods by overriding the defaults
+  normalization_methods = [
+    log_cp.run(
+      key: "log_cp10k",
+      args: [normalization_id: "log_cp10k", n_cp: 10000]
+    ),
+    log_cp.run(
+      key: "log_cpm",
+      args: [normalization_id: "log_cpm", n_cp: 1000000]
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cp10k",
+      args: [normalization_id: "sqrt_cp10k", n_cp: 10000]
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cpm",
+      args: [normalization_id: "sqrt_cpm", n_cp: 1000000]
+    ),
+    l1_sqrt.run(
+      key: "l1_sqrt",
+      args: [normalization_id: "l1_sqrt"]
+    ),
+    log_scran_pooling.run(
+      key: "log_scran_pooling",
+      args: [normalization_id: "log_scran_pooling"]
+    )
+  ]
+
+  output_ch = input_ch
+
+    // store original id for later use
+    | map{ id, state ->
+      [id, state + [_meta: [join_id: id]]]
+    }
+
+    // process neurips downloaded dataset
+    | openproblems_neurips2022_pbmc.run(
+      fromState: [
+        "input_mod1": "input_mod1",
+        "input_mod2": "input_mod2",
+        "mod1": "mod1",
+        "mod2": "mod2",
+        "dataset_id": "id",
+        "dataset_name": "dataset_name",
+        "dataset_url": "dataset_url",
+        "dataset_reference": "dataset_reference",
+        "dataset_summary": "dataset_summary",
+        "dataset_description": "dataset_description",
+        "dataset_organism": "dataset_organism"
+      ],
+      toState: [
+        "raw_mod1": "output_mod1",
+        "raw_mod2": "output_mod2"
+      ]
+    )
+
+    // subsample if need be
+    | subsample.run(
+      runIf: { id, state -> state.do_subsample },
+      fromState: [
+        "input": "raw_mod1",
+        "input_mod2": "raw_mod2",
+        "n_obs": "n_obs",
+        "n_vars": "n_vars",
+        "keep_features": "keep_features",
+        "keep_cell_type_categories": "keep_cell_type_categories",
+        "keep_batch_categories": "keep_batch_categories",
+        "even": "even",
+        "seed": "seed"
+      ],
+      toState: [
+        "raw_mod1": "output",
+        "raw_mod2": "output_mod2"
+      ]
+    )
+
+    // run mod1 normalization methods
+    | runEach(
+      components: normalization_methods,
+      id: { id, state, comp ->
+        if (state.normalization_methods.size() > 1) {
+          id + "/" + comp.name
+        } else {
+          id
+        }
+      },
+      filter: { id, state, comp ->
+        comp.name in state.normalization_methods
+      },
+      fromState: ["input": "raw_mod1"],
+      toState: { id, output, state, comp -> 
+        state + [
+          "normalization_id": comp.name,
+          "normalized_mod1": output.output
+        ]
+      }
+    )
+
+    // run normalization methods on second modality
+    // TODO: can we change this to DSB?
+    | prot_clr.run(
+      runIf: { id, state -> state.mod2 == "ADT" },
+      args: [normalization_id: "prot_clr"],
+      fromState: ["input": "raw_mod2"],
+      toState: ["normalized_mod2": "output"]
+    )
+    | atac_tfidf.run(
+      runIf: { id, state -> state.mod2 == "ATAC" },
+      args: [normalization_id: "atac_tfidf"],
+      fromState: ["input": "raw_mod2"],
+      toState: ["normalized_mod2": "output"]
+    )
+
+    | svd.run(
+      fromState: [
+        "input": "normalized_mod1",
+        "input_mod2": "normalized_mod2"
+      ],
+      toState: [
+        "svd_mod1": "output",
+        "svd_mod2": "output_mod2"
+      ]
+    )
+
+    | hvg.run(
+      fromState: [ "input": "svd_mod1" ],
+      toState: [ "hvg_mod1": "output" ]
+    )
+
+    | hvg.run(
+      key: "hvg_mod2",
+      fromState: [ "input": "svd_mod2" ],
+      toState: [ "hvg_mod2": "output" ]
+    )
+
+    // add synonyms
+    | map{ id, state ->
+      [id, state + ["output_mod1": state.hvg_mod1, "output_mod2": state.hvg_mod2]]
+    }
+
+    | extract_metadata.run(
+      key: "extract_metadata_mod1",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_mod1")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_mod1,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta_mod1": "output"]
+    )
+
+    | extract_metadata.run(
+      key: "extract_metadata_mod2",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_mod2")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_mod2,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta_mod2": "output"]
+    )
+
+    // only output the files for which an output file was specified
+    | setState([
+      "output_mod1",
+      "output_mod2",
+      "output_meta_mod1",
+      "output_meta_mod2",
+      "_meta"
+    ])
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/workflows/process_openproblems_neurips2022_pbmc/nextflow.config b/target/nextflow/datasets/workflows/process_openproblems_neurips2022_pbmc/nextflow.config
new file mode 100644
index 0000000000..5df82e847d
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_openproblems_neurips2022_pbmc/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/workflows/process_openproblems_neurips2022_pbmc'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Fetch and process Neurips 2022 multimodal datasets\n'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/workflows/process_openproblems_v1/.config.vsh.yaml b/target/nextflow/datasets/workflows/process_openproblems_v1/.config.vsh.yaml
new file mode 100644
index 0000000000..ed36f8b270
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_openproblems_v1/.config.vsh.yaml
@@ -0,0 +1,1925 @@
+functionality:
+  name: "process_openproblems_v1"
+  namespace: "datasets/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "string"
+      name: "--id"
+      description: "Unique identifier of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--input_id"
+      description: "The ID of the dataset in OpenProblems v1"
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_cell_type"
+      description: "Location of where to find the observation cell types."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_batch"
+      description: "Location of where to find the observation batch IDs."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_tissue"
+      description: "Location of where to find the observation tissue information."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--layer_counts"
+      description: "In which layer to find the counts matrix. Leave undefined to use\
+        \ `.X`."
+      info: null
+      example:
+      - "counts"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean"
+      name: "--sparse"
+      description: "Convert layers to a sparse CSR format."
+      info: null
+      default:
+      - true
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--var_feature_id"
+      description: "Location of where to find the feature IDs. Can be set to index\
+        \ if the feature IDs are the index."
+      info: null
+      example:
+      - "gene_ids"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--var_feature_name"
+      description: "Location of where to find the feature names. Can be set to index\
+        \ if the feature names are the index."
+      info: null
+      default:
+      - "index"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Sampling options"
+    arguments:
+    - type: "boolean"
+      name: "--do_subsample"
+      description: "Whether or not to subsample the dataset"
+      info: null
+      default:
+      - false
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--n_obs"
+      description: "Maximum number of observations to be kept. It might end up being\
+        \ less because empty cells / genes are removed."
+      info: null
+      default:
+      - 500
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--n_vars"
+      description: "Maximum number of variables to be kept. It might end up being\
+        \ less because empty cells / genes are removed."
+      info: null
+      default:
+      - 500
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--keep_features"
+      description: "A list of genes to keep."
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--keep_cell_type_categories"
+      description: "Categories indexes to be selected"
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--keep_batch_categories"
+      description: "Categories indexes to be selected"
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean_true"
+      name: "--even"
+      description: "Subsample evenly from different batches"
+      info: null
+      direction: "input"
+      dest: "par"
+    - type: "integer"
+      name: "--seed"
+      description: "A seed for the subsampling."
+      info: null
+      example:
+      - 123
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Normalization"
+    arguments:
+    - type: "string"
+      name: "--normalization_methods"
+      description: "Which normalization methods to run."
+      info: null
+      default:
+      - "log_cp10k"
+      - "log_cpm"
+      - "sqrt_cp10k"
+      - "sqrt_cpm"
+      - "l1_sqrt"
+      required: false
+      choices:
+      - "log_cp10k"
+      - "log_cpm"
+      - "sqrt_cp10k"
+      - "sqrt_cpm"
+      - "l1_sqrt"
+      - "log_scran_pooling"
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_dataset"
+      info:
+        label: "Common dataset"
+        summary: "A dataset processed by the common dataset processing pipeline."
+        description: "This dataset contains both raw counts and normalized data matrices,\n\
+          as well as a PCA embedding, HVG selection and a kNN graph."
+        slots:
+          obsp:
+          - type: "double"
+            name: "knn_distances"
+            description: "K nearest neighbors distance matrix."
+            required: true
+          - type: "double"
+            name: "knn_connectivities"
+            description: "K nearest neighbors connectivities matrix."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+          - type: "double"
+            name: "pca_variance"
+            description: "The PCA variance objects."
+            required: true
+          - type: "object"
+            name: "knn"
+            description: "Supplementary K nearest neighbors data."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_pca"
+            description: "The resulting PCA embedding."
+            required: true
+          varm:
+          - type: "double"
+            name: "pca_loadings"
+            description: "The PCA loadings matrix."
+            required: true
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A score for the feature indicating how highly variable it\
+              \ is."
+            required: true
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalised expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors created by the normalisation method, if\
+              \ any."
+            required: false
+      example:
+      - "resources_test/common/pancreas/dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_meta"
+      description: "Dataset metadata"
+      info: null
+      default:
+      - "dataset_metadata.yaml"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_raw"
+      info:
+        label: "Raw dataset"
+        summary: "An unprocessed dataset as output by a dataset loader."
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+      example:
+      - "resources_test/common/pancreas/raw.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_normalized"
+      info:
+        label: "Normalized dataset"
+        summary: "A normalized dataset"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalised expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors created by the normalisation method, if\
+              \ any."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      example:
+      - "resources_test/common/pancreas/normalized.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_pca"
+      info:
+        label: "Dataset+HVG+PCA"
+        summary: "A normalised dataset with a PCA embedding"
+        slots:
+          obsm:
+          - type: "double"
+            name: "X_pca"
+            description: "The resulting PCA embedding."
+            required: true
+          varm:
+          - type: "double"
+            name: "pca_loadings"
+            description: "The PCA loadings matrix."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+          - type: "double"
+            name: "pca_variance"
+            description: "The PCA variance objects."
+            required: true
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A score for the feature indicating how highly variable it\
+              \ is."
+            required: true
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalised expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors created by the normalisation method, if\
+              \ any."
+            required: false
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      example:
+      - "resources_test/common/pancreas/pca.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_hvg"
+      info:
+        label: "Dataset+HVG"
+        summary: "A normalised dataset with a PCA embedding and HVG selection."
+        slots:
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A score for the feature indicating how highly variable it\
+              \ is."
+            required: true
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalised expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors created by the normalisation method, if\
+              \ any."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      example:
+      - "resources_test/common/pancreas/hvg.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_knn"
+      info:
+        label: "Dataset+HVG+PCA+kNN"
+        summary: "A normalised data with a PCA embedding, HVG selection and a kNN\
+          \ graph"
+        slots:
+          obsp:
+          - type: "double"
+            name: "knn_distances"
+            description: "K nearest neighbors distance matrix."
+            required: true
+          - type: "double"
+            name: "knn_connectivities"
+            description: "K nearest neighbors connectivities matrix."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+          - type: "double"
+            name: "pca_variance"
+            description: "The PCA variance objects."
+            required: true
+          - type: "object"
+            name: "knn"
+            description: "Supplementary K nearest neighbors data."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_pca"
+            description: "The resulting PCA embedding."
+            required: true
+          varm:
+          - type: "double"
+            name: "pca_loadings"
+            description: "The PCA loadings matrix."
+            required: true
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A score for the feature indicating how highly variable it\
+              \ is."
+            required: true
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalised expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors created by the normalisation method, if\
+              \ any."
+            required: false
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+      example:
+      - "resources_test/common/pancreas/knn.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "src/wf_utils/helper.nf"
+  description: "Fetch and process legacy OpenProblems v1 datasets\n"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "datasets/loaders/openproblems_v1"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_v1/config.vsh.yaml"
+    configInfo:
+      functionalityName: "openproblems_v1"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_v1/config.vsh.yaml"
+      functionalityNamespace: "datasets/loaders"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/loaders/openproblems_v1/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_v1"
+  - name: "datasets/normalization/log_cp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+    configInfo:
+      functionalityName: "log_cp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/log_cp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp"
+  - name: "datasets/normalization/log_scran_pooling"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml"
+    configInfo:
+      functionalityName: "log_scran_pooling"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/log_scran_pooling/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_scran_pooling"
+  - name: "datasets/normalization/sqrt_cp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml"
+    configInfo:
+      functionalityName: "sqrt_cp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/sqrt_cp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/sqrt_cp"
+  - name: "datasets/normalization/l1_sqrt"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml"
+    configInfo:
+      functionalityName: "l1_sqrt"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/l1_sqrt/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/l1_sqrt"
+  - name: "datasets/processors/subsample"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml"
+    configInfo:
+      functionalityName: "subsample"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/subsample/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/subsample"
+  - name: "datasets/processors/pca"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/pca/config.vsh.yaml"
+    configInfo:
+      functionalityName: "pca"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/pca/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/pca/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/pca"
+  - name: "datasets/processors/hvg"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml"
+    configInfo:
+      functionalityName: "hvg"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/hvg/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/hvg"
+  - name: "datasets/processors/knn"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/knn/config.vsh.yaml"
+    configInfo:
+      functionalityName: "knn"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/knn/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/knn/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/knn"
+  - name: "common/extract_metadata"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+    configInfo:
+      functionalityName: "extract_metadata"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/extract_metadata/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_openproblems_v1/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_openproblems_v1"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_openproblems_v1/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/workflows/process_openproblems_v1/helper.nf b/target/nextflow/datasets/workflows/process_openproblems_v1/helper.nf
new file mode 100644
index 0000000000..7b3acd5b1c
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_openproblems_v1/helper.nf
@@ -0,0 +1,14 @@
+Map findArgumentSchema(Map config, String argument_id) {
+  def argument_groups =
+    (config.functionality.argument_groups ?: []) +
+    [
+      arguments: config.functionality.arguments ?: []
+    ]
+
+  def schema_value = argument_groups.findResult{ gr ->
+    gr.arguments.find { arg ->
+      arg.name == ("--" + argument_id)
+    }
+  }
+  return schema_value
+}
diff --git a/target/nextflow/datasets/workflows/process_openproblems_v1/main.nf b/target/nextflow/datasets/workflows/process_openproblems_v1/main.nf
new file mode 100644
index 0000000000..6f97ce002d
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_openproblems_v1/main.nf
@@ -0,0 +1,5375 @@
+// process_openproblems_v1 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_openproblems_v1",
+    "namespace" : "datasets/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--id",
+            "description" : "Unique identifier of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--input_id",
+            "description" : "The ID of the dataset in OpenProblems v1",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--obs_cell_type",
+            "description" : "Location of where to find the observation cell types.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--obs_batch",
+            "description" : "Location of where to find the observation batch IDs.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--obs_tissue",
+            "description" : "Location of where to find the observation tissue information.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--layer_counts",
+            "description" : "In which layer to find the counts matrix. Leave undefined to use `.X`.",
+            "example" : [
+              "counts"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "boolean",
+            "name" : "--sparse",
+            "description" : "Convert layers to a sparse CSR format.",
+            "default" : [
+              true
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--var_feature_id",
+            "description" : "Location of where to find the feature IDs. Can be set to index if the feature IDs are the index.",
+            "example" : [
+              "gene_ids"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--var_feature_name",
+            "description" : "Location of where to find the feature names. Can be set to index if the feature names are the index.",
+            "default" : [
+              "index"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Metadata",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--dataset_name",
+            "description" : "Nicely formatted name.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_url",
+            "description" : "Link to the original source of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_reference",
+            "description" : "Bibtex reference of the paper in which the dataset was published.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_summary",
+            "description" : "Short description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_description",
+            "description" : "Long description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_organism",
+            "description" : "The organism of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Sampling options",
+        "arguments" : [
+          {
+            "type" : "boolean",
+            "name" : "--do_subsample",
+            "description" : "Whether or not to subsample the dataset",
+            "default" : [
+              false
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--n_obs",
+            "description" : "Maximum number of observations to be kept. It might end up being less because empty cells / genes are removed.",
+            "default" : [
+              500
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--n_vars",
+            "description" : "Maximum number of variables to be kept. It might end up being less because empty cells / genes are removed.",
+            "default" : [
+              500
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--keep_features",
+            "description" : "A list of genes to keep.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--keep_cell_type_categories",
+            "description" : "Categories indexes to be selected",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--keep_batch_categories",
+            "description" : "Categories indexes to be selected",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "boolean_true",
+            "name" : "--even",
+            "description" : "Subsample evenly from different batches",
+            "direction" : "input",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--seed",
+            "description" : "A seed for the subsampling.",
+            "example" : [
+              123
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Normalization",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--normalization_methods",
+            "description" : "Which normalization methods to run.",
+            "default" : [
+              "log_cp10k",
+              "log_cpm",
+              "sqrt_cp10k",
+              "sqrt_cpm",
+              "l1_sqrt"
+            ],
+            "required" : false,
+            "choices" : [
+              "log_cp10k",
+              "log_cpm",
+              "sqrt_cp10k",
+              "sqrt_cpm",
+              "l1_sqrt",
+              "log_scran_pooling"
+            ],
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_dataset",
+            "info" : {
+              "label" : "Common dataset",
+              "summary" : "A dataset processed by the common dataset processing pipeline.",
+              "description" : "This dataset contains both raw counts and normalized data matrices,\nas well as a PCA embedding, HVG selection and a kNN graph.",
+              "slots" : {
+                "obsp" : [
+                  {
+                    "type" : "double",
+                    "name" : "knn_distances",
+                    "description" : "K nearest neighbors distance matrix.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "knn_connectivities",
+                    "description" : "K nearest neighbors connectivities matrix.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "pca_variance",
+                    "description" : "The PCA variance objects.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "object",
+                    "name" : "knn",
+                    "description" : "Supplementary K nearest neighbors data.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_pca",
+                    "description" : "The resulting PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "varm" : [
+                  {
+                    "type" : "double",
+                    "name" : "pca_loadings",
+                    "description" : "The PCA loadings matrix.",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A score for the feature indicating how highly variable it is.",
+                    "required" : true
+                  }
+                ],
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalised expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors created by the normalisation method, if any.",
+                    "required" : false
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_meta",
+            "description" : "Dataset metadata",
+            "default" : [
+              "dataset_metadata.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_raw",
+            "info" : {
+              "label" : "Raw dataset",
+              "summary" : "An unprocessed dataset as output by a dataset loader.",
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/raw.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_normalized",
+            "info" : {
+              "label" : "Normalized dataset",
+              "summary" : "A normalized dataset",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalised expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors created by the normalisation method, if any.",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  }
+                ]
+              },
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+            },
+            "example" : [
+              "resources_test/common/pancreas/normalized.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_pca",
+            "info" : {
+              "label" : "Dataset+HVG+PCA",
+              "summary" : "A normalised dataset with a PCA embedding",
+              "slots" : {
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_pca",
+                    "description" : "The resulting PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "varm" : [
+                  {
+                    "type" : "double",
+                    "name" : "pca_loadings",
+                    "description" : "The PCA loadings matrix.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "pca_variance",
+                    "description" : "The PCA variance objects.",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A score for the feature indicating how highly variable it is.",
+                    "required" : true
+                  }
+                ],
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalised expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+     ''' + '''               "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors created by the normalisation method, if any.",
+                    "required" : false
+                  }
+                ]
+              },
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+            },
+            "example" : [
+              "resources_test/common/pancreas/pca.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_hvg",
+            "info" : {
+              "label" : "Dataset+HVG",
+              "summary" : "A normalised dataset with a PCA embedding and HVG selection.",
+              "slots" : {
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A score for the feature indicating how highly variable it is.",
+                    "required" : true
+                  }
+                ],
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalised expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors created by the normalisation method, if any.",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              },
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+            },
+            "example" : [
+              "resources_test/common/pancreas/hvg.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_knn",
+            "info" : {
+              "label" : "Dataset+HVG+PCA+kNN",
+              "summary" : "A normalised data with a PCA embedding, HVG selection and a kNN graph",
+              "slots" : {
+                "obsp" : [
+                  {
+                    "type" : "double",
+                    "name" : "knn_distances",
+                    "description" : "K nearest neighbors distance matrix.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "knn_connectivities",
+                    "description" : "K nearest neighbors connectivities matrix.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "pca_variance",
+                    "description" : "The PCA variance objects.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "object",
+                    "name" : "knn",
+                    "description" : "Supplementary K nearest neighbors data.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_pca",
+                    "description" : "The resulting PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "varm" : [
+                  {
+                    "type" : "double",
+                    "name" : "pca_loadings",
+                    "description" : "The PCA loadings matrix.",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A score for the feature indicating how highly variable it is.",
+                    "required" : true
+                  }
+                ],
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalised expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors created by the normalisation method, if any.",
+                    "required" : false
+                  }
+                ]
+              },
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+            },
+            "example" : [
+              "resources_test/common/pancreas/knn.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_openproblems_v1/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "src/wf_utils/helper.nf",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "description" : "Fetch and process legacy OpenProblems v1 datasets\n",
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "datasets/loaders/openproblems_v1",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_v1/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "openproblems_v1",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_v1/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/loaders",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/loaders/openproblems_v1/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_v1"
+      },
+      {
+        "name" : "datasets/normalization/log_cp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "log_cp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/log_cp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp"
+      },
+      {
+        "name" : "datasets/normalization/log_scran_pooling",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "log_scran_pooling",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/log_scran_pooling/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_scran_pooling"
+      },
+      {
+        "name" : "datasets/normalization/sqrt_cp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "sqrt_cp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/sqrt_cp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/sqrt_cp"
+      },
+      {
+        "name" : "datasets/normalization/l1_sqrt",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "l1_sqrt",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/l1_sqrt/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/l1_sqrt"
+      },
+      {
+        "name" : "datasets/processors/subsample",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "subsample",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/subsample/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/subsample"
+      },
+      {
+        "name" : "datasets/processors/pca",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/pca/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "pca",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/pca/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/pca/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/pca"
+      },
+      {
+        "name" : "datasets/processors/hvg",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "hvg",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/hvg/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/hvg"
+      },
+      {
+        "name" : "datasets/processors/knn",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/knn/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "knn",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/knn/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/knn/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/knn"
+      },
+      {
+        "name" : "common/extract_metadata",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "extract_metadata",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/extract_metadata/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_openproblems_v1/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_openproblems_v1",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { openproblems_v1 } from "${meta.resources_dir}/../../../../nextflow/datasets/loaders/openproblems_v1/main.nf"
+include { log_cp } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/log_cp/main.nf"
+include { log_scran_pooling } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/log_scran_pooling/main.nf"
+include { sqrt_cp } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/sqrt_cp/main.nf"
+include { l1_sqrt } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/l1_sqrt/main.nf"
+include { subsample } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/subsample/main.nf"
+include { pca } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/pca/main.nf"
+include { hvg } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/hvg/main.nf"
+include { knn } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/knn/main.nf"
+include { extract_metadata } from "${meta.resources_dir}/../../../../nextflow/common/extract_metadata/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // create different normalization methods by overriding the defaults
+  normalization_methods = [
+    log_cp.run(
+      key: "log_cp10k",
+      args: [normalization_id: "log_cp10k", n_cp: 10000],
+    ),
+    log_cp.run(
+      key: "log_cpm",
+      args: [normalization_id: "log_cpm", n_cp: 1000000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cp10k",
+      args: [normalization_id: "sqrt_cp10k", n_cp: 10000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cpm",
+      args: [normalization_id: "sqrt_cpm", n_cp: 1000000],
+    ),
+    l1_sqrt.run(
+      key: "l1_sqrt",
+      args: [normalization_id: "l1_sqrt"],
+    ),
+    log_scran_pooling.run(
+      key: "log_scran_pooling",
+      args: [normalization_id: "log_scran_pooling"],
+    )
+  ]
+
+  output_ch = input_ch
+
+    // store original id for later use
+    | map{ id, state ->
+      [id, state + [_meta: [join_id: id]]]
+    }
+
+    // fetch data from legacy openproblems
+    | openproblems_v1.run(
+      fromState: [
+        "input_id": "input_id",
+        "obs_cell_type": "obs_cell_type",
+        "obs_batch": "obs_batch",
+        "obs_tissue": "obs_tissue",
+        "layer_counts": "layer_counts",
+        "sparse": "sparse",
+        "dataset_id": "id",
+        "dataset_name": "dataset_name",
+        "dataset_url": "dataset_url",
+        "dataset_reference": "dataset_reference",
+        "dataset_summary": "dataset_summary",
+        "dataset_description": "dataset_description",
+        "dataset_organism": "dataset_organism",
+      ],
+      toState: ["output_raw": "output"]
+    )
+    
+    // subsample if so desired
+    | subsample.run(
+      runIf: { id, state -> state.do_subsample },
+      fromState: [
+        "input": "output_raw",
+        "n_obs": "n_obs",
+        "n_vars": "n_vars",
+        "keep_features": "keep_features",
+        "keep_cell_type_categories": "keep_cell_type_categories",
+        "keep_batch_categories": "keep_batch_categories",
+        "even": "even",
+        "seed": "seed"
+      ],
+      args: [output_mod2: null],
+      toState: ["output_raw": "output"]
+    )
+
+    | runEach(
+      components: normalization_methods,
+      id: { id, state, comp ->
+        if (state.normalization_methods.size() > 1) {
+          id + "/" + comp.name
+        } else {
+          id
+        }
+      },
+      filter: { id, state, comp ->
+        comp.name in state.normalization_methods
+      },
+      fromState: ["input": "output_raw"],
+      toState: { id, output, state, comp ->
+        state + [
+          output_normalized: output.output,
+          normalization_id: comp.name
+        ]
+      }
+    )
+
+    | hvg.run(
+      fromState: ["input": "output_normalized"],
+      toState: ["output_hvg": "output"]
+    )
+
+    | pca.run(
+      fromState: ["input": "output_hvg"],
+      toState: ["output_pca": "output" ]
+    )
+
+    | knn.run(
+      fromState: ["input": "output_pca"],
+      toState: ["output_knn": "output"]
+    )
+
+    // add synonym
+    | map{ id, state ->
+      [id, state + [output_dataset: state.output_knn]]
+    }
+
+    | extract_metadata.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_dataset")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_dataset,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta": "output"]
+    )
+
+    // only output the files for which an output file was specified
+    | setState([
+      "output_dataset",
+      "output_meta",
+      "output_raw",
+      "output_normalized",
+      "output_pca",
+      "output_hvg",
+      "output_knn",
+      "_meta"
+    ])
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/workflows/process_openproblems_v1/nextflow.config b/target/nextflow/datasets/workflows/process_openproblems_v1/nextflow.config
new file mode 100644
index 0000000000..56be7e38d2
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_openproblems_v1/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/workflows/process_openproblems_v1'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Fetch and process legacy OpenProblems v1 datasets\n'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/workflows/process_openproblems_v1_multimodal/.config.vsh.yaml b/target/nextflow/datasets/workflows/process_openproblems_v1_multimodal/.config.vsh.yaml
new file mode 100644
index 0000000000..1e431fb067
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_openproblems_v1_multimodal/.config.vsh.yaml
@@ -0,0 +1,1018 @@
+functionality:
+  name: "process_openproblems_v1_multimodal"
+  namespace: "datasets/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "string"
+      name: "--id"
+      description: "Unique identifier of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--input_id"
+      description: "The ID of the dataset in OpenProblems v1"
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_cell_type"
+      description: "Location of where to find the observation cell types."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_batch"
+      description: "Location of where to find the observation batch IDs."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--obs_tissue"
+      description: "Location of where to find the observation tissue information."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--layer_counts"
+      description: "In which layer to find the counts matrix. Leave undefined to use\
+        \ `.X`."
+      info: null
+      example:
+      - "counts"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean"
+      name: "--sparse"
+      description: "Convert layers to a sparse CSR format."
+      info: null
+      default:
+      - true
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--var_feature_id"
+      description: "Location of where to find the feature IDs. Can be set to index\
+        \ if the feature IDs are the index."
+      info: null
+      example:
+      - "gene_ids"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--var_feature_name"
+      description: "Location of where to find the feature names. Can be set to index\
+        \ if the feature names are the index."
+      info: null
+      default:
+      - "index"
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--mod1"
+      description: "Name of the first modality."
+      info: null
+      example:
+      - "GEX"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--mod2"
+      description: "Name of the second modality."
+      info: null
+      example:
+      - "ADT"
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Sampling options"
+    arguments:
+    - type: "boolean"
+      name: "--do_subsample"
+      description: "Whether or not to subsample the dataset"
+      info: null
+      default:
+      - false
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--n_obs"
+      description: "Maximum number of observations to be kept. It might end up being\
+        \ less because empty cells / genes are removed."
+      info: null
+      default:
+      - 500
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--n_vars"
+      description: "Maximum number of variables to be kept. It might end up being\
+        \ less because empty cells / genes are removed."
+      info: null
+      default:
+      - 500
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--keep_features"
+      description: "A list of genes to keep."
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--keep_cell_type_categories"
+      description: "Categories indexes to be selected"
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--keep_batch_categories"
+      description: "Categories indexes to be selected"
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean_true"
+      name: "--even"
+      description: "Subsample evenly from different batches"
+      info: null
+      direction: "input"
+      dest: "par"
+    - type: "integer"
+      name: "--seed"
+      description: "A seed for the subsampling."
+      info: null
+      example:
+      - 123
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Normalization"
+    arguments:
+    - type: "string"
+      name: "--normalization_methods"
+      description: "Which normalization methods to run."
+      info: null
+      default:
+      - "log_cp10k"
+      - "log_cpm"
+      - "sqrt_cp10k"
+      - "sqrt_cpm"
+      - "l1_sqrt"
+      required: false
+      choices:
+      - "log_cp10k"
+      - "log_cpm"
+      - "sqrt_cp10k"
+      - "sqrt_cpm"
+      - "l1_sqrt"
+      - "log_scran_pooling"
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_mod1"
+      info:
+        label: "Common dataset"
+        summary: "A dataset processed by the common dataset processing pipeline."
+        description: "This dataset contains both raw counts and normalized data matrices,\n\
+          as well as a SVD embedding and a HVG selection.\n\nThe format of this file\
+          \ is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalised expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors created by the normalisation method, if\
+              \ any."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_svd"
+            description: "The resulting SVD embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/common/pancreas/dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_mod2"
+      info:
+        label: "Common dataset"
+        summary: "A dataset processed by the common dataset processing pipeline."
+        description: "This dataset contains both raw counts and normalized data matrices,\n\
+          as well as a SVD embedding and a HVG selection.\n\nThe format of this file\
+          \ is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalised expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors created by the normalisation method, if\
+              \ any."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_svd"
+            description: "The resulting SVD embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/common/pancreas/dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_meta_mod1"
+      description: "Dataset metadata"
+      info: null
+      example:
+      - "dataset_metadata_mod1.yaml"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_meta_mod2"
+      description: "Dataset metadata"
+      info: null
+      example:
+      - "dataset_metadata_mod2.yaml"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "src/wf_utils/helper.nf"
+  description: "Fetch and process legacy OpenProblems v1 multimodal datasets\n"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "datasets/loaders/openproblems_v1_multimodal"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_v1_multimodal/config.vsh.yaml"
+    configInfo:
+      functionalityName: "openproblems_v1_multimodal"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_v1_multimodal/config.vsh.yaml"
+      functionalityNamespace: "datasets/loaders"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/loaders/openproblems_v1_multimodal/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_v1_multimodal"
+  - name: "datasets/normalization/log_cp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+    configInfo:
+      functionalityName: "log_cp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/log_cp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp"
+  - name: "datasets/normalization/log_scran_pooling"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml"
+    configInfo:
+      functionalityName: "log_scran_pooling"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/log_scran_pooling/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_scran_pooling"
+  - name: "datasets/normalization/sqrt_cp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml"
+    configInfo:
+      functionalityName: "sqrt_cp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/sqrt_cp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/sqrt_cp"
+  - name: "datasets/normalization/l1_sqrt"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml"
+    configInfo:
+      functionalityName: "l1_sqrt"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/l1_sqrt/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/l1_sqrt"
+  - name: "datasets/normalization/prot_clr"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/prot_clr/config.vsh.yaml"
+    configInfo:
+      functionalityName: "prot_clr"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/prot_clr/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/prot_clr/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/prot_clr"
+  - name: "datasets/normalization/atac_tfidf"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/atac_tfidf/config.vsh.yaml"
+    configInfo:
+      functionalityName: "atac_tfidf"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/atac_tfidf/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/atac_tfidf/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/atac_tfidf"
+  - name: "datasets/processors/subsample"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml"
+    configInfo:
+      functionalityName: "subsample"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/subsample/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/subsample"
+  - name: "datasets/processors/svd"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/svd/config.vsh.yaml"
+    configInfo:
+      functionalityName: "svd"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/svd/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/svd/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/svd"
+  - name: "datasets/processors/hvg"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml"
+    configInfo:
+      functionalityName: "hvg"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/hvg/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/hvg"
+  - name: "common/extract_metadata"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+    configInfo:
+      functionalityName: "extract_metadata"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/extract_metadata/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_openproblems_v1_multimodal/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_openproblems_v1_multimodal"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_openproblems_v1_multimodal/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/workflows/process_openproblems_v1_multimodal/helper.nf b/target/nextflow/datasets/workflows/process_openproblems_v1_multimodal/helper.nf
new file mode 100644
index 0000000000..7b3acd5b1c
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_openproblems_v1_multimodal/helper.nf
@@ -0,0 +1,14 @@
+Map findArgumentSchema(Map config, String argument_id) {
+  def argument_groups =
+    (config.functionality.argument_groups ?: []) +
+    [
+      arguments: config.functionality.arguments ?: []
+    ]
+
+  def schema_value = argument_groups.findResult{ gr ->
+    gr.arguments.find { arg ->
+      arg.name == ("--" + argument_id)
+    }
+  }
+  return schema_value
+}
diff --git a/target/nextflow/datasets/workflows/process_openproblems_v1_multimodal/main.nf b/target/nextflow/datasets/workflows/process_openproblems_v1_multimodal/main.nf
new file mode 100644
index 0000000000..9c0330e214
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_openproblems_v1_multimodal/main.nf
@@ -0,0 +1,4352 @@
+// process_openproblems_v1_multimodal 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_openproblems_v1_multimodal",
+    "namespace" : "datasets/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--id",
+            "description" : "Unique identifier of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--input_id",
+            "description" : "The ID of the dataset in OpenProblems v1",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--obs_cell_type",
+            "description" : "Location of where to find the observation cell types.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--obs_batch",
+            "description" : "Location of where to find the observation batch IDs.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--obs_tissue",
+            "description" : "Location of where to find the observation tissue information.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--layer_counts",
+            "description" : "In which layer to find the counts matrix. Leave undefined to use `.X`.",
+            "example" : [
+              "counts"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "boolean",
+            "name" : "--sparse",
+            "description" : "Convert layers to a sparse CSR format.",
+            "default" : [
+              true
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--var_feature_id",
+            "description" : "Location of where to find the feature IDs. Can be set to index if the feature IDs are the index.",
+            "example" : [
+              "gene_ids"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--var_feature_name",
+            "description" : "Location of where to find the feature names. Can be set to index if the feature names are the index.",
+            "default" : [
+              "index"
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--mod1",
+            "description" : "Name of the first modality.",
+            "example" : [
+              "GEX"
+            ],
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--mod2",
+            "description" : "Name of the second modality.",
+            "example" : [
+              "ADT"
+            ],
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Metadata",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--dataset_name",
+            "description" : "Nicely formatted name.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_url",
+            "description" : "Link to the original source of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_reference",
+            "description" : "Bibtex reference of the paper in which the dataset was published.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_summary",
+            "description" : "Short description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_description",
+            "description" : "Long description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_organism",
+            "description" : "The organism of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Sampling options",
+        "arguments" : [
+          {
+            "type" : "boolean",
+            "name" : "--do_subsample",
+            "description" : "Whether or not to subsample the dataset",
+            "default" : [
+              false
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--n_obs",
+            "description" : "Maximum number of observations to be kept. It might end up being less because empty cells / genes are removed.",
+            "default" : [
+              500
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--n_vars",
+            "description" : "Maximum number of variables to be kept. It might end up being less because empty cells / genes are removed.",
+            "default" : [
+              500
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--keep_features",
+            "description" : "A list of genes to keep.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--keep_cell_type_categories",
+            "description" : "Categories indexes to be selected",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--keep_batch_categories",
+            "description" : "Categories indexes to be selected",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "boolean_true",
+            "name" : "--even",
+            "description" : "Subsample evenly from different batches",
+            "direction" : "input",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--seed",
+            "description" : "A seed for the subsampling.",
+            "example" : [
+              123
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Normalization",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--normalization_methods",
+            "description" : "Which normalization methods to run.",
+            "default" : [
+              "log_cp10k",
+              "log_cpm",
+              "sqrt_cp10k",
+              "sqrt_cpm",
+              "l1_sqrt"
+            ],
+            "required" : false,
+            "choices" : [
+              "log_cp10k",
+              "log_cpm",
+              "sqrt_cp10k",
+              "sqrt_cpm",
+              "l1_sqrt",
+              "log_scran_pooling"
+            ],
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_mod1",
+            "info" : {
+              "label" : "Common dataset",
+              "summary" : "A dataset processed by the common dataset processing pipeline.",
+              "description" : "This dataset contains both raw counts and normalized data matrices,\nas well as a SVD embedding and a HVG selection.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalised expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors created by the normalisation method, if any.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_svd",
+                    "description" : "The resulting SVD embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_mod2",
+            "info" : {
+              "label" : "Common dataset",
+              "summary" : "A dataset processed by the common dataset processing pipeline.",
+              "description" : "This dataset contains both raw counts and normalized data matrices,\nas well as a SVD embedding and a HVG selection.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalised expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors created by the normalisation method, if any.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_svd",
+                    "description" : "The resulting SVD embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_meta_mod1",
+            "description" : "Dataset metadata",
+            "example" : [
+              "dataset_metadata_mod1.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_meta_mod2",
+            "description" : "Dataset metadata",
+            "example" : [
+              "dataset_metadata_mod2.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_openproblems_v1_multimodal/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "src/wf_utils/helper.nf",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "description" : "Fetch and process legacy OpenProblems v1 multimodal datasets\n",
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "datasets/loaders/openproblems_v1_multimodal",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_v1_multimodal/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "openproblems_v1_multimodal",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/openproblems_v1_multimodal/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/loaders",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/loaders/openproblems_v1_multimodal/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/openproblems_v1_multimodal"
+      },
+      {
+        "name" : "datasets/normalization/log_cp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "log_cp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/log_cp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp"
+      },
+      {
+        "name" : "datasets/normalization/log_scran_pooling",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "log_scran_pooling",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/log_scran_pooling/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_scran_pooling"
+      },
+      {
+        "name" : "datasets/normalization/sqrt_cp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "sqrt_cp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/sqrt_cp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/sqrt_cp"
+      },
+      {
+        "name" : "datasets/normalization/l1_sqrt",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "l1_sqrt",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/l1_sqrt/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/l1_sqrt"
+      },
+      {
+        "name" : "datasets/normalization/prot_clr",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/prot_clr/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "prot_clr",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/prot_clr/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/prot_clr/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/prot_clr"
+      },
+      {
+        "name" : "datasets/normalization/atac_tfidf",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/atac_tfidf/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "atac_tfidf",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/atac_tfidf/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/atac_tfidf/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/atac_tfidf"
+      },
+      {
+        "name" : "datasets/processors/subsample",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "subsample",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/subsample/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/subsample"
+      },
+      {
+        "name" : "datasets/processors/svd",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/svd/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "svd",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/svd/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/svd/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/svd"
+      },
+      {
+        "name" : "datasets/processors/hvg",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "hvg",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/hvg/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/hvg/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/hvg"
+      },
+      {
+        "name" : "common/extract_metadata",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "extract_metadata",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/extract_metadata/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_openproblems_v1_multimodal/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_openproblems_v1_multimodal",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { openproblems_v1_multimodal } from "${meta.resources_dir}/../../../../nextflow/datasets/loaders/openproblems_v1_multimodal/main.nf"
+include { log_cp } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/log_cp/main.nf"
+include { log_scran_pooling } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/log_scran_pooling/main.nf"
+include { sqrt_cp } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/sqrt_cp/main.nf"
+include { l1_sqrt } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/l1_sqrt/main.nf"
+include { prot_clr } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/prot_clr/main.nf"
+include { atac_tfidf } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/atac_tfidf/main.nf"
+include { subsample } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/subsample/main.nf"
+include { svd } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/svd/main.nf"
+include { hvg } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/hvg/main.nf"
+include { extract_metadata } from "${meta.resources_dir}/../../../../nextflow/common/extract_metadata/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // create different normalization methods by overriding the defaults
+  normalization_methods = [
+    log_cp.run(
+      key: "log_cp10k",
+      args: [normalization_id: "log_cp10k", n_cp: 10000]
+    ),
+    log_cp.run(
+      key: "log_cpm",
+      args: [normalization_id: "log_cpm", n_cp: 1000000]
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cp10k",
+      args: [normalization_id: "sqrt_cp10k", n_cp: 10000]
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cpm",
+      args: [normalization_id: "sqrt_cpm", n_cp: 1000000]
+    ),
+    l1_sqrt.run(
+      key: "l1_sqrt",
+      args: [normalization_id: "l1_sqrt"]
+    ),
+    log_scran_pooling.run(
+      key: "log_scran_pooling",
+      args: [normalization_id: "log_scran_pooling"]
+    )
+  ]
+
+  output_ch = input_ch
+
+    // store original id for later use
+    | map{ id, state ->
+      [id, state + [_meta: [join_id: id]]]
+    }
+
+    // fetch data from legacy openproblems
+    | openproblems_v1_multimodal.run(
+      fromState: [
+        "input_id": "input_id",
+        "obs_cell_type": "obs_cell_type",
+        "obs_batch": "obs_batch",
+        "obs_tissue": "obs_tissue",
+        "layer_counts": "layer_counts",
+        "sparse": "sparse",
+        "dataset_id": "id",
+        "dataset_name": "dataset_name",
+        "dataset_url": "dataset_url",
+        "dataset_reference": "dataset_reference",
+        "dataset_summary": "dataset_summary",
+        "dataset_description": "dataset_description",
+        "dataset_organism": "dataset_organism"
+      ],
+      toState: [
+        "raw_mod1": "output_mod1",
+        "raw_mod2": "output_mod2"
+      ]
+    )
+
+    // subsample if need be
+    | subsample.run(
+      runIf: { id, state -> state.do_subsample },
+      fromState: [
+        "input": "raw_mod1",
+        "input_mod2": "raw_mod2",
+        "n_obs": "n_obs",
+        "n_vars": "n_vars",
+        "keep_features": "keep_features",
+        "keep_cell_type_categories": "keep_cell_type_categories",
+        "keep_batch_categories": "keep_batch_categories",
+        "even": "even",
+        "seed": "seed"
+      ],
+      toState: [
+        "raw_mod1": "output",
+        "raw_mod2": "output_mod2"
+      ]
+    )
+
+    // run normalization methods
+    | runEach(
+      components: normalization_methods,
+      id: { id, state, comp ->
+        if (state.normalization_methods.size() > 1) {
+          id + "/" + comp.name
+        } else {
+          id
+        }
+      },
+      filter: { id, state, comp ->
+        comp.name in state.normalization_methods
+      },
+      fromState: ["input": "raw_mod1"],
+      toState: { id, output, state, comp -> 
+        state + [
+          "normalization_id": comp.name,
+          "normalized_mod1": output.output
+        ]
+      }
+    )
+
+    // run normalization methods on second modality
+    // TODO: can we change this to DSB?
+    | prot_clr.run(
+      runIf: { id, state -> state.mod2 == "ADT" },
+      args: [normalization_id: "prot_clr"],
+      fromState: ["input": "raw_mod2"],
+      toState: ["normalized_mod2": "output"]
+    )
+    | atac_tfidf.run(
+      runIf: { id, state -> state.mod2 == "ATAC" },
+      args: [normalization_id: "atac_tfidf"],
+      fromState: ["input": "raw_mod2"],
+      toState: ["normalized_mod2": "output"]
+    )
+
+    | svd.run(
+      fromState: [
+        "input": "normalized_mod1",
+        "input_mod2": "normalized_mod2"
+      ],
+      toState: [
+        "svd_mod1": "output",
+        "svd_mod2": "output_mod2"
+      ]
+    )
+
+    | hvg.run(
+      fromState: [ "input": "svd_mod1" ],
+      toState: [ "hvg_mod1": "output" ]
+    )
+
+    | hvg.run(
+      key: "hvg_mod2",
+      fromState: [ "input": "svd_mod2" ],
+      toState: [ "hvg_mod2": "output" ]
+    )
+
+    // add synonyms
+    | map{ id, state ->
+      [id, state + [
+        "output_mod1": state.hvg_mod1,
+        "output_mod2": state.hvg_mod2
+      ]]
+    }
+
+    | extract_metadata.run(
+      key: "extract_metadata_mod1",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_mod1")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_mod1,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta_mod1": "output"]
+    )
+
+    | extract_metadata.run(
+      key: "extract_metadata_mod2",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_mod2")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_mod2,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta_mod2": "output"]
+    )
+    
+    // only output the files for which an output file was specified
+    | setState([
+      "output_mod1",
+      "output_mod2",
+      "output_meta_mod1",
+      "output_meta_mod2",
+      "_meta"
+    ])
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/workflows/process_openproblems_v1_multimodal/nextflow.config b/target/nextflow/datasets/workflows/process_openproblems_v1_multimodal/nextflow.config
new file mode 100644
index 0000000000..7381fa105b
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_openproblems_v1_multimodal/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/workflows/process_openproblems_v1_multimodal'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Fetch and process legacy OpenProblems v1 multimodal datasets\n'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/workflows/process_tenx_visium/.config.vsh.yaml b/target/nextflow/datasets/workflows/process_tenx_visium/.config.vsh.yaml
new file mode 100644
index 0000000000..aee8d372de
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_tenx_visium/.config.vsh.yaml
@@ -0,0 +1,606 @@
+functionality:
+  name: "process_tenx_visium"
+  namespace: "datasets/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Input"
+    arguments:
+    - type: "string"
+      name: "--input_expression"
+      description: "URL to the feature / barcode matrix HDF5."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--input_spatial"
+      description: "URL to the Spatial imaging data."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_dataset"
+      description: "Output h5ad file"
+      info:
+        label: "Raw dataset"
+        summary: "An unprocessed dataset as output by a dataset loader."
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+      example:
+      - "resources_test/common/pancreas/raw.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_meta"
+      description: "Dataset metadata"
+      info: null
+      default:
+      - "dataset_metadata.yaml"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--id"
+      description: "Unique identifier of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Gene or spot filtering"
+    description: "Arguments related to filtering cells and genes by counts."
+    arguments:
+    - type: "integer"
+      name: "--spot_filter_min_genes"
+      description: "Remove spots with less than this number of genes."
+      info: null
+      example:
+      - 200
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--spot_filter_min_counts"
+      description: "Remove spots with less than this number of counts."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_spots"
+      description: "Remove genes expressed in less than this number of cells."
+      info: null
+      example:
+      - 50
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_counts"
+      description: "Remove genes with less than this number of counts."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean"
+      name: "--remove_mitochondrial"
+      description: "Remove mitovhondrial genes?"
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Sampling options"
+    arguments:
+    - type: "boolean"
+      name: "--do_subsample"
+      description: "Whether or not to subsample the dataset"
+      info: null
+      default:
+      - false
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--n_obs"
+      description: "Maximum number of observations to be kept. It might end up being\
+        \ less because empty cells / genes are removed."
+      info: null
+      default:
+      - 500
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--n_vars"
+      description: "Maximum number of variables to be kept. It might end up being\
+        \ less because empty cells / genes are removed."
+      info: null
+      default:
+      - 500
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--seed"
+      description: "A seed for the subsampling."
+      info: null
+      example:
+      - 123
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Normalization"
+    arguments:
+    - type: "string"
+      name: "--normalization_methods"
+      description: "Which normalization methods to run."
+      info: null
+      default:
+      - "log_cp10k"
+      - "log_cpm"
+      - "sqrt_cp10k"
+      - "sqrt_cpm"
+      - "l1_sqrt"
+      required: false
+      choices:
+      - "log_cp10k"
+      - "log_cpm"
+      - "sqrt_cp10k"
+      - "sqrt_cpm"
+      - "l1_sqrt"
+      - "log_scran_pooling"
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "src/wf_utils/helper.nf"
+  description: "Download and process datasets originating from 10x Genomics.\n"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "datasets/loaders/tenx_visium"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/tenx_visium/config.vsh.yaml"
+    configInfo:
+      functionalityName: "tenx_visium"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/tenx_visium/config.vsh.yaml"
+      functionalityNamespace: "datasets/loaders"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/loaders/tenx_visium/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/tenx_visium"
+  - name: "datasets/normalization/log_cp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+    configInfo:
+      functionalityName: "log_cp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/log_cp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp"
+  - name: "datasets/normalization/log_scran_pooling"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml"
+    configInfo:
+      functionalityName: "log_scran_pooling"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/log_scran_pooling/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_scran_pooling"
+  - name: "datasets/normalization/sqrt_cp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml"
+    configInfo:
+      functionalityName: "sqrt_cp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/sqrt_cp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/sqrt_cp"
+  - name: "datasets/normalization/l1_sqrt"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml"
+    configInfo:
+      functionalityName: "l1_sqrt"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/l1_sqrt/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/l1_sqrt"
+  - name: "datasets/processors/subsample"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml"
+    configInfo:
+      functionalityName: "subsample"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/subsample/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/subsample"
+  - name: "common/extract_metadata"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+    configInfo:
+      functionalityName: "extract_metadata"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/extract_metadata/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_tenx_visium/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_tenx_visium"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_tenx_visium/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/workflows/process_tenx_visium/helper.nf b/target/nextflow/datasets/workflows/process_tenx_visium/helper.nf
new file mode 100644
index 0000000000..7b3acd5b1c
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_tenx_visium/helper.nf
@@ -0,0 +1,14 @@
+Map findArgumentSchema(Map config, String argument_id) {
+  def argument_groups =
+    (config.functionality.argument_groups ?: []) +
+    [
+      arguments: config.functionality.arguments ?: []
+    ]
+
+  def schema_value = argument_groups.findResult{ gr ->
+    gr.arguments.find { arg ->
+      arg.name == ("--" + argument_id)
+    }
+  }
+  return schema_value
+}
diff --git a/target/nextflow/datasets/workflows/process_tenx_visium/main.nf b/target/nextflow/datasets/workflows/process_tenx_visium/main.nf
new file mode 100644
index 0000000000..bbf577f6eb
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_tenx_visium/main.nf
@@ -0,0 +1,3789 @@
+// process_tenx_visium 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_tenx_visium",
+    "namespace" : "datasets/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Input",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--input_expression",
+            "description" : "URL to the feature / barcode matrix HDF5.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--input_spatial",
+            "description" : "URL to the Spatial imaging data.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_dataset",
+            "description" : "Output h5ad file",
+            "info" : {
+              "label" : "Raw dataset",
+              "summary" : "An unprocessed dataset as output by a dataset loader.",
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/raw.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_meta",
+            "description" : "Dataset metadata",
+            "default" : [
+              "dataset_metadata.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Metadata",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--id",
+            "description" : "Unique identifier of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_name",
+            "description" : "Nicely formatted name.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_url",
+            "description" : "Link to the original source of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_reference",
+            "description" : "Bibtex reference of the paper in which the dataset was published.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_summary",
+            "description" : "Short description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_description",
+            "description" : "Long description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_organism",
+            "description" : "The organism of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Gene or spot filtering",
+        "description" : "Arguments related to filtering cells and genes by counts.",
+        "arguments" : [
+          {
+            "type" : "integer",
+            "name" : "--spot_filter_min_genes",
+            "description" : "Remove spots with less than this number of genes.",
+            "example" : [
+              200
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--spot_filter_min_counts",
+            "description" : "Remove spots with less than this number of counts.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--gene_filter_min_spots",
+            "description" : "Remove genes expressed in less than this number of cells.",
+            "example" : [
+              50
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--gene_filter_min_counts",
+            "description" : "Remove genes with less than this number of counts.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "boolean",
+            "name" : "--remove_mitochondrial",
+            "description" : "Remove mitovhondrial genes?",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Sampling options",
+        "arguments" : [
+          {
+            "type" : "boolean",
+            "name" : "--do_subsample",
+            "description" : "Whether or not to subsample the dataset",
+            "default" : [
+              false
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--n_obs",
+            "description" : "Maximum number of observations to be kept. It might end up being less because empty cells / genes are removed.",
+            "default" : [
+              500
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--n_vars",
+            "description" : "Maximum number of variables to be kept. It might end up being less because empty cells / genes are removed.",
+            "default" : [
+              500
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--seed",
+            "description" : "A seed for the subsampling.",
+            "example" : [
+              123
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Normalization",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--normalization_methods",
+            "description" : "Which normalization methods to run.",
+            "default" : [
+              "log_cp10k",
+              "log_cpm",
+              "sqrt_cp10k",
+              "sqrt_cpm",
+              "l1_sqrt"
+            ],
+            "required" : false,
+            "choices" : [
+              "log_cp10k",
+              "log_cpm",
+              "sqrt_cp10k",
+              "sqrt_cpm",
+              "l1_sqrt",
+              "log_scran_pooling"
+            ],
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_tenx_visium/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "src/wf_utils/helper.nf",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "description" : "Download and process datasets originating from 10x Genomics.\n",
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "datasets/loaders/tenx_visium",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/tenx_visium/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "tenx_visium",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/tenx_visium/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/loaders",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/loaders/tenx_visium/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/tenx_visium"
+      },
+      {
+        "name" : "datasets/normalization/log_cp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "log_cp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/log_cp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp"
+      },
+      {
+        "name" : "datasets/normalization/log_scran_pooling",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "log_scran_pooling",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/log_scran_pooling/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_scran_pooling"
+      },
+      {
+        "name" : "datasets/normalization/sqrt_cp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "sqrt_cp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/sqrt_cp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/sqrt_cp"
+      },
+      {
+        "name" : "datasets/normalization/l1_sqrt",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "l1_sqrt",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/l1_sqrt/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/l1_sqrt"
+      },
+      {
+        "name" : "datasets/processors/subsample",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "subsample",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/subsample/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/subsample"
+      },
+      {
+        "name" : "common/extract_metadata",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "extract_metadata",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/extract_metadata/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_tenx_visium/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_tenx_visium",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { tenx_visium } from "${meta.resources_dir}/../../../../nextflow/datasets/loaders/tenx_visium/main.nf"
+include { log_cp } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/log_cp/main.nf"
+include { log_scran_pooling } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/log_scran_pooling/main.nf"
+include { sqrt_cp } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/sqrt_cp/main.nf"
+include { l1_sqrt } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/l1_sqrt/main.nf"
+include { subsample } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/subsample/main.nf"
+include { extract_metadata } from "${meta.resources_dir}/../../../../nextflow/common/extract_metadata/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+    // create different normalization methods by overriding the defaults
+  normalization_methods = [
+    log_cp.run(
+      key: "log_cp10k",
+      args: [normalization_id: "log_cp10k", n_cp: 10000],
+    ),
+    log_cp.run(
+      key: "log_cpm",
+      args: [normalization_id: "log_cpm", n_cp: 1000000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cp10k",
+      args: [normalization_id: "sqrt_cp10k", n_cp: 10000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cpm",
+      args: [normalization_id: "sqrt_cpm", n_cp: 1000000],
+    ),
+    l1_sqrt.run(
+      key: "l1_sqrt",
+      args: [normalization_id: "l1_sqrt"],
+    ),
+    log_scran_pooling.run(
+      key: "log_scran_pooling",
+      args: [normalization_id: "log_scran_pooling"],
+    )
+  ]
+
+  output_ch = input_ch
+
+    // store original id for later use
+    | map{ id, state ->
+      [id, state + [_meta: [join_id: id]]]
+    }
+
+    // fetch data from legacy openproblems
+    | tenx_visium.run(
+      fromState: [
+        "input_expression": "input_expression",
+        "input_spatial": "input_spatial",
+        "dataset_id": "id",
+        "dataset_name": "dataset_name",
+        "dataset_url": "dataset_url",
+        "dataset_reference": "dataset_reference",
+        "dataset_summary": "dataset_summary",
+        "dataset_description": "dataset_description",
+        "dataset_organism": "dataset_organism",
+        "spot_filter_min_genes": "spot_filter_min_genes",
+        "gene_filter_min_spots": "gene_filter_min_spots",
+        "remove_mitochondrial": "remove_mitochondrial"
+      ],
+      toState: ["output_raw": "dataset"]
+    )
+    
+    // subsample if so desired
+    | subsample.run(
+      runIf: { id, state -> state.do_subsample },
+      fromState: [
+        "input": "output_raw",
+        "n_obs": "n_obs",
+        "n_vars": "n_vars",
+        "seed": "seed"
+      ],
+      args: [output_mod2: null],
+      toState: ["output_raw": "output"]
+    )
+
+    | runEach(
+      components: normalization_methods,
+      id: { id, state, comp ->
+        if (state.normalization_methods.size() > 1) {
+          id + "/" + comp.name
+        } else {
+          id
+        }
+      },
+      filter: { id, state, comp ->
+        comp.name in state.normalization_methods
+      },
+      fromState: ["input": "output_raw"],
+      toState: { id, output, state, comp ->
+        state + [
+          output_normalized: output.output,
+          normalization_id: comp.name
+        ]
+      }
+    )
+    
+    // add synonym
+    | map{ id, state ->
+      [id, state + [output_dataset: state.output_normalized]]
+    }
+
+    | extract_metadata.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_dataset")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_dataset,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta": "output"]
+    )
+
+    // only output the files for which an output file was specified
+    | setState([
+      "output_dataset",
+      "output_meta",
+      "_meta"
+    ])
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/workflows/process_tenx_visium/nextflow.config b/target/nextflow/datasets/workflows/process_tenx_visium/nextflow.config
new file mode 100644
index 0000000000..dafc8b73d6
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_tenx_visium/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/workflows/process_tenx_visium'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Download and process datasets originating from 10x Genomics.\n'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/workflows/process_zenodo_spatial/.config.vsh.yaml b/target/nextflow/datasets/workflows/process_zenodo_spatial/.config.vsh.yaml
new file mode 100644
index 0000000000..ec5f64af2a
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_zenodo_spatial/.config.vsh.yaml
@@ -0,0 +1,598 @@
+functionality:
+  name: "process_zenodo_spatial"
+  namespace: "datasets/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Input"
+    arguments:
+    - type: "string"
+      name: "--input_data"
+      description: "URL to the Anndata file."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_dataset"
+      description: "Output h5ad file"
+      info:
+        label: "Raw dataset"
+        summary: "An unprocessed dataset as output by a dataset loader."
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+      example:
+      - "resources_test/common/pancreas/raw.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_meta"
+      description: "Dataset metadata"
+      info: null
+      default:
+      - "dataset_metadata.yaml"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--id"
+      description: "Unique identifier of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Gene or spot filtering"
+    description: "Arguments related to filtering cells and genes by counts."
+    arguments:
+    - type: "integer"
+      name: "--spot_filter_min_genes"
+      description: "Remove spots with less than this number of genes."
+      info: null
+      example:
+      - 200
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--spot_filter_min_counts"
+      description: "Remove spots with less than this number of counts."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_spots"
+      description: "Remove genes expressed in less than this number of cells."
+      info: null
+      example:
+      - 50
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_counts"
+      description: "Remove genes with less than this number of counts."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean"
+      name: "--remove_mitochondrial"
+      description: "Remove mitovhondrial genes?"
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Sampling options"
+    arguments:
+    - type: "boolean"
+      name: "--do_subsample"
+      description: "Whether or not to subsample the dataset"
+      info: null
+      default:
+      - false
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--n_obs"
+      description: "Maximum number of observations to be kept. It might end up being\
+        \ less because empty cells / genes are removed."
+      info: null
+      default:
+      - 600
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--n_vars"
+      description: "Maximum number of variables to be kept. It might end up being\
+        \ less because empty cells / genes are removed."
+      info: null
+      default:
+      - 500
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--seed"
+      description: "A seed for the subsampling."
+      info: null
+      example:
+      - 123
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Normalization"
+    arguments:
+    - type: "string"
+      name: "--normalization_methods"
+      description: "Which normalization methods to run."
+      info: null
+      default:
+      - "log_cp10k"
+      - "log_cpm"
+      - "sqrt_cp10k"
+      - "sqrt_cpm"
+      - "l1_sqrt"
+      required: false
+      choices:
+      - "log_cp10k"
+      - "log_cpm"
+      - "sqrt_cp10k"
+      - "sqrt_cpm"
+      - "l1_sqrt"
+      - "log_scran_pooling"
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "src/wf_utils/helper.nf"
+  description: "Download and process DBiT seq, MERFISH, seqFISH, Slide-seq v2, STARmap,\
+    \ and Stereo-seq data from Zenodo.\n"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "datasets/loaders/zenodo_spatial"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/zenodo_spatial/config.vsh.yaml"
+    configInfo:
+      functionalityName: "zenodo_spatial"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/zenodo_spatial/config.vsh.yaml"
+      functionalityNamespace: "datasets/loaders"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/loaders/zenodo_spatial/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/zenodo_spatial"
+  - name: "datasets/normalization/log_cp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+    configInfo:
+      functionalityName: "log_cp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/log_cp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp"
+  - name: "datasets/normalization/log_scran_pooling"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml"
+    configInfo:
+      functionalityName: "log_scran_pooling"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/log_scran_pooling/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_scran_pooling"
+  - name: "datasets/normalization/sqrt_cp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml"
+    configInfo:
+      functionalityName: "sqrt_cp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/sqrt_cp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/sqrt_cp"
+  - name: "datasets/normalization/l1_sqrt"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml"
+    configInfo:
+      functionalityName: "l1_sqrt"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/l1_sqrt/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/l1_sqrt"
+  - name: "datasets/processors/subsample"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml"
+    configInfo:
+      functionalityName: "subsample"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/subsample/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/subsample"
+  - name: "common/extract_metadata"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+    configInfo:
+      functionalityName: "extract_metadata"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/extract_metadata/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_zenodo_spatial/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_zenodo_spatial"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_zenodo_spatial/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/workflows/process_zenodo_spatial/helper.nf b/target/nextflow/datasets/workflows/process_zenodo_spatial/helper.nf
new file mode 100644
index 0000000000..7b3acd5b1c
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_zenodo_spatial/helper.nf
@@ -0,0 +1,14 @@
+Map findArgumentSchema(Map config, String argument_id) {
+  def argument_groups =
+    (config.functionality.argument_groups ?: []) +
+    [
+      arguments: config.functionality.arguments ?: []
+    ]
+
+  def schema_value = argument_groups.findResult{ gr ->
+    gr.arguments.find { arg ->
+      arg.name == ("--" + argument_id)
+    }
+  }
+  return schema_value
+}
diff --git a/target/nextflow/datasets/workflows/process_zenodo_spatial/main.nf b/target/nextflow/datasets/workflows/process_zenodo_spatial/main.nf
new file mode 100644
index 0000000000..91a7e04667
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_zenodo_spatial/main.nf
@@ -0,0 +1,3778 @@
+// process_zenodo_spatial 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_zenodo_spatial",
+    "namespace" : "datasets/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Input",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--input_data",
+            "description" : "URL to the Anndata file.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_dataset",
+            "description" : "Output h5ad file",
+            "info" : {
+              "label" : "Raw dataset",
+              "summary" : "An unprocessed dataset as output by a dataset loader.",
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/raw.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_meta",
+            "description" : "Dataset metadata",
+            "default" : [
+              "dataset_metadata.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Metadata",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--id",
+            "description" : "Unique identifier of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_name",
+            "description" : "Nicely formatted name.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_url",
+            "description" : "Link to the original source of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_reference",
+            "description" : "Bibtex reference of the paper in which the dataset was published.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_summary",
+            "description" : "Short description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_description",
+            "description" : "Long description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_organism",
+            "description" : "The organism of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Gene or spot filtering",
+        "description" : "Arguments related to filtering cells and genes by counts.",
+        "arguments" : [
+          {
+            "type" : "integer",
+            "name" : "--spot_filter_min_genes",
+            "description" : "Remove spots with less than this number of genes.",
+            "example" : [
+              200
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--spot_filter_min_counts",
+            "description" : "Remove spots with less than this number of counts.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--gene_filter_min_spots",
+            "description" : "Remove genes expressed in less than this number of cells.",
+            "example" : [
+              50
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--gene_filter_min_counts",
+            "description" : "Remove genes with less than this number of counts.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "boolean",
+            "name" : "--remove_mitochondrial",
+            "description" : "Remove mitovhondrial genes?",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Sampling options",
+        "arguments" : [
+          {
+            "type" : "boolean",
+            "name" : "--do_subsample",
+            "description" : "Whether or not to subsample the dataset",
+            "default" : [
+              false
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--n_obs",
+            "description" : "Maximum number of observations to be kept. It might end up being less because empty cells / genes are removed.",
+            "default" : [
+              600
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--n_vars",
+            "description" : "Maximum number of variables to be kept. It might end up being less because empty cells / genes are removed.",
+            "default" : [
+              500
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--seed",
+            "description" : "A seed for the subsampling.",
+            "example" : [
+              123
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Normalization",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--normalization_methods",
+            "description" : "Which normalization methods to run.",
+            "default" : [
+              "log_cp10k",
+              "log_cpm",
+              "sqrt_cp10k",
+              "sqrt_cpm",
+              "l1_sqrt"
+            ],
+            "required" : false,
+            "choices" : [
+              "log_cp10k",
+              "log_cpm",
+              "sqrt_cp10k",
+              "sqrt_cpm",
+              "l1_sqrt",
+              "log_scran_pooling"
+            ],
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_zenodo_spatial/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "src/wf_utils/helper.nf",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "description" : "Download and process DBiT seq, MERFISH, seqFISH, Slide-seq v2, STARmap, and Stereo-seq data from Zenodo.\n",
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "datasets/loaders/zenodo_spatial",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/zenodo_spatial/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "zenodo_spatial",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/zenodo_spatial/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/loaders",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/loaders/zenodo_spatial/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/zenodo_spatial"
+      },
+      {
+        "name" : "datasets/normalization/log_cp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "log_cp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/log_cp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp"
+      },
+      {
+        "name" : "datasets/normalization/log_scran_pooling",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "log_scran_pooling",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/log_scran_pooling/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_scran_pooling"
+      },
+      {
+        "name" : "datasets/normalization/sqrt_cp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "sqrt_cp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/sqrt_cp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/sqrt_cp"
+      },
+      {
+        "name" : "datasets/normalization/l1_sqrt",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "l1_sqrt",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/l1_sqrt/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/l1_sqrt"
+      },
+      {
+        "name" : "datasets/processors/subsample",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "subsample",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/subsample/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/subsample"
+      },
+      {
+        "name" : "common/extract_metadata",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "extract_metadata",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/extract_metadata/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_zenodo_spatial/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_zenodo_spatial",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { zenodo_spatial } from "${meta.resources_dir}/../../../../nextflow/datasets/loaders/zenodo_spatial/main.nf"
+include { log_cp } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/log_cp/main.nf"
+include { log_scran_pooling } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/log_scran_pooling/main.nf"
+include { sqrt_cp } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/sqrt_cp/main.nf"
+include { l1_sqrt } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/l1_sqrt/main.nf"
+include { subsample } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/subsample/main.nf"
+include { extract_metadata } from "${meta.resources_dir}/../../../../nextflow/common/extract_metadata/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // create different normalization methods by overriding the defaults
+  normalization_methods = [
+    log_cp.run(
+      key: "log_cp10k",
+      args: [normalization_id: "log_cp10k", n_cp: 10000],
+    ),
+    log_cp.run(
+      key: "log_cpm",
+      args: [normalization_id: "log_cpm", n_cp: 1000000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cp10k",
+      args: [normalization_id: "sqrt_cp10k", n_cp: 10000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cpm",
+      args: [normalization_id: "sqrt_cpm", n_cp: 1000000],
+    ),
+    l1_sqrt.run(
+      key: "l1_sqrt",
+      args: [normalization_id: "l1_sqrt"],
+    ),
+    log_scran_pooling.run(
+      key: "log_scran_pooling",
+      args: [normalization_id: "log_scran_pooling"],
+    )
+  ]
+
+  output_ch = input_ch
+
+    // store original id for later use
+    | map{ id, state ->
+      [id, state + [_meta: [join_id: id]]]
+    }
+
+    // fetch data from legacy openproblems
+    | zenodo_spatial.run(
+      fromState: [
+        "input_data": "input_data",
+        "dataset_id": "id",
+        "dataset_name": "dataset_name",
+        "dataset_url": "dataset_url",
+        "dataset_reference": "dataset_reference",
+        "dataset_summary": "dataset_summary",
+        "dataset_description": "dataset_description",
+        "dataset_organism": "dataset_organism",
+        "spot_filter_min_genes": "spot_filter_min_genes",
+        "gene_filter_min_spots": "gene_filter_min_spots",
+        "remove_mitochondrial": "remove_mitochondrial"
+      ],
+      toState: ["output_raw": "dataset"]
+    )
+    
+    // subsample if so desired
+    | subsample.run(
+      runIf: { id, state -> state.do_subsample },
+      fromState: [
+        "input": "output_raw",
+        "n_obs": "n_obs",
+        "n_vars": "n_vars",
+        "seed": "seed"
+      ],
+      args: [output_mod2: null],
+      toState: ["output_raw": "output"]
+    )
+
+    | runEach(
+      components: normalization_methods,
+      id: { id, state, comp ->
+        if (state.normalization_methods.size() > 1) {
+          id + "/" + comp.name
+        } else {
+          id
+        }
+      },
+      filter: { id, state, comp ->
+        comp.name in state.normalization_methods
+      },
+      fromState: ["input": "output_raw"],
+      toState: { id, output, state, comp ->
+        state + [
+          output_normalized: output.output,
+          normalization_id: comp.name
+        ]
+      }
+    )
+    
+    // add synonym
+    | map{ id, state ->
+      [id, state + [output_dataset: state.output_normalized]]
+    }
+
+    | extract_metadata.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_dataset")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_dataset,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta": "output"]
+    )
+
+    // only output the files for which an output file was specified
+    | setState([
+      "output_dataset",
+      "output_meta",
+      "_meta"
+    ])
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/workflows/process_zenodo_spatial/nextflow.config b/target/nextflow/datasets/workflows/process_zenodo_spatial/nextflow.config
new file mode 100644
index 0000000000..864a0705a8
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_zenodo_spatial/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/workflows/process_zenodo_spatial'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Download and process DBiT seq, MERFISH, seqFISH, Slide-seq v2, STARmap, and Stereo-seq data from Zenodo.\n'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/datasets/workflows/process_zenodo_spatial_slidetags/.config.vsh.yaml b/target/nextflow/datasets/workflows/process_zenodo_spatial_slidetags/.config.vsh.yaml
new file mode 100644
index 0000000000..3ac58b7a37
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_zenodo_spatial_slidetags/.config.vsh.yaml
@@ -0,0 +1,597 @@
+functionality:
+  name: "process_zenodo_spatial_slidetags"
+  namespace: "datasets/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Input"
+    arguments:
+    - type: "string"
+      name: "--input_data"
+      description: "URL to the Anndata file."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_dataset"
+      description: "Output h5ad file"
+      info:
+        label: "Raw dataset"
+        summary: "An unprocessed dataset as output by a dataset loader."
+        description: "This dataset contains raw counts and metadata as output by a\
+          \ dataset loader.\n\nThe format of this file is derived from the [CELLxGENE\
+          \ schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "dataset_id"
+            description: "Identifier for the dataset from which the cell data is derived,\
+              \ useful for tracking and referencing purposes."
+            required: false
+          - type: "string"
+            name: "assay"
+            description: "Type of assay used to generate the cell data, indicating\
+              \ the methodology or technique employed."
+            required: false
+          - type: "string"
+            name: "assay_ontology_term_id"
+            description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+              \ the assay, providing a standardized reference to the assay type."
+            required: false
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: false
+          - type: "string"
+            name: "cell_type_ontology_term_id"
+            description: "Cell Ontology (`CL:`) term identifier for the cell type,\
+              \ offering a standardized reference to the specific cell classification."
+            required: false
+          - type: "string"
+            name: "development_stage"
+            description: "Stage of development of the organism or tissue from which\
+              \ the cell is derived, indicating its maturity or developmental phase."
+            required: false
+          - type: "string"
+            name: "development_stage_ontology_term_id"
+            description: "Ontology term identifier for the developmental stage, providing\
+              \ a standardized reference to the organism's developmental phase.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Developmental Stages (`HsapDv:`) ontology is used.\
+              \  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+              \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+              Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+            required: false
+          - type: "string"
+            name: "disease"
+            description: "Information on any disease or pathological condition associated\
+              \ with the cell or donor."
+            required: false
+          - type: "string"
+            name: "disease_ontology_term_id"
+            description: "Ontology term identifier for the disease, enabling standardized\
+              \ disease classification and referencing.\n\nMust be a term from the\
+              \ Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461`\
+              \ from the Phenotype And Trait Ontology (`PATO:`).\n"
+            required: false
+          - type: "string"
+            name: "donor_id"
+            description: "Identifier for the donor from whom the cell sample is obtained."
+            required: false
+          - type: "boolean"
+            name: "is_primary_data"
+            description: "Indicates whether the data is primary (directly obtained\
+              \ from experiments) or has been computationally derived from other primary\
+              \ data."
+            required: false
+          - type: "string"
+            name: "organism"
+            description: "Organism from which the cell sample is obtained."
+            required: false
+          - type: "string"
+            name: "organism_ontology_term_id"
+            description: "Ontology term identifier for the organism, providing a standardized\
+              \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+              \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity"
+            description: "Ethnicity of the donor as self-reported, relevant for studies\
+              \ considering genetic diversity and population-specific traits."
+            required: false
+          - type: "string"
+            name: "self_reported_ethnicity_ontology_term_id"
+            description: "Ontology term identifier for the self-reported ethnicity,\
+              \ providing a standardized reference for ethnic classifications.\n\n\
+              If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+              \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+            required: false
+          - type: "string"
+            name: "sex"
+            description: "Biological sex of the donor or source organism, crucial\
+              \ for studies involving sex-specific traits or conditions."
+            required: false
+          - type: "string"
+            name: "sex_ontology_term_id"
+            description: "Ontology term identifier for the biological sex, ensuring\
+              \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+              \ and `PATO:0001340` are allowed."
+            required: false
+          - type: "string"
+            name: "suspension_type"
+            description: "Type of suspension or medium in which the cells were stored\
+              \ or processed, important for understanding cell handling and conditions."
+            required: false
+          - type: "string"
+            name: "tissue"
+            description: "Specific tissue from which the cells were derived, key for\
+              \ context and specificity in cell studies."
+            required: false
+          - type: "string"
+            name: "tissue_ontology_term_id"
+            description: "Ontology term identifier for the tissue, providing a standardized\
+              \ reference for the tissue type.\n\nFor organoid or tissue samples,\
+              \ the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be\
+              \ a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures,\
+              \ the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+              \ `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "tissue_general"
+            description: "General category or classification of the tissue, useful\
+              \ for broader grouping and comparison of cell data."
+            required: false
+          - type: "string"
+            name: "tissue_general_ontology_term_id"
+            description: "Ontology term identifier for the general tissue category,\
+              \ aiding in standardizing and grouping tissue types.\n\nFor organoid\
+              \ or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used.\
+              \ The term ids must be a child term of `UBERON:0001062` (anatomical\
+              \ entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The\
+              \ term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+            required: false
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the cell."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          - type: "integer"
+            name: "soma_joinid"
+            description: "If the dataset was retrieved from CELLxGENE census, this\
+              \ is a unique identifier for the feature."
+            required: false
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset. This is different from\
+              \ the `obs.dataset_id` field, which is the identifier for the dataset\
+              \ from which the cell data is derived."
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "A human-readable name for the dataset."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+            multiple: true
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+            multiple: true
+      example:
+      - "resources_test/common/pancreas/raw.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_meta"
+      description: "Dataset metadata"
+      info: null
+      default:
+      - "dataset_metadata.yaml"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Metadata"
+    arguments:
+    - type: "string"
+      name: "--id"
+      description: "Unique identifier of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_name"
+      description: "Nicely formatted name."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_url"
+      description: "Link to the original source of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_reference"
+      description: "Bibtex reference of the paper in which the dataset was published."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_summary"
+      description: "Short description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_description"
+      description: "Long description of the dataset."
+      info: null
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--dataset_organism"
+      description: "The organism of the dataset."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Gene or spot filtering"
+    description: "Arguments related to filtering cells and genes by counts."
+    arguments:
+    - type: "integer"
+      name: "--spot_filter_min_genes"
+      description: "Remove spots with less than this number of genes."
+      info: null
+      example:
+      - 200
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--spot_filter_min_counts"
+      description: "Remove spots with less than this number of counts."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_spots"
+      description: "Remove genes expressed in less than this number of cells."
+      info: null
+      example:
+      - 50
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--gene_filter_min_counts"
+      description: "Remove genes with less than this number of counts."
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "boolean"
+      name: "--remove_mitochondrial"
+      description: "Remove mitovhondrial genes?"
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Sampling options"
+    arguments:
+    - type: "boolean"
+      name: "--do_subsample"
+      description: "Whether or not to subsample the dataset"
+      info: null
+      default:
+      - false
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--n_obs"
+      description: "Maximum number of observations to be kept. It might end up being\
+        \ less because empty cells / genes are removed."
+      info: null
+      default:
+      - 600
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--n_vars"
+      description: "Maximum number of variables to be kept. It might end up being\
+        \ less because empty cells / genes are removed."
+      info: null
+      default:
+      - 500
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--seed"
+      description: "A seed for the subsampling."
+      info: null
+      example:
+      - 123
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Normalization"
+    arguments:
+    - type: "string"
+      name: "--normalization_methods"
+      description: "Which normalization methods to run."
+      info: null
+      default:
+      - "log_cp10k"
+      - "log_cpm"
+      - "sqrt_cp10k"
+      - "sqrt_cpm"
+      - "l1_sqrt"
+      required: false
+      choices:
+      - "log_cp10k"
+      - "log_cpm"
+      - "sqrt_cp10k"
+      - "sqrt_cpm"
+      - "l1_sqrt"
+      - "log_scran_pooling"
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "src/wf_utils/helper.nf"
+  description: "Download and process slide tags datasets originating from Zenodo.\n"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "datasets/loaders/zenodo_spatial_slidetags"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/zenodo_spatial_slidetags/config.vsh.yaml"
+    configInfo:
+      functionalityName: "zenodo_spatial_slidetags"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/loaders/zenodo_spatial_slidetags/config.vsh.yaml"
+      functionalityNamespace: "datasets/loaders"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/loaders/zenodo_spatial_slidetags/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/zenodo_spatial_slidetags"
+  - name: "datasets/normalization/log_cp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+    configInfo:
+      functionalityName: "log_cp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/log_cp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp"
+  - name: "datasets/normalization/log_scran_pooling"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml"
+    configInfo:
+      functionalityName: "log_scran_pooling"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/log_scran_pooling/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_scran_pooling"
+  - name: "datasets/normalization/sqrt_cp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml"
+    configInfo:
+      functionalityName: "sqrt_cp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/sqrt_cp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/sqrt_cp"
+  - name: "datasets/normalization/l1_sqrt"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml"
+    configInfo:
+      functionalityName: "l1_sqrt"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/l1_sqrt/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/l1_sqrt"
+  - name: "datasets/processors/subsample"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml"
+    configInfo:
+      functionalityName: "subsample"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml"
+      functionalityNamespace: "datasets/processors"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/processors/subsample/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/subsample"
+  - name: "common/extract_metadata"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+    configInfo:
+      functionalityName: "extract_metadata"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/extract_metadata/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_zenodo_spatial_slidetags/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_zenodo_spatial_slidetags"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_zenodo_spatial_slidetags/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/datasets/workflows/process_zenodo_spatial_slidetags/helper.nf b/target/nextflow/datasets/workflows/process_zenodo_spatial_slidetags/helper.nf
new file mode 100644
index 0000000000..7b3acd5b1c
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_zenodo_spatial_slidetags/helper.nf
@@ -0,0 +1,14 @@
+Map findArgumentSchema(Map config, String argument_id) {
+  def argument_groups =
+    (config.functionality.argument_groups ?: []) +
+    [
+      arguments: config.functionality.arguments ?: []
+    ]
+
+  def schema_value = argument_groups.findResult{ gr ->
+    gr.arguments.find { arg ->
+      arg.name == ("--" + argument_id)
+    }
+  }
+  return schema_value
+}
diff --git a/target/nextflow/datasets/workflows/process_zenodo_spatial_slidetags/main.nf b/target/nextflow/datasets/workflows/process_zenodo_spatial_slidetags/main.nf
new file mode 100644
index 0000000000..31aa44ca8e
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_zenodo_spatial_slidetags/main.nf
@@ -0,0 +1,3778 @@
+// process_zenodo_spatial_slidetags 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_zenodo_spatial_slidetags",
+    "namespace" : "datasets/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Input",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--input_data",
+            "description" : "URL to the Anndata file.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_dataset",
+            "description" : "Output h5ad file",
+            "info" : {
+              "label" : "Raw dataset",
+              "summary" : "An unprocessed dataset as output by a dataset loader.",
+              "description" : "This dataset contains raw counts and metadata as output by a dataset loader.\n\nThe format of this file is derived from the [CELLxGENE schema v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay",
+                    "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "assay_ontology_term_id",
+                    "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_ontology_term_id",
+                    "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage",
+                    "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "development_stage_ontology_term_id",
+                    "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease",
+                    "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "disease_ontology_term_id",
+                    "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "donor_id",
+                    "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "is_primary_data",
+                    "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism",
+                    "description" : "Organism from which the cell sample is obtained.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "organism_ontology_term_id",
+                    "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity",
+                    "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "self_reported_ethnicity_ontology_term_id",
+                    "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex",
+                    "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "sex_ontology_term_id",
+                    "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "suspension_type",
+                    "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue",
+                    "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_ontology_term_id",
+                    "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general",
+                    "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "tissue_general_ontology_term_id",
+                    "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "integer",
+                    "name" : "soma_joinid",
+                    "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "A human-readable name for the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false,
+                    "multiple" : true
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false,
+                    "multiple" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/raw.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_meta",
+            "description" : "Dataset metadata",
+            "default" : [
+              "dataset_metadata.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Metadata",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--id",
+            "description" : "Unique identifier of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_name",
+            "description" : "Nicely formatted name.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_url",
+            "description" : "Link to the original source of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_reference",
+            "description" : "Bibtex reference of the paper in which the dataset was published.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_summary",
+            "description" : "Short description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_description",
+            "description" : "Long description of the dataset.",
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--dataset_organism",
+            "description" : "The organism of the dataset.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Gene or spot filtering",
+        "description" : "Arguments related to filtering cells and genes by counts.",
+        "arguments" : [
+          {
+            "type" : "integer",
+            "name" : "--spot_filter_min_genes",
+            "description" : "Remove spots with less than this number of genes.",
+            "example" : [
+              200
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--spot_filter_min_counts",
+            "description" : "Remove spots with less than this number of counts.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--gene_filter_min_spots",
+            "description" : "Remove genes expressed in less than this number of cells.",
+            "example" : [
+              50
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--gene_filter_min_counts",
+            "description" : "Remove genes with less than this number of counts.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "boolean",
+            "name" : "--remove_mitochondrial",
+            "description" : "Remove mitovhondrial genes?",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Sampling options",
+        "arguments" : [
+          {
+            "type" : "boolean",
+            "name" : "--do_subsample",
+            "description" : "Whether or not to subsample the dataset",
+            "default" : [
+              false
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--n_obs",
+            "description" : "Maximum number of observations to be kept. It might end up being less because empty cells / genes are removed.",
+            "default" : [
+              600
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--n_vars",
+            "description" : "Maximum number of variables to be kept. It might end up being less because empty cells / genes are removed.",
+            "default" : [
+              500
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--seed",
+            "description" : "A seed for the subsampling.",
+            "example" : [
+              123
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Normalization",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--normalization_methods",
+            "description" : "Which normalization methods to run.",
+            "default" : [
+              "log_cp10k",
+              "log_cpm",
+              "sqrt_cp10k",
+              "sqrt_cpm",
+              "l1_sqrt"
+            ],
+            "required" : false,
+            "choices" : [
+              "log_cp10k",
+              "log_cpm",
+              "sqrt_cp10k",
+              "sqrt_cpm",
+              "l1_sqrt",
+              "log_scran_pooling"
+            ],
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_zenodo_spatial_slidetags/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "src/wf_utils/helper.nf",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "description" : "Download and process slide tags datasets originating from Zenodo.\n",
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "datasets/loaders/zenodo_spatial_slidetags",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/zenodo_spatial_slidetags/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "zenodo_spatial_slidetags",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/loaders/zenodo_spatial_slidetags/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/loaders",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/loaders/zenodo_spatial_slidetags/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/loaders/zenodo_spatial_slidetags"
+      },
+      {
+        "name" : "datasets/normalization/log_cp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "log_cp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/log_cp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp"
+      },
+      {
+        "name" : "datasets/normalization/log_scran_pooling",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "log_scran_pooling",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_scran_pooling/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/log_scran_pooling/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_scran_pooling"
+      },
+      {
+        "name" : "datasets/normalization/sqrt_cp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "sqrt_cp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/sqrt_cp/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/sqrt_cp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/sqrt_cp"
+      },
+      {
+        "name" : "datasets/normalization/l1_sqrt",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "l1_sqrt",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/l1_sqrt/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/l1_sqrt/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/l1_sqrt"
+      },
+      {
+        "name" : "datasets/processors/subsample",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "subsample",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/processors/subsample/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/processors",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/processors/subsample/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/processors/subsample"
+      },
+      {
+        "name" : "common/extract_metadata",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "extract_metadata",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/extract_metadata/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/datasets/workflows/process_zenodo_spatial_slidetags/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/workflows/process_zenodo_spatial_slidetags",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { zenodo_spatial_slidetags } from "${meta.resources_dir}/../../../../nextflow/datasets/loaders/zenodo_spatial_slidetags/main.nf"
+include { log_cp } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/log_cp/main.nf"
+include { log_scran_pooling } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/log_scran_pooling/main.nf"
+include { sqrt_cp } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/sqrt_cp/main.nf"
+include { l1_sqrt } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/l1_sqrt/main.nf"
+include { subsample } from "${meta.resources_dir}/../../../../nextflow/datasets/processors/subsample/main.nf"
+include { extract_metadata } from "${meta.resources_dir}/../../../../nextflow/common/extract_metadata/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // create different normalization methods by overriding the defaults
+  normalization_methods = [
+    log_cp.run(
+      key: "log_cp10k",
+      args: [normalization_id: "log_cp10k", n_cp: 10000],
+    ),
+    log_cp.run(
+      key: "log_cpm",
+      args: [normalization_id: "log_cpm", n_cp: 1000000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cp10k",
+      args: [normalization_id: "sqrt_cp10k", n_cp: 10000],
+    ),
+    sqrt_cp.run(
+      key: "sqrt_cpm",
+      args: [normalization_id: "sqrt_cpm", n_cp: 1000000],
+    ),
+    l1_sqrt.run(
+      key: "l1_sqrt",
+      args: [normalization_id: "l1_sqrt"],
+    ),
+    log_scran_pooling.run(
+      key: "log_scran_pooling",
+      args: [normalization_id: "log_scran_pooling"],
+    )
+  ]
+
+  output_ch = input_ch
+
+    // store original id for later use
+    | map{ id, state ->
+      [id, state + [_meta: [join_id: id]]]
+    }
+
+    // fetch data from legacy openproblems
+    | zenodo_spatial_slidetags.run(
+      fromState: [
+        "input_data": "input_data",
+        "dataset_id": "id",
+        "dataset_name": "dataset_name",
+        "dataset_url": "dataset_url",
+        "dataset_reference": "dataset_reference",
+        "dataset_summary": "dataset_summary",
+        "dataset_description": "dataset_description",
+        "dataset_organism": "dataset_organism",
+        "spot_filter_min_genes": "spot_filter_min_genes",
+        "gene_filter_min_spots": "gene_filter_min_spots",
+        "remove_mitochondrial": "remove_mitochondrial"
+      ],
+      toState: ["output_raw": "dataset"]
+    )
+    
+    // subsample if so desired
+    | subsample.run(
+      runIf: { id, state -> state.do_subsample },
+      fromState: [
+        "input": "output_raw",
+        "n_obs": "n_obs",
+        "n_vars": "n_vars",
+        "seed": "seed"
+      ],
+      args: [output_mod2: null],
+      toState: ["output_raw": "output"]
+    )
+
+    | runEach(
+      components: normalization_methods,
+      id: { id, state, comp ->
+        if (state.normalization_methods.size() > 1) {
+          id + "/" + comp.name
+        } else {
+          id
+        }
+      },
+      filter: { id, state, comp ->
+        comp.name in state.normalization_methods
+      },
+      fromState: ["input": "output_raw"],
+      toState: { id, output, state, comp ->
+        state + [
+          output_normalized: output.output,
+          normalization_id: comp.name
+        ]
+      }
+    )
+    
+    // add synonym
+    | map{ id, state ->
+      [id, state + [output_dataset: state.output_normalized]]
+    }
+
+    | extract_metadata.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "output_dataset")
+        // workaround: convert GString to String
+        schema = iterateMap(schema, { it instanceof GString ? it.toString() : it })
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.output_dataset,
+          "schema": schemaYaml
+        ]
+      },
+      toState: ["output_meta": "output"]
+    )
+
+    // only output the files for which an output file was specified
+    | setState([
+      "output_dataset",
+      "output_meta",
+      "_meta"
+    ])
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/datasets/workflows/process_zenodo_spatial_slidetags/nextflow.config b/target/nextflow/datasets/workflows/process_zenodo_spatial_slidetags/nextflow.config
new file mode 100644
index 0000000000..c8fe036e98
--- /dev/null
+++ b/target/nextflow/datasets/workflows/process_zenodo_spatial_slidetags/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'datasets/workflows/process_zenodo_spatial_slidetags'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Download and process slide tags datasets originating from Zenodo.\n'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/denoising/control_methods/no_denoising/.config.vsh.yaml b/target/nextflow/denoising/control_methods/no_denoising/.config.vsh.yaml
new file mode 100644
index 0000000000..7976d8572c
--- /dev/null
+++ b/target/nextflow/denoising/control_methods/no_denoising/.config.vsh.yaml
@@ -0,0 +1,199 @@
+functionality:
+  name: "no_denoising"
+  namespace: "denoising/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The subset of molecules used for the training dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The subset of molecules used for the test dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "train_sum"
+          type: "integer"
+          description: "The total number of counts in the training dataset."
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Denoised data"
+      summary: "A denoised dataset as output by a denoising method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "denoised"
+          description: "denoised data"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/denoised.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/denoising/pancreas"
+    dest: "resources_test/denoising/pancreas"
+  info:
+    label: "No Denoising"
+    summary: "negative control by copying train counts"
+    description: "This method serves as a negative control, where the denoised data\
+      \ is a copy of the unaltered training data. This represents the scoring threshold\
+      \ if denoising was not performed on the data."
+    v1:
+      path: "openproblems/tasks/denoising/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    variants:
+      no_denoising: null
+    preferred_normalization: "counts"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "These components have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/control_methods/no_denoising/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/control_methods/no_denoising"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/control_methods/no_denoising/no_denoising"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/denoising/control_methods/no_denoising/main.nf b/target/nextflow/denoising/control_methods/no_denoising/main.nf
new file mode 100644
index 0000000000..ff8887ddda
--- /dev/null
+++ b/target/nextflow/denoising/control_methods/no_denoising/main.nf
@@ -0,0 +1,3576 @@
+// no_denoising 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "no_denoising",
+    "namespace" : "denoising/control_methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train",
+        "info" : {
+          "label" : "Training data",
+          "summary" : "The subset of molecules used for the training dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/train.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The subset of molecules used for the test dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "train_sum",
+                "type" : "integer",
+                "description" : "The total number of counts in the training dataset.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/test.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Denoised data",
+          "summary" : "A denoised dataset as output by a denoising method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "denoised",
+                "description" : "denoised data",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/denoised.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/denoising/control_methods/no_denoising/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/denoising/pancreas",
+        "dest" : "resources_test/denoising/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "No Denoising",
+      "summary" : "negative control by copying train counts",
+      "description" : "This method serves as a negative control, where the denoised data is a copy of the unaltered training data. This represents the scoring threshold if denoising was not performed on the data.",
+      "v1" : {
+        "path" : "openproblems/tasks/denoising/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "counts",
+      "type" : "control_method",
+      "type_info" : {
+        "label" : "Control method",
+        "summary" : "Quality control methods for verifying the pipeline.",
+        "description" : "These components have the same interface as the regular methods\nbut also receive the solution object as input. It serves as a\nstarting point to test the relative accuracy of new methods in\nthe task, and also as a quality control for the metrics defined\nin the task. \n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/control_methods/no_denoising/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/control_methods/no_denoising",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+
+print("Process data", flush=True)
+input_train.layers["denoised"] = input_train.layers['counts']
+
+input_train.uns["method_id"] = meta['functionality_name']
+
+print("Write Data", flush=True)
+input_train.write_h5ad(par['output'],compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/denoising/control_methods/no_denoising",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/denoising/control_methods/no_denoising/nextflow.config b/target/nextflow/denoising/control_methods/no_denoising/nextflow.config
new file mode 100644
index 0000000000..da126927f7
--- /dev/null
+++ b/target/nextflow/denoising/control_methods/no_denoising/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'denoising/control_methods/no_denoising'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/denoising/control_methods/perfect_denoising/.config.vsh.yaml b/target/nextflow/denoising/control_methods/perfect_denoising/.config.vsh.yaml
new file mode 100644
index 0000000000..3ded6b17ad
--- /dev/null
+++ b/target/nextflow/denoising/control_methods/perfect_denoising/.config.vsh.yaml
@@ -0,0 +1,199 @@
+functionality:
+  name: "perfect_denoising"
+  namespace: "denoising/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The subset of molecules used for the training dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The subset of molecules used for the test dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "train_sum"
+          type: "integer"
+          description: "The total number of counts in the training dataset."
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Denoised data"
+      summary: "A denoised dataset as output by a denoising method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "denoised"
+          description: "denoised data"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/denoised.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/denoising/pancreas"
+    dest: "resources_test/denoising/pancreas"
+  info:
+    label: "Perfect Denoising"
+    summary: "Positive control by copying the test counts"
+    description: "This method serves as a positive control, where the test data is\
+      \ copied 1-to-1 to the denoised data. This makes it seem as if the data is perfectly\
+      \ denoised as it will be compared to the test data in the metrics."
+    v1:
+      path: "openproblems/tasks/denoising/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    variants:
+      perfect_denoising: null
+    preferred_normalization: "counts"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "These components have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/control_methods/perfect_denoising/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/control_methods/perfect_denoising"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/control_methods/perfect_denoising/perfect_denoising"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/denoising/control_methods/perfect_denoising/main.nf b/target/nextflow/denoising/control_methods/perfect_denoising/main.nf
new file mode 100644
index 0000000000..4f8046a5c8
--- /dev/null
+++ b/target/nextflow/denoising/control_methods/perfect_denoising/main.nf
@@ -0,0 +1,3577 @@
+// perfect_denoising 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "perfect_denoising",
+    "namespace" : "denoising/control_methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train",
+        "info" : {
+          "label" : "Training data",
+          "summary" : "The subset of molecules used for the training dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/train.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The subset of molecules used for the test dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "train_sum",
+                "type" : "integer",
+                "description" : "The total number of counts in the training dataset.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/test.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Denoised data",
+          "summary" : "A denoised dataset as output by a denoising method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "denoised",
+                "description" : "denoised data",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/denoised.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/denoising/control_methods/perfect_denoising/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/denoising/pancreas",
+        "dest" : "resources_test/denoising/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Perfect Denoising",
+      "summary" : "Positive control by copying the test counts",
+      "description" : "This method serves as a positive control, where the test data is copied 1-to-1 to the denoised data. This makes it seem as if the data is perfectly denoised as it will be compared to the test data in the metrics.",
+      "v1" : {
+        "path" : "openproblems/tasks/denoising/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "counts",
+      "type" : "control_method",
+      "type_info" : {
+        "label" : "Control method",
+        "summary" : "Quality control methods for verifying the pipeline.",
+        "description" : "These components have the same interface as the regular methods\nbut also receive the solution object as input. It serves as a\nstarting point to test the relative accuracy of new methods in\nthe task, and also as a quality control for the metrics defined\nin the task. \n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/control_methods/perfect_denoising/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/control_methods/perfect_denoising",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Process data", flush=True)
+input_train.layers["denoised"] = input_test.layers['counts']
+
+input_train.uns["method_id"] = meta['functionality_name']
+
+print("Write Data", flush=True)
+input_train.write_h5ad(par['output'],compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/denoising/control_methods/perfect_denoising",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/denoising/control_methods/perfect_denoising/nextflow.config b/target/nextflow/denoising/control_methods/perfect_denoising/nextflow.config
new file mode 100644
index 0000000000..c9ec260864
--- /dev/null
+++ b/target/nextflow/denoising/control_methods/perfect_denoising/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'denoising/control_methods/perfect_denoising'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/denoising/methods/alra/.config.vsh.yaml b/target/nextflow/denoising/methods/alra/.config.vsh.yaml
new file mode 100644
index 0000000000..f94d1652a6
--- /dev/null
+++ b/target/nextflow/denoising/methods/alra/.config.vsh.yaml
@@ -0,0 +1,176 @@
+functionality:
+  name: "alra"
+  namespace: "denoising/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The subset of molecules used for the training dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Denoised data"
+      summary: "A denoised dataset as output by a denoising method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "denoised"
+          description: "denoised data"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/denoised.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--norm"
+    description: "Normalization method"
+    info: null
+    default:
+    - "log"
+    required: false
+    choices:
+    - "sqrt"
+    - "log"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/denoising/pancreas"
+    dest: "resources_test/denoising/pancreas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "ALRA"
+    summary: "ALRA imputes missing values in scRNA-seq data by computing rank-k approximation,\
+      \ thresholding by gene, and rescaling the matrix."
+    description: "Adaptively-thresholded Low Rank Approximation (ALRA). \n\nALRA is\
+      \ a method for imputation of missing values in single cell RNA-sequencing data,\
+      \ \ndescribed in the preprint, \"Zero-preserving imputation of scRNA-seq data\
+      \ using low-rank approximation\" \navailable [here](https://www.biorxiv.org/content/early/2018/08/22/397588).\
+      \ Given a \nscRNA-seq expression matrix, ALRA first computes its rank-k approximation\
+      \ using randomized SVD. \nNext, each row (gene) is thresholded by the magnitude\
+      \ of the most negative value of that gene. \nFinally, the matrix is rescaled.\n"
+    reference: "linderman2018zero"
+    repository_url: "https://github.com/KlugerLab/ALRA"
+    documentation_url: "https://github.com/KlugerLab/ALRA/blob/master/README.md"
+    v1:
+      path: "openproblems/tasks/denoising/methods/alra.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    variants:
+      alra: null
+    preferred_normalization: "counts"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A denoising method."
+      description: "A denoising method to remove noise (i.e. technical artifacts)\
+        \ from a dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "Matrix"
+    - "rsvd"
+    github:
+    - "KlugerLab/ALRA"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/alra/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/alra"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/alra/alra"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/denoising/methods/alra/main.nf b/target/nextflow/denoising/methods/alra/main.nf
new file mode 100644
index 0000000000..3d1c0624ed
--- /dev/null
+++ b/target/nextflow/denoising/methods/alra/main.nf
@@ -0,0 +1,3574 @@
+// alra 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "alra",
+    "namespace" : "denoising/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train",
+        "info" : {
+          "label" : "Training data",
+          "summary" : "The subset of molecules used for the training dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/train.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Denoised data",
+          "summary" : "A denoised dataset as output by a denoising method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "denoised",
+                "description" : "denoised data",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/denoised.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--norm",
+        "description" : "Normalization method",
+        "default" : [
+          "log"
+        ],
+        "required" : false,
+        "choices" : [
+          "sqrt",
+          "log"
+        ],
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/alra/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/denoising/pancreas",
+        "dest" : "resources_test/denoising/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "ALRA",
+      "summary" : "ALRA imputes missing values in scRNA-seq data by computing rank-k approximation, thresholding by gene, and rescaling the matrix.",
+      "description" : "Adaptively-thresholded Low Rank Approximation (ALRA). \n\nALRA is a method for imputation of missing values in single cell RNA-sequencing data, \ndescribed in the preprint, \\"Zero-preserving imputation of scRNA-seq data using low-rank approximation\\" \navailable [here](https://www.biorxiv.org/content/early/2018/08/22/397588). Given a \nscRNA-seq expression matrix, ALRA first computes its rank-k approximation using randomized SVD. \nNext, each row (gene) is thresholded by the magnitude of the most negative value of that gene. \nFinally, the matrix is rescaled.\n",
+      "reference" : "linderman2018zero",
+      "repository_url" : "https://github.com/KlugerLab/ALRA",
+      "documentation_url" : "https://github.com/KlugerLab/ALRA/blob/master/README.md",
+      "v1" : {
+        "path" : "openproblems/tasks/denoising/methods/alra.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "counts",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A denoising method.",
+        "description" : "A denoising method to remove noise (i.e. technical artifacts) from a dataset.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "cran" : [
+            "Matrix",
+            "rsvd"
+          ],
+          "github" : [
+            "KlugerLab/ALRA"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/alra/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/alra",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+cat(">> Loading dependencies\\\\n")
+library(anndata, warn.conflicts = FALSE)
+library(ALRA, warn.conflicts = FALSE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_train" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "norm" = $( if [ ! -z ${VIASH_PAR_NORM+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_NORM" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat(">> Load input data\\\\n")
+input_train <- read_h5ad(par\\$input_train, backed = "r")
+
+cat(">> Set normalization method\\\\n")
+if (par\\$norm == "sqrt") {
+  norm_fn <- sqrt
+  denorm_fn <- function(x) x^2
+} else if (par\\$norm == "log") {
+  norm_fn <- log1p
+  denorm_fn <- expm1
+} else {
+  stop("Unknown normalization method: ", par\\$norm)
+}
+
+cat(">> Normalize data\\\\n")
+data <- as.matrix(input_train\\$layers[["counts"]])
+totalPerCell <- rowSums(data)
+data <- sweep(data, 1, totalPerCell, "/")
+data <- norm_fn(data)
+
+cat(">> Run ALRA\\\\n")
+data <- alra(data)\\$A_norm_rank_k_cor_sc
+data <- denorm_fn(data)
+data <- sweep(data, 1, totalPerCell, "*")
+
+cat(">> Store output\\\\n")
+output <- AnnData(
+  layers = list(denoised = data),
+  obs = input_train\\$obs[, c(), drop = FALSE],
+  var = input_train\\$var[, c(), drop = FALSE],
+  uns = list(
+    dataset_id = input_train\\$uns[["dataset_id"]],
+    method_id = meta\\$functionality_name
+  )
+)
+
+cat(">> Write output to file\\\\n")
+output\\$write_h5ad(par\\$output, compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/denoising/methods/alra",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/denoising/methods/alra/nextflow.config b/target/nextflow/denoising/methods/alra/nextflow.config
new file mode 100644
index 0000000000..9116bf4f15
--- /dev/null
+++ b/target/nextflow/denoising/methods/alra/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'denoising/methods/alra'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/denoising/methods/dca/.config.vsh.yaml b/target/nextflow/denoising/methods/dca/.config.vsh.yaml
new file mode 100644
index 0000000000..d9add95976
--- /dev/null
+++ b/target/nextflow/denoising/methods/dca/.config.vsh.yaml
@@ -0,0 +1,177 @@
+functionality:
+  name: "dca"
+  namespace: "denoising/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The subset of molecules used for the training dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Denoised data"
+      summary: "A denoised dataset as output by a denoising method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "denoised"
+          description: "denoised data"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/denoised.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--epochs"
+    description: "Number of total epochs in training"
+    info: null
+    default:
+    - 300
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/denoising/pancreas"
+    dest: "resources_test/denoising/pancreas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "DCA"
+    summary: "A deep autoencoder with ZINB loss function to address the dropout effect\
+      \ in count data"
+    description: "\"Deep Count Autoencoder\n\nRemoves the dropout effect by taking\
+      \ the count structure, overdispersed nature and sparsity of the data into account\
+      \ \nusing a deep autoencoder with zero-inflated negative binomial (ZINB) loss\
+      \ function.\"\n"
+    reference: "eraslan2019single"
+    documentation_url: "https://github.com/theislab/dca#readme"
+    repository_url: "https://github.com/theislab/dca"
+    v1:
+      path: "openproblems/tasks/denoising/methods/dca.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    variants:
+      dca: null
+    preferred_normalization: "counts"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A denoising method."
+      description: "A denoising method to remove noise (i.e. technical artifacts)\
+        \ from a dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.9"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "procps"
+    interactive: false
+  - type: "python"
+    user: false
+    packages:
+    - "anndata~=0.8.0"
+    - "scanpy"
+    - "pyyaml"
+    - "requests"
+    - "jsonschema"
+    - "git+https://github.com/scottgigante-immunai/dca.git@patch-1"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/dca/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/dca"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/dca/dca"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/denoising/methods/dca/main.nf b/target/nextflow/denoising/methods/dca/main.nf
new file mode 100644
index 0000000000..3eca00851a
--- /dev/null
+++ b/target/nextflow/denoising/methods/dca/main.nf
@@ -0,0 +1,3557 @@
+// dca 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "dca",
+    "namespace" : "denoising/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train",
+        "info" : {
+          "label" : "Training data",
+          "summary" : "The subset of molecules used for the training dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/train.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Denoised data",
+          "summary" : "A denoised dataset as output by a denoising method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "denoised",
+                "description" : "denoised data",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/denoised.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--epochs",
+        "description" : "Number of total epochs in training",
+        "default" : [
+          300
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/dca/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/denoising/pancreas",
+        "dest" : "resources_test/denoising/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "DCA",
+      "summary" : "A deep autoencoder with ZINB loss function to address the dropout effect in count data",
+      "description" : "\\"Deep Count Autoencoder\n\nRemoves the dropout effect by taking the count structure, overdispersed nature and sparsity of the data into account \nusing a deep autoencoder with zero-inflated negative binomial (ZINB) loss function.\\"\n",
+      "reference" : "eraslan2019single",
+      "documentation_url" : "https://github.com/theislab/dca#readme",
+      "repository_url" : "https://github.com/theislab/dca",
+      "v1" : {
+        "path" : "openproblems/tasks/denoising/methods/dca.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "counts",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A denoising method.",
+        "description" : "A denoising method to remove noise (i.e. technical artifacts) from a dataset.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "python:3.9",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "apt",
+          "packages" : [
+            "procps"
+          ],
+          "interactive" : false
+        },
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "anndata~=0.8.0",
+            "scanpy",
+            "pyyaml",
+            "requests",
+            "jsonschema",
+            "git+https://github.com/scottgigante-immunai/dca.git@patch-1"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/dca/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/dca",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+from dca.api import dca
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'epochs': $( if [ ! -z ${VIASH_PAR_EPOCHS+x} ]; then echo "int(r'${VIASH_PAR_EPOCHS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'], backed="r")
+
+print("Remove unneeded data", flush=True)
+output = ad.AnnData(
+    X=input_train.layers["counts"],
+    obs=input_train.obs[[]],
+    var=input_train.var[[]],
+    uns={
+        "dataset_id": input_train.uns["dataset_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+del input_train
+
+print("Run DCA", flush=True)
+dca(output, epochs=par["epochs"])
+
+print("Move output to correct location", flush=True)
+output.layers["denoised"] = output.X
+del output.X
+
+print("Writing data", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/denoising/methods/dca",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/denoising/methods/dca/nextflow.config b/target/nextflow/denoising/methods/dca/nextflow.config
new file mode 100644
index 0000000000..3c11dac1b8
--- /dev/null
+++ b/target/nextflow/denoising/methods/dca/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'denoising/methods/dca'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/denoising/methods/knn_smoothing/.config.vsh.yaml b/target/nextflow/denoising/methods/knn_smoothing/.config.vsh.yaml
new file mode 100644
index 0000000000..3b3269d6fc
--- /dev/null
+++ b/target/nextflow/denoising/methods/knn_smoothing/.config.vsh.yaml
@@ -0,0 +1,166 @@
+functionality:
+  name: "knn_smoothing"
+  namespace: "denoising/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The subset of molecules used for the training dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Denoised data"
+      summary: "A denoised dataset as output by a denoising method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "denoised"
+          description: "denoised data"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/denoised.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/denoising/pancreas"
+    dest: "resources_test/denoising/pancreas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "KNN Smoothing"
+    summary: "Iterative kNN-smoothing denoises scRNA-seq data by iteratively increasing\
+      \ the size of neighbourhoods for smoothing until a maximum k value is reached."
+    description: "Iterative kNN-smoothing is a method to repair or denoise noisy scRNA-seq\
+      \ expression matrices. Given a scRNA-seq expression matrix, KNN-smoothing first\
+      \ applies initial normalisation and smoothing. Then, a chosen number of principal\
+      \ components is used to calculate Euclidean distances between cells. Minimally\
+      \ sized neighbourhoods are initially determined from these Euclidean distances,\
+      \ and expression profiles are shared between neighbouring cells. Then, the resultant\
+      \ smoothed matrix is used as input to the next step of smoothing, where the\
+      \ size (k) of the considered neighbourhoods is increased, leading to greater\
+      \ smoothing. This process continues until a chosen maximum k value has been\
+      \ reached, at which point the iteratively smoothed object is then optionally\
+      \ scaled to yield a final result."
+    reference: "wagner2018knearest"
+    documentation_url: "https://github.com/yanailab/knn-smoothing#readme"
+    repository_url: "https://github.com/yanailab/knn-smoothing"
+    v1:
+      path: "openproblems/tasks/denoising/methods/knn_smoothing.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    variants:
+      knn_smoothing: null
+    preferred_normalization: "counts"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A denoising method."
+      description: "A denoising method to remove noise (i.e. technical artifacts)\
+        \ from a dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scipy"
+    github:
+    - "scottgigante-immunai/knn-smoothing@python_package"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/knn_smoothing/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/knn_smoothing"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/knn_smoothing/knn_smoothing"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/denoising/methods/knn_smoothing/main.nf b/target/nextflow/denoising/methods/knn_smoothing/main.nf
new file mode 100644
index 0000000000..735af34ec8
--- /dev/null
+++ b/target/nextflow/denoising/methods/knn_smoothing/main.nf
@@ -0,0 +1,3535 @@
+// knn_smoothing 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "knn_smoothing",
+    "namespace" : "denoising/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train",
+        "info" : {
+          "label" : "Training data",
+          "summary" : "The subset of molecules used for the training dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/train.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Denoised data",
+          "summary" : "A denoised dataset as output by a denoising method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "denoised",
+                "description" : "denoised data",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/denoised.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/knn_smoothing/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/denoising/pancreas",
+        "dest" : "resources_test/denoising/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "KNN Smoothing",
+      "summary" : "Iterative kNN-smoothing denoises scRNA-seq data by iteratively increasing the size of neighbourhoods for smoothing until a maximum k value is reached.",
+      "description" : "Iterative kNN-smoothing is a method to repair or denoise noisy scRNA-seq expression matrices. Given a scRNA-seq expression matrix, KNN-smoothing first applies initial normalisation and smoothing. Then, a chosen number of principal components is used to calculate Euclidean distances between cells. Minimally sized neighbourhoods are initially determined from these Euclidean distances, and expression profiles are shared between neighbouring cells. Then, the resultant smoothed matrix is used as input to the next step of smoothing, where the size (k) of the considered neighbourhoods is increased, leading to greater smoothing. This process continues until a chosen maximum k value has been reached, at which point the iteratively smoothed object is then optionally scaled to yield a final result.",
+      "reference" : "wagner2018knearest",
+      "documentation_url" : "https://github.com/yanailab/knn-smoothing#readme",
+      "repository_url" : "https://github.com/yanailab/knn-smoothing",
+      "v1" : {
+        "path" : "openproblems/tasks/denoising/methods/knn_smoothing.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "counts",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A denoising method.",
+        "description" : "A denoising method to remove noise (i.e. technical artifacts) from a dataset.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scipy"
+          ],
+          "github" : [
+            "scottgigante-immunai/knn-smoothing@python_package"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/knn_smoothing/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/knn_smoothing",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import knn_smooth
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par["input_train"], backed="r")
+
+print("Remove unneeded data", flush=True)
+X = input_train.layers["counts"].astype(float).transpose().toarray()
+
+# Create output AnnData for later use
+output = ad.AnnData(
+    obs=input_train.obs[[]],
+    var=input_train.var[[]],
+    uns={
+        "dataset_id": input_train.uns["dataset_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+del input_train
+
+print("Run KNN smoothing", flush=True)
+X = knn_smooth.knn_smoothing(X, k=10).transpose()
+
+print("Process data", flush=True)
+output.layers["denoised"] = X
+
+print("Writing data", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/denoising/methods/knn_smoothing",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/denoising/methods/knn_smoothing/nextflow.config b/target/nextflow/denoising/methods/knn_smoothing/nextflow.config
new file mode 100644
index 0000000000..66ded9f5b3
--- /dev/null
+++ b/target/nextflow/denoising/methods/knn_smoothing/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'denoising/methods/knn_smoothing'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/denoising/methods/magic/.config.vsh.yaml b/target/nextflow/denoising/methods/magic/.config.vsh.yaml
new file mode 100644
index 0000000000..4a5d4edcdc
--- /dev/null
+++ b/target/nextflow/denoising/methods/magic/.config.vsh.yaml
@@ -0,0 +1,222 @@
+functionality:
+  name: "magic"
+  namespace: "denoising/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The subset of molecules used for the training dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Denoised data"
+      summary: "A denoised dataset as output by a denoising method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "denoised"
+          description: "denoised data"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/denoised.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--solver"
+    description: "Which solver to use."
+    info: null
+    default:
+    - "exact"
+    required: false
+    choices:
+    - "exact"
+    - "approximate"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--norm"
+    description: "Normalization method"
+    info: null
+    default:
+    - "log"
+    required: false
+    choices:
+    - "sqrt"
+    - "log"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--decay"
+    description: "sets decay rate of kernel tails"
+    info: null
+    default:
+    - 1
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--t"
+    description: "power to which the diffusion operator is powered"
+    info: null
+    default:
+    - 3
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/denoising/pancreas"
+    dest: "resources_test/denoising/pancreas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "MAGIC"
+    summary: "MAGIC imputes and denoises scRNA-seq data that is noisy or dropout-prone."
+    description: "MAGIC (Markov Affinity-based Graph Imputation of Cells) is a method\
+      \ for imputation and denoising of noisy or dropout-prone single cell RNA-sequencing\
+      \ data. Given a normalised scRNA-seq expression matrix, it first calculates\
+      \ Euclidean distances between each pair of cells in the dataset, which is then\
+      \ augmented using a Gaussian kernel (function) and row-normalised to give a\
+      \ normalised affinity matrix. A t-step markov process is then calculated, by\
+      \ powering this affinity matrix t times. Finally, the powered affinity matrix\
+      \ is right-multiplied by the normalised data, causing the final imputed values\
+      \ to take the value of a per-gene average weighted by the affinities of cells.\
+      \ The resultant imputed matrix is then rescaled, to more closely match the magnitude\
+      \ of measurements in the normalised (input) matrix."
+    reference: "van2018recovering"
+    documentation_url: "https://github.com/KrishnaswamyLab/MAGIC#readme"
+    repository_url: "https://github.com/KrishnaswamyLab/MAGIC"
+    v1:
+      path: "openproblems/tasks/denoising/methods/magic.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    variants:
+      magic: null
+      magic_approx:
+        solver: "approximate"
+      magic_knn_naive:
+        norm: "log"
+        decay: "none"
+        t: 1
+    preferred_normalization: "counts"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A denoising method."
+      description: "A denoising method to remove noise (i.e. technical artifacts)\
+        \ from a dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pip:
+    - "scprep"
+    - "magic-impute"
+    - "scipy"
+    - "scikit-learn<1.2"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/magic/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/magic"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/magic/magic"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/denoising/methods/magic/main.nf b/target/nextflow/denoising/methods/magic/main.nf
new file mode 100644
index 0000000000..3c5ab41969
--- /dev/null
+++ b/target/nextflow/denoising/methods/magic/main.nf
@@ -0,0 +1,3641 @@
+// magic 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "magic",
+    "namespace" : "denoising/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train",
+        "info" : {
+          "label" : "Training data",
+          "summary" : "The subset of molecules used for the training dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/train.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Denoised data",
+          "summary" : "A denoised dataset as output by a denoising method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "denoised",
+                "description" : "denoised data",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/denoised.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--solver",
+        "description" : "Which solver to use.",
+        "default" : [
+          "exact"
+        ],
+        "required" : false,
+        "choices" : [
+          "exact",
+          "approximate"
+        ],
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--norm",
+        "description" : "Normalization method",
+        "default" : [
+          "log"
+        ],
+        "required" : false,
+        "choices" : [
+          "sqrt",
+          "log"
+        ],
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--decay",
+        "description" : "sets decay rate of kernel tails",
+        "default" : [
+          1
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--t",
+        "description" : "power to which the diffusion operator is powered",
+        "default" : [
+          3
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/magic/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/denoising/pancreas",
+        "dest" : "resources_test/denoising/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "MAGIC",
+      "summary" : "MAGIC imputes and denoises scRNA-seq data that is noisy or dropout-prone.",
+      "description" : "MAGIC (Markov Affinity-based Graph Imputation of Cells) is a method for imputation and denoising of noisy or dropout-prone single cell RNA-sequencing data. Given a normalised scRNA-seq expression matrix, it first calculates Euclidean distances between each pair of cells in the dataset, which is then augmented using a Gaussian kernel (function) and row-normalised to give a normalised affinity matrix. A t-step markov process is then calculated, by powering this affinity matrix t times. Finally, the powered affinity matrix is right-multiplied by the normalised data, causing the final imputed values to take the value of a per-gene average weighted by the affinities of cells. The resultant imputed matrix is then rescaled, to more closely match the magnitude of measurements in the normalised (input) matrix.",
+      "reference" : "van2018recovering",
+      "documentation_url" : "https://github.com/KrishnaswamyLab/MAGIC#readme",
+      "repository_url" : "https://github.com/KrishnaswamyLab/MAGIC",
+      "v1" : {
+        "path" : "openproblems/tasks/denoising/methods/magic.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "variants" : {
+        "magic_approx" : {
+          "solver" : "approximate"
+        },
+        "magic_knn_naive" : {
+          "norm" : "log",
+          "decay" : "none",
+          "t" : 1
+        }
+      },
+      "preferred_normalization" : "counts",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A denoising method.",
+        "description" : "A denoising method to remove noise (i.e. technical artifacts) from a dataset.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pip" : [
+            "scprep",
+            "magic-impute",
+            "scipy",
+            "scikit-learn<1.2"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/magic/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/magic",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import numpy as np
+import scprep
+from magic import MAGIC
+import scipy
+
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'solver': $( if [ ! -z ${VIASH_PAR_SOLVER+x} ]; then echo "r'${VIASH_PAR_SOLVER//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'norm': $( if [ ! -z ${VIASH_PAR_NORM+x} ]; then echo "r'${VIASH_PAR_NORM//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'decay': $( if [ ! -z ${VIASH_PAR_DECAY+x} ]; then echo "int(r'${VIASH_PAR_DECAY//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  't': $( if [ ! -z ${VIASH_PAR_T+x} ]; then echo "int(r'${VIASH_PAR_T//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+input_train = ad.read_h5ad(par["input_train"], backed="r")
+
+print("Set normalization method", flush=True)
+if par["norm"] == "sqrt":
+    norm_fn = np.sqrt
+    denorm_fn = np.square
+elif par["norm"] == "log":
+    norm_fn = np.log1p
+    denorm_fn = np.expm1
+else:
+    raise ValueError("Unknown normalization method: " + par["norm"] + ".")
+
+print("Remove unneeded data", flush=True)
+X = input_train.layers["counts"]
+
+# Create output AnnData for later use
+output = ad.AnnData(
+    obs=input_train.obs[[]],
+    var=input_train.var[[]],
+    uns={
+        "dataset_id": input_train.uns["dataset_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+del input_train
+
+print("Normalize data", flush=True)
+X, libsize = scprep.normalize.library_size_normalize(
+    X,
+    rescale=1,
+    return_library_size=True
+)
+X = scprep.utils.matrix_transform(X, norm_fn)
+
+print("Run MAGIC", flush=True)
+magic = MAGIC(
+    solver=par["solver"],
+    decay=par["decay"],
+    t=par["t"],
+    verbose=False,
+)
+X = magic.fit_transform(X, genes="all_genes")
+
+print("Denormalizing data", flush=True)
+X = scprep.utils.matrix_transform(X, denorm_fn)
+X = scprep.utils.matrix_vector_elementwise_multiply(X, libsize, axis=0)
+
+print("Create output AnnData", flush=True)
+output.layers["denoised"] = X
+
+print("Write Data", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/denoising/methods/magic",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/denoising/methods/magic/nextflow.config b/target/nextflow/denoising/methods/magic/nextflow.config
new file mode 100644
index 0000000000..f5497c55e1
--- /dev/null
+++ b/target/nextflow/denoising/methods/magic/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'denoising/methods/magic'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/denoising/metrics/mse/.config.vsh.yaml b/target/nextflow/denoising/metrics/mse/.config.vsh.yaml
new file mode 100644
index 0000000000..95ebf37792
--- /dev/null
+++ b/target/nextflow/denoising/metrics/mse/.config.vsh.yaml
@@ -0,0 +1,214 @@
+functionality:
+  name: "mse"
+  namespace: "denoising/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The subset of molecules used for the test dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "train_sum"
+          type: "integer"
+          description: "The total number of counts in the training dataset."
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_denoised"
+    info:
+      label: "Denoised data"
+      summary: "A denoised dataset as output by a denoising method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "denoised"
+          description: "denoised data"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/denoised.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Metric score file"
+    info:
+      label: "Score"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+    example:
+    - "resources_test/denoising/pancreas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/denoising/pancreas"
+    dest: "resources_test/denoising/pancreas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "mse"
+      label: "Mean-squared error"
+      summary: "The mean squared error between the denoised counts and the true counts."
+      description: "The mean squared error between the denoised counts of the training\
+        \ dataset and the true counts of the test dataset after reweighing by the\
+        \ train/test ratio"
+      reference: "batson2019molecular"
+      v1:
+        path: "openproblems/tasks/denoising/metrics/mse.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      maximize: false
+      min: 0
+      max: "+.inf"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A denoising metric."
+      description: "A metric for evaluating denoised datasets.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    - "scprep"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/metrics/mse/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/metrics/mse"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/metrics/mse/mse"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/denoising/metrics/mse/main.nf b/target/nextflow/denoising/metrics/mse/main.nf
new file mode 100644
index 0000000000..bef5101be0
--- /dev/null
+++ b/target/nextflow/denoising/metrics/mse/main.nf
@@ -0,0 +1,3635 @@
+// mse 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "mse",
+    "namespace" : "denoising/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_test",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The subset of molecules used for the test dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "train_sum",
+                "type" : "integer",
+                "description" : "The total number of counts in the training dataset.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/test.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_denoised",
+        "info" : {
+          "label" : "Denoised data",
+          "summary" : "A denoised dataset as output by a denoising method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "denoised",
+                "description" : "denoised data",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/denoised.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "description" : "Metric score file",
+        "info" : {
+          "label" : "Score",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset"
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method"
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/denoising/metrics/mse/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_metric_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/denoising/pancreas",
+        "dest" : "resources_test/denoising/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "mse",
+          "label" : "Mean-squared error",
+          "summary" : "The mean squared error between the denoised counts and the true counts.",
+          "description" : "The mean squared error between the denoised counts of the training dataset and the true counts of the test dataset after reweighing by the train/test ratio",
+          "reference" : "batson2019molecular",
+          "v1" : {
+            "path" : "openproblems/tasks/denoising/metrics/mse.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          },
+          "maximize" : false,
+          "min" : 0,
+          "max" : "+.inf"
+        }
+      ],
+      "type" : "metric",
+      "type_info" : {
+        "label" : "Metric",
+        "summary" : "A denoising metric.",
+        "description" : "A metric for evaluating denoised datasets.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scikit-learn",
+            "scprep"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/metrics/mse/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/metrics/mse",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import scanpy as sc
+import sklearn.metrics
+import scprep
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_denoised': $( if [ ! -z ${VIASH_PAR_INPUT_DENOISED+x} ]; then echo "r'${VIASH_PAR_INPUT_DENOISED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+input_denoised = ad.read_h5ad(par['input_denoised'], backed="r")
+input_test = ad.read_h5ad(par['input_test'], backed="r")
+
+test_data = ad.AnnData(X=input_test.layers["counts"], dtype="float")
+denoised_data = ad.AnnData(X=input_denoised.layers["denoised"], dtype="float")
+
+print("Normalize data", flush=True)
+
+# scaling and transformation
+target_sum = 10000
+
+sc.pp.normalize_total(test_data, target_sum)
+sc.pp.log1p(test_data)
+
+sc.pp.normalize_total(denoised_data, target_sum)
+sc.pp.log1p(denoised_data)
+
+print("Compute mse value", flush=True)
+error = sklearn.metrics.mean_squared_error(
+    scprep.utils.toarray(test_data.X), scprep.utils.toarray(denoised_data.X)
+)
+
+print("Store mse value", flush=True)
+output = ad.AnnData(
+    uns={ key: val for key, val in input_test.uns.items() },
+)
+
+output.uns["method_id"] = input_denoised.uns["method_id"]
+output.uns["metric_ids"] = meta['functionality_name']
+output.uns["metric_values"] = error
+
+print("Write adata to file", flush=True)
+output.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/denoising/metrics/mse",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/denoising/metrics/mse/nextflow.config b/target/nextflow/denoising/metrics/mse/nextflow.config
new file mode 100644
index 0000000000..e0a8c2ae40
--- /dev/null
+++ b/target/nextflow/denoising/metrics/mse/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'denoising/metrics/mse'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/denoising/metrics/poisson/.config.vsh.yaml b/target/nextflow/denoising/metrics/poisson/.config.vsh.yaml
new file mode 100644
index 0000000000..35b77225c6
--- /dev/null
+++ b/target/nextflow/denoising/metrics/poisson/.config.vsh.yaml
@@ -0,0 +1,213 @@
+functionality:
+  name: "poisson"
+  namespace: "denoising/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The subset of molecules used for the test dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "train_sum"
+          type: "integer"
+          description: "The total number of counts in the training dataset."
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_denoised"
+    info:
+      label: "Denoised data"
+      summary: "A denoised dataset as output by a denoising method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "denoised"
+          description: "denoised data"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/denoised.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    description: "Metric score file"
+    info:
+      label: "Score"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+    example:
+    - "resources_test/denoising/pancreas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/denoising/pancreas"
+    dest: "resources_test/denoising/pancreas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "poisson"
+      label: "Poisson Loss"
+      summary: "The Poisson log likelihood of the true counts observed in the distribution\
+        \ of denoised counts"
+      description: "The Poisson log likelihood of observing the true counts of the\
+        \ test dataset given the distribution given in the denoised dataset."
+      reference: "batson2019molecular"
+      v1:
+        path: "openproblems/tasks/denoising/metrics/poisson.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      maximize: false
+      min: 0
+      max: "+.inf"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A denoising metric."
+      description: "A metric for evaluating denoised datasets.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pip:
+    - "scprep"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/metrics/poisson/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/metrics/poisson"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/metrics/poisson/poisson"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/denoising/metrics/poisson/main.nf b/target/nextflow/denoising/metrics/poisson/main.nf
new file mode 100644
index 0000000000..273a5b7594
--- /dev/null
+++ b/target/nextflow/denoising/metrics/poisson/main.nf
@@ -0,0 +1,3630 @@
+// poisson 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "poisson",
+    "namespace" : "denoising/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_test",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The subset of molecules used for the test dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "train_sum",
+                "type" : "integer",
+                "description" : "The total number of counts in the training dataset.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/test.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_denoised",
+        "info" : {
+          "label" : "Denoised data",
+          "summary" : "A denoised dataset as output by a denoising method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "denoised",
+                "description" : "denoised data",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/denoised.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "description" : "Metric score file",
+        "info" : {
+          "label" : "Score",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset"
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method"
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/denoising/metrics/poisson/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_metric_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/denoising/pancreas",
+        "dest" : "resources_test/denoising/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "poisson",
+          "label" : "Poisson Loss",
+          "summary" : "The Poisson log likelihood of the true counts observed in the distribution of denoised counts",
+          "description" : "The Poisson log likelihood of observing the true counts of the test dataset given the distribution given in the denoised dataset.",
+          "reference" : "batson2019molecular",
+          "v1" : {
+            "path" : "openproblems/tasks/denoising/metrics/poisson.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          },
+          "maximize" : false,
+          "min" : 0,
+          "max" : "+.inf"
+        }
+      ],
+      "type" : "metric",
+      "type_info" : {
+        "label" : "Metric",
+        "summary" : "A denoising metric.",
+        "description" : "A metric for evaluating denoised datasets.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pip" : [
+            "scprep"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/metrics/poisson/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/metrics/poisson",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import scprep
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_denoised': $( if [ ! -z ${VIASH_PAR_INPUT_DENOISED+x} ]; then echo "r'${VIASH_PAR_INPUT_DENOISED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load Data", flush=True)
+input_denoised = ad.read_h5ad(par['input_denoised'], backed="r")
+input_test = ad.read_h5ad(par['input_test'], backed="r")
+
+test_data = scprep.utils.toarray(input_test.layers["counts"])
+denoised_data = scprep.utils.toarray(input_denoised.layers["denoised"])
+
+print("Compute metric value", flush=True)
+# scaling
+initial_sum = input_test.uns["train_sum"]
+target_sum = test_data.sum()
+denoised_data = denoised_data * target_sum / initial_sum
+
+# from molecular_cross_validation.mcv_sweep import poisson_nll_loss
+# copied from: https://github.com/czbiohub/molecular-cross-validation/blob/master/src/molecular_cross_validation/mcv_sweep.py
+def poisson_nll_loss(y_pred: np.ndarray, y_true: np.ndarray) -> float:
+    return (y_pred - y_true * np.log(y_pred + 1e-6)).mean()
+
+error = poisson_nll_loss(test_data, denoised_data)
+
+print("Store poisson value", flush=True)
+output = ad.AnnData(
+    uns={ key: val for key, val in input_test.uns.items() },
+)
+
+output.uns["method_id"] = input_denoised.uns["method_id"]
+output.uns["metric_ids"] = meta['functionality_name']
+output.uns["metric_values"] = error
+
+print("Write adata to file", flush=True)
+output.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/denoising/metrics/poisson",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/denoising/metrics/poisson/nextflow.config b/target/nextflow/denoising/metrics/poisson/nextflow.config
new file mode 100644
index 0000000000..0447236f8c
--- /dev/null
+++ b/target/nextflow/denoising/metrics/poisson/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'denoising/metrics/poisson'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/denoising/process_dataset/.config.vsh.yaml b/target/nextflow/denoising/process_dataset/.config.vsh.yaml
new file mode 100644
index 0000000000..70daca54a6
--- /dev/null
+++ b/target/nextflow/denoising/process_dataset/.config.vsh.yaml
@@ -0,0 +1,459 @@
+functionality:
+  name: "process_dataset"
+  namespace: "denoising"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Common dataset"
+      summary: "A dataset processed by the common dataset processing pipeline."
+      description: "This dataset contains both raw counts and normalized data matrices,\n\
+        as well as a PCA embedding, HVG selection and a kNN graph."
+      slots:
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "double"
+          name: "pca_variance"
+          description: "The PCA variance objects."
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        varm:
+        - type: "double"
+          name: "pca_loadings"
+          description: "The PCA loadings matrix."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+    example:
+    - "resources_test/common/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_train"
+    info:
+      label: "Training data"
+      summary: "The subset of molecules used for the training dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_test"
+    info:
+      label: "Test data"
+      summary: "The subset of molecules used for the test dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "train_sum"
+          type: "integer"
+          description: "The total number of counts in the training dataset."
+          required: true
+    example:
+    - "resources_test/denoising/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--method"
+    description: "The process method to assign train/test."
+    info: null
+    default:
+    - "mcv"
+    required: false
+    choices:
+    - "mcv"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--train_frac"
+    description: "The fraction the molecules need to be split to train dataset"
+    info: null
+    default:
+    - 0.9
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--seed"
+    description: "A seed for the subsampling."
+    info: null
+    example:
+    - 123
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "helper.py"
+  description: "Split data using molecular cross-validation.\n\nSplits molecules into\
+    \ two (potentially overlapping) groups using a fraction ratio.\nThese are output\
+    \ as two separate AnnData objects.\n"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  info:
+    type: "process_dataset"
+    type_info:
+      label: "Data processor"
+      summary: "A denoising dataset processor."
+      description: "A component for processing a Common Dataset into a task-specific\
+        \ dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    - "scipy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/process_dataset/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/process_dataset"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/process_dataset/process_dataset"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/denoising/process_dataset/helper.py b/target/nextflow/denoising/process_dataset/helper.py
new file mode 100644
index 0000000000..2044ed4c6e
--- /dev/null
+++ b/target/nextflow/denoising/process_dataset/helper.py
@@ -0,0 +1,55 @@
+# MIT License
+
+# Copyright (c) 2019 Chan Zuckerberg Biohub
+
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+# Copied from https://github.com/czbiohub/molecular-cross-validation/blob/master/src/molecular_cross_validation/util.py
+
+
+from typing import Tuple
+
+import numpy as np
+
+def split_molecules(
+    umis: np.ndarray,
+    data_split: float,
+    overlap_factor: float = 0.0,
+    random_state: np.random.RandomState = None,
+) -> Tuple[np.ndarray, np.ndarray]:
+    """Splits molecules into two (potentially overlapping) groups.
+    :param umis: Array of molecules to split
+    :param data_split: Proportion of molecules to assign to the first group
+    :param overlap_factor: Overlap correction factor, if desired
+    :param random_state: For reproducible sampling
+    :return: umis_X and umis_Y, representing ``split`` and ``~(1 - split)`` counts
+             sampled from the input array
+    """
+    if random_state is None:
+        random_state = np.random.RandomState()
+
+    umis_X_disjoint = random_state.binomial(umis, data_split - overlap_factor)
+    umis_Y_disjoint = random_state.binomial(
+        umis - umis_X_disjoint, (1 - data_split) / (1 - data_split + overlap_factor)
+    )
+    overlap_factor = umis - umis_X_disjoint - umis_Y_disjoint
+    umis_X = umis_X_disjoint + overlap_factor
+    umis_Y = umis_Y_disjoint + overlap_factor
+
+    return umis_X, umis_Y
\ No newline at end of file
diff --git a/target/nextflow/denoising/process_dataset/main.nf b/target/nextflow/denoising/process_dataset/main.nf
new file mode 100644
index 0000000000..6575978827
--- /dev/null
+++ b/target/nextflow/denoising/process_dataset/main.nf
@@ -0,0 +1,3944 @@
+// process_dataset 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_dataset",
+    "namespace" : "denoising",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Common dataset",
+          "summary" : "A dataset processed by the common dataset processing pipeline.",
+          "description" : "This dataset contains both raw counts and normalized data matrices,\nas well as a PCA embedding, HVG selection and a kNN graph.",
+          "slots" : {
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "pca_variance",
+                "description" : "The PCA variance objects.",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "varm" : [
+              {
+                "type" : "double",
+                "name" : "pca_loadings",
+                "description" : "The PCA loadings matrix.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_train",
+        "info" : {
+          "label" : "Training data",
+          "summary" : "The subset of molecules used for the training dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/train.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_test",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The subset of molecules used for the test dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "train_sum",
+                "type" : "integer",
+                "description" : "The total number of counts in the training dataset.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/denoising/pancreas/test.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--method",
+        "description" : "The process method to assign train/test.",
+        "default" : [
+          "mcv"
+        ],
+        "required" : false,
+        "choices" : [
+          "mcv"
+        ],
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "double",
+        "name" : "--train_frac",
+        "description" : "The fraction the molecules need to be split to train dataset",
+        "default" : [
+          0.9
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--seed",
+        "description" : "A seed for the subsampling.",
+        "example" : [
+          123
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/denoising/process_dataset/"
+      },
+      {
+        "type" : "file",
+        "path" : "helper.py",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/denoising/process_dataset/"
+      }
+    ],
+    "description" : "Split data using molecular cross-validation.\n\nSplits molecules into two (potentially overlapping) groups using a fraction ratio.\nThese are output as two separate AnnData objects.\n",
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/common/pancreas",
+        "dest" : "resources_test/common/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "process_dataset",
+      "type_info" : {
+        "label" : "Data processor",
+        "summary" : "A denoising dataset processor.",
+        "description" : "A component for processing a Common Dataset into a task-specific dataset.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "numpy",
+            "scipy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "highmem",
+          "midcpu",
+          "midtime"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/process_dataset/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/process_dataset",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_train': $( if [ ! -z ${VIASH_PAR_OUTPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_OUTPUT_TRAIN//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_test': $( if [ ! -z ${VIASH_PAR_OUTPUT_TEST+x} ]; then echo "r'${VIASH_PAR_OUTPUT_TEST//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'method': $( if [ ! -z ${VIASH_PAR_METHOD+x} ]; then echo "r'${VIASH_PAR_METHOD//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'train_frac': $( if [ ! -z ${VIASH_PAR_TRAIN_FRAC+x} ]; then echo "float(r'${VIASH_PAR_TRAIN_FRAC//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'seed': $( if [ ! -z ${VIASH_PAR_SEED+x} ]; then echo "int(r'${VIASH_PAR_SEED//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# add helper scripts to path
+sys.path.append(meta["resources_dir"])
+from helper import split_molecules
+
+# set random state
+random_state = np.random.RandomState(par['seed'])
+
+print(">> Load Data", flush=True)
+adata = ad.read_h5ad(par["input"])
+
+# remove all layers except for counts
+for key in list(adata.layers.keys()):
+    if key != "counts":
+        del adata.layers[key]
+
+# round counts and convert to int
+counts = np.array(adata.layers["counts"]).round().astype(int)
+
+print(">> process and split data", flush=True)
+train_data, test_data = split_molecules(
+    counts.data, par["train_frac"], 0.0, random_state
+)
+
+X_train = counts.copy()
+X_test = counts.copy()
+X_train.data = train_data
+X_test.data = test_data
+X_train.eliminate_zeros()
+X_test.eliminate_zeros()
+
+# copy adata to train_set, test_set
+output_train = ad.AnnData(
+    layers={"counts": X_train},
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    uns={"dataset_id": adata.uns["dataset_id"]}
+)
+test_uns_keys = ["dataset_id", "dataset_name", "dataset_url", "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism"]
+output_test = ad.AnnData(
+    layers={"counts": X_test},
+    obs=adata.obs[[]],
+    var=adata.var[[]],
+    uns={key: adata.uns[key] for key in test_uns_keys}
+)
+
+# add additional information for the train set
+output_test.uns["train_sum"] = X_train.sum()
+
+# Remove no cells that do not have enough reads
+is_missing = np.array(X_train.sum(axis=0) == 0)
+
+output_train = output_train[:, ~is_missing.flatten()]
+output_test = output_test[:, ~is_missing.flatten()]
+
+print(">> Write to file", flush=True)
+output_train.write_h5ad(par["output_train"])
+output_test.write_h5ad(par["output_test"])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/denoising/process_dataset",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "highmem",
+    "midcpu",
+    "midtime"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/denoising/process_dataset/nextflow.config b/target/nextflow/denoising/process_dataset/nextflow.config
new file mode 100644
index 0000000000..7b2c832f08
--- /dev/null
+++ b/target/nextflow/denoising/process_dataset/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'denoising/process_dataset'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Split data using molecular cross-validation.\n\nSplits molecules into two (potentially overlapping) groups using a fraction ratio.\nThese are output as two separate AnnData objects.\n'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/denoising/workflows/process_datasets/.config.vsh.yaml b/target/nextflow/denoising/workflows/process_datasets/.config.vsh.yaml
new file mode 100644
index 0000000000..cc03e6d17f
--- /dev/null
+++ b/target/nextflow/denoising/workflows/process_datasets/.config.vsh.yaml
@@ -0,0 +1,216 @@
+functionality:
+  name: "process_datasets"
+  namespace: "denoising/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input"
+      info:
+        label: "Common Dataset"
+        summary: "A subset of the common dataset."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+      example:
+      - "dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_train"
+      info:
+        label: "Training data"
+        summary: "The subset of molecules used for the training dataset"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+      example:
+      - "resources_test/denoising/pancreas/train.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_test"
+      info:
+        label: "Test data"
+        summary: "The subset of molecules used for the test dataset"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - name: "train_sum"
+            type: "integer"
+            description: "The total number of counts in the training dataset."
+            required: true
+      example:
+      - "resources_test/denoising/pancreas/test.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "src/wf_utils/helper.nf"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/check_dataset_schema"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+    configInfo:
+      functionalityName: "check_dataset_schema"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/check_dataset_schema/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+  - name: "denoising/process_dataset"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/process_dataset/config.vsh.yaml"
+    configInfo:
+      functionalityName: "process_dataset"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/process_dataset/config.vsh.yaml"
+      functionalityNamespace: "denoising"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/denoising/process_dataset/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/process_dataset"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/workflows/process_datasets/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/workflows/process_datasets"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/workflows/process_datasets/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/denoising/workflows/process_datasets/helper.nf b/target/nextflow/denoising/workflows/process_datasets/helper.nf
new file mode 100644
index 0000000000..7b3acd5b1c
--- /dev/null
+++ b/target/nextflow/denoising/workflows/process_datasets/helper.nf
@@ -0,0 +1,14 @@
+Map findArgumentSchema(Map config, String argument_id) {
+  def argument_groups =
+    (config.functionality.argument_groups ?: []) +
+    [
+      arguments: config.functionality.arguments ?: []
+    ]
+
+  def schema_value = argument_groups.findResult{ gr ->
+    gr.arguments.find { arg ->
+      arg.name == ("--" + argument_id)
+    }
+  }
+  return schema_value
+}
diff --git a/target/nextflow/denoising/workflows/process_datasets/main.nf b/target/nextflow/denoising/workflows/process_datasets/main.nf
new file mode 100644
index 0000000000..d2271e445b
--- /dev/null
+++ b/target/nextflow/denoising/workflows/process_datasets/main.nf
@@ -0,0 +1,3284 @@
+// process_datasets 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_datasets",
+    "namespace" : "denoising/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input",
+            "info" : {
+              "label" : "Common Dataset",
+              "summary" : "A subset of the common dataset.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_train",
+            "info" : {
+              "label" : "Training data",
+              "summary" : "The subset of molecules used for the training dataset",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/denoising/pancreas/train.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_test",
+            "info" : {
+              "label" : "Test data",
+              "summary" : "The subset of molecules used for the test dataset",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "train_sum",
+                    "type" : "integer",
+                    "description" : "The total number of counts in the training dataset.",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/denoising/pancreas/test.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/denoising/workflows/process_datasets/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "src/wf_utils/helper.nf",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/check_dataset_schema",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "check_dataset_schema",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/check_dataset_schema/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+      },
+      {
+        "name" : "denoising/process_dataset",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/process_dataset/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "process_dataset",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/process_dataset/config.vsh.yaml",
+          "functionalityNamespace" : "denoising",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/denoising/process_dataset/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/process_dataset"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/workflows/process_datasets/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/workflows/process_datasets",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { check_dataset_schema } from "${meta.resources_dir}/../../../../nextflow/common/check_dataset_schema/main.nf"
+include { process_dataset } from "${meta.resources_dir}/../../../../nextflow/denoising/process_dataset/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    | check_dataset_schema.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset": checks["exit_code"] == 0 ? state.input : null,
+        ]
+      }
+    )
+
+    // remove datasets which didn't pass the schema check
+    | filter { id, state ->
+      state.dataset != null
+    }
+
+    | process_dataset.run(
+      fromState: [ input: "dataset" ],
+      toState: [
+        output_train: "output_train",
+        output_test: "output_test"
+      ]
+    )
+
+    // only output the files for which an output file was specified
+    | setState(["output_train", "output_test"])
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/denoising/workflows/process_datasets/nextflow.config b/target/nextflow/denoising/workflows/process_datasets/nextflow.config
new file mode 100644
index 0000000000..4060ae957d
--- /dev/null
+++ b/target/nextflow/denoising/workflows/process_datasets/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'denoising/workflows/process_datasets'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/denoising/workflows/run_benchmark/.config.vsh.yaml b/target/nextflow/denoising/workflows/run_benchmark/.config.vsh.yaml
new file mode 100644
index 0000000000..7fd14903e7
--- /dev/null
+++ b/target/nextflow/denoising/workflows/run_benchmark/.config.vsh.yaml
@@ -0,0 +1,376 @@
+functionality:
+  name: "run_benchmark"
+  namespace: "denoising/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input_train"
+      info:
+        label: "Training data"
+        summary: "The subset of molecules used for the training dataset"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+      example:
+      - "resources_test/denoising/pancreas/train.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_test"
+      info:
+        label: "Test data"
+        summary: "The subset of molecules used for the test dataset"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - name: "train_sum"
+            type: "integer"
+            description: "The total number of counts in the training dataset."
+            required: true
+      example:
+      - "resources_test/denoising/pancreas/test.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_scores"
+      description: "A yaml file containing the scores of each of the methods"
+      info: null
+      default:
+      - "score_uns.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_method_configs"
+      info: null
+      default:
+      - "method_configs.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_metric_configs"
+      info: null
+      default:
+      - "metric_configs.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_dataset_info"
+      info: null
+      default:
+      - "dataset_uns.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_task_info"
+      info: null
+      default:
+      - "task_info.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Methods"
+    arguments:
+    - type: "string"
+      name: "--method_ids"
+      description: "A list of method ids to run. If not specified, all methods will\
+        \ be run."
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "../../api/task_info.yaml"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/check_dataset_schema"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+    configInfo:
+      functionalityName: "check_dataset_schema"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/check_dataset_schema/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+  - name: "common/extract_metadata"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+    configInfo:
+      functionalityName: "extract_metadata"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/extract_metadata/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  - name: "denoising/control_methods/no_denoising"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/control_methods/no_denoising/config.vsh.yaml"
+    configInfo:
+      functionalityName: "no_denoising"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/control_methods/no_denoising/config.vsh.yaml"
+      functionalityNamespace: "denoising/control_methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/denoising/control_methods/no_denoising/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/control_methods/no_denoising"
+  - name: "denoising/control_methods/perfect_denoising"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/control_methods/perfect_denoising/config.vsh.yaml"
+    configInfo:
+      functionalityName: "perfect_denoising"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/control_methods/perfect_denoising/config.vsh.yaml"
+      functionalityNamespace: "denoising/control_methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/denoising/control_methods/perfect_denoising/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/control_methods/perfect_denoising"
+  - name: "denoising/methods/alra"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/alra/config.vsh.yaml"
+    configInfo:
+      functionalityName: "alra"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/alra/config.vsh.yaml"
+      functionalityNamespace: "denoising/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/denoising/methods/alra/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/alra"
+  - name: "denoising/methods/dca"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/dca/config.vsh.yaml"
+    configInfo:
+      functionalityName: "dca"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/dca/config.vsh.yaml"
+      functionalityNamespace: "denoising/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/denoising/methods/dca/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/dca"
+  - name: "denoising/methods/knn_smoothing"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/knn_smoothing/config.vsh.yaml"
+    configInfo:
+      functionalityName: "knn_smoothing"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/knn_smoothing/config.vsh.yaml"
+      functionalityNamespace: "denoising/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/denoising/methods/knn_smoothing/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/knn_smoothing"
+  - name: "denoising/methods/magic"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/magic/config.vsh.yaml"
+    configInfo:
+      functionalityName: "magic"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/magic/config.vsh.yaml"
+      functionalityNamespace: "denoising/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/denoising/methods/magic/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/magic"
+  - name: "denoising/metrics/mse"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/metrics/mse/config.vsh.yaml"
+    configInfo:
+      functionalityName: "mse"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/metrics/mse/config.vsh.yaml"
+      functionalityNamespace: "denoising/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/denoising/metrics/mse/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/metrics/mse"
+  - name: "denoising/metrics/poisson"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/metrics/poisson/config.vsh.yaml"
+    configInfo:
+      functionalityName: "poisson"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/metrics/poisson/config.vsh.yaml"
+      functionalityNamespace: "denoising/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/denoising/metrics/poisson/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/metrics/poisson"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/denoising/workflows/run_benchmark/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/workflows/run_benchmark"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/workflows/run_benchmark/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/denoising/workflows/run_benchmark/main.nf b/target/nextflow/denoising/workflows/run_benchmark/main.nf
new file mode 100644
index 0000000000..16eda91443
--- /dev/null
+++ b/target/nextflow/denoising/workflows/run_benchmark/main.nf
@@ -0,0 +1,3604 @@
+// run_benchmark 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "run_benchmark",
+    "namespace" : "denoising/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input_train",
+            "info" : {
+              "label" : "Training data",
+              "summary" : "The subset of molecules used for the training dataset",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/denoising/pancreas/train.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_test",
+            "info" : {
+              "label" : "Test data",
+              "summary" : "The subset of molecules used for the test dataset",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "train_sum",
+                    "type" : "integer",
+                    "description" : "The total number of counts in the training dataset.",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/denoising/pancreas/test.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_scores",
+            "description" : "A yaml file containing the scores of each of the methods",
+            "default" : [
+              "score_uns.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_method_configs",
+            "default" : [
+              "method_configs.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_metric_configs",
+            "default" : [
+              "metric_configs.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_dataset_info",
+            "default" : [
+              "dataset_uns.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_task_info",
+            "default" : [
+              "task_info.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Methods",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--method_ids",
+            "description" : "A list of method ids to run. If not specified, all methods will be run.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/denoising/workflows/run_benchmark/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "../../api/task_info.yaml",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/denoising/workflows/run_benchmark/"
+      }
+    ],
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/check_dataset_schema",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "check_dataset_schema",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/check_dataset_schema/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+      },
+      {
+        "name" : "common/extract_metadata",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "extract_metadata",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/extract_metadata/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+      },
+      {
+        "name" : "denoising/control_methods/no_denoising",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/control_methods/no_denoising/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "no_denoising",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/control_methods/no_denoising/config.vsh.yaml",
+          "functionalityNamespace" : "denoising/control_methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/denoising/control_methods/no_denoising/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/control_methods/no_denoising"
+      },
+      {
+        "name" : "denoising/control_methods/perfect_denoising",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/control_methods/perfect_denoising/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "perfect_denoising",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/control_methods/perfect_denoising/config.vsh.yaml",
+          "functionalityNamespace" : "denoising/control_methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/denoising/control_methods/perfect_denoising/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/control_methods/perfect_denoising"
+      },
+      {
+        "name" : "denoising/methods/alra",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/alra/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "alra",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/alra/config.vsh.yaml",
+          "functionalityNamespace" : "denoising/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/denoising/methods/alra/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/alra"
+      },
+      {
+        "name" : "denoising/methods/dca",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/dca/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "dca",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/dca/config.vsh.yaml",
+          "functionalityNamespace" : "denoising/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/denoising/methods/dca/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/dca"
+      },
+      {
+        "name" : "denoising/methods/knn_smoothing",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/knn_smoothing/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "knn_smoothing",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/knn_smoothing/config.vsh.yaml",
+          "functionalityNamespace" : "denoising/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/denoising/methods/knn_smoothing/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/knn_smoothing"
+      },
+      {
+        "name" : "denoising/methods/magic",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/magic/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "magic",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/methods/magic/config.vsh.yaml",
+          "functionalityNamespace" : "denoising/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/denoising/methods/magic/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/methods/magic"
+      },
+      {
+        "name" : "denoising/metrics/mse",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/metrics/mse/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "mse",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/metrics/mse/config.vsh.yaml",
+          "functionalityNamespace" : "denoising/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/denoising/metrics/mse/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/metrics/mse"
+      },
+      {
+        "name" : "denoising/metrics/poisson",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/metrics/poisson/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "poisson",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/metrics/poisson/config.vsh.yaml",
+          "functionalityNamespace" : "denoising/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/denoising/metrics/poisson/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/metrics/poisson"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/denoising/workflows/run_benchmark/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/denoising/workflows/run_benchmark",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { check_dataset_schema } from "${meta.resources_dir}/../../../../nextflow/common/check_dataset_schema/main.nf"
+include { extract_metadata } from "${meta.resources_dir}/../../../../nextflow/common/extract_metadata/main.nf"
+include { no_denoising } from "${meta.resources_dir}/../../../../nextflow/denoising/control_methods/no_denoising/main.nf"
+include { perfect_denoising } from "${meta.resources_dir}/../../../../nextflow/denoising/control_methods/perfect_denoising/main.nf"
+include { alra } from "${meta.resources_dir}/../../../../nextflow/denoising/methods/alra/main.nf"
+include { dca } from "${meta.resources_dir}/../../../../nextflow/denoising/methods/dca/main.nf"
+include { knn_smoothing } from "${meta.resources_dir}/../../../../nextflow/denoising/methods/knn_smoothing/main.nf"
+include { magic } from "${meta.resources_dir}/../../../../nextflow/denoising/methods/magic/main.nf"
+include { mse } from "${meta.resources_dir}/../../../../nextflow/denoising/metrics/mse/main.nf"
+include { poisson } from "${meta.resources_dir}/../../../../nextflow/denoising/metrics/poisson/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // construct list of methods
+  methods = [
+    no_denoising,
+    perfect_denoising,
+    alra,
+    dca,
+    knn_smoothing,
+    magic
+  ]
+
+  // construct list of metrics
+  metrics = [
+    mse,
+    poisson
+  ]
+
+  /****************************
+   * EXTRACT DATASET METADATA *
+   ****************************/
+  dataset_ch = input_ch
+    // store join id
+    | map{ id, state -> 
+      [id, state + ["_meta": [join_id: id]]]
+    }
+
+    // extract the dataset metadata
+    | extract_metadata.run(
+      fromState: [input: "input_test"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+    
+  /***************************
+   * RUN METHODS AND METRICS *
+   ***************************/
+  score_ch = dataset_ch
+
+    // run all methods
+    | runEach(
+      components: methods,
+
+      // use the 'filter' argument to only run a defined method or all methods
+      filter: { id, state, comp ->
+        def method_check = !state.method_ids || state.method_ids.contains(comp.config.functionality.name)
+
+        method_check
+      },
+
+      // define a new 'id' by appending the method name to the dataset id
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: [
+        input_train: "input_train",
+        input_test: "input_test"
+      ],
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          method_id: comp.config.functionality.name,
+          method_output: output.output
+        ]
+      }
+    )
+
+    // run all metrics
+    | runEach(
+      components: metrics,
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: [
+        input_test: "input_test", 
+        input_denoised: "method_output"
+      ],
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          metric_id: comp.config.functionality.name,
+          metric_output: output.output
+        ]
+      }
+    )
+
+  /******************************
+   * GENERATE OUTPUT YAML FILES *
+   ******************************/
+  // TODO: can we store everything below in a separate helper function?
+  // NOTE: the 'denoising' task doesn't use normalized data,
+  // so code related to normalization_ids is commented out
+
+  // extract the dataset metadata
+  dataset_meta_ch = dataset_ch
+    // // only keep one of the normalization methods
+    // | filter{ id, state ->
+    //   state.dataset_uns.normalization_id == "log_cp10k"
+    // }
+    | joinStates { ids, states ->
+      // store the dataset metadata in a file
+      def dataset_uns = states.collect{state ->
+        def uns = state.dataset_uns.clone()
+        // uns.remove("normalization_id")
+        uns
+      }
+      def dataset_uns_yaml_blob = toYamlBlob(dataset_uns)
+      def dataset_uns_file = tempFile("dataset_uns.yaml")
+      dataset_uns_file.write(dataset_uns_yaml_blob)
+
+      ["output", [output_dataset_info: dataset_uns_file]]
+    }
+
+  output_ch = score_ch
+
+    // extract the scores
+    | extract_metadata.run(
+      key: "extract_scores",
+      fromState: [input: "metric_output"],
+      toState: { id, output, state ->
+        state + [
+          score_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | joinStates { ids, states ->
+      // store the method configs in a file
+      def method_configs = methods.collect{it.config}
+      def method_configs_yaml_blob = toYamlBlob(method_configs)
+      def method_configs_file = tempFile("method_configs.yaml")
+      method_configs_file.write(method_configs_yaml_blob)
+
+      // store the metric configs in a file
+      def metric_configs = metrics.collect{it.config}
+      def metric_configs_yaml_blob = toYamlBlob(metric_configs)
+      def metric_configs_file = tempFile("metric_configs.yaml")
+      metric_configs_file.write(metric_configs_yaml_blob)
+
+      def task_info_file = meta.resources_dir.resolve("task_info.yaml")
+
+      // store the scores in a file
+      def score_uns = states.collect{it.score_uns}
+      def score_uns_yaml_blob = toYamlBlob(score_uns)
+      def score_uns_file = tempFile("score_uns.yaml")
+      score_uns_file.write(score_uns_yaml_blob)
+
+      def new_state = [
+        output_method_configs: method_configs_file,
+        output_metric_configs: metric_configs_file,
+        output_task_info: task_info_file,
+        output_scores: score_uns_file,
+        _meta: states[0]._meta
+      ]
+
+      ["output", new_state]
+    }
+
+    // merge all of the output data 
+    | mix(dataset_meta_ch)
+    | joinStates{ ids, states ->
+      def mergedStates = states.inject([:]) { acc, m -> acc + m }
+      [ids[0], mergedStates]
+    }
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/denoising/workflows/run_benchmark/nextflow.config b/target/nextflow/denoising/workflows/run_benchmark/nextflow.config
new file mode 100644
index 0000000000..38269f1153
--- /dev/null
+++ b/target/nextflow/denoising/workflows/run_benchmark/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'denoising/workflows/run_benchmark'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/denoising/workflows/run_benchmark/task_info.yaml b/target/nextflow/denoising/workflows/run_benchmark/task_info.yaml
new file mode 100644
index 0000000000..f7de1118f2
--- /dev/null
+++ b/target/nextflow/denoising/workflows/run_benchmark/task_info.yaml
@@ -0,0 +1,54 @@
+name: denoising
+label: Denoising
+v1:
+  path: openproblems/tasks/denoising/README.md
+  commit: 3fe9251ba906061b6769eed2ac9da0db5f8e26bb
+summary: "Removing noise in sparse single-cell RNA-sequencing count data"
+image: "thumbnail.svg"
+motivation: |
+  Single-cell RNA-Seq protocols only detect a fraction of the mRNA molecules present
+  in each cell. As a result, the measurements (UMI counts) observed for each gene and each
+  cell are associated with generally high levels of technical noise ([Grün et al.,
+  2014](https://www.nature.com/articles/nmeth.2930)). Denoising describes the task of
+  estimating the true expression level of each gene in each cell. In the single-cell
+  literature, this task is also referred to as *imputation*, a term which is typically
+  used for missing data problems in statistics. Similar to the use of the terms "dropout",
+  "missing data", and "technical zeros", this terminology can create confusion about the
+  underlying measurement process ([Sarkar and Stephens,
+  2020](https://www.biorxiv.org/content/10.1101/2020.04.07.030007v2)).
+description: |
+  A key challenge in evaluating denoising methods is the general lack of a ground truth. A
+  recent benchmark study ([Hou et al.,
+  2020](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02132-x))
+  relied on flow-sorted datasets, mixture control experiments ([Tian et al.,
+  2019](https://www.nature.com/articles/s41592-019-0425-8)), and comparisons with bulk
+  RNA-Seq data. Since each of these approaches suffers from specific limitations, it is
+  difficult to combine these different approaches into a single quantitative measure of
+  denoising accuracy. Here, we instead rely on an approach termed molecular
+  cross-validation (MCV), which was specifically developed to quantify denoising accuracy
+  in the absence of a ground truth ([Batson et al.,
+  2019](https://www.biorxiv.org/content/10.1101/786269v1)). In MCV, the observed molecules
+  in a given scRNA-Seq dataset are first partitioned between a *training* and a *test*
+  dataset. Next, a denoising method is applied to the training dataset. Finally, denoising
+  accuracy is measured by comparing the result to the test dataset. The authors show that
+  both in theory and in practice, the measured denoising accuracy is representative of the
+  accuracy that would be obtained on a ground truth dataset.
+authors:
+  - name: "Wesley Lewis"
+    roles: [ author, maintainer ]
+    info:
+      github: wes-lewis
+  - name: "Scott Gigante"
+    roles: [ author, maintainer ]
+    info:
+      github: scottgigante
+      orcid: "0000-0002-4544-2764"
+  - name: Robrecht Cannoodt
+    roles: [ author ]
+    info:
+      github: rcannood
+      orcid: "0000-0003-3641-729X"
+  - name: Kai Waldrant
+    roles: [ author ]
+    info:
+      github: KaiWaldrant
\ No newline at end of file
diff --git a/target/nextflow/dimensionality_reduction/control_methods/random_features/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/control_methods/random_features/.config.vsh.yaml
new file mode 100644
index 0000000000..caf2942391
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/control_methods/random_features/.config.vsh.yaml
@@ -0,0 +1,233 @@
+functionality:
+  name: "random_features"
+  namespace: "dimensionality_reduction/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Test data"
+      summary: "The data for evaluating a dimensionality reduction."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    label: "Random Features"
+    summary: "Negative control by randomly embedding into a 2D space."
+    description: "This method serves as a negative control, where the data is randomly\
+      \ embedded into a two-dimensional space, with no attempt to preserve the original\
+      \ structure."
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/baseline.py"
+      commit: "80b37e7a6aa27df4436f400397564c01276817e0"
+    preferred_normalization: "counts"
+    variants:
+      random_features: null
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "Control methods have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/random_features/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/control_methods/random_features"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/control_methods/random_features/random_features"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/control_methods/random_features/main.nf b/target/nextflow/dimensionality_reduction/control_methods/random_features/main.nf
new file mode 100644
index 0000000000..8f85eb0320
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/control_methods/random_features/main.nf
@@ -0,0 +1,3636 @@
+// random_features 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "random_features",
+    "namespace" : "dimensionality_reduction/control_methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset to pass to a method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The data for evaluating a dimensionality reduction.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Embedding",
+          "summary" : "A dataset with dimensionality reduction embedding.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "The dimensionally reduced embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/random_features/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/dimensionality_reduction/pancreas/",
+        "dest" : "resources_test/dimensionality_reduction/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Random Features",
+      "summary" : "Negative control by randomly embedding into a 2D space.",
+      "description" : "This method serves as a negative control, where the data is randomly embedded into a two-dimensional space, with no attempt to preserve the original structure.",
+      "v1" : {
+        "path" : "openproblems/tasks/dimensionality_reduction/methods/baseline.py",
+        "commit" : "80b37e7a6aa27df4436f400397564c01276817e0"
+      },
+      "preferred_normalization" : "counts",
+      "type" : "control_method",
+      "type_info" : {
+        "label" : "Control method",
+        "summary" : "Quality control methods for verifying the pipeline.",
+        "description" : "Control methods have the same interface as the regular methods\nbut also receive the solution object as input. It serves as a\nstarting point to test the relative accuracy of new methods in\nthe task, and also as a quality control for the metrics defined\nin the task.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/random_features/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/control_methods/random_features",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+
+print("Create random embedding", flush=True)
+X_emb = np.random.normal(0, 1, (input.shape[0], 2))
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/control_methods/random_features",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/control_methods/random_features/nextflow.config b/target/nextflow/dimensionality_reduction/control_methods/random_features/nextflow.config
new file mode 100644
index 0000000000..1507290a28
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/control_methods/random_features/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/control_methods/random_features'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/control_methods/spectral_features/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/control_methods/spectral_features/.config.vsh.yaml
new file mode 100644
index 0000000000..6c969882f1
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/control_methods/spectral_features/.config.vsh.yaml
@@ -0,0 +1,274 @@
+functionality:
+  name: "spectral_features"
+  namespace: "dimensionality_reduction/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Test data"
+      summary: "The data for evaluating a dimensionality reduction."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_comps"
+    description: "Number of components to use for the embedding."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "t"
+    description: "Number to power the eigenvalues by."
+    info: null
+    default:
+    - 1
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "n_retries"
+    description: "Number of times to retry if the embedding fails, each time adding\
+      \ noise."
+    info: null
+    default:
+    - 1
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    label: "Spectral Features"
+    summary: "Positive control by Use 1000-dimensional diffusions maps as an embedding."
+    description: "This serves as a positive control since it uses 1000-dimensional\
+      \ diffusions maps as an embedding"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      spectral_features: null
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "Control methods have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "umap-learn"
+    - "scipy"
+    - "numpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/spectral_features/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/control_methods/spectral_features"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/control_methods/spectral_features/spectral_features"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/control_methods/spectral_features/main.nf b/target/nextflow/dimensionality_reduction/control_methods/spectral_features/main.nf
new file mode 100644
index 0000000000..0831ced478
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/control_methods/spectral_features/main.nf
@@ -0,0 +1,3732 @@
+// spectral_features 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "spectral_features",
+    "namespace" : "dimensionality_reduction/control_methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset to pass to a method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The data for evaluating a dimensionality reduction.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Embedding",
+          "summary" : "A dataset with dimensionality reduction embedding.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "The dimensionally reduced embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_comps",
+        "description" : "Number of components to use for the embedding.",
+        "default" : [
+          1000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "t",
+        "description" : "Number to power the eigenvalues by.",
+        "default" : [
+          1
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "n_retries",
+        "description" : "Number of times to retry if the embedding fails, each time adding noise.",
+        "default" : [
+          1
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/spectral_features/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/dimensionality_reduction/pancreas/",
+        "dest" : "resources_test/dimensionality_reduction/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Spectral Features",
+      "summary" : "Positive control by Use 1000-dimensional diffusions maps as an embedding.",
+      "description" : "This serves as a positive control since it uses 1000-dimensional diffusions maps as an embedding",
+      "v1" : {
+        "path" : "openproblems/tasks/dimensionality_reduction/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "control_method",
+      "type_info" : {
+        "label" : "Control method",
+        "summary" : "Quality control methods for verifying the pipeline.",
+        "description" : "Control methods have the same interface as the regular methods\nbut also receive the solution object as input. It serves as a\nstarting point to test the relative accuracy of new methods in\nthe task, and also as a quality control for the metrics defined\nin the task.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "umap-learn",
+            "scipy",
+            "numpy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/spectral_features/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/control_methods/spectral_features",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import umap
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_comps': $( if [ ! -z ${VIASH_PAR_N_COMPS+x} ]; then echo "int(r'${VIASH_PAR_N_COMPS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  't': $( if [ ! -z ${VIASH_PAR_T+x} ]; then echo "int(r'${VIASH_PAR_T//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_retries': $( if [ ! -z ${VIASH_PAR_N_RETRIES+x} ]; then echo "int(r'${VIASH_PAR_N_RETRIES//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+def diffusion_map(graph, n_comps, t, n_retries):
+    import numpy as np
+    import scipy.sparse.linalg
+
+    diag_data = np.asarray(graph.sum(axis=0))
+    identity = scipy.sparse.identity(graph.shape[0], dtype=np.float64)
+    diag = scipy.sparse.spdiags(
+        1.0 / np.sqrt(diag_data), 0, graph.shape[0], graph.shape[0]
+    )
+    laplacian = identity - diag * graph * diag
+    num_lanczos_vectors = max(2 * n_comps + 1, int(np.sqrt(graph.shape[0])))
+    try:
+        eigenvalues, eigenvectors = scipy.sparse.linalg.eigsh(
+            laplacian,
+            n_comps,
+            which="SM",
+            ncv=num_lanczos_vectors,
+            tol=1e-4,
+            v0=np.ones(laplacian.shape[0]),
+            maxiter=graph.shape[0] * 5,
+        )
+        return (eigenvalues**t) * eigenvectors
+    except scipy.sparse.linalg.ArpackNoConvergence:
+        if n_retries > 0:
+            # add some noise and try again
+            graph_rand = graph.copy().tocoo()
+            graph_rand.row = np.random.choice(
+                graph_rand.shape[0], len(graph_rand.row), replace=True
+            )
+            graph_rand.data *= 0.01
+            return diffusion_map(
+                graph + graph_rand, n_comps, t, n_retries=n_retries - 1
+            )
+        else:
+            raise
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+
+print("Create high dimensionally embedding with all features", flush=True)
+
+n_comps = min(par["n_comps"], min(input.shape) - 2)
+
+graph = umap.UMAP(transform_mode="graph").fit_transform(input.layers["normalized"])
+
+X_emb = diffusion_map(graph, n_comps, t=par["t"], n_retries=par["n_retries"])
+
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/control_methods/spectral_features",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/control_methods/spectral_features/nextflow.config b/target/nextflow/dimensionality_reduction/control_methods/spectral_features/nextflow.config
new file mode 100644
index 0000000000..864b8e84c7
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/control_methods/spectral_features/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/control_methods/spectral_features'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/control_methods/true_features/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/control_methods/true_features/.config.vsh.yaml
new file mode 100644
index 0000000000..0888eab717
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/control_methods/true_features/.config.vsh.yaml
@@ -0,0 +1,232 @@
+functionality:
+  name: "true_features"
+  namespace: "dimensionality_reduction/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Test data"
+      summary: "The data for evaluating a dimensionality reduction."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    label: "True Features"
+    summary: "Positive control by retaining the dimensionality without loss of information."
+    description: "This serves as a positive control since the original high-dimensional\
+      \ data is retained as is, without any loss of information"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      true_features: null
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "Control methods have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/true_features/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/control_methods/true_features"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/control_methods/true_features/true_features"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/control_methods/true_features/main.nf b/target/nextflow/dimensionality_reduction/control_methods/true_features/main.nf
new file mode 100644
index 0000000000..26030e9216
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/control_methods/true_features/main.nf
@@ -0,0 +1,3635 @@
+// true_features 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "true_features",
+    "namespace" : "dimensionality_reduction/control_methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset to pass to a method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The data for evaluating a dimensionality reduction.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Embedding",
+          "summary" : "A dataset with dimensionality reduction embedding.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "The dimensionally reduced embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/true_features/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/dimensionality_reduction/pancreas/",
+        "dest" : "resources_test/dimensionality_reduction/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "True Features",
+      "summary" : "Positive control by retaining the dimensionality without loss of information.",
+      "description" : "This serves as a positive control since the original high-dimensional data is retained as is, without any loss of information",
+      "v1" : {
+        "path" : "openproblems/tasks/dimensionality_reduction/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "control_method",
+      "type_info" : {
+        "label" : "Control method",
+        "summary" : "Quality control methods for verifying the pipeline.",
+        "description" : "Control methods have the same interface as the regular methods\nbut also receive the solution object as input. It serves as a\nstarting point to test the relative accuracy of new methods in\nthe task, and also as a quality control for the metrics defined\nin the task.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/true_features/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/control_methods/true_features",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+
+print("Create high dimensionally embedding with all features", flush=True)
+X_emb = input.layers["normalized"].toarray()
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/control_methods/true_features",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/control_methods/true_features/nextflow.config b/target/nextflow/dimensionality_reduction/control_methods/true_features/nextflow.config
new file mode 100644
index 0000000000..749385d8c3
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/control_methods/true_features/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/control_methods/true_features'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/methods/densmap/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/methods/densmap/.config.vsh.yaml
new file mode 100644
index 0000000000..c02e277a8f
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/densmap/.config.vsh.yaml
@@ -0,0 +1,203 @@
+functionality:
+  name: "densmap"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to subset to. If not specified,\
+      \ the input matrix will not be subset."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pca_dims"
+    description: "Number of PCA dimensions to use. If not specified, no PCA will be\
+      \ performed."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "densMAP"
+    summary: "Modified UMAP with preservation of local density information"
+    description: "A modification of UMAP that adds an extra cost term in order to\
+      \ preserve information about the relative local density of the data. It is performed\
+      \ on the same inputs as UMAP."
+    reference: "narayan2021assessing"
+    repository_url: "https://github.com/lmcinnes/umap"
+    documentation_url: "https://github.com/lmcinnes/umap#readme"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/umap.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      densmap_logCP10k: null
+      densmap_pca_logCP10k:
+        n_pca_dims: 50
+      densmap_logCP10k_1kHVG:
+        n_hvg: 1000
+      densmap_pca_logCP10k_1kHVG:
+        n_pca_dims: 50
+        n_hvg: 1000
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "umap-learn"
+    - "pynndescent==0.5.11"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/densmap/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/densmap"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/densmap/densmap"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/methods/densmap/main.nf b/target/nextflow/dimensionality_reduction/methods/densmap/main.nf
new file mode 100644
index 0000000000..a863c8617d
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/densmap/main.nf
@@ -0,0 +1,3610 @@
+// densmap 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "densmap",
+    "namespace" : "dimensionality_reduction/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset to pass to a method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Embedding",
+          "summary" : "A dataset with dimensionality reduction embedding.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "The dimensionally reduced embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hvg",
+        "description" : "Number of highly variable genes to subset to. If not specified, the input matrix will not be subset.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_pca_dims",
+        "description" : "Number of PCA dimensions to use. If not specified, no PCA will be performed.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/densmap/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/dimensionality_reduction/pancreas/",
+        "dest" : "resources_test/dimensionality_reduction/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "densMAP",
+      "summary" : "Modified UMAP with preservation of local density information",
+      "description" : "A modification of UMAP that adds an extra cost term in order to preserve information about the relative local density of the data. It is performed on the same inputs as UMAP.",
+      "reference" : "narayan2021assessing",
+      "repository_url" : "https://github.com/lmcinnes/umap",
+      "documentation_url" : "https://github.com/lmcinnes/umap#readme",
+      "v1" : {
+        "path" : "openproblems/tasks/dimensionality_reduction/methods/umap.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "variants" : {
+        "densmap_pca_logCP10k" : {
+          "n_pca_dims" : 50
+        },
+        "densmap_logCP10k_1kHVG" : {
+          "n_hvg" : 1000
+        },
+        "densmap_pca_logCP10k_1kHVG" : {
+          "n_pca_dims" : 50,
+          "n_hvg" : 1000
+        }
+      },
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A dimensionality reduction method.",
+        "description" : "A dimensionality reduction method to summarise the biological\ninformation in a dataset in as few dimensions as possible.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "umap-learn",
+            "pynndescent==0.5.11"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/densmap/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/densmap",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+from umap import UMAP
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_pca_dims': $( if [ ! -z ${VIASH_PAR_N_PCA_DIMS+x} ]; then echo "int(r'${VIASH_PAR_N_PCA_DIMS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+if par["n_pca_dims"]:
+    print("Apply PCA to normalized data", flush=True)
+    umap_input = sc.tl.pca(
+        X_mat,
+        n_comps=par["n_pca_dims"],
+        svd_solver="arpack"
+    )
+else:
+    print("Use normalized data as input for UMAP", flush=True)
+    umap_input = X_mat
+
+print("Run densMAP", flush=True)
+X_emb = UMAP(densmap=True, random_state=42).fit_transform(umap_input)
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/methods/densmap",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/methods/densmap/nextflow.config b/target/nextflow/dimensionality_reduction/methods/densmap/nextflow.config
new file mode 100644
index 0000000000..22ca74573e
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/densmap/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/methods/densmap'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/methods/diffusion_map/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/methods/diffusion_map/.config.vsh.yaml
new file mode 100644
index 0000000000..0e322be8c7
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/diffusion_map/.config.vsh.yaml
@@ -0,0 +1,182 @@
+functionality:
+  name: "diffusion_map"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_dim"
+    description: "Number of dimensions."
+    info: null
+    default:
+    - 3
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Diffusion Map"
+    summary: "Finding meaningful geometric descriptions of datasets using diffusion\
+      \ maps."
+    description: "Implements diffusion map method of data parametrization, including\
+      \ creation and visualization of diffusion map, clustering with diffusion K-means\
+      \ and regression using adaptive regression model."
+    reference: "coifman2006diffusion"
+    documentation_url: "https://bioconductor.org/packages/release/bioc/html/destiny.html"
+    repository_url: "https://github.com/theislab/destiny"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/diffusion_map.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    bioc:
+    - "destiny"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/diffusion_map/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/diffusion_map"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/diffusion_map/diffusion_map"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/methods/diffusion_map/main.nf b/target/nextflow/dimensionality_reduction/methods/diffusion_map/main.nf
new file mode 100644
index 0000000000..b3c1bea4fe
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/diffusion_map/main.nf
@@ -0,0 +1,3579 @@
+// diffusion_map 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "diffusion_map",
+    "namespace" : "dimensionality_reduction/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset to pass to a method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Embedding",
+          "summary" : "A dataset with dimensionality reduction embedding.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "The dimensionally reduced embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_dim",
+        "description" : "Number of dimensions.",
+        "default" : [
+          3
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/diffusion_map/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/dimensionality_reduction/pancreas/",
+        "dest" : "resources_test/dimensionality_reduction/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Diffusion Map",
+      "summary" : "Finding meaningful geometric descriptions of datasets using diffusion maps.",
+      "description" : "Implements diffusion map method of data parametrization, including creation and visualization of diffusion map, clustering with diffusion K-means and regression using adaptive regression model.",
+      "reference" : "coifman2006diffusion",
+      "documentation_url" : "https://bioconductor.org/packages/release/bioc/html/destiny.html",
+      "repository_url" : "https://github.com/theislab/destiny",
+      "v1" : {
+        "path" : "openproblems/tasks/dimensionality_reduction/methods/diffusion_map.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A dimensionality reduction method.",
+        "description" : "A dimensionality reduction method to summarise the biological\ninformation in a dataset in as few dimensions as possible.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "bioc" : [
+            "destiny"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/diffusion_map/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/diffusion_map",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("diffusionMap", quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "n_dim" = $( if [ ! -z ${VIASH_PAR_N_DIM+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_DIM" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading input files\\\\n")
+input <- anndata::read_h5ad(par\\$input)
+
+cat("Running destiny diffusion map\\\\n")
+# create SummarizedExperiment object
+sce <- SingleCellExperiment::SingleCellExperiment(
+  assays = list(
+    logcounts = t(as.matrix(input\\$layers[["normalized"]]))
+  )
+)
+dm <- destiny::DiffusionMap(sce)
+X_emb <- destiny::eigenvectors(dm)[, seq_len(par\\$n_dim)]
+
+cat("Write output AnnData to file\\\\n")
+output <- anndata::AnnData(
+  uns = list(
+    dataset_id = input\\$uns[["dataset_id"]],
+    normalization_id = input\\$uns[["normalization_id"]],
+    method_id = meta\\$functionality_name
+  ),
+  obsm = list(
+    X_emb = X_emb
+  ),
+  shape = input\\$shape
+)
+output\\$write_h5ad(par\\$output, compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/methods/diffusion_map",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/methods/diffusion_map/nextflow.config b/target/nextflow/dimensionality_reduction/methods/diffusion_map/nextflow.config
new file mode 100644
index 0000000000..f0ca7b4cfe
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/diffusion_map/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/methods/diffusion_map'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/methods/ivis/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/methods/ivis/.config.vsh.yaml
new file mode 100644
index 0000000000..4b326d834d
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/ivis/.config.vsh.yaml
@@ -0,0 +1,200 @@
+functionality:
+  name: "ivis"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pca_dims"
+    description: "Number of principal components of PCA to use."
+    info: null
+    default:
+    - 50
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to subset to. If not specified,\
+      \ the input matrix will not be subset."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "ivis"
+    summary: "Structure-preserving dimensionality reduction using a siamese neural\
+      \ network trained on triplets."
+    description: "ivis is a machine learning library for reducing dimensionality of\
+      \ very large datasets using Siamese Neural Networks.\nivis preserves global\
+      \ data structures in a low-dimensional space, adds new data points to existing\
+      \ embeddings using\na parametric mapping function, and scales linearly to millions\
+      \ of observations.\n"
+    reference: "szubert2019structurepreserving"
+    repository_url: "https://github.com/beringresearch/ivis"
+    documentation_url: "https://github.com/beringresearch/ivis#readme"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/ivis.py"
+      commit: "93d2161a08da3edf249abedff5111fb5ce527552"
+    preferred_normalization: "log_cp10k"
+    variants:
+      ivis_logCPM_1kHVG: null
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "ivis[cpu]"
+    - "tensorflow<2.16"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/ivis/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/ivis"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/ivis/ivis"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/methods/ivis/main.nf b/target/nextflow/dimensionality_reduction/methods/ivis/main.nf
new file mode 100644
index 0000000000..302e7ea4e9
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/ivis/main.nf
@@ -0,0 +1,3603 @@
+// ivis 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "ivis",
+    "namespace" : "dimensionality_reduction/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset to pass to a method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Embedding",
+          "summary" : "A dataset with dimensionality reduction embedding.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "The dimensionally reduced embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_pca_dims",
+        "description" : "Number of principal components of PCA to use.",
+        "default" : [
+          50
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hvg",
+        "description" : "Number of highly variable genes to subset to. If not specified, the input matrix will not be subset.",
+        "default" : [
+          1000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/ivis/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/dimensionality_reduction/pancreas/",
+        "dest" : "resources_test/dimensionality_reduction/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "ivis",
+      "summary" : "Structure-preserving dimensionality reduction using a siamese neural network trained on triplets.",
+      "description" : "ivis is a machine learning library for reducing dimensionality of very large datasets using Siamese Neural Networks.\nivis preserves global data structures in a low-dimensional space, adds new data points to existing embeddings using\na parametric mapping function, and scales linearly to millions of observations.\n",
+      "reference" : "szubert2019structurepreserving",
+      "repository_url" : "https://github.com/beringresearch/ivis",
+      "documentation_url" : "https://github.com/beringresearch/ivis#readme",
+      "v1" : {
+        "path" : "openproblems/tasks/dimensionality_reduction/methods/ivis.py",
+        "commit" : "93d2161a08da3edf249abedff5111fb5ce527552"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A dimensionality reduction method.",
+        "description" : "A dimensionality reduction method to summarise the biological\ninformation in a dataset in as few dimensions as possible.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "ivis[cpu]",
+            "tensorflow<2.16"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/ivis/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/ivis",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import scanpy as sc
+from ivis import Ivis
+
+# todo: allow using gpus instead!
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_pca_dims': $( if [ ! -z ${VIASH_PAR_N_PCA_DIMS+x} ]; then echo "int(r'${VIASH_PAR_N_PCA_DIMS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+print(f"Running PCA with {par['n_pca_dims']} dimensions", flush=True)
+X_pca = sc.tl.pca(X_mat, n_comps=par["n_pca_dims"], svd_solver="arpack")
+
+print("Run ivis", flush=True)
+# parameters taken from:
+# https://bering-ivis.readthedocs.io/en/latest/scanpy_singlecell.html#reducing-dimensionality-using-ivis
+ivis = Ivis(
+    k=15,
+    model="maaten",
+    n_epochs_without_progress=5,
+    verbose=0,
+    embedding_dims=2,
+)
+X_emb = ivis.fit_transform(X_pca)
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/methods/ivis",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/methods/ivis/nextflow.config b/target/nextflow/dimensionality_reduction/methods/ivis/nextflow.config
new file mode 100644
index 0000000000..fc4f629d99
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/ivis/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/methods/ivis'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/methods/lmds/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/methods/lmds/.config.vsh.yaml
new file mode 100644
index 0000000000..79bec0d209
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/lmds/.config.vsh.yaml
@@ -0,0 +1,214 @@
+functionality:
+  name: "lmds"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_dim"
+    description: "Number of dimensions."
+    info: null
+    default:
+    - 2
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_landmarks"
+    description: "Number of landmarks."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--distance_method"
+    description: "Number of clusters to be estimated over the input dataset."
+    info: null
+    default:
+    - "pearson"
+    required: false
+    choices:
+    - "euclidean"
+    - "pearson"
+    - "spearman"
+    - "cosine"
+    - "chisquared"
+    - "hamming"
+    - "kullback"
+    - "manhattan"
+    - "maximum"
+    - "canberra"
+    - "minkowski"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "LMDS"
+    summary: "Landmark Multi-Dimensional Scaling"
+    description: "Landmark Multi-Dimensional Scaling (LMDS) is a method for dimensionality\
+      \ reduction that is based on the concept of multi-dimensional scaling.\nLMDS\
+      \ is a non-linear dimensionality reduction method that is based on the concept\
+      \ of multi-dimensional scaling.\n"
+    preferred_normalization: "log_cp10k"
+    reference: "saelens2019comparison"
+    documentation_url: "https://dynverse.org/lmds/"
+    repository_url: "https://github.com/dynverse/lmds"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "Matrix"
+    - "lmds"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/lmds/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/lmds"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/lmds/lmds"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/methods/lmds/main.nf b/target/nextflow/dimensionality_reduction/methods/lmds/main.nf
new file mode 100644
index 0000000000..05ade42d3a
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/lmds/main.nf
@@ -0,0 +1,3617 @@
+// lmds 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "lmds",
+    "namespace" : "dimensionality_reduction/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset to pass to a method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Embedding",
+          "summary" : "A dataset with dimensionality reduction embedding.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "The dimensionally reduced embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_dim",
+        "description" : "Number of dimensions.",
+        "default" : [
+          2
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_landmarks",
+        "description" : "Number of landmarks.",
+        "default" : [
+          1000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--distance_method",
+        "description" : "Number of clusters to be estimated over the input dataset.",
+        "default" : [
+          "pearson"
+        ],
+        "required" : false,
+        "choices" : [
+          "euclidean",
+          "pearson",
+          "spearman",
+          "cosine",
+          "chisquared",
+          "hamming",
+          "kullback",
+          "manhattan",
+          "maximum",
+          "canberra",
+          "minkowski"
+        ],
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/lmds/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/dimensionality_reduction/pancreas/",
+        "dest" : "resources_test/dimensionality_reduction/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "LMDS",
+      "summary" : "Landmark Multi-Dimensional Scaling",
+      "description" : "Landmark Multi-Dimensional Scaling (LMDS) is a method for dimensionality reduction that is based on the concept of multi-dimensional scaling.\nLMDS is a non-linear dimensionality reduction method that is based on the concept of multi-dimensional scaling.\n",
+      "preferred_normalization" : "log_cp10k",
+      "reference" : "saelens2019comparison",
+      "documentation_url" : "https://dynverse.org/lmds/",
+      "repository_url" : "https://github.com/dynverse/lmds",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A dimensionality reduction method.",
+        "description" : "A dimensionality reduction method to summarise the biological\ninformation in a dataset in as few dimensions as possible.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "cran" : [
+            "Matrix",
+            "lmds"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/lmds/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/lmds",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("lmds", quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "n_dim" = $( if [ ! -z ${VIASH_PAR_N_DIM+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_DIM" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "n_landmarks" = $( if [ ! -z ${VIASH_PAR_N_LANDMARKS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_LANDMARKS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "distance_method" = $( if [ ! -z ${VIASH_PAR_DISTANCE_METHOD+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_DISTANCE_METHOD" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading input files\\\\n")
+input <- anndata::read_h5ad(par\\$input)
+
+# TODO: if we wanted to, we could compute the distance
+# matrix in batches. This would be useful for large datasets.
+cat("Running LMDS\\\\n")
+X_emb <- lmds::lmds(
+  input\\$layers[["normalized"]],
+  ndim = par\\$n_dim,
+  num_landmarks = par\\$n_landmarks,
+  distance_method = par\\$distance_method
+)
+
+cat("Write output AnnData to file\\\\n")
+output <- anndata::AnnData(
+  uns = list(
+    dataset_id = input\\$uns[["dataset_id"]],
+    method_id = meta\\$functionality_name,
+    normalization_id = input\\$uns[["normalization_id"]]
+  ),
+  obsm = list(
+    X_emb = X_emb
+  ),
+  shape = input\\$shape
+)
+output\\$write_h5ad(par\\$output, compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/methods/lmds",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/methods/lmds/nextflow.config b/target/nextflow/dimensionality_reduction/methods/lmds/nextflow.config
new file mode 100644
index 0000000000..252c5f216f
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/lmds/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/methods/lmds'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/methods/neuralee/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/methods/neuralee/.config.vsh.yaml
new file mode 100644
index 0000000000..c9b79574d8
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/neuralee/.config.vsh.yaml
@@ -0,0 +1,217 @@
+functionality:
+  name: "neuralee"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_iter"
+    description: "Number of iterations."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to subset to. If not specified,\
+      \ the input matrix will not be subset."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean"
+    name: "--normalize"
+    description: "Whether to perform own normalization"
+    info: null
+    default:
+    - false
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "NeuralEE"
+    summary: "Non-linear method that uses a neural network to preserve pairwise distances\
+      \ between data points in a high-dimensional space."
+    description: "A neural network implementation of elastic embedding. It is a\n\
+      non-linear method that preserves pairwise distances between data points.\nNeuralEE\
+      \ uses a neural network to optimize an objective function that\nmeasures the\
+      \ difference between pairwise distances in the original\nhigh-dimensional space\
+      \ and the two-dimensional space. It is computed on both\nthe recommended input\
+      \ from the package authors of 500 HVGs selected from a\nlogged expression matrix\
+      \ (without sequencing depth scaling) and the default\nlogCPM matrix with 1000\
+      \ HVGs.\n"
+    reference: "xiong2020neuralee"
+    repository_url: "https://github.com/HiBearME/NeuralEE"
+    documentation_url: "https://github.com/HiBearME/NeuralEE#readme"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/neuralee.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      neuralee_default:
+        normalize: true
+        n_hvg: 500
+      neuralee_logCP10k_1kHVG:
+        normalize: false
+        n_hvg: 1000
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "torch"
+    - "git+https://github.com/michalk8/neuralee@8946abf"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/neuralee/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/neuralee"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/neuralee/neuralee"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/methods/neuralee/main.nf b/target/nextflow/dimensionality_reduction/methods/neuralee/main.nf
new file mode 100644
index 0000000000..7199943138
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/neuralee/main.nf
@@ -0,0 +1,3644 @@
+// neuralee 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "neuralee",
+    "namespace" : "dimensionality_reduction/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset to pass to a method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Embedding",
+          "summary" : "A dataset with dimensionality reduction embedding.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "The dimensionally reduced embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_iter",
+        "description" : "Number of iterations.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hvg",
+        "description" : "Number of highly variable genes to subset to. If not specified, the input matrix will not be subset.",
+        "default" : [
+          1000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "boolean",
+        "name" : "--normalize",
+        "description" : "Whether to perform own normalization",
+        "default" : [
+          false
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/neuralee/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/dimensionality_reduction/pancreas/",
+        "dest" : "resources_test/dimensionality_reduction/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "NeuralEE",
+      "summary" : "Non-linear method that uses a neural network to preserve pairwise distances between data points in a high-dimensional space.",
+      "description" : "A neural network implementation of elastic embedding. It is a\nnon-linear method that preserves pairwise distances between data points.\nNeuralEE uses a neural network to optimize an objective function that\nmeasures the difference between pairwise distances in the original\nhigh-dimensional space and the two-dimensional space. It is computed on both\nthe recommended input from the package authors of 500 HVGs selected from a\nlogged expression matrix (without sequencing depth scaling) and the default\nlogCPM matrix with 1000 HVGs.\n",
+      "reference" : "xiong2020neuralee",
+      "repository_url" : "https://github.com/HiBearME/NeuralEE",
+      "documentation_url" : "https://github.com/HiBearME/NeuralEE#readme",
+      "v1" : {
+        "path" : "openproblems/tasks/dimensionality_reduction/methods/neuralee.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "variants" : {
+        "neuralee_default" : {
+          "normalize" : true,
+          "n_hvg" : 500
+        },
+        "neuralee_logCP10k_1kHVG" : {
+          "normalize" : false,
+          "n_hvg" : 1000
+        }
+      },
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A dimensionality reduction method.",
+        "description" : "A dimensionality reduction method to summarise the biological\ninformation in a dataset in as few dimensions as possible.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "torch",
+            "git+https://github.com/michalk8/neuralee@8946abf"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/neuralee/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/neuralee",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import torch
+from neuralee.embedding import NeuralEE
+from neuralee.dataset import GeneExpressionDataset
+
+# todo: allow gpu
+device = torch.device("cpu")
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_iter': $( if [ ! -z ${VIASH_PAR_N_ITER+x} ]; then echo "int(r'${VIASH_PAR_N_ITER//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'normalize': $( if [ ! -z ${VIASH_PAR_NORMALIZE+x} ]; then echo "r'${VIASH_PAR_NORMALIZE//\\'/\\'\\"\\'\\"r\\'}'.lower() == 'true'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+
+if par["normalize"]:
+    print("Performing own normalization", flush=True)
+    # perform own normalization based on the "recommended" preprocessing taken from example notebooks, e.g.:
+    # https://github.com/HiBearME/NeuralEE/blob/master/tests/notebooks/retina_dataset.ipynb
+    dataset = GeneExpressionDataset(input.layers["counts"])
+    dataset.log_shift()
+    if par["n_hvg"]:
+        dataset.subsample_genes(par["n_hvg"])
+    dataset.standardscale()
+
+else:
+    X_mat = input.layers["normalized"]
+
+    if par["n_hvg"]:
+        print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+        idx = input.var["hvg_score"].to_numpy().argsort()[-par["n_hvg"]:]
+        X_mat = X_mat[:, idx]
+    
+    print("Using pre-normalized data", flush=True)
+    dataset = GeneExpressionDataset(X_mat)
+
+
+# estimate the affinity matrix
+batch_size = min(1000, input.n_obs)
+print(f"Use {batch_size} cells as batch to estimate the affinity matrix", flush=True)
+dataset.affinity_split(N_small=batch_size)
+
+print("Create NeuralEE object", flush=True)
+NEE = NeuralEE(dataset, d=2, device=device)
+fine_tune_kwargs = dict(verbose=False)
+
+if par["n_iter"]:
+    fine_tune_kwargs["maxit"] = par["n_iter"]
+
+print("Run NeuralEE", flush=True)
+res = NEE.fine_tune(**fine_tune_kwargs)
+
+X_emb = res["X"].detach().cpu().numpy()
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/methods/neuralee",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/methods/neuralee/nextflow.config b/target/nextflow/dimensionality_reduction/methods/neuralee/nextflow.config
new file mode 100644
index 0000000000..7f847bc6f9
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/neuralee/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/methods/neuralee'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/methods/pca/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/methods/pca/.config.vsh.yaml
new file mode 100644
index 0000000000..12db13a61d
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/pca/.config.vsh.yaml
@@ -0,0 +1,188 @@
+functionality:
+  name: "pca"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to subset to. If not specified,\
+      \ the input matrix will not be subset."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "PCA"
+    summary: "A linear method that finds orthogonal directions to compute the two-dimensional\
+      \ embedding."
+    description: "Principal Component Analysis is a linear method that finds orthogonal\n\
+      directions in the data that capture the most variance. The first two\nprincipal\
+      \ components are chosen as the two-dimensional embedding. We select\nonly the\
+      \ first two principal components as the two-dimensional embedding. PCA\nis calculated\
+      \ on the logCPM expression matrix with and without selecting 1000\nHVGs.\n"
+    reference: "pearson1901pca"
+    repository_url: "https://github.com/scikit-learn/scikit-learn"
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/pca.py"
+      commit: "154ccb9fd99113f3d28d9c3f139194539a0290f9"
+    preferred_normalization: "log_cp10k"
+    variants:
+      pca_logCP10k: null
+      pca_logCP10k_1kHVG:
+        n_hvg: 1000
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scanpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/pca/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/pca"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/pca/pca"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/methods/pca/main.nf b/target/nextflow/dimensionality_reduction/methods/pca/main.nf
new file mode 100644
index 0000000000..3460525399
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/pca/main.nf
@@ -0,0 +1,3575 @@
+// pca 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "pca",
+    "namespace" : "dimensionality_reduction/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset to pass to a method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Embedding",
+          "summary" : "A dataset with dimensionality reduction embedding.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "The dimensionally reduced embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hvg",
+        "description" : "Number of highly variable genes to subset to. If not specified, the input matrix will not be subset.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/pca/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/dimensionality_reduction/pancreas/",
+        "dest" : "resources_test/dimensionality_reduction/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "PCA",
+      "summary" : "A linear method that finds orthogonal directions to compute the two-dimensional embedding.",
+      "description" : "Principal Component Analysis is a linear method that finds orthogonal\ndirections in the data that capture the most variance. The first two\nprincipal components are chosen as the two-dimensional embedding. We select\nonly the first two principal components as the two-dimensional embedding. PCA\nis calculated on the logCPM expression matrix with and without selecting 1000\nHVGs.\n",
+      "reference" : "pearson1901pca",
+      "repository_url" : "https://github.com/scikit-learn/scikit-learn",
+      "documentation_url" : "https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html",
+      "v1" : {
+        "path" : "openproblems/tasks/dimensionality_reduction/methods/pca.py",
+        "commit" : "154ccb9fd99113f3d28d9c3f139194539a0290f9"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "variants" : {
+        "pca_logCP10k_1kHVG" : {
+          "n_hvg" : 1000
+        }
+      },
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A dimensionality reduction method.",
+        "description" : "A dimensionality reduction method to summarise the biological\ninformation in a dataset in as few dimensions as possible.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scanpy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/pca/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/pca",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+print(f"Running PCA", flush=True)
+X_emb = sc.tl.pca(X_mat, n_comps=2, svd_solver="arpack")[:, :2]
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/methods/pca",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/methods/pca/nextflow.config b/target/nextflow/dimensionality_reduction/methods/pca/nextflow.config
new file mode 100644
index 0000000000..3450fdb164
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/pca/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/methods/pca'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/methods/phate/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/methods/phate/.config.vsh.yaml
new file mode 100644
index 0000000000..e24311034c
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/phate/.config.vsh.yaml
@@ -0,0 +1,219 @@
+functionality:
+  name: "phate"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pca_dims"
+    description: "Number of principal components of PCA to use."
+    info: null
+    default:
+    - 50
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to subset to. If not specified,\
+      \ the input matrix will not be subset."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--gamma"
+    description: "Gamma value"
+    info: null
+    default:
+    - 1.0
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "PHATE"
+    summary: "Preservating trajectories in a dataset by using heat diffusion potential."
+    description: "PHATE or \"Potential of Heat - diffusion for Affinity - based Transition\n\
+      Embedding\" uses the potential of heat diffusion to preserve trajectories in\
+      \ a\ndataset via a diffusion process. It is an affinity - based method that\n\
+      creates an embedding by finding the dominant eigenvalues of a Markov\ntransition\
+      \ matrix. We evaluate several variants including using the\nrecommended square\
+      \ - root transformed CPM matrix as input, this input with\nthe gamma parameter\
+      \ set to zero and the normal logCPM transformed matrix with\nand without HVG\
+      \ selection.\n"
+    reference: "moon2019visualizing"
+    repository_url: "https://github.com/KrishnaswamyLab/PHATE"
+    documentation_url: "https://github.com/KrishnaswamyLab/PHATE#readme"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/phate.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "sqrt_cp10k"
+    variants:
+      phate_default: null
+      phate_sqrt:
+        gamma: 0
+      phate_logCP10k:
+        preferred_normalization: "log_cp10k"
+      phate_logCP10k_1kHVG:
+        n_hvg: 1000
+        preferred_normalization: "log_cp10k"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "phate==1.0.*"
+    - "scprep"
+    - "scikit-learn<1.2"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/phate/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/phate"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/phate/phate"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/methods/phate/main.nf b/target/nextflow/dimensionality_reduction/methods/phate/main.nf
new file mode 100644
index 0000000000..e38393c95c
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/phate/main.nf
@@ -0,0 +1,3614 @@
+// phate 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "phate",
+    "namespace" : "dimensionality_reduction/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset to pass to a method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Embedding",
+          "summary" : "A dataset with dimensionality reduction embedding.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "The dimensionally reduced embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_pca_dims",
+        "description" : "Number of principal components of PCA to use.",
+        "default" : [
+          50
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hvg",
+        "description" : "Number of highly variable genes to subset to. If not specified, the input matrix will not be subset.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "double",
+        "name" : "--gamma",
+        "description" : "Gamma value",
+        "default" : [
+          1.0
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/phate/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/dimensionality_reduction/pancreas/",
+        "dest" : "resources_test/dimensionality_reduction/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "PHATE",
+      "summary" : "Preservating trajectories in a dataset by using heat diffusion potential.",
+      "description" : "PHATE or \\"Potential of Heat - diffusion for Affinity - based Transition\nEmbedding\\" uses the potential of heat diffusion to preserve trajectories in a\ndataset via a diffusion process. It is an affinity - based method that\ncreates an embedding by finding the dominant eigenvalues of a Markov\ntransition matrix. We evaluate several variants including using the\nrecommended square - root transformed CPM matrix as input, this input with\nthe gamma parameter set to zero and the normal logCPM transformed matrix with\nand without HVG selection.\n",
+      "reference" : "moon2019visualizing",
+      "repository_url" : "https://github.com/KrishnaswamyLab/PHATE",
+      "documentation_url" : "https://github.com/KrishnaswamyLab/PHATE#readme",
+      "v1" : {
+        "path" : "openproblems/tasks/dimensionality_reduction/methods/phate.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "sqrt_cp10k",
+      "variants" : {
+        "phate_sqrt" : {
+          "gamma" : 0
+        },
+        "phate_logCP10k" : {
+          "preferred_normalization" : "log_cp10k"
+        },
+        "phate_logCP10k_1kHVG" : {
+          "n_hvg" : 1000,
+          "preferred_normalization" : "log_cp10k"
+        }
+      },
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A dimensionality reduction method.",
+        "description" : "A dimensionality reduction method to summarise the biological\ninformation in a dataset in as few dimensions as possible.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "phate==1.0.*",
+            "scprep",
+            "scikit-learn<1.2"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/phate/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/phate",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+from phate import PHATE
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_pca_dims': $( if [ ! -z ${VIASH_PAR_N_PCA_DIMS+x} ]; then echo "int(r'${VIASH_PAR_N_PCA_DIMS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'gamma': $( if [ ! -z ${VIASH_PAR_GAMMA+x} ]; then echo "float(r'${VIASH_PAR_GAMMA//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Subsetting to {par['n_hvg']} HVG", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+print("Run PHATE", flush=True)
+phate_op = PHATE(n_pca=par["n_pca_dims"], verbose=False, n_jobs=-1, gamma=par["gamma"])
+X_emb = phate_op.fit_transform(X_mat)
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/methods/phate",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/methods/phate/nextflow.config b/target/nextflow/dimensionality_reduction/methods/phate/nextflow.config
new file mode 100644
index 0000000000..2e3062da57
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/phate/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/methods/phate'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/methods/pymde/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/methods/pymde/.config.vsh.yaml
new file mode 100644
index 0000000000..ced6ce6e73
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/pymde/.config.vsh.yaml
@@ -0,0 +1,206 @@
+functionality:
+  name: "pymde"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--embed_method"
+    description: "The method to use for embedding. Options are 'umap' and 'tsne'."
+    info: null
+    default:
+    - "neighbors"
+    required: false
+    choices:
+    - "neighbors"
+    - "distances"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to subset to. If not specified,\
+      \ the input matrix will not be subset."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pca_dims"
+    description: "Number of principal components to use for the initial PCA step."
+    info: null
+    default:
+    - 100
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "PyMDE"
+    summary: "A Python implementation of Minimum-Distortion Embedding"
+    description: "PyMDE is a Python implementation of Minimum-Distortion Embedding.\
+      \ It is a non-linear\nmethod that preserves distances between cells or neighbourhoods\
+      \ in the original space.\n"
+    reference: "agrawal2021mde"
+    repository_url: "https://github.com/cvxgrp/pymde"
+    documentation_url: "https://pymde.org"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/pymde.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "pymde"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/pymde/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/pymde"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/pymde/pymde"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/methods/pymde/main.nf b/target/nextflow/dimensionality_reduction/methods/pymde/main.nf
new file mode 100644
index 0000000000..7066e846e3
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/pymde/main.nf
@@ -0,0 +1,3618 @@
+// pymde 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "pymde",
+    "namespace" : "dimensionality_reduction/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset to pass to a method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Embedding",
+          "summary" : "A dataset with dimensionality reduction embedding.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "The dimensionally reduced embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--embed_method",
+        "description" : "The method to use for embedding. Options are 'umap' and 'tsne'.",
+        "default" : [
+          "neighbors"
+        ],
+        "required" : false,
+        "choices" : [
+          "neighbors",
+          "distances"
+        ],
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hvg",
+        "description" : "Number of highly variable genes to subset to. If not specified, the input matrix will not be subset.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_pca_dims",
+        "description" : "Number of principal components to use for the initial PCA step.",
+        "default" : [
+          100
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/pymde/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/dimensionality_reduction/pancreas/",
+        "dest" : "resources_test/dimensionality_reduction/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "PyMDE",
+      "summary" : "A Python implementation of Minimum-Distortion Embedding",
+      "description" : "PyMDE is a Python implementation of Minimum-Distortion Embedding. It is a non-linear\nmethod that preserves distances between cells or neighbourhoods in the original space.\n",
+      "reference" : "agrawal2021mde",
+      "repository_url" : "https://github.com/cvxgrp/pymde",
+      "documentation_url" : "https://pymde.org",
+      "v1" : {
+        "path" : "openproblems/tasks/dimensionality_reduction/methods/pymde.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A dimensionality reduction method.",
+        "description" : "A dimensionality reduction method to summarise the biological\ninformation in a dataset in as few dimensions as possible.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "pymde"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/pymde/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/pymde",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import scanpy as sc
+import pymde
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'embed_method': $( if [ ! -z ${VIASH_PAR_EMBED_METHOD+x} ]; then echo "r'${VIASH_PAR_EMBED_METHOD//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_pca_dims': $( if [ ! -z ${VIASH_PAR_N_PCA_DIMS+x} ]; then echo "int(r'${VIASH_PAR_N_PCA_DIMS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+if par["embed_method"] == "neighbors":
+    mde_fn = pymde.preserve_neighbors
+elif par["embed_method"] == "distances":
+    mde_fn = pymde.preserve_distances
+else:
+    raise ValueError(f"Unknown embedding method: {par['embed_method']}")
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+print(f"Compute PCA", flush=True)
+X_pca = sc.tl.pca(X_mat, n_comps=par["n_pca_dims"], svd_solver="arpack")
+
+print(f"Run MDE", flush=True)
+X_emb = (
+    mde_fn(X_pca, embedding_dim=2, verbose=True)
+    .embed(verbose=True)
+    .detach()
+    .numpy()
+)
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/methods/pymde",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/methods/pymde/nextflow.config b/target/nextflow/dimensionality_reduction/methods/pymde/nextflow.config
new file mode 100644
index 0000000000..1e41e05e27
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/pymde/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/methods/pymde'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/methods/simlr/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/methods/simlr/.config.vsh.yaml
new file mode 100644
index 0000000000..0e3dc05e48
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/simlr/.config.vsh.yaml
@@ -0,0 +1,252 @@
+functionality:
+  name: "simlr"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_dim"
+    description: "Number of dimensions."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_clusters"
+    description: "Number of clusters to be estimated over the input dataset."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--tuning_param"
+    description: "Number of dimensions."
+    info: null
+    default:
+    - 10
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean"
+    name: "--impute"
+    description: "Should the input data be transposed?"
+    info: null
+    default:
+    - false
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean"
+    name: "--normalize"
+    description: "Should the input data be normalized?"
+    info: null
+    default:
+    - false
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--cores_ratio"
+    description: "Ratio of the number of cores to be used when computing the multi-kernel."
+    info: null
+    default:
+    - 1
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SIMLR"
+    summary: "Multikernel-based learning of distance metrics from gene expression\
+      \ data for dimension reduction, clustering and visulaization."
+    description: "Single-cell Interpretation via Multikernel LeaRning (SIMLR) learns\
+      \ cell-to-cell similarity measures from single-cell RNA-seq data in using Gaussian\
+      \ kernels with various hyperparameters in order to perform dimension reduction,\
+      \ clustering and visualization. \nSIMLR assumes that if C separable populations\
+      \ exist among the N cells, then the similarity matrix should have an approximate\
+      \ block-diagonal structure with C blocks whereby cells have larger similarities\
+      \ to other cells within the same subpopulations. Learned similarity between\
+      \ two cells should be small if the Euclidean distance between them is large.\
+      \ The cell-to-cell similarity is computed using an optimization framework over\
+      \ an N x N similarity matrix, a low-dimensional auxilary matrix enforcing low\
+      \ rank constraint on the similarity matrix, and the kernel weights. \nDimension\
+      \ reduction is achieved by the stochastic neighbor embedding methodology with\
+      \ the learned similarities as input. \n"
+    preferred_normalization: "log_cp10k"
+    reference: "wang2017visualization"
+    documentation_url: "https://github.com/BatzoglouLabSU/SIMLR/blob/SIMLR/README.md"
+    repository_url: "https://github.com/BatzoglouLabSU/SIMLR"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    packages:
+    - "grDevices"
+    cran:
+    - "Matrix"
+    - "parallel"
+    - "Rcpp"
+    - "pracma"
+    - "RcppAnnoy"
+    - "RSpectra"
+    - "igraph"
+    bioc:
+    - "SIMLR"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/simlr/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/simlr"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/simlr/simlr"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/methods/simlr/main.nf b/target/nextflow/dimensionality_reduction/methods/simlr/main.nf
new file mode 100644
index 0000000000..104a7673a5
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/simlr/main.nf
@@ -0,0 +1,3679 @@
+// simlr 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "simlr",
+    "namespace" : "dimensionality_reduction/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset to pass to a method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Embedding",
+          "summary" : "A dataset with dimensionality reduction embedding.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "The dimensionally reduced embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_dim",
+        "description" : "Number of dimensions.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_clusters",
+        "description" : "Number of clusters to be estimated over the input dataset.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--tuning_param",
+        "description" : "Number of dimensions.",
+        "default" : [
+          10
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "boolean",
+        "name" : "--impute",
+        "description" : "Should the input data be transposed?",
+        "default" : [
+          false
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "boolean",
+        "name" : "--normalize",
+        "description" : "Should the input data be normalized?",
+        "default" : [
+          false
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--cores_ratio",
+        "description" : "Ratio of the number of cores to be used when computing the multi-kernel.",
+        "default" : [
+          1
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/simlr/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/dimensionality_reduction/pancreas/",
+        "dest" : "resources_test/dimensionality_reduction/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "SIMLR",
+      "summary" : "Multikernel-based learning of distance metrics from gene expression data for dimension reduction, clustering and visulaization.",
+      "description" : "Single-cell Interpretation via Multikernel LeaRning (SIMLR) learns cell-to-cell similarity measures from single-cell RNA-seq data in using Gaussian kernels with various hyperparameters in order to perform dimension reduction, clustering and visualization. \nSIMLR assumes that if C separable populations exist among the N cells, then the similarity matrix should have an approximate block-diagonal structure with C blocks whereby cells have larger similarities to other cells within the same subpopulations. Learned similarity between two cells should be small if the Euclidean distance between them is large. The cell-to-cell similarity is computed using an optimization framework over an N x N similarity matrix, a low-dimensional auxilary matrix enforcing low rank constraint on the similarity matrix, and the kernel weights. \nDimension reduction is achieved by the stochastic neighbor embedding methodology with the learned similarities as input. \n",
+      "preferred_normalization" : "log_cp10k",
+      "reference" : "wang2017visualization",
+      "documentation_url" : "https://github.com/BatzoglouLabSU/SIMLR/blob/SIMLR/README.md",
+      "repository_url" : "https://github.com/BatzoglouLabSU/SIMLR",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A dimensionality reduction method.",
+        "description" : "A dimensionality reduction method to summarise the biological\ninformation in a dataset in as few dimensions as possible.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "packages" : [
+            "grDevices"
+          ],
+          "cran" : [
+            "Matrix",
+            "parallel",
+            "Rcpp",
+            "pracma",
+            "RcppAnnoy",
+            "RSpectra",
+            "igraph"
+          ],
+          "bioc" : [
+            "SIMLR"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/simlr/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/simlr",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("SIMLR", quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "n_dim" = $( if [ ! -z ${VIASH_PAR_N_DIM+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_DIM" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "n_clusters" = $( if [ ! -z ${VIASH_PAR_N_CLUSTERS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_CLUSTERS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "tuning_param" = $( if [ ! -z ${VIASH_PAR_TUNING_PARAM+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_TUNING_PARAM" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "impute" = $( if [ ! -z ${VIASH_PAR_IMPUTE+x} ]; then echo -n "as.logical(toupper('"; echo -n "$VIASH_PAR_IMPUTE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'))"; else echo NULL; fi ),
+  "normalize" = $( if [ ! -z ${VIASH_PAR_NORMALIZE+x} ]; then echo -n "as.logical(toupper('"; echo -n "$VIASH_PAR_NORMALIZE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'))"; else echo NULL; fi ),
+  "cores_ratio" = $( if [ ! -z ${VIASH_PAR_CORES_RATIO+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_CORES_RATIO" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading input files\\\\n")
+input <- anndata::read_h5ad(par\\$input)
+
+X <- t(as.matrix(input\\$layers[["normalized"]]))
+
+if (is.null(par\\$n_clusters)) {
+  cat("Estimating the number of clusters\\\\n")
+  set.seed(1)
+  NUMC = 2:5
+  estimates <- SIMLR::SIMLR_Estimate_Number_of_Clusters(
+    X = X,
+    NUMC = NUMC,
+    cores.ratio = par\\$cores_ratio
+  )
+  n_clusters <- NUMC[which.min(estimates\\$K2)]
+} else {
+  n_clusters <- par\\$n_clusters
+}
+
+if (is.null(par\\$n_dim)) {
+  n_dim <- NA
+} else {
+  n_dim <- par\\$n_dim
+}
+
+cat("Running SIMLR\\\\n")
+simlr_result <- SIMLR::SIMLR(
+  X = X,
+  c = n_clusters,
+  no.dim = n_dim,
+  k = par\\$tuning_param,
+  if.impute = par\\$impute,
+  normalize = par\\$normalize,
+  cores.ratio = par\\$cores_ratio
+)
+obsm_X_emb <- simlr_result\\$ydata
+
+cat("Write output AnnData to file\\\\n")
+output <- anndata::AnnData(
+  uns = list(
+    dataset_id = input\\$uns[["dataset_id"]],
+    method_id = meta\\$functionality_name,
+    normalization_id = input\\$uns[["normalization_id"]]
+  ),
+  obsm = list(
+    X_emb = obsm_X_emb
+  ),
+  shape = input\\$shape
+)
+output\\$write_h5ad(par\\$output, compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/methods/simlr",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/methods/simlr/nextflow.config b/target/nextflow/dimensionality_reduction/methods/simlr/nextflow.config
new file mode 100644
index 0000000000..e264f8d34a
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/simlr/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/methods/simlr'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/methods/tsne/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/methods/tsne/.config.vsh.yaml
new file mode 100644
index 0000000000..81153a2765
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/tsne/.config.vsh.yaml
@@ -0,0 +1,206 @@
+functionality:
+  name: "tsne"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to subset to. If not specified,\
+      \ the input matrix will not be subset."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pca_dims"
+    description: "Number of PCA dimensions to use. If not specified, no PCA will be\
+      \ performed."
+    info: null
+    default:
+    - 50
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "t-SNE"
+    summary: "Minimizing Kullback-Leibler divergence by converting similarities into\
+      \ joint probabilities between data points and the low/high dimensional embedding."
+    description: "t-distributed Stochastic Neighbor Embedding converts similarities\n\
+      between data points to joint probabilities and tries to minimize the\nKullback-Leibler\
+      \ divergence between the joint probabilities of the\nlow-dimensional embedding\
+      \ and the high-dimensional data. We use the\nimplementation in the scanpy package\
+      \ with the result of PCA on the logCPM\nexpression matrix (with and without\
+      \ HVG selection).\n"
+    reference: "vandermaaten2008visualizing"
+    repository_url: "https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html#sklearn.manifold.TSNE"
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html#sklearn.manifold.TSNE"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/tsne.py"
+      commit: "154ccb9fd99113f3d28d9c3f139194539a0290f9"
+    preferred_normalization: "log_cp10k"
+    variants:
+      tsne_logCP10k: null
+      tsne_logCP10k_1kHVG:
+        n_hvg: 1000
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "cmake"
+    - "gcc"
+    interactive: false
+  - type: "python"
+    user: false
+    github:
+    - "DmitryUlyanov/Multicore-TSNE"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/tsne/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/tsne"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/tsne/tsne"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/methods/tsne/main.nf b/target/nextflow/dimensionality_reduction/methods/tsne/main.nf
new file mode 100644
index 0000000000..067dbfb723
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/tsne/main.nf
@@ -0,0 +1,3602 @@
+// tsne 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "tsne",
+    "namespace" : "dimensionality_reduction/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset to pass to a method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Embedding",
+          "summary" : "A dataset with dimensionality reduction embedding.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "The dimensionally reduced embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hvg",
+        "description" : "Number of highly variable genes to subset to. If not specified, the input matrix will not be subset.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_pca_dims",
+        "description" : "Number of PCA dimensions to use. If not specified, no PCA will be performed.",
+        "default" : [
+          50
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/tsne/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/dimensionality_reduction/pancreas/",
+        "dest" : "resources_test/dimensionality_reduction/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "t-SNE",
+      "summary" : "Minimizing Kullback-Leibler divergence by converting similarities into joint probabilities between data points and the low/high dimensional embedding.",
+      "description" : "t-distributed Stochastic Neighbor Embedding converts similarities\nbetween data points to joint probabilities and tries to minimize the\nKullback-Leibler divergence between the joint probabilities of the\nlow-dimensional embedding and the high-dimensional data. We use the\nimplementation in the scanpy package with the result of PCA on the logCPM\nexpression matrix (with and without HVG selection).\n",
+      "reference" : "vandermaaten2008visualizing",
+      "repository_url" : "https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html#sklearn.manifold.TSNE",
+      "documentation_url" : "https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html#sklearn.manifold.TSNE",
+      "v1" : {
+        "path" : "openproblems/tasks/dimensionality_reduction/methods/tsne.py",
+        "commit" : "154ccb9fd99113f3d28d9c3f139194539a0290f9"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "variants" : {
+        "tsne_logCP10k_1kHVG" : {
+          "n_hvg" : 1000
+        }
+      },
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A dimensionality reduction method.",
+        "description" : "A dimensionality reduction method to summarise the biological\ninformation in a dataset in as few dimensions as possible.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "apt",
+          "packages" : [
+            "cmake",
+            "gcc"
+          ],
+          "interactive" : false
+        },
+        {
+          "type" : "python",
+          "user" : false,
+          "github" : [
+            "DmitryUlyanov/Multicore-TSNE"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/tsne/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/tsne",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_pca_dims': $( if [ ! -z ${VIASH_PAR_N_PCA_DIMS+x} ]; then echo "int(r'${VIASH_PAR_N_PCA_DIMS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Subsetting to {par['n_hvg']} HVG", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+print("Computing PCA", flush=True)
+input.obsm["X_pca"] = sc.tl.pca(X_mat, n_comps=par["n_pca_dims"], svd_solver="arpack")
+
+print("Run t-SNE", flush=True)
+sc.tl.tsne(input, use_rep="X_pca", n_pcs=par["n_pca_dims"])
+X_emb = input.obsm["X_tsne"].copy()
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/methods/tsne",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/methods/tsne/nextflow.config b/target/nextflow/dimensionality_reduction/methods/tsne/nextflow.config
new file mode 100644
index 0000000000..22f0e9ff02
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/tsne/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/methods/tsne'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/methods/umap/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/methods/umap/.config.vsh.yaml
new file mode 100644
index 0000000000..88869ab8f4
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/umap/.config.vsh.yaml
@@ -0,0 +1,205 @@
+functionality:
+  name: "umap"
+  namespace: "dimensionality_reduction/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hvg"
+    description: "Number of highly variable genes to subset to. If not specified,\
+      \ the input matrix will not be subset."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pca_dims"
+    description: "Number of PCA dimensions to use. If not specified, no PCA will be\
+      \ performed."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "UMAP"
+    summary: "A manifold learning algorithm that utilizes topological data analysis\
+      \ for dimension reduction."
+    description: "Uniform Manifold Approximation and Projection is an algorithm for\n\
+      dimension reduction based on manifold learning techniques and ideas from\ntopological\
+      \ data analysis. We perform UMAP on the logCPM expression matrix\nbefore and\
+      \ after HVG selection and with and without PCA as a pre-processing\nstep.\n"
+    reference: "mcinnes2018umap"
+    repository_url: "https://github.com/lmcinnes/umap"
+    documentation_url: "https://github.com/lmcinnes/umap#readme"
+    v1:
+      path: "openproblems/tasks/dimensionality_reduction/methods/umap.py"
+      commit: "14d70b330cae09527a6d4c4e552db240601e31cf"
+    preferred_normalization: "log_cp10k"
+    variants:
+      umap_logCP10k: null
+      umap_pca_logCP10k:
+        n_pca_dims: 50
+      umap_logCP10k_1kHVG:
+        n_hvg: 1000
+      umap_pca_logCP10k_1kHVG:
+        n_pca_dims: 50
+        n_hvg: 1000
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A dimensionality reduction method."
+      description: "A dimensionality reduction method to summarise the biological\n\
+        information in a dataset in as few dimensions as possible.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "umap-learn"
+    - "pynndescent==0.5.11"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/umap/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/umap"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/umap/umap"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/methods/umap/main.nf b/target/nextflow/dimensionality_reduction/methods/umap/main.nf
new file mode 100644
index 0000000000..2f1cb50ce0
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/umap/main.nf
@@ -0,0 +1,3609 @@
+// umap 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "umap",
+    "namespace" : "dimensionality_reduction/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset to pass to a method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Embedding",
+          "summary" : "A dataset with dimensionality reduction embedding.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "The dimensionally reduced embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hvg",
+        "description" : "Number of highly variable genes to subset to. If not specified, the input matrix will not be subset.",
+        "default" : [
+          1000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_pca_dims",
+        "description" : "Number of PCA dimensions to use. If not specified, no PCA will be performed.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/umap/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/dimensionality_reduction/pancreas/",
+        "dest" : "resources_test/dimensionality_reduction/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "UMAP",
+      "summary" : "A manifold learning algorithm that utilizes topological data analysis for dimension reduction.",
+      "description" : "Uniform Manifold Approximation and Projection is an algorithm for\ndimension reduction based on manifold learning techniques and ideas from\ntopological data analysis. We perform UMAP on the logCPM expression matrix\nbefore and after HVG selection and with and without PCA as a pre-processing\nstep.\n",
+      "reference" : "mcinnes2018umap",
+      "repository_url" : "https://github.com/lmcinnes/umap",
+      "documentation_url" : "https://github.com/lmcinnes/umap#readme",
+      "v1" : {
+        "path" : "openproblems/tasks/dimensionality_reduction/methods/umap.py",
+        "commit" : "14d70b330cae09527a6d4c4e552db240601e31cf"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "variants" : {
+        "umap_pca_logCP10k" : {
+          "n_pca_dims" : 50
+        },
+        "umap_logCP10k_1kHVG" : {
+          "n_hvg" : 1000
+        },
+        "umap_pca_logCP10k_1kHVG" : {
+          "n_pca_dims" : 50,
+          "n_hvg" : 1000
+        }
+      },
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A dimensionality reduction method.",
+        "description" : "A dimensionality reduction method to summarise the biological\ninformation in a dataset in as few dimensions as possible.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "umap-learn",
+            "pynndescent==0.5.11"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/umap/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/umap",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+from umap import UMAP
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_hvg': $( if [ ! -z ${VIASH_PAR_N_HVG+x} ]; then echo "int(r'${VIASH_PAR_N_HVG//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_pca_dims': $( if [ ! -z ${VIASH_PAR_N_PCA_DIMS+x} ]; then echo "int(r'${VIASH_PAR_N_PCA_DIMS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input = ad.read_h5ad(par["input"])
+X_mat = input.layers["normalized"]
+
+if par["n_hvg"]:
+    print(f"Select top {par['n_hvg']} high variable genes", flush=True)
+    idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]]
+    X_mat = X_mat[:, idx]
+
+if par["n_pca_dims"]:
+    print("Apply PCA to normalized data", flush=True)
+    umap_input = sc.tl.pca(
+        X_mat,
+        n_comps=par["n_pca_dims"],
+        svd_solver="arpack"
+    )
+else:
+    print("Use normalized data as input for UMAP", flush=True)
+    umap_input = X_mat
+
+print("Run UMAP", flush=True)
+X_emb = UMAP(densmap=False, random_state=42).fit_transform(umap_input)
+
+print("Create output AnnData", flush=True)
+output = ad.AnnData(
+    obs=input.obs[[]],
+    obsm={
+        "X_emb": X_emb
+    },
+    uns={
+        "dataset_id": input.uns["dataset_id"],
+        "normalization_id": input.uns["normalization_id"],
+        "method_id": meta["functionality_name"]
+    }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/methods/umap",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/methods/umap/nextflow.config b/target/nextflow/dimensionality_reduction/methods/umap/nextflow.config
new file mode 100644
index 0000000000..791a04203c
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/methods/umap/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/methods/umap'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/metrics/clustering_performance/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/metrics/clustering_performance/.config.vsh.yaml
new file mode 100644
index 0000000000..1e28c10e71
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/metrics/clustering_performance/.config.vsh.yaml
@@ -0,0 +1,299 @@
+functionality:
+  name: "clustering_performance"
+  namespace: "dimensionality_reduction/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_embedding"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Test data"
+      summary: "The data for evaluating a dimensionality reduction."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--nmi_avg_method"
+    description: "Method to compute normalizer in the denominator for normalized mutual\
+      \ information score calculation."
+    info: null
+    default:
+    - "arithmetic"
+    required: false
+    choices:
+    - "min"
+    - "geometric"
+    - "arithmetic"
+    - "max"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "normalized_mutual_information"
+      label: "NMI"
+      summary: "Normalized Mutual Information (NMI) is a measure of the concordance\
+        \ between clustering obtained from the reduced-dimensional embeddings and\
+        \ the cell labels."
+      description: "The Normalized Mutual Information (NMI) is a measure of the similarity\
+        \ between cluster labels obtained from the clustering of dimensionality reduction\
+        \ embeddings and the true cell labels. It is a normalization of the Mutual\
+        \ Information (MI) score to scale the results between 0 (no mutual information)\
+        \ and 1 (perfect correlation). \nMutual Information quantifies the \"amount\
+        \ of information\" obtained about one random variable by observing the other\
+        \ random variable. Assuming two label assignments X and Y, it is given by:\
+        \ \n  $MI(X,Y) = \\sum_{x=1}^{X}\\sum_{y=1}^{Y}p(x,y)log(\\frac{P(x,y)}{P(x)P'(y)})$,\
+        \ \nwhere P(x,y) is the joint probability mass function of X and Y, and P(x),\
+        \ P'(y) are the marginal probability mass functions of X and Y respectively.\
+        \ The mutual information is normalized by some generalized mean of H(X) and\
+        \ H(Y). Therefore, Normalized Mutual Information can be defined as: \n  $NMI(X,Y)\
+        \ = \\frac{MI(X,Y)}{mean(H(X),H(Y))}$, \nwhere H(X) and H(Y) are the entropies\
+        \ of X and Y respectively. Higher NMI score suggests that the method is effective\
+        \ in preserving relevant information.\n"
+      reference: "emmons2016analysis"
+      documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.normalized_mutual_info_score.html"
+      repository_url: "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.normalized_mutual_info_score.html"
+      min: 0
+      max: 1
+      maximize: true
+    - name: "adjusted_rand_index"
+      label: "ARI"
+      summary: "Adjusted Rand Index (ARI) is a measure of the similarities between\
+        \ two cluster assignments of the reduced-dimensional embeddings and the true\
+        \ cell types."
+      description: "Adjusted Rand Index (ARI) is a measure of similarity between two\
+        \ clusterings by considering all pairs of samples and counting pairs that\
+        \ are assigned in the same or different clusters in the predicted (from the\
+        \ reduced dimensional embeddings) and true clusterings (cell type labels).\
+        \ It is the Rand Index (RI) adjusted for chance.\nAssuming the C as the cell\
+        \ type labels and K as the clustering of the reduced dimensional embedding,\
+        \ Rand Index can be defined as:\n  $RI = \\frac{a + b}{{C}_{2}^{n_{samples}}}$,\n\
+        where 'a' is the number of pairs of elements that are in the same set in C\
+        \ and in the same set in K, 'b' is the number of pairs of elements that are\
+        \ in different sets in C and in different sets in K, and ${C}_{2}^{n_{samples}}$\
+        \ is the total number of possible pairs in the dataset. Random label assignments\
+        \ can be discounted as follows: \n  $ARI = \\frac{RI - E[RI]}{max(RI) - E[RI]}$,\
+        \ \nwhere E[RI] is the expected RI of random labellings.\n"
+      reference: "santos2009on"
+      documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html#sklearn.metrics.adjusted_rand_score"
+      repository_url: "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html#sklearn.metrics.adjusted_rand_score"
+      min: 0
+      max: 1
+      maximize: true
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A dimensionality reduction metric."
+      description: "A metric for evaluating dimensionality reductions.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    - "scanpy"
+    - "leidenalg"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/clustering_performance/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/clustering_performance"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/clustering_performance/clustering_performance"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/metrics/clustering_performance/main.nf b/target/nextflow/dimensionality_reduction/metrics/clustering_performance/main.nf
new file mode 100644
index 0000000000..b7adc69c97
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/metrics/clustering_performance/main.nf
@@ -0,0 +1,3720 @@
+// clustering_performance 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "clustering_performance",
+    "namespace" : "dimensionality_reduction/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_embedding",
+        "info" : {
+          "label" : "Embedding",
+          "summary" : "A dataset with dimensionality reduction embedding.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "The dimensionally reduced embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The data for evaluating a dimensionality reduction.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--nmi_avg_method",
+        "description" : "Method to compute normalizer in the denominator for normalized mutual information score calculation.",
+        "default" : [
+          "arithmetic"
+        ],
+        "required" : false,
+        "choices" : [
+          "min",
+          "geometric",
+          "arithmetic",
+          "max"
+        ],
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/clustering_performance/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/dimensionality_reduction/pancreas/",
+        "dest" : "resources_test/dimensionality_reduction/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_metric_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "normalized_mutual_information",
+          "label" : "NMI",
+          "summary" : "Normalized Mutual Information (NMI) is a measure of the concordance between clustering obtained from the reduced-dimensional embeddings and the cell labels.",
+          "description" : "The Normalized Mutual Information (NMI) is a measure of the similarity between cluster labels obtained from the clustering of dimensionality reduction embeddings and the true cell labels. It is a normalization of the Mutual Information (MI) score to scale the results between 0 (no mutual information) and 1 (perfect correlation). \nMutual Information quantifies the \\"amount of information\\" obtained about one random variable by observing the other random variable. Assuming two label assignments X and Y, it is given by: \n  $MI(X,Y) = \\\\sum_{x=1}^{X}\\\\sum_{y=1}^{Y}p(x,y)log(\\\\frac{P(x,y)}{P(x)P'(y)})$, \nwhere P(x,y) is the joint probability mass function of X and Y, and P(x), P'(y) are the marginal probability mass functions of X and Y respectively. The mutual information is normalized by some generalized mean of H(X) and H(Y). Therefore, Normalized Mutual Information can be defined as: \n  $NMI(X,Y) = \\\\frac{MI(X,Y)}{mean(H(X),H(Y))}$, \nwhere H(X) and H(Y) are the entropies of X and Y respectively. Higher NMI score suggests that the method is effective in preserving relevant information.\n",
+          "reference" : "emmons2016analysis",
+          "documentation_url" : "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.normalized_mutual_info_score.html",
+          "repository_url" : "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.normalized_mutual_info_score.html",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true
+        },
+        {
+          "name" : "adjusted_rand_index",
+          "label" : "ARI",
+          "summary" : "Adjusted Rand Index (ARI) is a measure of the similarities between two cluster assignments of the reduced-dimensional embeddings and the true cell types.",
+          "description" : "Adjusted Rand Index (ARI) is a measure of similarity between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted (from the reduced dimensional embeddings) and true clusterings (cell type labels). It is the Rand Index (RI) adjusted for chance.\nAssuming the C as the cell type labels and K as the clustering of the reduced dimensional embedding, Rand Index can be defined as:\n  $RI = \\\\frac{a + b}{{C}_{2}^{n_{samples}}}$,\nwhere 'a' is the number of pairs of elements that are in the same set in C and in the same set in K, 'b' is the number of pairs of elements that are in different sets in C and in different sets in K, and ${C}_{2}^{n_{samples}}$ is the total number of possible pairs in the dataset. Random label assignments can be discounted as follows: \n  $ARI = \\\\frac{RI - E[RI]}{max(RI) - E[RI]}$, \nwhere E[RI] is the expected RI of random labellings.\n",
+          "reference" : "santos2009on",
+          "documentation_url" : "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html#sklearn.metrics.adjusted_rand_score",
+          "repository_url" : "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.adjusted_rand_score.html#sklearn.metrics.adjusted_rand_score",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true
+        }
+      ],
+      "type" : "metric",
+      "type_info" : {
+        "label" : "Metric",
+        "summary" : "A dimensionality reduction metric.",
+        "description" : "A metric for evaluating dimensionality reductions.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scikit-learn",
+            "scanpy",
+            "leidenalg"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/clustering_performance/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/clustering_performance",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import scanpy as sc
+from sklearn.cluster import KMeans
+from sklearn.metrics import normalized_mutual_info_score
+from sklearn.metrics import adjusted_rand_score
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_embedding': $( if [ ! -z ${VIASH_PAR_INPUT_EMBEDDING+x} ]; then echo "r'${VIASH_PAR_INPUT_EMBEDDING//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'nmi_avg_method': $( if [ ! -z ${VIASH_PAR_NMI_AVG_METHOD+x} ]; then echo "r'${VIASH_PAR_NMI_AVG_METHOD//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_embedding = ad.read_h5ad(par['input_embedding'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Compute metrics', flush=True)
+
+# Perform Leiden clustering on dimensionlity reduction embedding
+n = 20
+resolutions = [2 * x / n for x in range(1, n + 1)]
+score_max = 0
+res_max = resolutions[0]
+key_max = None
+score_all = []
+
+if "neighbors" not in input_embedding.uns:
+  sc.pp.neighbors(input_embedding, use_rep="X_emb")
+
+for res in resolutions:
+  key_added = f"X_emb_leiden_{res}"
+  sc.tl.leiden(input_embedding, resolution=res, key_added=key_added)
+  score = normalized_mutual_info_score(input_solution.obs["cell_type"], input_embedding.obs[key_added], average_method = par['nmi_avg_method'])
+  score_all.append(score)
+
+  if score_max < score:
+    score_max = score
+    res_max = res
+    key_max = key_added
+
+# Compute NMI scores
+nmi = normalized_mutual_info_score(input_solution.obs["cell_type"], input_embedding.obs[key_max], average_method = par['nmi_avg_method'])
+
+# Compute ARI scores
+ari = adjusted_rand_score(input_solution.obs["cell_type"], input_embedding.obs[key_max])
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  uns={
+    'dataset_id': input_embedding.uns['dataset_id'],
+    'normalization_id': input_embedding.uns['normalization_id'],
+    'method_id': input_embedding.uns['method_id'],
+    'metric_ids': [ 'normalized_mutual_information', 'adjusted_rand_index' ],
+    'metric_values': [ nmi, ari ]
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/metrics/clustering_performance",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/metrics/clustering_performance/nextflow.config b/target/nextflow/dimensionality_reduction/metrics/clustering_performance/nextflow.config
new file mode 100644
index 0000000000..9d35a4ab9f
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/metrics/clustering_performance/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/metrics/clustering_performance'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/metrics/coranking/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/metrics/coranking/.config.vsh.yaml
new file mode 100644
index 0000000000..608e728bfa
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/metrics/coranking/.config.vsh.yaml
@@ -0,0 +1,374 @@
+functionality:
+  name: "coranking"
+  namespace: "dimensionality_reduction/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_embedding"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Test data"
+      summary: "The data for evaluating a dimensionality reduction."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "continuity_at_k30"
+      label: "Continuity at k=30"
+      reference: "venna2006local"
+      summary: "The continuity metric at k=30 computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      description: "The continuity metric at k=30 computed on the co-ranking matrix\
+        \ between expression matrix and embedding."
+      repository_url: "https://github.com/gdkrmr/coRanking/"
+      documentation_url: "https://coranking.guido-kraemer.com/"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py"
+        commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+        note: "The original v1 implementations consisted of a lot of helper functions\
+          \ which were \nderived from the pyDRMetrics package. This version uses the\
+          \ coRanking package\nto avoid reimplementing and potentially introducing\
+          \ a lot of bugs in how\nthe various metrics are computed.\n\nIn addition,\
+          \ the references for each of the metrics were looked up to\nproperly attribute\
+          \ the original authors of each of the metrics.\n"
+    - name: "trustworthiness_at_k30"
+      label: "Trustworthiness at k=30"
+      summary: "The trustworthiness metric at k=30 computed on the co-ranking matrix\
+        \ between expression matrix and embedding."
+      description: "The trustworthiness metric at k=30 computed on the co-ranking\
+        \ matrix between expression matrix and embedding."
+      repository_url: "https://github.com/gdkrmr/coRanking/"
+      documentation_url: "https://coranking.guido-kraemer.com/"
+      reference: "venna2006local"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py"
+        commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+        note: "The original v1 implementations consisted of a lot of helper functions\
+          \ which were \nderived from the pyDRMetrics package. This version uses the\
+          \ coRanking package\nto avoid reimplementing and potentially introducing\
+          \ a lot of bugs in how\nthe various metrics are computed.\n\nIn addition,\
+          \ the references for each of the metrics were looked up to\nproperly attribute\
+          \ the original authors of each of the metrics.\n"
+    - name: "qnx_at_k30"
+      label: "The value for QNX at k=30"
+      summary: "The QNX metric at k=30 computed on the co-ranking matrix between expression\
+        \ matrix and embedding."
+      description: "The QNX metric at k=30 computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      repository_url: "https://github.com/gdkrmr/coRanking/"
+      documentation_url: "https://coranking.guido-kraemer.com/"
+      reference: "lee2009quality"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py"
+        commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+        note: "The original v1 implementations consisted of a lot of helper functions\
+          \ which were \nderived from the pyDRMetrics package. This version uses the\
+          \ coRanking package\nto avoid reimplementing and potentially introducing\
+          \ a lot of bugs in how\nthe various metrics are computed.\n\nIn addition,\
+          \ the references for each of the metrics were looked up to\nproperly attribute\
+          \ the original authors of each of the metrics.\n"
+    - name: "lcmc_at_k30"
+      label: "The value for LCMC at k=30"
+      summary: "The LCMC metric at k=30 computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      description: "The LCMC metric at k=30 computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      repository_url: "https://github.com/gdkrmr/coRanking/"
+      documentation_url: "https://coranking.guido-kraemer.com/"
+      reference: "chen2009local"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py"
+        commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+        note: "The original v1 implementations consisted of a lot of helper functions\
+          \ which were \nderived from the pyDRMetrics package. This version uses the\
+          \ coRanking package\nto avoid reimplementing and potentially introducing\
+          \ a lot of bugs in how\nthe various metrics are computed.\n\nIn addition,\
+          \ the references for each of the metrics were looked up to\nproperly attribute\
+          \ the original authors of each of the metrics.\n"
+    - name: "qnx_auc"
+      label: "Area under the QNX curve"
+      summary: "The AU-QNX metric at k=30 computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      description: "The AU-QNX metric at k=30 computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      repository_url: "https://github.com/gdkrmr/coRanking/"
+      documentation_url: "https://coranking.guido-kraemer.com/"
+      reference: "lueks2011evaluate"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py"
+        commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+        note: "The original v1 implementations consisted of a lot of helper functions\
+          \ which were \nderived from the pyDRMetrics package. This version uses the\
+          \ coRanking package\nto avoid reimplementing and potentially introducing\
+          \ a lot of bugs in how\nthe various metrics are computed.\n\nIn addition,\
+          \ the references for each of the metrics were looked up to\nproperly attribute\
+          \ the original authors of each of the metrics.\n"
+    - name: "qlocal"
+      label: "Local quality measure"
+      summary: "The local quality metric computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      description: "The local quality metric computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      repository_url: "https://github.com/gdkrmr/coRanking/"
+      documentation_url: "https://coranking.guido-kraemer.com/"
+      reference: "lueks2011evaluate"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py"
+        commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+        note: "The original v1 implementations consisted of a lot of helper functions\
+          \ which were \nderived from the pyDRMetrics package. This version uses the\
+          \ coRanking package\nto avoid reimplementing and potentially introducing\
+          \ a lot of bugs in how\nthe various metrics are computed.\n\nIn addition,\
+          \ the references for each of the metrics were looked up to\nproperly attribute\
+          \ the original authors of each of the metrics.\n"
+    - name: "qglobal"
+      label: "Global quality measure"
+      summary: "The Global quality metric computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      description: "The Global quality metric computed on the co-ranking matrix between\
+        \ expression matrix and embedding."
+      repository_url: "https://github.com/gdkrmr/coRanking/"
+      documentation_url: "https://coranking.guido-kraemer.com/"
+      reference: "lueks2011evaluate"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py"
+        commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+        note: "The original v1 implementations consisted of a lot of helper functions\
+          \ which were \nderived from the pyDRMetrics package. This version uses the\
+          \ coRanking package\nto avoid reimplementing and potentially introducing\
+          \ a lot of bugs in how\nthe various metrics are computed.\n\nIn addition,\
+          \ the references for each of the metrics were looked up to\nproperly attribute\
+          \ the original authors of each of the metrics.\n"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A dimensionality reduction metric."
+      description: "A metric for evaluating dimensionality reductions.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "coRanking"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/coranking/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/coranking"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/coranking/coranking"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/metrics/coranking/main.nf b/target/nextflow/dimensionality_reduction/metrics/coranking/main.nf
new file mode 100644
index 0000000000..9f6288b575
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/metrics/coranking/main.nf
@@ -0,0 +1,3838 @@
+// coranking 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "coranking",
+    "namespace" : "dimensionality_reduction/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_embedding",
+        "info" : {
+          "label" : "Embedding",
+          "summary" : "A dataset with dimensionality reduction embedding.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "The dimensionally reduced embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The data for evaluating a dimensionality reduction.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/coranking/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/dimensionality_reduction/pancreas/",
+        "dest" : "resources_test/dimensionality_reduction/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_metric_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "continuity_at_k30",
+          "label" : "Continuity at k=30",
+          "reference" : "venna2006local",
+          "summary" : "The continuity metric at k=30 computed on the co-ranking matrix between expression matrix and embedding.",
+          "description" : "The continuity metric at k=30 computed on the co-ranking matrix between expression matrix and embedding.",
+          "repository_url" : "https://github.com/gdkrmr/coRanking/",
+          "documentation_url" : "https://coranking.guido-kraemer.com/",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py",
+            "commit" : "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6",
+            "note" : "The original v1 implementations consisted of a lot of helper functions which were \nderived from the pyDRMetrics package. This version uses the coRanking package\nto avoid reimplementing and potentially introducing a lot of bugs in how\nthe various metrics are computed.\n\nIn addition, the references for each of the metrics were looked up to\nproperly attribute the original authors of each of the metrics.\n"
+          }
+        },
+        {
+          "name" : "trustworthiness_at_k30",
+          "label" : "Trustworthiness at k=30",
+          "summary" : "The trustworthiness metric at k=30 computed on the co-ranking matrix between expression matrix and embedding.",
+          "description" : "The trustworthiness metric at k=30 computed on the co-ranking matrix between expression matrix and embedding.",
+          "repository_url" : "https://github.com/gdkrmr/coRanking/",
+          "documentation_url" : "https://coranking.guido-kraemer.com/",
+          "reference" : "venna2006local",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py",
+            "commit" : "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6",
+            "note" : "The original v1 implementations consisted of a lot of helper functions which were \nderived from the pyDRMetrics package. This version uses the coRanking package\nto avoid reimplementing and potentially introducing a lot of bugs in how\nthe various metrics are computed.\n\nIn addition, the references for each of the metrics were looked up to\nproperly attribute the original authors of each of the metrics.\n"
+          }
+        },
+        {
+          "name" : "qnx_at_k30",
+          "label" : "The value for QNX at k=30",
+          "summary" : "The QNX metric at k=30 computed on the co-ranking matrix between expression matrix and embedding.",
+          "description" : "The QNX metric at k=30 computed on the co-ranking matrix between expression matrix and embedding.",
+          "repository_url" : "https://github.com/gdkrmr/coRanking/",
+          "documentation_url" : "https://coranking.guido-kraemer.com/",
+          "reference" : "lee2009quality",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py",
+            "commit" : "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6",
+            "note" : "The original v1 implementations consisted of a lot of helper functions which were \nderived from the pyDRMetrics package. This version uses the coRanking package\nto avoid reimplementing and potentially introducing a lot of bugs in how\nthe various metrics are computed.\n\nIn addition, the references for each of the metrics were looked up to\nproperly attribute the original authors of each of the metrics.\n"
+          }
+        },
+        {
+          "name" : "lcmc_at_k30",
+          "label" : "The value for LCMC at k=30",
+          "summary" : "The LCMC metric at k=30 computed on the co-ranking matrix between expression matrix and embedding.",
+          "description" : "The LCMC metric at k=30 computed on the co-ranking matrix between expression matrix and embedding.",
+          "repository_url" : "https://github.com/gdkrmr/coRanking/",
+          "documentation_url" : "https://coranking.guido-kraemer.com/",
+          "reference" : "chen2009local",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py",
+            "commit" : "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6",
+            "note" : "The original v1 implementations consisted of a lot of helper functions which were \nderived from the pyDRMetrics package. This version uses the coRanking package\nto avoid reimplementing and potentially introducing a lot of bugs in how\nthe various metrics are computed.\n\nIn addition, the references for each of the metrics were looked up to\nproperly attribute the original authors of each of the metrics.\n"
+          }
+        },
+        {
+          "name" : "qnx_auc",
+          "label" : "Area under the QNX curve",
+          "summary" : "The AU-QNX metric at k=30 computed on the co-ranking matrix between expression matrix and embedding.",
+          "description" : "The AU-QNX metric at k=30 computed on the co-ranking matrix between expression matrix and embedding.",
+          "repository_url" : "https://github.com/gdkrmr/coRanking/",
+          "documentation_url" : "https://coranking.guido-kraemer.com/",
+          "reference" : "lueks2011evaluate",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py",
+            "commit" : "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6",
+            "note" : "The original v1 implementations consisted of a lot of helper functions which were \nderived from the pyDRMetrics package. This version uses the coRanking package\nto avoid reimplementing and potentially introducing a lot of bugs in how\nthe various metrics are computed.\n\nIn addition, the references for each of the metrics were looked up to\nproperly attribute the original authors of each of the metrics.\n"
+          }
+        },
+        {
+          "name" : "qlocal",
+          "label" : "Local quality measure",
+          "summary" : "The local quality metric computed on the co-ranking matrix between expression matrix and embedding.",
+          "description" : "The local quality metric computed on the co-ranking matrix between expression matrix and embedding.",
+          "repository_url" : "https://github.com/gdkrmr/coRanking/",
+          "documentation_url" : "https://coranking.guido-kraemer.com/",
+          "reference" : "lueks2011evaluate",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py",
+            "commit" : "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6",
+            "note" : "The original v1 implementations consisted of a lot of helper functions which were \nderived from the pyDRMetrics package. This version uses the coRanking package\nto avoid reimplementing and potentially introducing a lot of bugs in how\nthe various metrics are computed.\n\nIn addition, the references for each of the metrics were looked up to\nproperly attribute the original authors of each of the metrics.\n"
+          }
+        },
+        {
+          "name" : "qglobal",
+          "label" : "Global quality measure",
+          "summary" : "The Global quality metric computed on the co-ranking matrix between expression matrix and embedding.",
+          "description" : "The Global quality metric computed on the co-ranking matrix between expression matrix and embedding.",
+          "repository_url" : "https://github.com/gdkrmr/coRanking/",
+          "documentation_url" : "https://coranking.guido-kraemer.com/",
+          "reference" : "lueks2011evaluate",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/dimensionality_reduction/metrics/nn_ranking.py",
+            "commit" : "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6",
+            "note" : "The original v1 implementations consisted of a lot of helper functions which were \nderived from the pyDRMetrics package. This version uses the coRanking package\nto avoid reimplementing and potentially introducing a lot of bugs in how\nthe various metrics are computed.\n\nIn addition, the references for each of the metrics were looked up to\nproperly attribute the original authors of each of the metrics.\n"
+          }
+        }
+      ],
+      "type" : "metric",
+      "type_info" : {
+        "label" : "Metric",
+        "summary" : "A dimensionality reduction metric.",
+        "description" : "A metric for evaluating dimensionality reductions.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "cran" : [
+            "coRanking"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/coranking/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/coranking",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+library(anndata)
+library(coRanking)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_embedding" = $( if [ ! -z ${VIASH_PAR_INPUT_EMBEDDING+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_EMBEDDING" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_solution" = $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_SOLUTION" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Read anndata objects")
+input_solution <- anndata::read_h5ad(par[["input_solution"]])
+input_embedding <- anndata::read_h5ad(par[["input_embedding"]])
+
+# get datasets
+high_dim <- input_solution\\$layers[["normalized"]]
+X_emb <- input_embedding\\$obsm[["X_emb"]]
+
+if (any(is.na(X_emb))) {
+  continuity_at_k30 <-
+    trustworthiness_at_k30 <-
+    qnx_at_k30 <-
+    lcmc_at_k30 <-
+    qnx_auc <-
+    qlocal <-
+    qglobal <-
+    0
+} else {
+  cat("Compute pairwise distances\\\\n")
+  # TODO: computing a square distance matrix is problematic for large datasets!
+  # TODO: should we use a different distance metric for the high_dim?
+  # TODO: or should we subset to the HVG?
+  dist_highdim <- coRanking:::euclidean(as.matrix(high_dim))
+  dist_emb <- coRanking:::euclidean(as.matrix(X_emb))
+
+  cat("Compute ranking matrices\\\\n")
+  rmat_highdim <- rankmatrix(dist_highdim, input = "dist")
+  rmat_emb <- rankmatrix(dist_emb, input = "dist")
+
+  cat("Compute coranking matrix\\\\n")
+  corank <- coranking(rmat_highdim, rmat_emb, "rank")
+
+  cat("Compute metrics\\\\n")
+  # Compute QNX. This is a curve indicating the percentage of points
+  # that are mild in- and extrusions or keep their rank.
+  qnx <- Q_NX(corank)
+
+  # Calculate the local continuity meta-criterion from a co-ranking matrix.
+  lcmc <- LCMC(corank)
+
+  # the values of qnx are split into local and global values by kmax
+  kmax <- which.max(lcmc)
+
+  # check certain quality values at k=30
+  k30 <- 30
+  trustworthiness_at_k30 <- coRanking:::cm.M_T(corank, k30)
+  continuity_at_k30 <- coRanking:::cm.M_C(corank, k30)
+  qnx_at_k30 <- qnx[[k30]]
+  lcmc_at_k30 <- lcmc[[k30]]
+
+  # area under the QNX curve
+  qnx_auc <- mean(qnx)
+
+  # local quality measure
+  qlocal <- mean(qnx[seq_len(kmax)])
+
+  # global quality measure
+  qglobal <- mean(qnx[-seq_len(kmax)])
+}
+
+cat("construct output AnnData\\\\n")
+output <- AnnData(
+  shape = c(0L, 0L),
+  uns = list(
+    dataset_id = input_solution\\$uns[["dataset_id"]],
+    normalization_id = input_solution\\$uns[["normalization_id"]],
+    method_id = input_embedding\\$uns[["method_id"]],
+    metric_ids = c(
+      "continuity_at_k30",
+      "trustworthiness_at_k30",
+      "qnx_at_k30",
+      "lcmc_at_k30",
+      "qnx_auc",
+      "qlocal",
+      "qglobal"
+    ),
+    metric_values = c(
+      continuity_at_k30,
+      trustworthiness_at_k30,
+      qnx_at_k30,
+      lcmc_at_k30,
+      qnx_auc,
+      qlocal,
+      qglobal
+    )
+  )
+)
+
+cat("Write to file\\\\n")
+output\\$write_h5ad(par\\$output)
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/metrics/coranking",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/metrics/coranking/nextflow.config b/target/nextflow/dimensionality_reduction/metrics/coranking/nextflow.config
new file mode 100644
index 0000000000..388fc98d5e
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/metrics/coranking/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/metrics/coranking'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/metrics/density_preservation/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/metrics/density_preservation/.config.vsh.yaml
new file mode 100644
index 0000000000..a0434e7a3f
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/metrics/density_preservation/.config.vsh.yaml
@@ -0,0 +1,267 @@
+functionality:
+  name: "density_preservation"
+  namespace: "dimensionality_reduction/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_embedding"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Test data"
+      summary: "The data for evaluating a dimensionality reduction."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_neighbors"
+    description: "Number of neighbors to use for density estimation."
+    info: null
+    default:
+    - 30
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--seed"
+    description: "Random seed."
+    info: null
+    default:
+    - 42
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "density_preservation"
+      label: "Density preservation"
+      summary: "Similarity between local densities in the high-dimensional data and\
+        \ the reduced data."
+      description: "\"Similarity between local densities in the high-dimensional data\
+        \ and the reduced data.\nThis is computed as the pearson correlation of local\
+        \ radii with the local radii in the original data space.\"\n"
+      reference: "narayan2021assessing"
+      min: -1
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/density.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A dimensionality reduction metric."
+      description: "A metric for evaluating dimensionality reductions.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scipy"
+    - "numpy"
+    - "umap-learn"
+    - "pynndescent~=0.5.11"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/density_preservation/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/density_preservation"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/density_preservation/density_preservation"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/metrics/density_preservation/main.nf b/target/nextflow/dimensionality_reduction/metrics/density_preservation/main.nf
new file mode 100644
index 0000000000..833d71b65a
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/metrics/density_preservation/main.nf
@@ -0,0 +1,3786 @@
+// density_preservation 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "density_preservation",
+    "namespace" : "dimensionality_reduction/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_embedding",
+        "info" : {
+          "label" : "Embedding",
+          "summary" : "A dataset with dimensionality reduction embedding.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "The dimensionally reduced embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The data for evaluating a dimensionality reduction.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_neighbors",
+        "description" : "Number of neighbors to use for density estimation.",
+        "default" : [
+          30
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--seed",
+        "description" : "Random seed.",
+        "default" : [
+          42
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/density_preservation/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/dimensionality_reduction/pancreas/",
+        "dest" : "resources_test/dimensionality_reduction/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_metric_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "density_preservation",
+          "label" : "Density preservation",
+          "summary" : "Similarity between local densities in the high-dimensional data and the reduced data.",
+          "description" : "\\"Similarity between local densities in the high-dimensional data and the reduced data.\nThis is computed as the pearson correlation of local radii with the local radii in the original data space.\\"\n",
+          "reference" : "narayan2021assessing",
+          "min" : -1,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/dimensionality_reduction/metrics/density.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          }
+        }
+      ],
+      "type" : "metric",
+      "type_info" : {
+        "label" : "Metric",
+        "summary" : "A dimensionality reduction metric.",
+        "description" : "A metric for evaluating dimensionality reductions.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scipy",
+            "numpy",
+            "umap-learn",
+            "pynndescent~=0.5.11"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/density_preservation/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/density_preservation",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+
+
+import anndata as ad
+import numpy as np
+from typing import Optional
+from umap import UMAP
+from scipy.stats import pearsonr
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_embedding': $( if [ ! -z ${VIASH_PAR_INPUT_EMBEDDING+x} ]; then echo "r'${VIASH_PAR_INPUT_EMBEDDING//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_neighbors': $( if [ ! -z ${VIASH_PAR_N_NEIGHBORS+x} ]; then echo "int(r'${VIASH_PAR_N_NEIGHBORS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'seed': $( if [ ! -z ${VIASH_PAR_SEED+x} ]; then echo "int(r'${VIASH_PAR_SEED//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# Interpreted from:
+# https://github.com/lmcinnes/umap/blob/317ce81dc64aec9e279aa1374ac809d9ced236f6/umap/umap_.py#L1190-L1243
+#
+# Author: Leland McInnes <leland.mcinnes@gmail.com>
+#
+# License: BSD 3 clause
+def _calculate_radii(
+    X: np.ndarray,
+    n_neighbors: int = 30,
+    random_state: Optional[int] = None
+) -> np.ndarray:
+    from umap.umap_ import fuzzy_simplicial_set
+    from umap.umap_ import nearest_neighbors
+
+    (knn_indices, knn_dists, _) = nearest_neighbors(
+        X,
+        n_neighbors,
+        "euclidean",
+        {},
+        False,
+        random_state,
+        verbose=False,
+    )
+
+    emb_graph, _, _, emb_dists = fuzzy_simplicial_set(
+        X,
+        n_neighbors,
+        random_state,
+        "euclidean",
+        {},
+        knn_indices,
+        knn_dists,
+        verbose=False,
+        return_dists=True,
+    )
+
+    emb_graph = emb_graph.tocoo()
+    emb_graph.sum_duplicates()
+    emb_graph.eliminate_zeros()
+
+    n_vertices = emb_graph.shape[1]
+
+    mu_sum = np.zeros(n_vertices, dtype=np.float32)
+    re = np.zeros(n_vertices, dtype=np.float32)
+
+    head = emb_graph.row
+    tail = emb_graph.col
+    for i in range(len(head)):
+        j = head[i]
+        k = tail[i]
+        D = emb_dists[j, k]
+        mu = emb_graph.data[i]
+        re[j] += mu * D
+        re[k] += mu * D
+        mu_sum[j] += mu
+        mu_sum[k] += mu
+
+    epsilon = 1e-8
+    return np.log(epsilon + (re / mu_sum))
+
+def compute_density_preservation(
+    X_emb: np.ndarray,
+    high_dim: np.ndarray,
+    n_neighbors: int = 30,
+    random_state: Optional[int] = None
+) -> float:
+    if np.any(np.isnan(X_emb)):
+        return 0.0
+    
+    print("Compute local radii in original data", flush=True)
+    ro = _calculate_radii(
+        high_dim,
+        n_neighbors=n_neighbors,
+        random_state=random_state
+    )
+
+    print("Compute local radii of embedding", flush=True)
+    re = _calculate_radii(
+        X_emb,
+        n_neighbors=n_neighbors,
+        random_state=random_state
+    )
+    
+    print("Compute pearson correlation", flush=True)
+    return pearsonr(ro, re)[0]
+
+
+print("Load data", flush=True)
+input_solution = ad.read_h5ad(par["input_solution"])
+input_embedding = ad.read_h5ad(par["input_embedding"])
+
+high_dim = input_solution.layers["normalized"]
+X_emb = input_embedding.obsm["X_emb"]
+
+density_preservation = compute_density_preservation(
+    X_emb=X_emb,
+    high_dim=high_dim,
+    n_neighbors=par["n_neighbors"],
+    random_state=par["seed"]
+)
+
+print("Create output AnnData object", flush=True)
+output = ad.AnnData(
+    uns={
+        "dataset_id": input_solution.uns["dataset_id"],
+        "normalization_id": input_solution.uns["normalization_id"],
+        "method_id": input_embedding.uns["method_id"],
+        "metric_ids": [ "density_preservation" ],
+        "metric_values": [ density_preservation ]
+    }
+)
+
+print("Write data to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/metrics/density_preservation",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/metrics/density_preservation/nextflow.config b/target/nextflow/dimensionality_reduction/metrics/density_preservation/nextflow.config
new file mode 100644
index 0000000000..4bb744c0a0
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/metrics/density_preservation/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/metrics/density_preservation'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/metrics/distance_correlation/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/metrics/distance_correlation/.config.vsh.yaml
new file mode 100644
index 0000000000..c07ec07d5d
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/metrics/distance_correlation/.config.vsh.yaml
@@ -0,0 +1,267 @@
+functionality:
+  name: "distance_correlation"
+  namespace: "dimensionality_reduction/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_embedding"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Test data"
+      summary: "The data for evaluating a dimensionality reduction."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean_true"
+    name: "--spectral"
+    description: "Calculate the spectral root mean squared error."
+    info: null
+    direction: "input"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "distance_correlation"
+      label: "Distance Correlation"
+      summary: "Calculates the distance correlation by computing Spearman correlations\
+        \ between distances."
+      description: "Calculates the distance correlation by computing Spearman correlations\
+        \ between distances on the full (or processed) data matrix and the dimensionally-reduced\
+        \ matrix."
+      reference: "kruskal1964mds"
+      min: 0
+      max: "+.inf"
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/distance_correlation.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+        note: "This metric was ported but will probably be removed soon."
+    - name: "distance_correlation_spectral"
+      label: "Distance Correlation Spectral"
+      summary: "Spearman correlation between all pairwise diffusion distances in the\
+        \ original and dimension-reduced data."
+      description: "Spearman correlation between all pairwise diffusion distances\
+        \ in the original and dimension-reduced data."
+      reference: "coifman2006diffusion"
+      min: 0
+      max: "+.inf"
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/root_mean_square_error.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+        note: "This metric was ported but will probably be removed soon."
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A dimensionality reduction metric."
+      description: "A metric for evaluating dimensionality reductions.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "umap-learn"
+    - "scikit-learn"
+    - "numpy"
+    - "pynndescent~=0.5.11"
+    - "scipy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/distance_correlation/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/distance_correlation"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/distance_correlation/distance_correlation"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/metrics/distance_correlation/main.nf b/target/nextflow/dimensionality_reduction/metrics/distance_correlation/main.nf
new file mode 100644
index 0000000000..8bda1216f1
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/metrics/distance_correlation/main.nf
@@ -0,0 +1,3712 @@
+// distance_correlation 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "distance_correlation",
+    "namespace" : "dimensionality_reduction/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_embedding",
+        "info" : {
+          "label" : "Embedding",
+          "summary" : "A dataset with dimensionality reduction embedding.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "The dimensionally reduced embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The data for evaluating a dimensionality reduction.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "boolean_true",
+        "name" : "--spectral",
+        "description" : "Calculate the spectral root mean squared error.",
+        "direction" : "input",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/distance_correlation/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/dimensionality_reduction/pancreas/",
+        "dest" : "resources_test/dimensionality_reduction/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_metric_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "distance_correlation",
+          "label" : "Distance Correlation",
+          "summary" : "Calculates the distance correlation by computing Spearman correlations between distances.",
+          "description" : "Calculates the distance correlation by computing Spearman correlations between distances on the full (or processed) data matrix and the dimensionally-reduced matrix.",
+          "reference" : "kruskal1964mds",
+          "min" : 0,
+          "max" : "+.inf",
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/dimensionality_reduction/metrics/distance_correlation.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32",
+            "note" : "This metric was ported but will probably be removed soon."
+          }
+        },
+        {
+          "name" : "distance_correlation_spectral",
+          "label" : "Distance Correlation Spectral",
+          "summary" : "Spearman correlation between all pairwise diffusion distances in the original and dimension-reduced data.",
+          "description" : "Spearman correlation between all pairwise diffusion distances in the original and dimension-reduced data.",
+          "reference" : "coifman2006diffusion",
+          "min" : 0,
+          "max" : "+.inf",
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/dimensionality_reduction/metrics/root_mean_square_error.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32",
+            "note" : "This metric was ported but will probably be removed soon."
+          }
+        }
+      ],
+      "type" : "metric",
+      "type_info" : {
+        "label" : "Metric",
+        "summary" : "A dimensionality reduction metric.",
+        "description" : "A metric for evaluating dimensionality reductions.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "umap-learn",
+            "scikit-learn",
+            "numpy",
+            "pynndescent~=0.5.11",
+            "scipy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/distance_correlation/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/distance_correlation",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import numpy as np
+import sklearn.decomposition
+import scipy.stats
+import scipy.spatial
+from sklearn.metrics import pairwise_distances
+import umap
+import umap.spectral
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_embedding': $( if [ ! -z ${VIASH_PAR_INPUT_EMBEDDING+x} ]; then echo "r'${VIASH_PAR_INPUT_EMBEDDING//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'spectral': $( if [ ! -z ${VIASH_PAR_SPECTRAL+x} ]; then echo "r'${VIASH_PAR_SPECTRAL//\\'/\\'\\"\\'\\"r\\'}'.lower() == 'true'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+def _distance_correlation(X, X_emb):
+    high_dimensional_distance_vector = scipy.spatial.distance.pdist(X)
+    low_dimensional_distance_vector = scipy.spatial.distance.pdist(X_emb)
+    corr = scipy.stats.spearmanr(
+        low_dimensional_distance_vector, high_dimensional_distance_vector
+    )
+    return corr
+
+print("Load data", flush=True)
+input_solution = ad.read_h5ad(par["input_solution"])
+input_embedding = ad.read_h5ad(par["input_embedding"])
+
+high_dim = input_solution.layers["normalized"]
+X_emb = input_embedding.obsm["X_emb"]
+
+print("Compute NNLS residual after SVD", flush=True)
+n_svd = 500
+svd_emb = sklearn.decomposition.TruncatedSVD(n_svd).fit_transform(high_dim)
+dist_corr = _distance_correlation(svd_emb, X_emb).correlation
+
+#! Explicitly not changing it to use diffusion map method as this will have a positive effect on the diffusion map method for this specific metric.
+print("Compute NLSS residual after spectral embedding", flush=True)
+n_comps = min(1000, min(input_solution.shape) - 2)
+umap_graph = umap.UMAP(transform_mode="graph").fit_transform(high_dim)
+spectral_emb = umap.spectral.spectral_layout(
+    high_dim, umap_graph, n_comps, random_state=np.random.default_rng()
+)
+dist_corr_spectral = _distance_correlation(spectral_emb, X_emb).correlation
+
+print("Create output AnnData object", flush=True)
+output = ad.AnnData(
+    uns={
+        "dataset_id": input_solution.uns["dataset_id"],
+        "normalization_id": input_solution.uns["normalization_id"],
+        "method_id": input_embedding.uns["method_id"],
+        "metric_ids": [ "distance_correlation", "distance_correlation_spectral" ],
+        "metric_values": [ dist_corr, dist_corr_spectral ]
+    }
+)
+
+print("Write data to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/metrics/distance_correlation",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/metrics/distance_correlation/nextflow.config b/target/nextflow/dimensionality_reduction/metrics/distance_correlation/nextflow.config
new file mode 100644
index 0000000000..fe1b8a3ff7
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/metrics/distance_correlation/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/metrics/distance_correlation'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/metrics/trustworthiness/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/metrics/trustworthiness/.config.vsh.yaml
new file mode 100644
index 0000000000..8d0692a111
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/metrics/trustworthiness/.config.vsh.yaml
@@ -0,0 +1,244 @@
+functionality:
+  name: "trustworthiness"
+  namespace: "dimensionality_reduction/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_embedding"
+    info:
+      label: "Embedding"
+      summary: "A dataset with dimensionality reduction embedding."
+      slots:
+        obsm:
+        - type: "double"
+          name: "X_emb"
+          description: "The dimensionally reduced embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Test data"
+      summary: "The data for evaluating a dimensionality reduction."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/dimensionality_reduction/pancreas/"
+    dest: "resources_test/dimensionality_reduction/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "trustworthiness"
+      label: "Trustworthiness at k=15"
+      summary: "A measurement of similarity between the rank of each point's nearest\
+        \ neighbors in the high-dimensional data and the reduced data."
+      description: "A measurement of similarity between the rank of each point's nearest\
+        \ neighbors in the high-dimensional data and the reduced data."
+      reference: "venna2006local"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/dimensionality_reduction/metrics/trustworthiness.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+        note: "This metric is already included in the 'coranking' component and can\
+          \ be removed."
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A dimensionality reduction metric."
+      description: "A metric for evaluating dimensionality reductions.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    - "numpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/trustworthiness/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/trustworthiness"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/trustworthiness/trustworthiness"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/metrics/trustworthiness/main.nf b/target/nextflow/dimensionality_reduction/metrics/trustworthiness/main.nf
new file mode 100644
index 0000000000..3be0909598
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/metrics/trustworthiness/main.nf
@@ -0,0 +1,3664 @@
+// trustworthiness 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "trustworthiness",
+    "namespace" : "dimensionality_reduction/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_embedding",
+        "info" : {
+          "label" : "Embedding",
+          "summary" : "A dataset with dimensionality reduction embedding.",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_emb",
+                "description" : "The dimensionally reduced embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/embedding.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The data for evaluating a dimensionality reduction.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/trustworthiness/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/dimensionality_reduction/pancreas/",
+        "dest" : "resources_test/dimensionality_reduction/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_metric_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "trustworthiness",
+          "label" : "Trustworthiness at k=15",
+          "summary" : "A measurement of similarity between the rank of each point's nearest neighbors in the high-dimensional data and the reduced data.",
+          "description" : "A measurement of similarity between the rank of each point's nearest neighbors in the high-dimensional data and the reduced data.",
+          "reference" : "venna2006local",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/dimensionality_reduction/metrics/trustworthiness.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32",
+            "note" : "This metric is already included in the 'coranking' component and can be removed."
+          }
+        }
+      ],
+      "type" : "metric",
+      "type_info" : {
+        "label" : "Metric",
+        "summary" : "A dimensionality reduction metric.",
+        "description" : "A metric for evaluating dimensionality reductions.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scikit-learn",
+            "numpy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/trustworthiness/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/trustworthiness",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import numpy as np
+from sklearn import manifold
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_embedding': $( if [ ! -z ${VIASH_PAR_INPUT_EMBEDDING+x} ]; then echo "r'${VIASH_PAR_INPUT_EMBEDDING//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+input_solution = ad.read_h5ad(par["input_solution"])
+input_embedding = ad.read_h5ad(par["input_embedding"])
+
+high_dim = input_solution.layers["normalized"]
+X_emb = input_embedding.obsm["X_emb"]
+
+print("Reduce dimensionality of raw data", flush=True)
+trustworthiness = manifold.trustworthiness(
+    high_dim, X_emb, n_neighbors=15, metric="euclidean"
+)
+
+print("Create output AnnData object", flush=True)
+output = ad.AnnData(
+    uns={
+        "dataset_id": input_solution.uns["dataset_id"],
+        "normalization_id": input_solution.uns["normalization_id"],
+        "method_id": input_embedding.uns["method_id"],
+        "metric_ids": [ "trustworthiness" ],
+        "metric_values": [ trustworthiness ]
+    }
+)
+
+print("Write data to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/metrics/trustworthiness",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/metrics/trustworthiness/nextflow.config b/target/nextflow/dimensionality_reduction/metrics/trustworthiness/nextflow.config
new file mode 100644
index 0000000000..c2ba06b39d
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/metrics/trustworthiness/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/metrics/trustworthiness'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/process_dataset/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/process_dataset/.config.vsh.yaml
new file mode 100644
index 0000000000..1ad1ce832c
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/process_dataset/.config.vsh.yaml
@@ -0,0 +1,444 @@
+functionality:
+  name: "process_dataset"
+  namespace: "dimensionality_reduction"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Common dataset"
+      summary: "A dataset processed by the common dataset processing pipeline."
+      description: "This dataset contains both raw counts and normalized data matrices,\n\
+        as well as a PCA embedding, HVG selection and a kNN graph."
+      slots:
+        obsp:
+        - type: "double"
+          name: "knn_distances"
+          description: "K nearest neighbors distance matrix."
+          required: true
+        - type: "double"
+          name: "knn_connectivities"
+          description: "K nearest neighbors connectivities matrix."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset. This is different from\
+            \ the `obs.dataset_id` field, which is the identifier for the dataset\
+            \ from which the cell data is derived."
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "A human-readable name for the dataset."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+          multiple: true
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+          multiple: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "double"
+          name: "pca_variance"
+          description: "The PCA variance objects."
+          required: true
+        - type: "object"
+          name: "knn"
+          description: "Supplementary K nearest neighbors data."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        varm:
+        - type: "double"
+          name: "pca_loadings"
+          description: "The PCA loadings matrix."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the feature."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "dataset_id"
+          description: "Identifier for the dataset from which the cell data is derived,\
+            \ useful for tracking and referencing purposes."
+          required: false
+        - type: "string"
+          name: "assay"
+          description: "Type of assay used to generate the cell data, indicating the\
+            \ methodology or technique employed."
+          required: false
+        - type: "string"
+          name: "assay_ontology_term_id"
+          description: "Experimental Factor Ontology (`EFO:`) term identifier for\
+            \ the assay, providing a standardized reference to the assay type."
+          required: false
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: false
+        - type: "string"
+          name: "cell_type_ontology_term_id"
+          description: "Cell Ontology (`CL:`) term identifier for the cell type, offering\
+            \ a standardized reference to the specific cell classification."
+          required: false
+        - type: "string"
+          name: "development_stage"
+          description: "Stage of development of the organism or tissue from which\
+            \ the cell is derived, indicating its maturity or developmental phase."
+          required: false
+        - type: "string"
+          name: "development_stage_ontology_term_id"
+          description: "Ontology term identifier for the developmental stage, providing\
+            \ a standardized reference to the organism's developmental phase.\n\n\
+            If the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Developmental Stages (`HsapDv:`) ontology is used.  \n\
+            If the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`),\
+            \ then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\n\
+            Otherwise, the Uberon (`UBERON:`) ontology is used.\n"
+          required: false
+        - type: "string"
+          name: "disease"
+          description: "Information on any disease or pathological condition associated\
+            \ with the cell or donor."
+          required: false
+        - type: "string"
+          name: "disease_ontology_term_id"
+          description: "Ontology term identifier for the disease, enabling standardized\
+            \ disease classification and referencing.\n\nMust be a term from the Mondo\
+            \ Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the\
+            \ Phenotype And Trait Ontology (`PATO:`).\n"
+          required: false
+        - type: "string"
+          name: "donor_id"
+          description: "Identifier for the donor from whom the cell sample is obtained."
+          required: false
+        - type: "boolean"
+          name: "is_primary_data"
+          description: "Indicates whether the data is primary (directly obtained from\
+            \ experiments) or has been computationally derived from other primary\
+            \ data."
+          required: false
+        - type: "string"
+          name: "organism"
+          description: "Organism from which the cell sample is obtained."
+          required: false
+        - type: "string"
+          name: "organism_ontology_term_id"
+          description: "Ontology term identifier for the organism, providing a standardized\
+            \ reference for the organism.\n\nMust be a term from the NCBI Taxonomy\
+            \ Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n"
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity"
+          description: "Ethnicity of the donor as self-reported, relevant for studies\
+            \ considering genetic diversity and population-specific traits."
+          required: false
+        - type: "string"
+          name: "self_reported_ethnicity_ontology_term_id"
+          description: "Ontology term identifier for the self-reported ethnicity,\
+            \ providing a standardized reference for ethnic classifications.\n\nIf\
+            \ the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`),\
+            \ then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n"
+          required: false
+        - type: "string"
+          name: "sex"
+          description: "Biological sex of the donor or source organism, crucial for\
+            \ studies involving sex-specific traits or conditions."
+          required: false
+        - type: "string"
+          name: "sex_ontology_term_id"
+          description: "Ontology term identifier for the biological sex, ensuring\
+            \ standardized classification of sex. Only `PATO:0000383`, `PATO:0000384`\
+            \ and `PATO:0001340` are allowed."
+          required: false
+        - type: "string"
+          name: "suspension_type"
+          description: "Type of suspension or medium in which the cells were stored\
+            \ or processed, important for understanding cell handling and conditions."
+          required: false
+        - type: "string"
+          name: "tissue"
+          description: "Specific tissue from which the cells were derived, key for\
+            \ context and specificity in cell studies."
+          required: false
+        - type: "string"
+          name: "tissue_ontology_term_id"
+          description: "Ontology term identifier for the tissue, providing a standardized\
+            \ reference for the tissue type.\n\nFor organoid or tissue samples, the\
+            \ Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child\
+            \ term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the\
+            \ Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`,\
+            \ `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "tissue_general"
+          description: "General category or classification of the tissue, useful for\
+            \ broader grouping and comparison of cell data."
+          required: false
+        - type: "string"
+          name: "tissue_general_ontology_term_id"
+          description: "Ontology term identifier for the general tissue category,\
+            \ aiding in standardizing and grouping tissue types.\n\nFor organoid or\
+            \ tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term\
+            \ ids must be a child term of `UBERON:0001062` (anatomical entity).\n\
+            For cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot\
+            \ be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n"
+          required: false
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        - type: "integer"
+          name: "soma_joinid"
+          description: "If the dataset was retrieved from CELLxGENE census, this is\
+            \ a unique identifier for the cell."
+          required: false
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors created by the normalisation method, if any."
+          required: false
+    example:
+    - "resources_test/common/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_dataset"
+    info:
+      label: "Dataset"
+      summary: "The dataset to pass to a method."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_solution"
+    info:
+      label: "Test data"
+      summary: "The data for evaluating a dimensionality reduction."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Classification of the cell type based on its characteristics\
+            \ and function within the tissue or organism."
+          required: true
+        var:
+        - type: "double"
+          name: "hvg_score"
+          description: "High variability gene score (normalized dispersion). The greater,\
+            \ the more variable."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/subset_anndata.py"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas/"
+    dest: "resources_test/common/pancreas/"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "process_dataset"
+    type_info:
+      label: "Data processor"
+      summary: "A dimensionality reduction dataset processor."
+      description: "A component for processing a Common Dataset into a task-specific\
+        \ dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/process_dataset/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/process_dataset"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/process_dataset/process_dataset"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/process_dataset/main.nf b/target/nextflow/dimensionality_reduction/process_dataset/main.nf
new file mode 100644
index 0000000000..c27bcb09b9
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/process_dataset/main.nf
@@ -0,0 +1,3890 @@
+// process_dataset 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_dataset",
+    "namespace" : "dimensionality_reduction",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Common dataset",
+          "summary" : "A dataset processed by the common dataset processing pipeline.",
+          "description" : "This dataset contains both raw counts and normalized data matrices,\nas well as a PCA embedding, HVG selection and a kNN graph.",
+          "slots" : {
+            "obsp" : [
+              {
+                "type" : "double",
+                "name" : "knn_distances",
+                "description" : "K nearest neighbors distance matrix.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "knn_connectivities",
+                "description" : "K nearest neighbors connectivities matrix.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "A human-readable name for the dataset.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false,
+                "multiple" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "pca_variance",
+                "description" : "The PCA variance objects.",
+                "required" : true
+              },
+              {
+                "type" : "object",
+                "name" : "knn",
+                "description" : "Supplementary K nearest neighbors data.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "varm" : [
+              {
+                "type" : "double",
+                "name" : "pca_loadings",
+                "description" : "The PCA loadings matrix.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay",
+                "description" : "Type of assay used to generate the cell data, indicating the methodology or technique employed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "assay_ontology_term_id",
+                "description" : "Experimental Factor Ontology (`EFO:`) term identifier for the assay, providing a standardized reference to the assay type.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "cell_type_ontology_term_id",
+                "description" : "Cell Ontology (`CL:`) term identifier for the cell type, offering a standardized reference to the specific cell classification.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage",
+                "description" : "Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "development_stage_ontology_term_id",
+                "description" : "Ontology term identifier for the developmental stage, providing a standardized reference to the organism's developmental phase.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Developmental Stages (`HsapDv:`) ontology is used.  \nIf the organism is mouse (`organism_ontology_term_id == 'NCBITaxon:10090'`), then the Mouse Developmental Stages (`MmusDv:`) ontology is used.\nOtherwise, the Uberon (`UBERON:`) ontology is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease",
+                "description" : "Information on any disease or pathological condition associated with the cell or donor.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "disease_ontology_term_id",
+                "description" : "Ontology term identifier for the disease, enabling standardized disease classification and referencing.\n\nMust be a term from the Mondo Disease Ontology (`MONDO:`) ontology term, or `PATO:0000461` from the Phenotype And Trait Ontology (`PATO:`).\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "donor_id",
+                "description" : "Identifier for the donor from whom the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "is_primary_data",
+                "description" : "Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism",
+                "description" : "Organism from which the cell sample is obtained.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "organism_ontology_term_id",
+                "description" : "Ontology term identifier for the organism, providing a standardized reference for the organism.\n\nMust be a term from the NCBI Taxonomy Ontology (`NCBITaxon:`) which is a child of `NCBITaxon:33208`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity",
+                "description" : "Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "self_reported_ethnicity_ontology_term_id",
+                "description" : "Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications.\n\nIf the organism is human (`organism_ontology_term_id == 'NCBITaxon:9606'`), then the Human Ancestry Ontology (`HANCESTRO:`) is used.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex",
+                "description" : "Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "sex_ontology_term_id",
+                "description" : "Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only `PATO:0000383`, `PATO:0000384` and `PATO:0001340` are allowed.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "suspension_type",
+                "description" : "Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue",
+                "description" : "Specific tissue from which the cells were derived, key for context and specificity in cell studies.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_ontology_term_id",
+                "description" : "Ontology term identifier for the tissue, providing a standardized reference for the tissue type.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general",
+                "description" : "General category or classification of the tissue, useful for broader grouping and comparison of cell data.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "tissue_general_ontology_term_id",
+                "description" : "Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types.\n\nFor organoid or tissue samples, the Uber-anatomy ontology (`UBERON:`) is used. The term ids must be a child term of `UBERON:0001062` (anatomical entity).\nFor cell cultures, the Cell Ontology (`CL:`) is used. The term ids cannot be `CL:0000255`, `CL:0000257` or `CL:0000548`.\n",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "soma_joinid",
+                "description" : "If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors created by the normalisation method, if any.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_dataset",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset to pass to a method.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_solution",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The data for evaluating a dimensionality reduction.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/process_dataset/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/helper_functions/subset_anndata.py",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/common/pancreas/",
+        "dest" : "resources_test/common/pancreas/",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "process_dataset",
+      "type_info" : {
+        "label" : "Data processor",
+        "summary" : "A dimensionality reduction dataset processor.",
+        "description" : "A component for processing a Common Dataset into a task-specific dataset.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/process_dataset/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/process_dataset",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_dataset': $( if [ ! -z ${VIASH_PAR_OUTPUT_DATASET+x} ]; then echo "r'${VIASH_PAR_OUTPUT_DATASET//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_solution': $( if [ ! -z ${VIASH_PAR_OUTPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_OUTPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# import helper functions
+sys.path.append(meta['resources_dir'])
+from subset_anndata import read_config_slots_info, subset_anndata
+
+print(">> Load Data", flush=True)
+adata = ad.read_h5ad(par["input"])
+
+print(">> Figuring out which data needs to be copied to which output file", flush=True)
+slot_info = read_config_slots_info(meta["config"])
+
+print(">> Creating train data", flush=True)
+output_dataset = subset_anndata(adata, slot_info["output_dataset"])
+
+print(">> Creating test data", flush=True)
+output_solution = subset_anndata(adata, slot_info["output_solution"])
+
+print(">> Writing", flush=True)
+output_dataset.write_h5ad(par["output_dataset"])
+output_solution.write_h5ad(par["output_solution"])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/dimensionality_reduction/process_dataset",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/process_dataset/nextflow.config b/target/nextflow/dimensionality_reduction/process_dataset/nextflow.config
new file mode 100644
index 0000000000..afc22776be
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/process_dataset/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/process_dataset'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/process_dataset/subset_anndata.py b/target/nextflow/dimensionality_reduction/process_dataset/subset_anndata.py
new file mode 100644
index 0000000000..80bd160872
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/process_dataset/subset_anndata.py
@@ -0,0 +1,83 @@
+"""Helper functions related to subsetting AnnData objects based on the file format 
+specifications in the .config.vsh.yaml and slot mapping overrides."""
+
+def read_config_slots_info(config_file, slot_mapping = {}):
+    """Read the .config.vsh.yaml to find out which output slots need to be copied to which output file.
+    
+    Arguments:
+    config_file -- Path to the .config.vsh.yaml file (required).
+    slot_mapping -- Which slots to retain. Must be a dictionary whose keys are the names 
+      of the AnnData structs, and values is another dictionary with destination value
+      names as keys and source value names as values.
+      Example of slot_mapping: 
+        ```
+        slot_mapping = {
+          "layers": {
+            "counts": par["layer_counts"],
+          },
+          "obs": {
+            "cell_type": par["obs_cell_type"],
+            "batch": par["obs_batch"],
+          }
+        }
+        ```
+    """
+    import yaml
+    import re
+
+    # read output spec from yaml
+    with open(config_file, "r") as object_name:
+        config = yaml.safe_load(object_name)
+
+    output_struct_slots = {}
+
+    # fetch info on which slots should be copied to which file
+    for arg in config["functionality"]["arguments"]:
+        # argument is an output file with a slot specification
+        if arg["direction"] == "output" and arg.get("info", {}).get("slots"):
+            object_name = re.sub("--", "", arg["name"])
+            
+            struct_slots = arg['info']['slots']
+            out = {}
+            for (struct, slots) in struct_slots.items():
+                out_struct = {}
+                for slot in slots:
+                    # if slot_mapping[struct][slot['name']] exists, use that as the source slot name
+                    # otherwise use slot['name']
+                    source_slot = slot_mapping.get(struct, {}).get(slot["name"], slot["name"])
+                    out_struct[slot["name"]] = source_slot
+                out[struct] = out_struct
+
+            output_struct_slots[object_name] = out
+
+    return output_struct_slots
+
+# create new anndata objects according to api spec
+def subset_anndata(adata, slot_info):
+    """Create new anndata object according to slot info specifications.
+    
+    Arguments:
+    adata -- An AnnData object to subset (required)
+    slot_info -- Which slots to retain, typically one of the items in the output of read_config_slots_info.
+      Must be a dictionary whose keys are the names of the AnnData structs, and values is another 
+      dictionary with destination value names as keys and source value names as values. 
+      """
+    import pandas as pd
+    import anndata as ad
+
+    structs = ["layers", "obs", "var", "uns", "obsp", "obsm", "varp", "varm"]
+    kwargs = {}
+
+    for struct in structs:
+        slot_mapping = slot_info.get(struct, {})
+        data = {dest : getattr(adata, struct)[src] for (dest, src) in slot_mapping.items()}
+        if len(data) > 0:
+            if struct in ['obs', 'var']:
+                data = pd.concat(data, axis=1)
+            kwargs[struct] = data
+        elif struct in ['obs', 'var']:
+            # if no columns need to be copied, we still need an 'obs' and a 'var' 
+            # to help determine the shape of the adata
+            kwargs[struct] = getattr(adata, struct).iloc[:,[]]
+
+    return ad.AnnData(**kwargs)
\ No newline at end of file
diff --git a/target/nextflow/dimensionality_reduction/workflows/process_datasets/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/workflows/process_datasets/.config.vsh.yaml
new file mode 100644
index 0000000000..c710ed3132
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/workflows/process_datasets/.config.vsh.yaml
@@ -0,0 +1,266 @@
+functionality:
+  name: "process_datasets"
+  namespace: "dimensionality_reduction/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input"
+      info:
+        label: "Dataset"
+        summary: "The dataset to pass to a method."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: true
+          var:
+          - type: "double"
+            name: "hvg_score"
+            description: "High variability gene score (normalized dispersion). The\
+              \ greater, the more variable."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_dataset"
+      info:
+        label: "Dataset"
+        summary: "The dataset to pass to a method."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          var:
+          - type: "double"
+            name: "hvg_score"
+            description: "High variability gene score (normalized dispersion). The\
+              \ greater, the more variable."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_solution"
+      info:
+        label: "Test data"
+        summary: "The data for evaluating a dimensionality reduction."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: true
+          var:
+          - type: "double"
+            name: "hvg_score"
+            description: "High variability gene score (normalized dispersion). The\
+              \ greater, the more variable."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "src/wf_utils/helper.nf"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/check_dataset_schema"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+    configInfo:
+      functionalityName: "check_dataset_schema"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/check_dataset_schema/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+  - name: "dimensionality_reduction/process_dataset"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/process_dataset/config.vsh.yaml"
+    configInfo:
+      functionalityName: "process_dataset"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/process_dataset/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/process_dataset/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/process_dataset"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/workflows/process_datasets/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/workflows/process_datasets"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/workflows/process_datasets/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/workflows/process_datasets/helper.nf b/target/nextflow/dimensionality_reduction/workflows/process_datasets/helper.nf
new file mode 100644
index 0000000000..7b3acd5b1c
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/workflows/process_datasets/helper.nf
@@ -0,0 +1,14 @@
+Map findArgumentSchema(Map config, String argument_id) {
+  def argument_groups =
+    (config.functionality.argument_groups ?: []) +
+    [
+      arguments: config.functionality.arguments ?: []
+    ]
+
+  def schema_value = argument_groups.findResult{ gr ->
+    gr.arguments.find { arg ->
+      arg.name == ("--" + argument_id)
+    }
+  }
+  return schema_value
+}
diff --git a/target/nextflow/dimensionality_reduction/workflows/process_datasets/main.nf b/target/nextflow/dimensionality_reduction/workflows/process_datasets/main.nf
new file mode 100644
index 0000000000..f9c8a0ee28
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/workflows/process_datasets/main.nf
@@ -0,0 +1,3354 @@
+// process_datasets 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_datasets",
+    "namespace" : "dimensionality_reduction/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input",
+            "info" : {
+              "label" : "Dataset",
+              "summary" : "The dataset to pass to a method.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_dataset",
+            "info" : {
+              "label" : "Dataset",
+              "summary" : "The dataset to pass to a method.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_solution",
+            "info" : {
+              "label" : "Test data",
+              "summary" : "The data for evaluating a dimensionality reduction.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/workflows/process_datasets/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "src/wf_utils/helper.nf",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/check_dataset_schema",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "check_dataset_schema",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/check_dataset_schema/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+      },
+      {
+        "name" : "dimensionality_reduction/process_dataset",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/process_dataset/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "process_dataset",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/process_dataset/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/process_dataset/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/process_dataset"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/workflows/process_datasets/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/workflows/process_datasets",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { check_dataset_schema } from "${meta.resources_dir}/../../../../nextflow/common/check_dataset_schema/main.nf"
+include { process_dataset } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/process_dataset/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    | check_dataset_schema.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset": checks["exit_code"] == 0 ? state.input : null,
+        ]
+      }
+    )
+
+    // remove datasets which didn't pass the schema check
+    | filter { id, state ->
+      state.dataset != null
+    }
+
+    | process_dataset.run(
+      fromState: [input: "dataset"],
+      toState: [
+        output_dataset: "output_dataset",
+        output_solution: "output_solution"
+      ]
+    )
+
+    // only output the files for which an output file was specified
+    | setState(["output_dataset", "output_solution"])
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/workflows/process_datasets/nextflow.config b/target/nextflow/dimensionality_reduction/workflows/process_datasets/nextflow.config
new file mode 100644
index 0000000000..2f842dea1b
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/workflows/process_datasets/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/workflows/process_datasets'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/workflows/run_benchmark/.config.vsh.yaml b/target/nextflow/dimensionality_reduction/workflows/run_benchmark/.config.vsh.yaml
new file mode 100644
index 0000000000..b72a34061b
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/workflows/run_benchmark/.config.vsh.yaml
@@ -0,0 +1,593 @@
+functionality:
+  name: "run_benchmark"
+  namespace: "dimensionality_reduction/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input_dataset"
+      info:
+        label: "Dataset"
+        summary: "The dataset to pass to a method."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          var:
+          - type: "double"
+            name: "hvg_score"
+            description: "High variability gene score (normalized dispersion). The\
+              \ greater, the more variable."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_solution"
+      info:
+        label: "Test data"
+        summary: "The data for evaluating a dimensionality reduction."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "cell_type"
+            description: "Classification of the cell type based on its characteristics\
+              \ and function within the tissue or organism."
+            required: true
+          var:
+          - type: "double"
+            name: "hvg_score"
+            description: "High variability gene score (normalized dispersion). The\
+              \ greater, the more variable."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_scores"
+      description: "A yaml file containing the scores of each of the methods"
+      info: null
+      default:
+      - "score_uns.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_method_configs"
+      info: null
+      default:
+      - "method_configs.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_metric_configs"
+      info: null
+      default:
+      - "metric_configs.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_dataset_info"
+      info: null
+      default:
+      - "dataset_uns.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_task_info"
+      info: null
+      default:
+      - "task_info.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Methods"
+    arguments:
+    - type: "string"
+      name: "--method_ids"
+      description: "A list of method ids to run. If not specified, all methods will\
+        \ be run."
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "../../api/task_info.yaml"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/check_dataset_schema"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+    configInfo:
+      functionalityName: "check_dataset_schema"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/check_dataset_schema/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+  - name: "common/extract_metadata"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+    configInfo:
+      functionalityName: "extract_metadata"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/extract_metadata/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  - name: "dimensionality_reduction/control_methods/random_features"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/random_features/config.vsh.yaml"
+    configInfo:
+      functionalityName: "random_features"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/random_features/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction/control_methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/control_methods/random_features/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/control_methods/random_features"
+  - name: "dimensionality_reduction/control_methods/spectral_features"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/spectral_features/config.vsh.yaml"
+    configInfo:
+      functionalityName: "spectral_features"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/spectral_features/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction/control_methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/control_methods/spectral_features/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/control_methods/spectral_features"
+  - name: "dimensionality_reduction/control_methods/true_features"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/true_features/config.vsh.yaml"
+    configInfo:
+      functionalityName: "true_features"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/true_features/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction/control_methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/control_methods/true_features/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/control_methods/true_features"
+  - name: "dimensionality_reduction/methods/densmap"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/densmap/config.vsh.yaml"
+    configInfo:
+      functionalityName: "densmap"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/densmap/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/methods/densmap/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/densmap"
+  - name: "dimensionality_reduction/methods/diffusion_map"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/diffusion_map/config.vsh.yaml"
+    configInfo:
+      functionalityName: "diffusion_map"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/diffusion_map/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/methods/diffusion_map/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/diffusion_map"
+  - name: "dimensionality_reduction/methods/ivis"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/ivis/config.vsh.yaml"
+    configInfo:
+      functionalityName: "ivis"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/ivis/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/methods/ivis/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/ivis"
+  - name: "dimensionality_reduction/methods/lmds"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/lmds/config.vsh.yaml"
+    configInfo:
+      functionalityName: "lmds"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/lmds/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/methods/lmds/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/lmds"
+  - name: "dimensionality_reduction/methods/neuralee"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/neuralee/config.vsh.yaml"
+    configInfo:
+      functionalityName: "neuralee"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/neuralee/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/methods/neuralee/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/neuralee"
+  - name: "dimensionality_reduction/methods/pca"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/pca/config.vsh.yaml"
+    configInfo:
+      functionalityName: "pca"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/pca/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/methods/pca/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/pca"
+  - name: "dimensionality_reduction/methods/phate"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/phate/config.vsh.yaml"
+    configInfo:
+      functionalityName: "phate"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/phate/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/methods/phate/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/phate"
+  - name: "dimensionality_reduction/methods/pymde"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/pymde/config.vsh.yaml"
+    configInfo:
+      functionalityName: "pymde"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/pymde/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/methods/pymde/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/pymde"
+  - name: "dimensionality_reduction/methods/simlr"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/simlr/config.vsh.yaml"
+    configInfo:
+      functionalityName: "simlr"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/simlr/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/methods/simlr/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/simlr"
+  - name: "dimensionality_reduction/methods/tsne"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/tsne/config.vsh.yaml"
+    configInfo:
+      functionalityName: "tsne"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/tsne/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/methods/tsne/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/tsne"
+  - name: "dimensionality_reduction/methods/umap"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/umap/config.vsh.yaml"
+    configInfo:
+      functionalityName: "umap"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/umap/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/methods/umap/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/umap"
+  - name: "dimensionality_reduction/metrics/clustering_performance"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/clustering_performance/config.vsh.yaml"
+    configInfo:
+      functionalityName: "clustering_performance"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/clustering_performance/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/metrics/clustering_performance/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/clustering_performance"
+  - name: "dimensionality_reduction/metrics/coranking"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/coranking/config.vsh.yaml"
+    configInfo:
+      functionalityName: "coranking"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/coranking/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/metrics/coranking/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/coranking"
+  - name: "dimensionality_reduction/metrics/density_preservation"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/density_preservation/config.vsh.yaml"
+    configInfo:
+      functionalityName: "density_preservation"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/density_preservation/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/metrics/density_preservation/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/density_preservation"
+  - name: "dimensionality_reduction/metrics/distance_correlation"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/distance_correlation/config.vsh.yaml"
+    configInfo:
+      functionalityName: "distance_correlation"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/distance_correlation/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/metrics/distance_correlation/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/distance_correlation"
+  - name: "dimensionality_reduction/metrics/trustworthiness"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/trustworthiness/config.vsh.yaml"
+    configInfo:
+      functionalityName: "trustworthiness"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/trustworthiness/config.vsh.yaml"
+      functionalityNamespace: "dimensionality_reduction/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/dimensionality_reduction/metrics/trustworthiness/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/trustworthiness"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/workflows/run_benchmark/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/workflows/run_benchmark"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/workflows/run_benchmark/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/dimensionality_reduction/workflows/run_benchmark/main.nf b/target/nextflow/dimensionality_reduction/workflows/run_benchmark/main.nf
new file mode 100644
index 0000000000..47fe835aa0
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/workflows/run_benchmark/main.nf
@@ -0,0 +1,3914 @@
+// run_benchmark 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "run_benchmark",
+    "namespace" : "dimensionality_reduction/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input_dataset",
+            "info" : {
+              "label" : "Dataset",
+              "summary" : "The dataset to pass to a method.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_solution",
+            "info" : {
+              "label" : "Test data",
+              "summary" : "The data for evaluating a dimensionality reduction.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Classification of the cell type based on its characteristics and function within the tissue or organism.",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "High variability gene score (normalized dispersion). The greater, the more variable.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/dimensionality_reduction/pancreas/solution.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_scores",
+            "description" : "A yaml file containing the scores of each of the methods",
+            "default" : [
+              "score_uns.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_method_configs",
+            "default" : [
+              "method_configs.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_metric_configs",
+            "default" : [
+              "metric_configs.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_dataset_info",
+            "default" : [
+              "dataset_uns.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_task_info",
+            "default" : [
+              "task_info.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Methods",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--method_ids",
+            "description" : "A list of method ids to run. If not specified, all methods will be run.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/workflows/run_benchmark/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "../../api/task_info.yaml",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/workflows/run_benchmark/"
+      }
+    ],
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/check_dataset_schema",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "check_dataset_schema",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/check_dataset_schema/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+      },
+      {
+        "name" : "common/extract_metadata",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "extract_metadata",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/extract_metadata/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+      },
+      {
+        "name" : "dimensionality_reduction/control_methods/random_features",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/random_features/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "random_features",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/random_features/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction/control_methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/control_methods/random_features/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/control_methods/random_features"
+      },
+      {
+        "name" : "dimensionality_reduction/control_methods/spectral_features",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/spectral_features/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "spectral_features",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/spectral_features/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction/control_methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/control_methods/spectral_features/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/control_methods/spectral_features"
+      },
+      {
+        "name" : "dimensionality_reduction/control_methods/true_features",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/true_features/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "true_features",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/control_methods/true_features/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction/control_methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/control_methods/true_features/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/control_methods/true_features"
+      },
+      {
+        "name" : "dimensionality_reduction/methods/densmap",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/densmap/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "densmap",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/densmap/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/methods/densmap/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/densmap"
+      },
+      {
+        "name" : "dimensionality_reduction/methods/diffusion_map",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/diffusion_map/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "diffusion_map",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/diffusion_map/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/methods/diffusion_map/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/diffusion_map"
+      },
+      {
+        "name" : "dimensionality_reduction/methods/ivis",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/ivis/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "ivis",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/ivis/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/methods/ivis/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/ivis"
+      },
+      {
+        "name" : "dimensionality_reduction/methods/lmds",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/lmds/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "lmds",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/lmds/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/methods/lmds/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/lmds"
+      },
+      {
+        "name" : "dimensionality_reduction/methods/neuralee",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/neuralee/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "neuralee",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/neuralee/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/methods/neuralee/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/neuralee"
+      },
+      {
+        "name" : "dimensionality_reduction/methods/pca",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/pca/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "pca",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/pca/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/methods/pca/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/pca"
+      },
+      {
+        "name" : "dimensionality_reduction/methods/phate",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/phate/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "phate",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/phate/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/methods/phate/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/phate"
+      },
+      {
+        "name" : "dimensionality_reduction/methods/pymde",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/pymde/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "pymde",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/pymde/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/methods/pymde/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/pymde"
+      },
+      {
+        "name" : "dimensionality_reduction/methods/simlr",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/simlr/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "simlr",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/simlr/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/methods/simlr/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/simlr"
+      },
+      {
+        "name" : "dimensionality_reduction/methods/tsne",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/tsne/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "tsne",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/tsne/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/methods/tsne/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/tsne"
+      },
+      {
+        "name" : "dimensionality_reduction/methods/umap",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/umap/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "umap",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/methods/umap/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/methods/umap/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/methods/umap"
+      },
+      {
+        "name" : "dimensionality_reduction/metrics/clustering_performance",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/clustering_performance/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "clustering_performance",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/clustering_performance/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/metrics/clustering_performance/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/clustering_performance"
+      },
+      {
+        "name" : "dimensionality_reduction/metrics/coranking",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/coranking/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "coranking",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/coranking/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/metrics/coranking/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/coranking"
+      },
+      {
+        "name" : "dimensionality_reduction/metrics/density_preservation",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/density_preservation/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "density_preservation",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/density_preservation/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/metrics/density_preservation/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/density_preservation"
+      },
+      {
+        "name" : "dimensionality_reduction/metrics/distance_correlation",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/distance_correlation/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "distance_correlation",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/distance_correlation/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/metrics/distance_correlation/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/distance_correlation"
+      },
+      {
+        "name" : "dimensionality_reduction/metrics/trustworthiness",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/trustworthiness/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "trustworthiness",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/metrics/trustworthiness/config.vsh.yaml",
+          "functionalityNamespace" : "dimensionality_reduction/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/dimensionality_reduction/metrics/trustworthiness/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/metrics/trustworthiness"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/dimensionality_reduction/workflows/run_benchmark/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/dimensionality_reduction/workflows/run_benchmark",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { check_dataset_schema } from "${meta.resources_dir}/../../../../nextflow/common/check_dataset_schema/main.nf"
+include { extract_metadata } from "${meta.resources_dir}/../../../../nextflow/common/extract_metadata/main.nf"
+include { random_features } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/control_methods/random_features/main.nf"
+include { spectral_features } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/control_methods/spectral_features/main.nf"
+include { true_features } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/control_methods/true_features/main.nf"
+include { densmap } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/methods/densmap/main.nf"
+include { diffusion_map } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/methods/diffusion_map/main.nf"
+include { ivis } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/methods/ivis/main.nf"
+include { lmds } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/methods/lmds/main.nf"
+include { neuralee } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/methods/neuralee/main.nf"
+include { pca } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/methods/pca/main.nf"
+include { phate } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/methods/phate/main.nf"
+include { pymde } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/methods/pymde/main.nf"
+include { simlr } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/methods/simlr/main.nf"
+include { tsne } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/methods/tsne/main.nf"
+include { umap } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/methods/umap/main.nf"
+include { clustering_performance } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/metrics/clustering_performance/main.nf"
+include { coranking } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/metrics/coranking/main.nf"
+include { density_preservation } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/metrics/density_preservation/main.nf"
+include { distance_correlation } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/metrics/distance_correlation/main.nf"
+include { trustworthiness } from "${meta.resources_dir}/../../../../nextflow/dimensionality_reduction/metrics/trustworthiness/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // construct list of methods
+  methods = [
+    // controls
+    random_features,
+    spectral_features,
+    true_features,
+    // methods
+    densmap,
+    diffusion_map,
+    ivis,
+    lmds,
+    neuralee,
+    pca,
+    phate,
+    pymde,
+    simlr,
+    tsne,
+    umap
+  ]
+
+  // construct list of metrics
+  metrics = [
+    clustering_performance,
+    coranking,
+    density_preservation,
+    distance_correlation,
+    trustworthiness
+  ]
+
+
+  /****************************
+   * EXTRACT DATASET METADATA *
+   ****************************/
+  dataset_ch = input_ch
+    // store join id
+    | map{ id, state -> 
+      [id, state + ["_meta": [join_id: id]]]
+    }
+
+    // extract the dataset metadata
+    | extract_metadata.run(
+      fromState: [input: "input_solution"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+  /***************************
+   * RUN METHODS AND METRICS *
+   ***************************/
+  score_ch = dataset_ch
+
+    // run all methods
+    | runEach(
+      components: methods,
+      
+      // use the 'filter' argument to only run a method on the normalisation the component is asking for
+      filter: { id, state, comp ->
+        def norm = state.dataset_uns.normalization_id
+        def pref = comp.config.functionality.info.preferred_normalization
+        // if the preferred normalisation is none at all,
+        // we can pass whichever dataset we want
+        def norm_check = (norm == "log_cp10k" && pref == "counts") || norm == pref
+        def method_check = !state.method_ids || state.method_ids.contains(comp.config.functionality.name)
+
+        method_check && norm_check
+      },
+
+      // define a new 'id' by appending the method name to the dataset id
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: { id, state, comp ->
+        def new_args = [
+          input: state.input_dataset
+        ]
+        if (comp.config.functionality.info.type == "control_method") {
+          new_args.input_solution = state.input_solution
+        }
+        new_args
+      },
+
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          method_id: comp.config.functionality.name,
+          method_output: output.output
+        ]
+      }
+    )
+
+    // run all metrics
+    | runEach(
+      components: metrics,
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: { id, state, comp ->
+        [
+          input_solution: state.input_solution,
+          input_embedding: state.method_output
+        ]
+      },
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          metric_id: comp.config.functionality.name,
+          metric_output: output.output
+        ]
+      }
+    )
+
+  /******************************
+   * GENERATE OUTPUT YAML FILES *
+   ******************************/
+  // TODO: can we store everything below in a separate helper function?
+
+  // extract the dataset metadata
+  dataset_meta_ch = dataset_ch
+    // only keep one of the normalization methods
+    | filter{ id, state ->
+      state.dataset_uns.normalization_id == "log_cp10k"
+    }
+    | joinStates { ids, states ->
+      // store the dataset metadata in a file
+      def dataset_uns = states.collect{state ->
+        def uns = state.dataset_uns.clone()
+        uns.remove("normalization_id")
+        uns
+      }
+      def dataset_uns_yaml_blob = toYamlBlob(dataset_uns)
+      def dataset_uns_file = tempFile("dataset_uns.yaml")
+      dataset_uns_file.write(dataset_uns_yaml_blob)
+
+      ["output", [output_dataset_info: dataset_uns_file]]
+    }
+
+  output_ch = score_ch
+
+    // extract the scores
+    | extract_metadata.run(
+      key: "extract_scores",
+      fromState: [input: "metric_output"],
+      toState: { id, output, state ->
+        state + [
+          score_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | joinStates { ids, states ->
+      // store the method configs in a file
+      def method_configs = methods.collect{it.config}
+      def method_configs_yaml_blob = toYamlBlob(method_configs)
+      def method_configs_file = tempFile("method_configs.yaml")
+      method_configs_file.write(method_configs_yaml_blob)
+
+      // store the metric configs in a file
+      def metric_configs = metrics.collect{it.config}
+      def metric_configs_yaml_blob = toYamlBlob(metric_configs)
+      def metric_configs_file = tempFile("metric_configs.yaml")
+      metric_configs_file.write(metric_configs_yaml_blob)
+
+      def task_info_file = meta.resources_dir.resolve("task_info.yaml")
+
+      // store the scores in a file
+      def score_uns = states.collect{it.score_uns}
+      def score_uns_yaml_blob = toYamlBlob(score_uns)
+      def score_uns_file = tempFile("score_uns.yaml")
+      score_uns_file.write(score_uns_yaml_blob)
+
+      def new_state = [
+        output_method_configs: method_configs_file,
+        output_metric_configs: metric_configs_file,
+        output_task_info: task_info_file,
+        output_scores: score_uns_file,
+        _meta: states[0]._meta
+      ]
+
+      ["output", new_state]
+    }
+
+    // merge all of the output data 
+    | mix(dataset_meta_ch)
+    | joinStates{ ids, states ->
+      def mergedStates = states.inject([:]) { acc, m -> acc + m }
+      [ids[0], mergedStates]
+    }
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/dimensionality_reduction/workflows/run_benchmark/nextflow.config b/target/nextflow/dimensionality_reduction/workflows/run_benchmark/nextflow.config
new file mode 100644
index 0000000000..415fdccfa3
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/workflows/run_benchmark/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'dimensionality_reduction/workflows/run_benchmark'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/dimensionality_reduction/workflows/run_benchmark/task_info.yaml b/target/nextflow/dimensionality_reduction/workflows/run_benchmark/task_info.yaml
new file mode 100644
index 0000000000..4f24ae9764
--- /dev/null
+++ b/target/nextflow/dimensionality_reduction/workflows/run_benchmark/task_info.yaml
@@ -0,0 +1,73 @@
+name: dimensionality_reduction
+label: "Dimensionality reduction for 2D visualization"
+v1:
+  path: openproblems/tasks/dimensionality_reduction/README.md
+  commit: b353a462f6ea353e0fc43d0f9fcbbe621edc3a0b
+summary: Reduction of high-dimensional datasets to 2D for visualization & interpretation
+image: "thumbnail.svg"
+motivation: |
+  Data visualisation is an important part of all stages of single-cell analysis, from
+  initial quality control to interpretation and presentation of final results. For bulk RNA-seq
+  studies, linear dimensionality reduction techniques such as PCA and MDS are commonly used
+  to visualise the variation between samples. While these methods are highly effective they
+  can only be used to show the first few components of variation which cannot fully represent
+  the increased complexity and number of observations in single-cell datasets. For this reason
+  non-linear techniques (most notably t-SNE and UMAP) have become the standard for visualising
+  single-cell studies. These methods attempt to compress a dataset into a two-dimensional space
+  while attempting to capture as much of the variance between observations as possible. Many
+  methods for solving this problem now exist. In general these methods try to preserve distances,
+  while some additionally consider aspects such as density within the embedded space or conservation
+  of continuous trajectories. Despite almost every single-cell study using one of these visualisations
+  there has been debate as to whether they can effectively capture the variation in single-cell
+  datasets [@chari2023speciousart].
+description: |
+  The dimensionality reduction task attempts to quantify the ability of methods to embed the
+  information present in complex single-cell studies into a two-dimensional space. Thus, this task
+  is specifically designed for dimensionality reduction for visualisation and does not consider other
+  uses of dimensionality reduction in standard single-cell workflows such as improving the
+  signal-to-noise ratio (and in fact several of the methods use PCA as a pre-processing step for this
+  reason). Unlike most tasks, methods for the dimensionality reduction task must accept a matrix
+  containing expression values normalised to 10,000 counts per cell and log transformed (log-10k) and
+  produce a two-dimensional coordinate for each cell. Pre-normalised matrices are required to
+  enforce consistency between the metric evaluation (which generally requires normalised data) and
+  the method runs. When these are not consistent, methods that use the same normalisation as used in
+  the metric tend to score more highly. For some methods we also evaluate the pre-processing
+  recommended by the method.
+authors:
+  - name: Luke Zappia
+    roles: [ maintainer, author ]
+    info:
+      github: lazappi
+  - name: Michal Klein
+    roles: [ author ]
+    info:
+      github: michalk8
+  - name: Scott Gigante
+    roles: [ author ]
+    info:
+      github: scottgigante
+      orcid: "0000-0002-4544-2764"
+  - name: Ben DeMeo
+    roles: [ author ]
+    info:
+      github: bendemeo
+  - name: Robrecht Cannoodt
+    roles: [ author ]
+    info:
+      github: rcannood
+      orcid: 0000-0003-3641-729X
+  - name: Kai Waldrant
+    roles: [ contributor ]
+    info:
+      github: KaiWaldrant
+      orcid: 0009-0003-8555-1361
+  - name: Sai Nirmayi Yasa
+    roles: [ contributor ]
+    info:
+      github: sainirmayi
+      orcid: 0009-0003-6319-9803
+  - name: Juan A. Cordero Varela
+    roles: [ contributor ]
+    info:
+      github: jacorvar
+      orcid: 0000-0002-7373-5433
diff --git a/target/nextflow/label_projection/control_methods/majority_vote/.config.vsh.yaml b/target/nextflow/label_projection/control_methods/majority_vote/.config.vsh.yaml
new file mode 100644
index 0000000000..409f297d9c
--- /dev/null
+++ b/target/nextflow/label_projection/control_methods/majority_vote/.config.vsh.yaml
@@ -0,0 +1,317 @@
+functionality:
+  name: "majority_vote"
+  namespace: "label_projection/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "The solution for the test data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    label: "Majority Vote"
+    summary: "A control-type method that predicts all cells to belong to the most\
+      \ abundant cell type in the dataset"
+    description: "A control-type method that predicts all cells to belong to the most\
+      \ abundant cell type in the dataset"
+    v1:
+      path: "openproblems/tasks/label_projection/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    variants:
+      majority_vote: null
+    preferred_normalization: "counts"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "This folder contains control components for the task. \nThese\
+        \ components have the same interface as the regular methods\nbut also receive\
+        \ the solution object as input. It serves as a\nstarting point to test the\
+        \ relative accuracy of new methods in\nthe task, and also as a quality control\
+        \ for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/majority_vote/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/control_methods/majority_vote"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/control_methods/majority_vote/majority_vote"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/label_projection/control_methods/majority_vote/main.nf b/target/nextflow/label_projection/control_methods/majority_vote/main.nf
new file mode 100644
index 0000000000..71bb63787f
--- /dev/null
+++ b/target/nextflow/label_projection/control_methods/majority_vote/main.nf
@@ -0,0 +1,3754 @@
+// majority_vote 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "majority_vote",
+    "namespace" : "label_projection/control_methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train",
+        "info" : {
+          "label" : "Training data",
+          "summary" : "The training data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "Ground truth cell type labels",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/train.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The test data (without labels)",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/test.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "The solution for the test data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "Ground truth cell type labels",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "The prediction file",
+          "slots" : {
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label_pred",
+                "description" : "Predicted labels for the test cells.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/majority_vote/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/label_projection/pancreas",
+        "dest" : "resources_test/label_projection/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Majority Vote",
+      "summary" : "A control-type method that predicts all cells to belong to the most abundant cell type in the dataset",
+      "description" : "A control-type method that predicts all cells to belong to the most abundant cell type in the dataset",
+      "v1" : {
+        "path" : "openproblems/tasks/label_projection/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "counts",
+      "type" : "control_method",
+      "type_info" : {
+        "label" : "Control method",
+        "summary" : "Quality control methods for verifying the pipeline.",
+        "description" : "This folder contains control components for the task. \nThese components have the same interface as the regular methods\nbut also receive the solution object as input. It serves as a\nstarting point to test the relative accuracy of new methods in\nthe task, and also as a quality control for the metrics defined\nin the task. \n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/majority_vote/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/control_methods/majority_vote",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Compute majority vote", flush=True)
+majority = input_train.obs.label.value_counts().index[0]
+
+print("Create prediction object", flush=True)
+input_test.obs["label_pred"] = majority
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/label_projection/control_methods/majority_vote",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/label_projection/control_methods/majority_vote/nextflow.config b/target/nextflow/label_projection/control_methods/majority_vote/nextflow.config
new file mode 100644
index 0000000000..dd98c78f30
--- /dev/null
+++ b/target/nextflow/label_projection/control_methods/majority_vote/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'label_projection/control_methods/majority_vote'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/label_projection/control_methods/random_labels/.config.vsh.yaml b/target/nextflow/label_projection/control_methods/random_labels/.config.vsh.yaml
new file mode 100644
index 0000000000..ef74c4b167
--- /dev/null
+++ b/target/nextflow/label_projection/control_methods/random_labels/.config.vsh.yaml
@@ -0,0 +1,322 @@
+functionality:
+  name: "random_labels"
+  namespace: "label_projection/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "The solution for the test data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    label: "Random Labels"
+    summary: "a negative control, where the labels are randomly predicted."
+    description: "A negative control, where the labels are randomly predicted without\
+      \ training the data."
+    v1:
+      path: "openproblems/tasks/label_projection/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "counts"
+    variants:
+      random_labels: null
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "This folder contains control components for the task. \nThese\
+        \ components have the same interface as the regular methods\nbut also receive\
+        \ the solution object as input. It serves as a\nstarting point to test the\
+        \ relative accuracy of new methods in\nthe task, and also as a quality control\
+        \ for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scanpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/random_labels/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/control_methods/random_labels"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/control_methods/random_labels/random_labels"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/label_projection/control_methods/random_labels/main.nf b/target/nextflow/label_projection/control_methods/random_labels/main.nf
new file mode 100644
index 0000000000..86ca9accd9
--- /dev/null
+++ b/target/nextflow/label_projection/control_methods/random_labels/main.nf
@@ -0,0 +1,3771 @@
+// random_labels 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "random_labels",
+    "namespace" : "label_projection/control_methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train",
+        "info" : {
+          "label" : "Training data",
+          "summary" : "The training data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "Ground truth cell type labels",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/train.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The test data (without labels)",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/test.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "The solution for the test data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "Ground truth cell type labels",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "The prediction file",
+          "slots" : {
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label_pred",
+                "description" : "Predicted labels for the test cells.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/random_labels/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/label_projection/pancreas",
+        "dest" : "resources_test/label_projection/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Random Labels",
+      "summary" : "a negative control, where the labels are randomly predicted.",
+      "description" : "A negative control, where the labels are randomly predicted without training the data.",
+      "v1" : {
+        "path" : "openproblems/tasks/label_projection/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "counts",
+      "type" : "control_method",
+      "type_info" : {
+        "label" : "Control method",
+        "summary" : "Quality control methods for verifying the pipeline.",
+        "description" : "This folder contains control components for the task. \nThese components have the same interface as the regular methods\nbut also receive the solution object as input. It serves as a\nstarting point to test the relative accuracy of new methods in\nthe task, and also as a quality control for the metrics defined\nin the task. \n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scanpy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/random_labels/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/control_methods/random_labels",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Compute label distribution", flush=True)
+label_distribution = input_train.obs.label.value_counts()
+label_distribution = label_distribution / label_distribution.sum()
+
+print("Create prediction object", flush=True)
+input_test.obs["label_pred"] = np.random.choice(
+    label_distribution.index,
+    size=input_test.n_obs,
+    replace=True,
+    p=label_distribution
+)
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/label_projection/control_methods/random_labels",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/label_projection/control_methods/random_labels/nextflow.config b/target/nextflow/label_projection/control_methods/random_labels/nextflow.config
new file mode 100644
index 0000000000..0cd8adf616
--- /dev/null
+++ b/target/nextflow/label_projection/control_methods/random_labels/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'label_projection/control_methods/random_labels'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/label_projection/control_methods/true_labels/.config.vsh.yaml b/target/nextflow/label_projection/control_methods/true_labels/.config.vsh.yaml
new file mode 100644
index 0000000000..02b7ccfc97
--- /dev/null
+++ b/target/nextflow/label_projection/control_methods/true_labels/.config.vsh.yaml
@@ -0,0 +1,317 @@
+functionality:
+  name: "true_labels"
+  namespace: "label_projection/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "The solution for the test data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    label: "True labels"
+    summary: "a positive control, solution labels are copied 1 to 1 to the predicted\
+      \ data."
+    description: "A positive control, where the solution labels are copied 1 to 1\
+      \ to the predicted data."
+    v1:
+      path: "openproblems/tasks/label_projection/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "counts"
+    variants:
+      true_labels: null
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "This folder contains control components for the task. \nThese\
+        \ components have the same interface as the regular methods\nbut also receive\
+        \ the solution object as input. It serves as a\nstarting point to test the\
+        \ relative accuracy of new methods in\nthe task, and also as a quality control\
+        \ for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/true_labels/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/control_methods/true_labels"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/control_methods/true_labels/true_labels"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/label_projection/control_methods/true_labels/main.nf b/target/nextflow/label_projection/control_methods/true_labels/main.nf
new file mode 100644
index 0000000000..b9461aa808
--- /dev/null
+++ b/target/nextflow/label_projection/control_methods/true_labels/main.nf
@@ -0,0 +1,3752 @@
+// true_labels 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "true_labels",
+    "namespace" : "label_projection/control_methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train",
+        "info" : {
+          "label" : "Training data",
+          "summary" : "The training data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "Ground truth cell type labels",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/train.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The test data (without labels)",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/test.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "The solution for the test data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "Ground truth cell type labels",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "The prediction file",
+          "slots" : {
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label_pred",
+                "description" : "Predicted labels for the test cells.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/true_labels/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/label_projection/pancreas",
+        "dest" : "resources_test/label_projection/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "True labels",
+      "summary" : "a positive control, solution labels are copied 1 to 1 to the predicted data.",
+      "description" : "A positive control, where the solution labels are copied 1 to 1 to the predicted data.",
+      "v1" : {
+        "path" : "openproblems/tasks/label_projection/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "counts",
+      "type" : "control_method",
+      "type_info" : {
+        "label" : "Control method",
+        "summary" : "Quality control methods for verifying the pipeline.",
+        "description" : "This folder contains control components for the task. \nThese components have the same interface as the regular methods\nbut also receive the solution object as input. It serves as a\nstarting point to test the relative accuracy of new methods in\nthe task, and also as a quality control for the metrics defined\nin the task. \n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/true_labels/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/control_methods/true_labels",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+# input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print("Create prediction object", flush=True)
+input_test.obs["label_pred"] = input_solution.obs["label"]
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/label_projection/control_methods/true_labels",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/label_projection/control_methods/true_labels/nextflow.config b/target/nextflow/label_projection/control_methods/true_labels/nextflow.config
new file mode 100644
index 0000000000..676a72d25f
--- /dev/null
+++ b/target/nextflow/label_projection/control_methods/true_labels/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'label_projection/control_methods/true_labels'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/label_projection/methods/knn/.config.vsh.yaml b/target/nextflow/label_projection/methods/knn/.config.vsh.yaml
new file mode 100644
index 0000000000..1ff6ed35d1
--- /dev/null
+++ b/target/nextflow/label_projection/methods/knn/.config.vsh.yaml
@@ -0,0 +1,253 @@
+functionality:
+  name: "knn"
+  namespace: "label_projection/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "KNN"
+    summary: "Assumes cells with similar gene expression belong to the same cell type,\
+      \ and assigns an unlabelled cell the most common cell type among its k nearest\
+      \ neighbors in PCA space."
+    description: "Using the \"k-nearest neighbours\" approach, which is a\npopular\
+      \ machine learning algorithm for classification and regression tasks.\nThe assumption\
+      \ underlying KNN in this context is that cells with similar gene\nexpression\
+      \ profiles tend to belong to the same cell type. For each unlabelled\ncell,\
+      \ this method computes the $k$ labelled cells (in this case, 5) with the\nsmallest\
+      \ distance in PCA space, and assigns that cell the most common cell\ntype among\
+      \ its $k$ nearest neighbors.\n"
+    reference: "cover1967nearest"
+    repository_url: "https://github.com/scikit-learn/scikit-learn"
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html"
+    v1:
+      path: "openproblems/tasks/label_projection/methods/knn_classifier.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      knn_classifier_log_cp10k: null
+      knn_classifier_scran:
+        preferred_normalization: "log_scran_pooling"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A label projection method."
+      description: "A label projection method to predict the labels of a new \"test\"\
+        \ndataset based on an annotated \"training\" dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    - "jsonschema"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/knn/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/knn"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/knn/knn"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/label_projection/methods/knn/main.nf b/target/nextflow/label_projection/methods/knn/main.nf
new file mode 100644
index 0000000000..18e2489367
--- /dev/null
+++ b/target/nextflow/label_projection/methods/knn/main.nf
@@ -0,0 +1,3659 @@
+// knn 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "knn",
+    "namespace" : "label_projection/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train",
+        "info" : {
+          "label" : "Training data",
+          "summary" : "The training data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "Ground truth cell type labels",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/train.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The test data (without labels)",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/test.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "The prediction file",
+          "slots" : {
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label_pred",
+                "description" : "Predicted labels for the test cells.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/knn/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/label_projection/pancreas",
+        "dest" : "resources_test/label_projection/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "KNN",
+      "summary" : "Assumes cells with similar gene expression belong to the same cell type, and assigns an unlabelled cell the most common cell type among its k nearest neighbors in PCA space.",
+      "description" : "Using the \\"k-nearest neighbours\\" approach, which is a\npopular machine learning algorithm for classification and regression tasks.\nThe assumption underlying KNN in this context is that cells with similar gene\nexpression profiles tend to belong to the same cell type. For each unlabelled\ncell, this method computes the $k$ labelled cells (in this case, 5) with the\nsmallest distance in PCA space, and assigns that cell the most common cell\ntype among its $k$ nearest neighbors.\n",
+      "reference" : "cover1967nearest",
+      "repository_url" : "https://github.com/scikit-learn/scikit-learn",
+      "documentation_url" : "https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html",
+      "v1" : {
+        "path" : "openproblems/tasks/label_projection/methods/knn_classifier.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "variants" : {
+        "knn_classifier_scran" : {
+          "preferred_normalization" : "log_scran_pooling"
+        }
+      },
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A label projection method.",
+        "description" : "A label projection method to predict the labels of a new \\"test\\"\ndataset based on an annotated \\"training\\" dataset.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scikit-learn",
+            "jsonschema"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/knn/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/knn",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import sklearn.neighbors
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Fit to train data", flush=True)
+classifier = sklearn.neighbors.KNeighborsClassifier()
+classifier.fit(input_train.obsm["X_pca"], input_train.obs["label"].astype(str))
+
+print("Predict on test data", flush=True)
+input_test.obs["label_pred"] = classifier.predict(input_test.obsm["X_pca"])
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/label_projection/methods/knn",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/label_projection/methods/knn/nextflow.config b/target/nextflow/label_projection/methods/knn/nextflow.config
new file mode 100644
index 0000000000..e8d60842d4
--- /dev/null
+++ b/target/nextflow/label_projection/methods/knn/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'label_projection/methods/knn'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/label_projection/methods/logistic_regression/.config.vsh.yaml b/target/nextflow/label_projection/methods/logistic_regression/.config.vsh.yaml
new file mode 100644
index 0000000000..f44579a320
--- /dev/null
+++ b/target/nextflow/label_projection/methods/logistic_regression/.config.vsh.yaml
@@ -0,0 +1,249 @@
+functionality:
+  name: "logistic_regression"
+  namespace: "label_projection/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Logistic Regression"
+    summary: "Logistic Regression with 100-dimensional PCA coordinates estimates parameters\
+      \ for multivariate classification by minimizing cross entropy loss over cell\
+      \ type classes."
+    description: "Logistic Regression estimates parameters of a logistic function\
+      \ for\nmultivariate classification tasks. Here, we use 100-dimensional whitened\
+      \ PCA\ncoordinates as independent variables, and the model minimises the cross\n\
+      entropy loss over all cell type classes.\n"
+    reference: "hosmer2013applied"
+    repository_url: "https://github.com/scikit-learn/scikit-learn"
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html"
+    v1:
+      path: "openproblems/tasks/label_projection/methods/logistic_regression.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      logistic_regression_log_cp10k: null
+      logistic_regression_scran:
+        preferred_normalization: "log_scran_pooling"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A label projection method."
+      description: "A label projection method to predict the labels of a new \"test\"\
+        \ndataset based on an annotated \"training\" dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/logistic_regression/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/logistic_regression"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/logistic_regression/logistic_regression"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/label_projection/methods/logistic_regression/main.nf b/target/nextflow/label_projection/methods/logistic_regression/main.nf
new file mode 100644
index 0000000000..e7a646f4ff
--- /dev/null
+++ b/target/nextflow/label_projection/methods/logistic_regression/main.nf
@@ -0,0 +1,3658 @@
+// logistic_regression 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "logistic_regression",
+    "namespace" : "label_projection/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train",
+        "info" : {
+          "label" : "Training data",
+          "summary" : "The training data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "Ground truth cell type labels",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/train.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The test data (without labels)",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/test.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "The prediction file",
+          "slots" : {
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label_pred",
+                "description" : "Predicted labels for the test cells.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/logistic_regression/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/label_projection/pancreas",
+        "dest" : "resources_test/label_projection/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Logistic Regression",
+      "summary" : "Logistic Regression with 100-dimensional PCA coordinates estimates parameters for multivariate classification by minimizing cross entropy loss over cell type classes.",
+      "description" : "Logistic Regression estimates parameters of a logistic function for\nmultivariate classification tasks. Here, we use 100-dimensional whitened PCA\ncoordinates as independent variables, and the model minimises the cross\nentropy loss over all cell type classes.\n",
+      "reference" : "hosmer2013applied",
+      "repository_url" : "https://github.com/scikit-learn/scikit-learn",
+      "documentation_url" : "https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html",
+      "v1" : {
+        "path" : "openproblems/tasks/label_projection/methods/logistic_regression.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "variants" : {
+        "logistic_regression_scran" : {
+          "preferred_normalization" : "log_scran_pooling"
+        }
+      },
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A label projection method.",
+        "description" : "A label projection method to predict the labels of a new \\"test\\"\ndataset based on an annotated \\"training\\" dataset.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scikit-learn"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/logistic_regression/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/logistic_regression",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import sklearn.linear_model
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Fit to train data", flush=True)
+classifier = sklearn.linear_model.LogisticRegression()
+classifier.fit(input_train.obsm["X_pca"], input_train.obs["label"].astype(str))
+
+print("Predict on test data", flush=True)
+input_test.obs["label_pred"] = classifier.predict(input_test.obsm["X_pca"])
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/label_projection/methods/logistic_regression",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/label_projection/methods/logistic_regression/nextflow.config b/target/nextflow/label_projection/methods/logistic_regression/nextflow.config
new file mode 100644
index 0000000000..213bdb8e91
--- /dev/null
+++ b/target/nextflow/label_projection/methods/logistic_regression/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'label_projection/methods/logistic_regression'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/label_projection/methods/mlp/.config.vsh.yaml b/target/nextflow/label_projection/methods/mlp/.config.vsh.yaml
new file mode 100644
index 0000000000..1bd3d355f8
--- /dev/null
+++ b/target/nextflow/label_projection/methods/mlp/.config.vsh.yaml
@@ -0,0 +1,275 @@
+functionality:
+  name: "mlp"
+  namespace: "label_projection/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--hidden_layer_sizes"
+    description: "The ith element represents the number of neurons in the ith hidden\
+      \ layer."
+    info: null
+    default:
+    - 100
+    - 100
+    required: false
+    direction: "input"
+    multiple: true
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_iter"
+    description: "Maximum number of iterations"
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Multilayer perceptron"
+    summary: "A neural network with 100-dimensional PCA input, two hidden layers,\
+      \ and gradient descent weight updates to minimize cross entropy loss."
+    description: "Multi-Layer Perceptron is a type of artificial neural network that\n\
+      consists of multiple layers of interconnected neurons. Each neuron computes\
+      \ a\nweighted sum of all neurons in the previous layer and transforms it with\n\
+      nonlinear activation function. The output layer provides the final\nprediction,\
+      \ and network weights are updated by gradient descent to minimize\nthe cross\
+      \ entropy loss. Here, the input data is 100-dimensional whitened PCA\ncoordinates\
+      \ for each cell, and we use two hidden layers of 100 neurons each.\n"
+    reference: "hinton1989connectionist"
+    repository_url: "https://github.com/scikit-learn/scikit-learn"
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html"
+    v1:
+      path: "openproblems/tasks/label_projection/methods/mlp.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    preferred_normalization: "log_cp10k"
+    variants:
+      mlp_log_cp10k: null
+      mlp_scran:
+        preferred_normalization: "log_scran_pooling"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A label projection method."
+      description: "A label projection method to predict the labels of a new \"test\"\
+        \ndataset based on an annotated \"training\" dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/mlp/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/mlp"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/mlp/mlp"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/label_projection/methods/mlp/main.nf b/target/nextflow/label_projection/methods/mlp/main.nf
new file mode 100644
index 0000000000..7d52d800b5
--- /dev/null
+++ b/target/nextflow/label_projection/methods/mlp/main.nf
@@ -0,0 +1,3690 @@
+// mlp 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "mlp",
+    "namespace" : "label_projection/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train",
+        "info" : {
+          "label" : "Training data",
+          "summary" : "The training data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "Ground truth cell type labels",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/train.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The test data (without labels)",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/test.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "The prediction file",
+          "slots" : {
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label_pred",
+                "description" : "Predicted labels for the test cells.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--hidden_layer_sizes",
+        "description" : "The ith element represents the number of neurons in the ith hidden layer.",
+        "default" : [
+          100,
+          100
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : true,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--max_iter",
+        "description" : "Maximum number of iterations",
+        "default" : [
+          1000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/mlp/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/label_projection/pancreas",
+        "dest" : "resources_test/label_projection/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Multilayer perceptron",
+      "summary" : "A neural network with 100-dimensional PCA input, two hidden layers, and gradient descent weight updates to minimize cross entropy loss.",
+      "description" : "Multi-Layer Perceptron is a type of artificial neural network that\nconsists of multiple layers of interconnected neurons. Each neuron computes a\nweighted sum of all neurons in the previous layer and transforms it with\nnonlinear activation function. The output layer provides the final\nprediction, and network weights are updated by gradient descent to minimize\nthe cross entropy loss. Here, the input data is 100-dimensional whitened PCA\ncoordinates for each cell, and we use two hidden layers of 100 neurons each.\n",
+      "reference" : "hinton1989connectionist",
+      "repository_url" : "https://github.com/scikit-learn/scikit-learn",
+      "documentation_url" : "https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html",
+      "v1" : {
+        "path" : "openproblems/tasks/label_projection/methods/mlp.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "variants" : {
+        "mlp_scran" : {
+          "preferred_normalization" : "log_scran_pooling"
+        }
+      },
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A label projection method.",
+        "description" : "A label projection method to predict the labels of a new \\"test\\"\ndataset based on an annotated \\"training\\" dataset.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scikit-learn"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/mlp/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/mlp",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+from sklearn.neural_network import MLPClassifier
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'hidden_layer_sizes': $( if [ ! -z ${VIASH_PAR_HIDDEN_LAYER_SIZES+x} ]; then echo "list(map(int, r'${VIASH_PAR_HIDDEN_LAYER_SIZES//\\'/\\'\\"\\'\\"r\\'}'.split(':')))"; else echo None; fi ),
+  'max_iter': $( if [ ! -z ${VIASH_PAR_MAX_ITER+x} ]; then echo "int(r'${VIASH_PAR_MAX_ITER//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Fit to train data", flush=True)
+classifier = MLPClassifier(
+    max_iter=par["max_iter"], 
+    hidden_layer_sizes=tuple(par["hidden_layer_sizes"])
+)
+classifier.fit(input_train.obsm["X_pca"], input_train.obs["label"].astype(str))
+
+print("Predict on test data", flush=True)
+input_test.obs["label_pred"] = classifier.predict(input_test.obsm["X_pca"])
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/label_projection/methods/mlp",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/label_projection/methods/mlp/nextflow.config b/target/nextflow/label_projection/methods/mlp/nextflow.config
new file mode 100644
index 0000000000..1dae282308
--- /dev/null
+++ b/target/nextflow/label_projection/methods/mlp/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'label_projection/methods/mlp'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/label_projection/methods/naive_bayes/.config.vsh.yaml b/target/nextflow/label_projection/methods/naive_bayes/.config.vsh.yaml
new file mode 100644
index 0000000000..8079b206f4
--- /dev/null
+++ b/target/nextflow/label_projection/methods/naive_bayes/.config.vsh.yaml
@@ -0,0 +1,249 @@
+functionality:
+  name: "naive_bayes"
+  namespace: "label_projection/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Naive Bayesian Classifier"
+    summary: "Naive Bayes classification using feature probabilities to project cell\
+      \ type labels from a reference dataset."
+    description: "Naive Bayes classification leverages probabilistic models based\
+      \ on Bayes' theorem\nto classify cells into different types. In the context\
+      \ of single-cell datasets, this method\nutilizes the probabilities of features\
+      \ to project cell type labels from a reference dataset\nto new datasets. The\
+      \ algorithm assumes independence between features, making it computationally\n\
+      efficient and well-suited for high-dimensional data. It is particularly useful\
+      \ for annotating\ncells in atlas-scale datasets, ensuring consistency and alignment\
+      \ with existing reference annotations.\n"
+    reference: "hosmer2013applied"
+    repository_url: "https://github.com/scikit-learn/scikit-learn"
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html"
+    preferred_normalization: "log_cp10k"
+    variants:
+      naive_bayes_log_cp10k: null
+      naive_bayes_scran:
+        preferred_normalization: "log_scran_pooling"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A label projection method."
+      description: "A label projection method to predict the labels of a new \"test\"\
+        \ndataset based on an annotated \"training\" dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/naive_bayes/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/naive_bayes"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/naive_bayes/naive_bayes"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/label_projection/methods/naive_bayes/main.nf b/target/nextflow/label_projection/methods/naive_bayes/main.nf
new file mode 100644
index 0000000000..1b6b01b242
--- /dev/null
+++ b/target/nextflow/label_projection/methods/naive_bayes/main.nf
@@ -0,0 +1,3654 @@
+// naive_bayes 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "naive_bayes",
+    "namespace" : "label_projection/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train",
+        "info" : {
+          "label" : "Training data",
+          "summary" : "The training data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "Ground truth cell type labels",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/train.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The test data (without labels)",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/test.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "The prediction file",
+          "slots" : {
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label_pred",
+                "description" : "Predicted labels for the test cells.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/naive_bayes/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/label_projection/pancreas",
+        "dest" : "resources_test/label_projection/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Naive Bayesian Classifier",
+      "summary" : "Naive Bayes classification using feature probabilities to project cell type labels from a reference dataset.",
+      "description" : "Naive Bayes classification leverages probabilistic models based on Bayes' theorem\nto classify cells into different types. In the context of single-cell datasets, this method\nutilizes the probabilities of features to project cell type labels from a reference dataset\nto new datasets. The algorithm assumes independence between features, making it computationally\nefficient and well-suited for high-dimensional data. It is particularly useful for annotating\ncells in atlas-scale datasets, ensuring consistency and alignment with existing reference annotations.\n",
+      "reference" : "hosmer2013applied",
+      "repository_url" : "https://github.com/scikit-learn/scikit-learn",
+      "documentation_url" : "https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html",
+      "preferred_normalization" : "log_cp10k",
+      "variants" : {
+        "naive_bayes_scran" : {
+          "preferred_normalization" : "log_scran_pooling"
+        }
+      },
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A label projection method.",
+        "description" : "A label projection method to predict the labels of a new \\"test\\"\ndataset based on an annotated \\"training\\" dataset.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scikit-learn"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/naive_bayes/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/naive_bayes",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import sklearn.naive_bayes
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+print("Fit to train data", flush=True)
+classifier = sklearn.naive_bayes.GaussianNB()
+classifier.fit(input_train.obsm["X_pca"], input_train.obs["label"].astype(str))
+
+print("Predict on test data", flush=True)
+input_test.obs["label_pred"] = classifier.predict(input_test.obsm["X_pca"])
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/label_projection/methods/naive_bayes",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/label_projection/methods/naive_bayes/nextflow.config b/target/nextflow/label_projection/methods/naive_bayes/nextflow.config
new file mode 100644
index 0000000000..0465d42c30
--- /dev/null
+++ b/target/nextflow/label_projection/methods/naive_bayes/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'label_projection/methods/naive_bayes'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/label_projection/methods/scanvi/.config.vsh.yaml b/target/nextflow/label_projection/methods/scanvi/.config.vsh.yaml
new file mode 100644
index 0000000000..f34c0ed9f7
--- /dev/null
+++ b/target/nextflow/label_projection/methods/scanvi/.config.vsh.yaml
@@ -0,0 +1,266 @@
+functionality:
+  name: "scanvi"
+  namespace: "label_projection/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--num_hvg"
+    description: "The number of HVG genes to subset to."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "scANVI"
+    summary: "scANVI predicts cell type labels for unlabelled test data by leveraging\
+      \ cell type labels, modelling uncertainty and using deep neural networks with\
+      \ stochastic optimization."
+    description: "single-cell ANnotation using Variational Inference is a\nsemi-supervised\
+      \ variant of the scVI(Lopez et al. 2018) algorithm. Like scVI,\nscANVI uses\
+      \ deep neural networks and stochastic optimization to model\nuncertainty caused\
+      \ by technical noise and bias in single - cell\ntranscriptomics measurements.\
+      \ However, scANVI also leverages cell type labels\nin the generative modelling.\
+      \ In this approach, scANVI is used to predict the\ncell type labels of the unlabelled\
+      \ test data.\n"
+    reference: "lotfollahi2020query"
+    repository_url: "https://github.com/scverse/scvi-tools"
+    documentation_url: "https://scarches.readthedocs.io/en/latest/scanvi_surgery_pipeline.html"
+    v1:
+      path: "openproblems/tasks/label_projection/methods/scvi_tools.py"
+      commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+    preferred_normalization: "counts"
+    variants:
+      scanvi_all_genes: null
+      scanvi_hvg:
+        num_hvg: 2000
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A label projection method."
+      description: "A label projection method to predict the labels of a new \"test\"\
+        \ndataset based on an annotated \"training\" dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_pytorch_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scarches"
+    - "scvi-tools>=1.1.0"
+    upgrade: true
+  - type: "docker"
+    run:
+    - "pip install -U \"jax[cuda12_pip]\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "highcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/scanvi/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/scanvi"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/scanvi/scanvi"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/label_projection/methods/scanvi/main.nf b/target/nextflow/label_projection/methods/scanvi/main.nf
new file mode 100644
index 0000000000..86b7157cb8
--- /dev/null
+++ b/target/nextflow/label_projection/methods/scanvi/main.nf
@@ -0,0 +1,3727 @@
+// scanvi 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "scanvi",
+    "namespace" : "label_projection/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train",
+        "info" : {
+          "label" : "Training data",
+          "summary" : "The training data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "Ground truth cell type labels",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/train.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The test data (without labels)",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/test.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "The prediction file",
+          "slots" : {
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label_pred",
+                "description" : "Predicted labels for the test cells.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--num_hvg",
+        "description" : "The number of HVG genes to subset to.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/scanvi/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/label_projection/pancreas",
+        "dest" : "resources_test/label_projection/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "scANVI",
+      "summary" : "scANVI predicts cell type labels for unlabelled test data by leveraging cell type labels, modelling uncertainty and using deep neural networks with stochastic optimization.",
+      "description" : "single-cell ANnotation using Variational Inference is a\nsemi-supervised variant of the scVI(Lopez et al. 2018) algorithm. Like scVI,\nscANVI uses deep neural networks and stochastic optimization to model\nuncertainty caused by technical noise and bias in single - cell\ntranscriptomics measurements. However, scANVI also leverages cell type labels\nin the generative modelling. In this approach, scANVI is used to predict the\ncell type labels of the unlabelled test data.\n",
+      "reference" : "lotfollahi2020query",
+      "repository_url" : "https://github.com/scverse/scvi-tools",
+      "documentation_url" : "https://scarches.readthedocs.io/en/latest/scanvi_surgery_pipeline.html",
+      "v1" : {
+        "path" : "openproblems/tasks/label_projection/methods/scvi_tools.py",
+        "commit" : "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+      },
+      "preferred_normalization" : "counts",
+      "variants" : {
+        "scanvi_hvg" : {
+          "num_hvg" : 2000
+        }
+      },
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A label projection method.",
+        "description" : "A label projection method to predict the labels of a new \\"test\\"\ndataset based on an annotated \\"training\\" dataset.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_pytorch_nvidia:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scarches",
+            "scvi-tools>=1.1.0"
+          ],
+          "upgrade" : true
+        },
+        {
+          "type" : "docker",
+          "run" : [
+            "pip install -U \\"jax[cuda12_pip]\\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+          ]
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "highcpu",
+          "gpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/scanvi/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/scanvi",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import scarches as sca
+import pandas as pd
+
+# followed procedure from here:
+# https://scarches.readthedocs.io/en/latest/scanvi_surgery_pipeline.html
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'num_hvg': $( if [ ! -z ${VIASH_PAR_NUM_HVG+x} ]; then echo "int(r'${VIASH_PAR_NUM_HVG//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+
+if par["num_hvg"]:
+    print("Subsetting to HVG", flush=True)
+    hvg_idx = input_train.var['hvg_score'].to_numpy().argsort()[:par["num_hvg"]]
+    input_train = input_train[:,hvg_idx]
+    input_test = input_test[:,hvg_idx]
+
+print("Concatenating train and test data", flush=True)
+input_train.obs['is_test'] = False
+input_test.obs['is_test'] = True
+input_test.obs['label'] = "Unknown"
+adata = ad.concat([input_train, input_test], merge = "same")
+del input_train
+
+print("Create SCANVI model and train it on fully labelled reference dataset", flush=True)
+sca.models.SCVI.setup_anndata(
+    adata, 
+    batch_key="batch", 
+    labels_key="label",
+    layer="counts"
+)
+
+vae = sca.models.SCVI(
+    adata,
+    n_layers=2,
+    encode_covariates=True,
+    deeply_inject_covariates=False,
+    use_layer_norm="both",
+    use_batch_norm="none",
+)
+
+print("Create the SCANVI model instance with ZINB loss", flush=True)
+scanvae = sca.models.SCANVI.from_scvi_model(vae, unlabeled_category = "Unknown")
+
+print("Train SCANVI model", flush=True)
+scanvae.train()
+
+print("Make predictions", flush=True)
+preds = scanvae.predict(adata)
+
+print("Store outputs", flush=True)
+output = ad.AnnData(
+    obs=pd.DataFrame(
+        {"label_pred": preds[adata.obs['is_test'].values]},
+        index=input_test.obs.index,
+    ),
+    var=input_test.var[[]],
+    uns={
+        "dataset_id": input_test.uns["dataset_id"],
+        "normalization_id": input_test.uns["normalization_id"],
+        "method_id": meta["functionality_name"],
+    },
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/label_projection/methods/scanvi",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "highcpu",
+    "gpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/label_projection/methods/scanvi/nextflow.config b/target/nextflow/label_projection/methods/scanvi/nextflow.config
new file mode 100644
index 0000000000..a397074723
--- /dev/null
+++ b/target/nextflow/label_projection/methods/scanvi/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'label_projection/methods/scanvi'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/label_projection/methods/scanvi_scarches/.config.vsh.yaml b/target/nextflow/label_projection/methods/scanvi_scarches/.config.vsh.yaml
new file mode 100644
index 0000000000..8916e56823
--- /dev/null
+++ b/target/nextflow/label_projection/methods/scanvi_scarches/.config.vsh.yaml
@@ -0,0 +1,306 @@
+functionality:
+  name: "scanvi_scarches"
+  namespace: "label_projection/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_latent"
+    description: "Number of units in the latent layer"
+    info: null
+    default:
+    - 30
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_layers"
+    description: "Number of hidden layers"
+    info: null
+    default:
+    - 2
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_hidden"
+    description: "Number of units in the hidden layers"
+    info: null
+    default:
+    - 128
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--dropout_rate"
+    description: "Rate of dropout applied in training"
+    info: null
+    default:
+    - 0.2
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs"
+    description: "Maximum number of training epochs"
+    info: null
+    default:
+    - 2
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "scANVI+scArches"
+    summary: "Query to reference single-cell integration with transfer learning with\
+      \ scANVI and scArches"
+    description: "scArches+scANVI or \"Single-cell architecture surgery\" is a deep\
+      \ learning method for mapping new datasets onto a pre-existing reference model,\
+      \ using transfer learning and parameter optimization. It first uses scANVI to\
+      \ build a reference model from the training data, and then apply scArches to\
+      \ map the test data onto the reference model and make predictions."
+    reference: "lotfollahi2020query"
+    documentation_url: "https://docs.scvi-tools.org"
+    repository_url: "https://github.com/scverse/scvi-tools"
+    preferred_normalization: "counts"
+    v1:
+      path: "openproblems/tasks/label_projection/methods/scvi_tools.py"
+      commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+    variants:
+      scanvi_scarches: null
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A label projection method."
+      description: "A label projection method to predict the labels of a new \"test\"\
+        \ndataset based on an annotated \"training\" dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_pytorch_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scvi-tools>=1.1.0"
+    upgrade: true
+  - type: "docker"
+    run:
+    - "pip install -U \"jax[cuda12_pip]\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/scanvi_scarches/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/scanvi_scarches"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/scanvi_scarches/scanvi_scarches"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/label_projection/methods/scanvi_scarches/main.nf b/target/nextflow/label_projection/methods/scanvi_scarches/main.nf
new file mode 100644
index 0000000000..88713e9681
--- /dev/null
+++ b/target/nextflow/label_projection/methods/scanvi_scarches/main.nf
@@ -0,0 +1,3761 @@
+// scanvi_scarches 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "scanvi_scarches",
+    "namespace" : "label_projection/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train",
+        "info" : {
+          "label" : "Training data",
+          "summary" : "The training data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "Ground truth cell type labels",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/train.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The test data (without labels)",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/test.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "The prediction file",
+          "slots" : {
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label_pred",
+                "description" : "Predicted labels for the test cells.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_latent",
+        "description" : "Number of units in the latent layer",
+        "default" : [
+          30
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_layers",
+        "description" : "Number of hidden layers",
+        "default" : [
+          2
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_hidden",
+        "description" : "Number of units in the hidden layers",
+        "default" : [
+          128
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "double",
+        "name" : "--dropout_rate",
+        "description" : "Rate of dropout applied in training",
+        "default" : [
+          0.2
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--max_epochs",
+        "description" : "Maximum number of training epochs",
+        "default" : [
+          2
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/scanvi_scarches/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/label_projection/pancreas",
+        "dest" : "resources_test/label_projection/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "scANVI+scArches",
+      "summary" : "Query to reference single-cell integration with transfer learning with scANVI and scArches",
+      "description" : "scArches+scANVI or \\"Single-cell architecture surgery\\" is a deep learning method for mapping new datasets onto a pre-existing reference model, using transfer learning and parameter optimization. It first uses scANVI to build a reference model from the training data, and then apply scArches to map the test data onto the reference model and make predictions.",
+      "reference" : "lotfollahi2020query",
+      "documentation_url" : "https://docs.scvi-tools.org",
+      "repository_url" : "https://github.com/scverse/scvi-tools",
+      "preferred_normalization" : "counts",
+      "v1" : {
+        "path" : "openproblems/tasks/label_projection/methods/scvi_tools.py",
+        "commit" : "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+      },
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A label projection method.",
+        "description" : "A label projection method to predict the labels of a new \\"test\\"\ndataset based on an annotated \\"training\\" dataset.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_pytorch_nvidia:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scvi-tools>=1.1.0"
+          ],
+          "upgrade" : true
+        },
+        {
+          "type" : "docker",
+          "run" : [
+            "pip install -U \\"jax[cuda12_pip]\\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+          ]
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu",
+          "gpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/scanvi_scarches/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/scanvi_scarches",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import numpy as np
+import scvi
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_latent': $( if [ ! -z ${VIASH_PAR_N_LATENT+x} ]; then echo "int(r'${VIASH_PAR_N_LATENT//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_layers': $( if [ ! -z ${VIASH_PAR_N_LAYERS+x} ]; then echo "int(r'${VIASH_PAR_N_LAYERS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_hidden': $( if [ ! -z ${VIASH_PAR_N_HIDDEN+x} ]; then echo "int(r'${VIASH_PAR_N_HIDDEN//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'dropout_rate': $( if [ ! -z ${VIASH_PAR_DROPOUT_RATE+x} ]; then echo "float(r'${VIASH_PAR_DROPOUT_RATE//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'max_epochs': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Reading input files", flush=True)
+input_train = ad.read_h5ad(par["input_train"])
+input_test = ad.read_h5ad(par["input_test"])
+input_train.X = input_train.layers["counts"]
+input_test.X = input_test.layers["counts"]
+
+print("Train model", flush=True)
+unlabeled_category = "Unknown"
+
+scvi.model.SCVI.setup_anndata(input_train, batch_key="batch", labels_key="label")
+
+# specific scArches parameters
+arches_params = dict(
+    use_layer_norm="both",
+    use_batch_norm="none",
+    encode_covariates=True,
+    dropout_rate=par["dropout_rate"],
+    n_hidden=par["n_hidden"],
+    n_layers=par["n_layers"],
+    n_latent=par["n_latent"],
+)
+scvi_model = scvi.model.SCVI(input_train, **arches_params)
+train_kwargs = dict(
+    train_size=0.9,
+    early_stopping=True,
+)
+scvi_model.train(**train_kwargs)
+model = scvi.model.SCANVI.from_scvi_model(
+    scvi_model, unlabeled_category=unlabeled_category
+)
+model.train(**train_kwargs)
+
+query_model = scvi.model.SCANVI.load_query_data(input_test, model)
+train_kwargs = dict(max_epochs=par["max_epochs"], early_stopping=True)
+query_model.train(plan_kwargs=dict(weight_decay=0.0), **train_kwargs)
+
+print("Generate predictions", flush=True)
+input_test.obs["label"] = "Unknown"
+input_test.obs["label_pred"] = query_model.predict(input_test)
+
+print("Write output AnnData to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/label_projection/methods/scanvi_scarches",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu",
+    "gpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/label_projection/methods/scanvi_scarches/nextflow.config b/target/nextflow/label_projection/methods/scanvi_scarches/nextflow.config
new file mode 100644
index 0000000000..ce4fb056a8
--- /dev/null
+++ b/target/nextflow/label_projection/methods/scanvi_scarches/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'label_projection/methods/scanvi_scarches'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/label_projection/methods/xgboost/.config.vsh.yaml b/target/nextflow/label_projection/methods/xgboost/.config.vsh.yaml
new file mode 100644
index 0000000000..6081eaf470
--- /dev/null
+++ b/target/nextflow/label_projection/methods/xgboost/.config.vsh.yaml
@@ -0,0 +1,248 @@
+functionality:
+  name: "xgboost"
+  namespace: "label_projection/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "XGBoost"
+    summary: "XGBoost is a decision tree model that averages multiple trees with gradient\
+      \ boosting."
+    description: "XGBoost is a gradient boosting decision tree model that learns multiple\
+      \ tree\nstructures in the form of a series of input features and their values,\n\
+      leading to a prediction decision, and averages predictions from all its\ntrees.\
+      \ Here, input features are normalised gene expression values.\n"
+    reference: "chen2016xgboost"
+    repository_url: "https://github.com/dmlc/xgboost"
+    documentation_url: "https://xgboost.readthedocs.io/en/stable/index.html"
+    v1:
+      path: "openproblems/tasks/label_projection/methods/xgboost.py"
+      commit: "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+    preferred_normalization: "log_cp10k"
+    variants:
+      xgboost_log_cp10k: null
+      xgboost_scran:
+        preferred_normalization: "log_scran_pooling"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A label projection method."
+      description: "A label projection method to predict the labels of a new \"test\"\
+        \ndataset based on an annotated \"training\" dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "xgboost"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/xgboost/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/xgboost"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/xgboost/xgboost"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/label_projection/methods/xgboost/main.nf b/target/nextflow/label_projection/methods/xgboost/main.nf
new file mode 100644
index 0000000000..fa0d4a1878
--- /dev/null
+++ b/target/nextflow/label_projection/methods/xgboost/main.nf
@@ -0,0 +1,3669 @@
+// xgboost 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "xgboost",
+    "namespace" : "label_projection/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train",
+        "info" : {
+          "label" : "Training data",
+          "summary" : "The training data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "Ground truth cell type labels",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/train.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The test data (without labels)",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/test.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "The prediction file",
+          "slots" : {
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label_pred",
+                "description" : "Predicted labels for the test cells.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/xgboost/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/label_projection/pancreas",
+        "dest" : "resources_test/label_projection/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "XGBoost",
+      "summary" : "XGBoost is a decision tree model that averages multiple trees with gradient boosting.",
+      "description" : "XGBoost is a gradient boosting decision tree model that learns multiple tree\nstructures in the form of a series of input features and their values,\nleading to a prediction decision, and averages predictions from all its\ntrees. Here, input features are normalised gene expression values.\n",
+      "reference" : "chen2016xgboost",
+      "repository_url" : "https://github.com/dmlc/xgboost",
+      "documentation_url" : "https://xgboost.readthedocs.io/en/stable/index.html",
+      "v1" : {
+        "path" : "openproblems/tasks/label_projection/methods/xgboost.py",
+        "commit" : "e3be930c6d4bbd656ab1e656badb52bb50e6cdd6"
+      },
+      "preferred_normalization" : "log_cp10k",
+      "variants" : {
+        "xgboost_scran" : {
+          "preferred_normalization" : "log_scran_pooling"
+        }
+      },
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A label projection method.",
+        "description" : "A label projection method to predict the labels of a new \\"test\\"\ndataset based on an annotated \\"training\\" dataset.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "xgboost"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/xgboost/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/xgboost",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import xgboost as xgb
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test': $( if [ ! -z ${VIASH_PAR_INPUT_TEST+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load input data", flush=True)
+input_train = ad.read_h5ad(par['input_train'])
+input_test = ad.read_h5ad(par['input_test'])
+input_layer = "normalized"
+
+print("Transform into integers", flush=True)
+input_train.obs["label_int"] = input_train.obs["label"].cat.codes
+categories = input_train.obs["label"].cat.categories
+
+print("Convert AnnDatas into datasets", flush=True)
+xg_train = xgb.DMatrix(input_train.layers[input_layer], label=input_train.obs["label_int"])
+xg_test = xgb.DMatrix(input_test.layers[input_layer])
+
+print("Fit on train data", flush=True)
+param = {'objective': 'multi:softmax', 'num_class': len(categories)}
+watchlist = [(xg_train, "train")]
+xgb_op = xgb.train(param, xg_train, evals=watchlist)
+
+print("Predict on test data", flush=True)
+pred = xgb_op.predict(xg_test).astype(int)
+input_test.obs["label_pred"] = categories[pred]
+
+print("Write output to file", flush=True)
+input_test.uns["method_id"] = meta["functionality_name"]
+input_test.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/label_projection/methods/xgboost",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/label_projection/methods/xgboost/nextflow.config b/target/nextflow/label_projection/methods/xgboost/nextflow.config
new file mode 100644
index 0000000000..d5e96a5f60
--- /dev/null
+++ b/target/nextflow/label_projection/methods/xgboost/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'label_projection/methods/xgboost'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/label_projection/metrics/accuracy/.config.vsh.yaml b/target/nextflow/label_projection/metrics/accuracy/.config.vsh.yaml
new file mode 100644
index 0000000000..228608536f
--- /dev/null
+++ b/target/nextflow/label_projection/metrics/accuracy/.config.vsh.yaml
@@ -0,0 +1,251 @@
+functionality:
+  name: "accuracy"
+  namespace: "label_projection/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "The solution for the test data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_prediction"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "accuracy"
+      label: "Accuracy"
+      summary: "The percentage of correctly predicted labels."
+      description: "The percentage of correctly predicted labels."
+      min: 0
+      max: 1
+      maximize: true
+      reference: "grandini2020metrics"
+      v1:
+        path: "openproblems/tasks/label_projection/metrics/accuracy.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A label projection metric."
+      description: "A metric for evaluating predicted labels.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/metrics/accuracy/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/metrics/accuracy"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/metrics/accuracy/accuracy"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/label_projection/metrics/accuracy/main.nf b/target/nextflow/label_projection/metrics/accuracy/main.nf
new file mode 100644
index 0000000000..12d72cde48
--- /dev/null
+++ b/target/nextflow/label_projection/metrics/accuracy/main.nf
@@ -0,0 +1,3678 @@
+// accuracy 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "accuracy",
+    "namespace" : "label_projection/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "The solution for the test data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "Ground truth cell type labels",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_prediction",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "The prediction file",
+          "slots" : {
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label_pred",
+                "description" : "Predicted labels for the test cells.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/label_projection/metrics/accuracy/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/label_projection/pancreas",
+        "dest" : "resources_test/label_projection/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_metric_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "accuracy",
+          "label" : "Accuracy",
+          "summary" : "The percentage of correctly predicted labels.",
+          "description" : "The percentage of correctly predicted labels.",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "reference" : "grandini2020metrics",
+          "v1" : {
+            "path" : "openproblems/tasks/label_projection/metrics/accuracy.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          }
+        }
+      ],
+      "type" : "metric",
+      "type_info" : {
+        "label" : "Metric",
+        "summary" : "A label projection metric.",
+        "description" : "A metric for evaluating predicted labels.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scikit-learn"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/metrics/accuracy/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/metrics/accuracy",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import numpy as np
+import sklearn.preprocessing
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_prediction': $( if [ ! -z ${VIASH_PAR_INPUT_PREDICTION+x} ]; then echo "r'${VIASH_PAR_INPUT_PREDICTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+input_prediction = ad.read_h5ad(par['input_prediction'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+assert (input_prediction.obs_names == input_solution.obs_names).all(), "obs_names not the same in prediction and solution inputs"
+
+print("Encode labels", flush=True)
+cats = list(input_solution.obs["label"].dtype.categories) + list(input_prediction.obs["label_pred"].dtype.categories)
+encoder = sklearn.preprocessing.LabelEncoder().fit(cats)
+input_solution.obs["label"] = encoder.transform(input_solution.obs["label"])
+input_prediction.obs["label_pred"] = encoder.transform(input_prediction.obs["label_pred"])
+
+print("Compute prediction accuracy", flush=True)
+accuracy = np.mean(input_solution.obs["label"] == input_prediction.obs["label_pred"])
+
+print("Store metric value", flush=True)
+input_prediction.uns["metric_ids"] = "accuracy"
+input_prediction.uns["metric_values"] = accuracy
+
+print("Writing adata to file", flush=True)
+input_prediction.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/label_projection/metrics/accuracy",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/label_projection/metrics/accuracy/nextflow.config b/target/nextflow/label_projection/metrics/accuracy/nextflow.config
new file mode 100644
index 0000000000..c4a3e4c7e0
--- /dev/null
+++ b/target/nextflow/label_projection/metrics/accuracy/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'label_projection/metrics/accuracy'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/label_projection/metrics/f1/.config.vsh.yaml b/target/nextflow/label_projection/metrics/f1/.config.vsh.yaml
new file mode 100644
index 0000000000..b71b423c29
--- /dev/null
+++ b/target/nextflow/label_projection/metrics/f1/.config.vsh.yaml
@@ -0,0 +1,278 @@
+functionality:
+  name: "f1"
+  namespace: "label_projection/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "The solution for the test data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_prediction"
+    info:
+      label: "Prediction"
+      summary: "The prediction file"
+      slots:
+        obs:
+        - type: "string"
+          name: "label_pred"
+          description: "Predicted labels for the test cells."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/label_projection/pancreas"
+    dest: "resources_test/label_projection/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "f1_weighted"
+      label: "F1 weighted"
+      summary: "Average weigthed support between each labels F1 score"
+      description: "Calculates the F1 score for each label, and find their average\
+        \ weighted by support (the number of true instances for each label). This\
+        \ alters 'macro' to account for label imbalance; it can result in an F-score\
+        \ that is not between precision and recall."
+      reference: "grandini2020metrics"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/label_projection/metrics/f1.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    - name: "f1_macro"
+      label: "F1 macro"
+      summary: "Unweighted mean of each label F1-score"
+      description: "Calculates the F1 score for each label, and find their unweighted\
+        \ mean. This does not take label imbalance into account."
+      reference: "grandini2020metrics"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/label_projection/metrics/f1.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    - name: "f1_micro"
+      label: "F1 micro"
+      summary: "Calculation of TP, FN and FP."
+      description: "Calculates the F1 score globally by counting the total true positives,\
+        \ false negatives and false positives."
+      reference: "grandini2020metrics"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/label_projection/metrics/f1.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A label projection metric."
+      description: "A metric for evaluating predicted labels.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/metrics/f1/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/metrics/f1"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/metrics/f1/f1"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/label_projection/metrics/f1/main.nf b/target/nextflow/label_projection/metrics/f1/main.nf
new file mode 100644
index 0000000000..1caac437de
--- /dev/null
+++ b/target/nextflow/label_projection/metrics/f1/main.nf
@@ -0,0 +1,3712 @@
+// f1 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "f1",
+    "namespace" : "label_projection/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "The solution for the test data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "Ground truth cell type labels",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_prediction",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "The prediction file",
+          "slots" : {
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label_pred",
+                "description" : "Predicted labels for the test cells.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/label_projection/metrics/f1/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/label_projection/pancreas",
+        "dest" : "resources_test/label_projection/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_metric_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "f1_weighted",
+          "label" : "F1 weighted",
+          "summary" : "Average weigthed support between each labels F1 score",
+          "description" : "Calculates the F1 score for each label, and find their average weighted by support (the number of true instances for each label). This alters 'macro' to account for label imbalance; it can result in an F-score that is not between precision and recall.",
+          "reference" : "grandini2020metrics",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/label_projection/metrics/f1.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          }
+        },
+        {
+          "name" : "f1_macro",
+          "label" : "F1 macro",
+          "summary" : "Unweighted mean of each label F1-score",
+          "description" : "Calculates the F1 score for each label, and find their unweighted mean. This does not take label imbalance into account.",
+          "reference" : "grandini2020metrics",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/label_projection/metrics/f1.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          }
+        },
+        {
+          "name" : "f1_micro",
+          "label" : "F1 micro",
+          "summary" : "Calculation of TP, FN and FP.",
+          "description" : "Calculates the F1 score globally by counting the total true positives, false negatives and false positives.",
+          "reference" : "grandini2020metrics",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/label_projection/metrics/f1.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          }
+        }
+      ],
+      "type" : "metric",
+      "type_info" : {
+        "label" : "Metric",
+        "summary" : "A label projection metric.",
+        "description" : "A metric for evaluating predicted labels.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scikit-learn"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/metrics/f1/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/metrics/f1",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+from sklearn.metrics import f1_score
+import sklearn.preprocessing
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_prediction': $( if [ ! -z ${VIASH_PAR_INPUT_PREDICTION+x} ]; then echo "r'${VIASH_PAR_INPUT_PREDICTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Load data", flush=True)
+input_prediction = ad.read_h5ad(par['input_prediction'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+assert (input_prediction.obs_names == input_solution.obs_names).all(), "obs_names not the same in prediction and solution inputs"
+
+print("Encode labels", flush=True)
+cats = list(input_solution.obs["label"].dtype.categories) + list(input_prediction.obs["label_pred"].dtype.categories)
+encoder = sklearn.preprocessing.LabelEncoder().fit(cats)
+input_solution.obs["label"] = encoder.transform(input_solution.obs["label"])
+input_prediction.obs["label_pred"] = encoder.transform(input_prediction.obs["label_pred"])
+
+print("Compute F1 score", flush=True)
+metric_type = [ "macro", "micro", "weighted" ]
+metric_id = [ "f1_" + x for x in metric_type]
+metric_value = [ f1_score(
+        input_solution.obs["label"], 
+        input_prediction.obs["label_pred"], 
+        average=x
+    ) for x in metric_type ]
+
+print("Store metric value", flush=True)
+input_prediction.uns["metric_ids"] = metric_id
+input_prediction.uns["metric_values"] = metric_value
+
+print("Writing adata to file", flush=True)
+input_prediction.write_h5ad(par['output'], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/label_projection/metrics/f1",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/label_projection/metrics/f1/nextflow.config b/target/nextflow/label_projection/metrics/f1/nextflow.config
new file mode 100644
index 0000000000..2f141b5f63
--- /dev/null
+++ b/target/nextflow/label_projection/metrics/f1/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'label_projection/metrics/f1'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/label_projection/process_dataset/.config.vsh.yaml b/target/nextflow/label_projection/process_dataset/.config.vsh.yaml
new file mode 100644
index 0000000000..88ed69a295
--- /dev/null
+++ b/target/nextflow/label_projection/process_dataset/.config.vsh.yaml
@@ -0,0 +1,397 @@
+functionality:
+  name: "process_dataset"
+  namespace: "label_projection"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Common Dataset"
+      summary: "A subset of the common dataset."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type information"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/common/pancreas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_train"
+    info:
+      label: "Training data"
+      summary: "The training data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/train.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_test"
+    info:
+      label: "Test data"
+      summary: "The test data (without labels)"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/test.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_solution"
+    info:
+      label: "Solution"
+      summary: "The solution for the test data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "label"
+          description: "Ground truth cell type labels"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/label_projection/pancreas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--method"
+    description: "The process method to assign train/test."
+    info: null
+    default:
+    - "batch"
+    required: false
+    choices:
+    - "batch"
+    - "random"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_label"
+    description: "Which .obs slot to use as label."
+    info: null
+    default:
+    - "cell_type"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--obs_batch"
+    description: "Which .obs slot to use as batch covariate."
+    info: null
+    default:
+    - "batch"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--seed"
+    description: "A seed for the subsampling."
+    info: null
+    example:
+    - 123
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/subset_anndata.py"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/pancreas"
+    dest: "resources_test/common/pancreas"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "process_dataset"
+    type_info:
+      label: "Data processor"
+      summary: "A label projection dataset processor."
+      description: "A component for processing a Common Dataset into a task-specific\
+        \ dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/process_dataset/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/process_dataset"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/process_dataset/process_dataset"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/label_projection/process_dataset/main.nf b/target/nextflow/label_projection/process_dataset/main.nf
new file mode 100644
index 0000000000..43a2f1086d
--- /dev/null
+++ b/target/nextflow/label_projection/process_dataset/main.nf
@@ -0,0 +1,3923 @@
+// process_dataset 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_dataset",
+    "namespace" : "label_projection",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Common Dataset",
+          "summary" : "A subset of the common dataset.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Cell type information",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/pancreas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_train",
+        "info" : {
+          "label" : "Training data",
+          "summary" : "The training data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "Ground truth cell type labels",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/train.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_test",
+        "info" : {
+          "label" : "Test data",
+          "summary" : "The test data (without labels)",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/test.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "The solution for the test data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "label",
+                "description" : "Ground truth cell type labels",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/label_projection/pancreas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--method",
+        "description" : "The process method to assign train/test.",
+        "default" : [
+          "batch"
+        ],
+        "required" : false,
+        "choices" : [
+          "batch",
+          "random"
+        ],
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--obs_label",
+        "description" : "Which .obs slot to use as label.",
+        "default" : [
+          "cell_type"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--obs_batch",
+        "description" : "Which .obs slot to use as batch covariate.",
+        "default" : [
+          "batch"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--seed",
+        "description" : "A seed for the subsampling.",
+        "example" : [
+          123
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/label_projection/process_dataset/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/helper_functions/subset_anndata.py",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/common/pancreas",
+        "dest" : "resources_test/common/pancreas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "process_dataset",
+      "type_info" : {
+        "label" : "Data processor",
+        "summary" : "A label projection dataset processor.",
+        "description" : "A component for processing a Common Dataset into a task-specific dataset.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "highmem",
+          "midcpu",
+          "midtime"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/process_dataset/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/process_dataset",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import random
+import numpy as np
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_train': $( if [ ! -z ${VIASH_PAR_OUTPUT_TRAIN+x} ]; then echo "r'${VIASH_PAR_OUTPUT_TRAIN//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_test': $( if [ ! -z ${VIASH_PAR_OUTPUT_TEST+x} ]; then echo "r'${VIASH_PAR_OUTPUT_TEST//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_solution': $( if [ ! -z ${VIASH_PAR_OUTPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_OUTPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'method': $( if [ ! -z ${VIASH_PAR_METHOD+x} ]; then echo "r'${VIASH_PAR_METHOD//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'obs_label': $( if [ ! -z ${VIASH_PAR_OBS_LABEL+x} ]; then echo "r'${VIASH_PAR_OBS_LABEL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'obs_batch': $( if [ ! -z ${VIASH_PAR_OBS_BATCH+x} ]; then echo "r'${VIASH_PAR_OBS_BATCH//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'seed': $( if [ ! -z ${VIASH_PAR_SEED+x} ]; then echo "int(r'${VIASH_PAR_SEED//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# import helper functions
+sys.path.append(meta['resources_dir'])
+from subset_anndata import read_config_slots_info, subset_anndata
+
+# set seed if need be
+if par["seed"]:
+    print(f">> Setting seed to {par['seed']}")
+    random.seed(par["seed"])
+
+print(">> Load data", flush=True)
+adata = ad.read_h5ad(par["input"])
+print("input:", adata)
+
+print(f">> Process data using {par['method']} method")
+if par["method"] == "batch":
+    batch_info = adata.obs[par["obs_batch"]]
+    batch_categories = batch_info.dtype.categories
+    test_batches = random.sample(list(batch_categories), 1)
+    is_test = [ x in test_batches for x in batch_info ]
+elif par["method"] == "random":
+    train_ix = np.random.choice(adata.n_obs, round(adata.n_obs * 0.8), replace=False)
+    is_test = [ not x in train_ix for x in range(0, adata.n_obs) ]
+
+# subset the different adatas
+print(">> Figuring which data needs to be copied to which output file", flush=True)
+# use par arguments to look for label and batch value in different slots
+slot_mapping = {
+    "obs": {
+        "label": par["obs_label"],
+        "batch": par["obs_batch"],
+    }
+}
+slot_info = read_config_slots_info(meta["config"], slot_mapping)
+
+print(">> Creating train data", flush=True)
+output_train = subset_anndata(
+    adata[[not x for x in is_test]], 
+    slot_info["output_train"]
+)
+
+print(">> Creating test data", flush=True)
+output_test = subset_anndata(
+    adata[is_test],
+    slot_info["output_test"]
+)
+
+print(">> Creating solution data", flush=True)
+output_solution = subset_anndata(
+    adata[is_test],
+    slot_info['output_solution']
+)
+
+print(">> Writing data", flush=True)
+output_train.write_h5ad(par["output_train"])
+output_test.write_h5ad(par["output_test"])
+output_solution.write_h5ad(par["output_solution"])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/label_projection/process_dataset",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "highmem",
+    "midcpu",
+    "midtime"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/label_projection/process_dataset/nextflow.config b/target/nextflow/label_projection/process_dataset/nextflow.config
new file mode 100644
index 0000000000..edc2e61130
--- /dev/null
+++ b/target/nextflow/label_projection/process_dataset/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'label_projection/process_dataset'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/label_projection/process_dataset/subset_anndata.py b/target/nextflow/label_projection/process_dataset/subset_anndata.py
new file mode 100644
index 0000000000..80bd160872
--- /dev/null
+++ b/target/nextflow/label_projection/process_dataset/subset_anndata.py
@@ -0,0 +1,83 @@
+"""Helper functions related to subsetting AnnData objects based on the file format 
+specifications in the .config.vsh.yaml and slot mapping overrides."""
+
+def read_config_slots_info(config_file, slot_mapping = {}):
+    """Read the .config.vsh.yaml to find out which output slots need to be copied to which output file.
+    
+    Arguments:
+    config_file -- Path to the .config.vsh.yaml file (required).
+    slot_mapping -- Which slots to retain. Must be a dictionary whose keys are the names 
+      of the AnnData structs, and values is another dictionary with destination value
+      names as keys and source value names as values.
+      Example of slot_mapping: 
+        ```
+        slot_mapping = {
+          "layers": {
+            "counts": par["layer_counts"],
+          },
+          "obs": {
+            "cell_type": par["obs_cell_type"],
+            "batch": par["obs_batch"],
+          }
+        }
+        ```
+    """
+    import yaml
+    import re
+
+    # read output spec from yaml
+    with open(config_file, "r") as object_name:
+        config = yaml.safe_load(object_name)
+
+    output_struct_slots = {}
+
+    # fetch info on which slots should be copied to which file
+    for arg in config["functionality"]["arguments"]:
+        # argument is an output file with a slot specification
+        if arg["direction"] == "output" and arg.get("info", {}).get("slots"):
+            object_name = re.sub("--", "", arg["name"])
+            
+            struct_slots = arg['info']['slots']
+            out = {}
+            for (struct, slots) in struct_slots.items():
+                out_struct = {}
+                for slot in slots:
+                    # if slot_mapping[struct][slot['name']] exists, use that as the source slot name
+                    # otherwise use slot['name']
+                    source_slot = slot_mapping.get(struct, {}).get(slot["name"], slot["name"])
+                    out_struct[slot["name"]] = source_slot
+                out[struct] = out_struct
+
+            output_struct_slots[object_name] = out
+
+    return output_struct_slots
+
+# create new anndata objects according to api spec
+def subset_anndata(adata, slot_info):
+    """Create new anndata object according to slot info specifications.
+    
+    Arguments:
+    adata -- An AnnData object to subset (required)
+    slot_info -- Which slots to retain, typically one of the items in the output of read_config_slots_info.
+      Must be a dictionary whose keys are the names of the AnnData structs, and values is another 
+      dictionary with destination value names as keys and source value names as values. 
+      """
+    import pandas as pd
+    import anndata as ad
+
+    structs = ["layers", "obs", "var", "uns", "obsp", "obsm", "varp", "varm"]
+    kwargs = {}
+
+    for struct in structs:
+        slot_mapping = slot_info.get(struct, {})
+        data = {dest : getattr(adata, struct)[src] for (dest, src) in slot_mapping.items()}
+        if len(data) > 0:
+            if struct in ['obs', 'var']:
+                data = pd.concat(data, axis=1)
+            kwargs[struct] = data
+        elif struct in ['obs', 'var']:
+            # if no columns need to be copied, we still need an 'obs' and a 'var' 
+            # to help determine the shape of the adata
+            kwargs[struct] = getattr(adata, struct).iloc[:,[]]
+
+    return ad.AnnData(**kwargs)
\ No newline at end of file
diff --git a/target/nextflow/label_projection/workflows/process_datasets/.config.vsh.yaml b/target/nextflow/label_projection/workflows/process_datasets/.config.vsh.yaml
new file mode 100644
index 0000000000..3efb26809e
--- /dev/null
+++ b/target/nextflow/label_projection/workflows/process_datasets/.config.vsh.yaml
@@ -0,0 +1,361 @@
+functionality:
+  name: "process_datasets"
+  namespace: "label_projection/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input"
+      info:
+        label: "Common Dataset"
+        summary: "A subset of the common dataset."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "cell_type"
+            description: "Cell type information"
+            required: true
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          var:
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_pca"
+            description: "The resulting PCA embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/common/pancreas/dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_train"
+      info:
+        label: "Training data"
+        summary: "The training data"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "label"
+            description: "Ground truth cell type labels"
+            required: true
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          var:
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_pca"
+            description: "The resulting PCA embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/label_projection/pancreas/train.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_test"
+      info:
+        label: "Test data"
+        summary: "The test data (without labels)"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          var:
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_pca"
+            description: "The resulting PCA embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/label_projection/pancreas/test.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_solution"
+      info:
+        label: "Solution"
+        summary: "The solution for the test data"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "label"
+            description: "Ground truth cell type labels"
+            required: true
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          var:
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_pca"
+            description: "The resulting PCA embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/label_projection/pancreas/solution.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "src/wf_utils/helper.nf"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/check_dataset_schema"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+    configInfo:
+      functionalityName: "check_dataset_schema"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/check_dataset_schema/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+  - name: "label_projection/process_dataset"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/process_dataset/config.vsh.yaml"
+    configInfo:
+      functionalityName: "process_dataset"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/process_dataset/config.vsh.yaml"
+      functionalityNamespace: "label_projection"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/label_projection/process_dataset/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/process_dataset"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/workflows/process_datasets/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/workflows/process_datasets"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/workflows/process_datasets/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/label_projection/workflows/process_datasets/helper.nf b/target/nextflow/label_projection/workflows/process_datasets/helper.nf
new file mode 100644
index 0000000000..7b3acd5b1c
--- /dev/null
+++ b/target/nextflow/label_projection/workflows/process_datasets/helper.nf
@@ -0,0 +1,14 @@
+Map findArgumentSchema(Map config, String argument_id) {
+  def argument_groups =
+    (config.functionality.argument_groups ?: []) +
+    [
+      arguments: config.functionality.arguments ?: []
+    ]
+
+  def schema_value = argument_groups.findResult{ gr ->
+    gr.arguments.find { arg ->
+      arg.name == ("--" + argument_id)
+    }
+  }
+  return schema_value
+}
diff --git a/target/nextflow/label_projection/workflows/process_datasets/main.nf b/target/nextflow/label_projection/workflows/process_datasets/main.nf
new file mode 100644
index 0000000000..9c6166d9f4
--- /dev/null
+++ b/target/nextflow/label_projection/workflows/process_datasets/main.nf
@@ -0,0 +1,3501 @@
+// process_datasets 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_datasets",
+    "namespace" : "label_projection/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input",
+            "info" : {
+              "label" : "Common Dataset",
+              "summary" : "A subset of the common dataset.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Cell type information",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_pca",
+                    "description" : "The resulting PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/pancreas/dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_train",
+            "info" : {
+              "label" : "Training data",
+              "summary" : "The training data",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "label",
+                    "description" : "Ground truth cell type labels",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_pca",
+                    "description" : "The resulting PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/label_projection/pancreas/train.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_test",
+            "info" : {
+              "label" : "Test data",
+              "summary" : "The test data (without labels)",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_pca",
+                    "description" : "The resulting PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/label_projection/pancreas/test.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_solution",
+            "info" : {
+              "label" : "Solution",
+              "summary" : "The solution for the test data",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "label",
+                    "description" : "Ground truth cell type labels",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_pca",
+                    "description" : "The resulting PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/label_projection/pancreas/solution.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/label_projection/workflows/process_datasets/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "src/wf_utils/helper.nf",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/check_dataset_schema",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "check_dataset_schema",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/check_dataset_schema/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+      },
+      {
+        "name" : "label_projection/process_dataset",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/process_dataset/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "process_dataset",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/process_dataset/config.vsh.yaml",
+          "functionalityNamespace" : "label_projection",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/label_projection/process_dataset/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/process_dataset"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/workflows/process_datasets/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/workflows/process_datasets",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { check_dataset_schema } from "${meta.resources_dir}/../../../../nextflow/common/check_dataset_schema/main.nf"
+include { process_dataset } from "${meta.resources_dir}/../../../../nextflow/label_projection/process_dataset/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    | check_dataset_schema.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset": checks["exit_code"] == 0 ? state.input : null,
+        ]
+      }
+    )
+
+    // remove datasets which didn't pass the schema check
+    | filter { id, state ->
+      state.dataset != null
+    }
+
+    | process_dataset.run(
+      fromState: [ input: "dataset" ],
+      toState: [
+        output_train: "output_train",
+        output_test: "output_test",
+        output_solution: "output_solution"
+      ]
+    )
+
+    // only output the files for which an output file was specified
+    | setState(["output_train", "output_test", "output_solution"])
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/label_projection/workflows/process_datasets/nextflow.config b/target/nextflow/label_projection/workflows/process_datasets/nextflow.config
new file mode 100644
index 0000000000..b018d24439
--- /dev/null
+++ b/target/nextflow/label_projection/workflows/process_datasets/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'label_projection/workflows/process_datasets'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/label_projection/workflows/run_benchmark/.config.vsh.yaml b/target/nextflow/label_projection/workflows/run_benchmark/.config.vsh.yaml
new file mode 100644
index 0000000000..f10ed3eb5d
--- /dev/null
+++ b/target/nextflow/label_projection/workflows/run_benchmark/.config.vsh.yaml
@@ -0,0 +1,540 @@
+functionality:
+  name: "run_benchmark"
+  namespace: "label_projection/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input_train"
+      info:
+        label: "Training data"
+        summary: "The training data"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "label"
+            description: "Ground truth cell type labels"
+            required: true
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          var:
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_pca"
+            description: "The resulting PCA embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/label_projection/pancreas/train.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_test"
+      info:
+        label: "Test data"
+        summary: "The test data (without labels)"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          var:
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_pca"
+            description: "The resulting PCA embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/label_projection/pancreas/test.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_solution"
+      info:
+        label: "Solution"
+        summary: "The solution for the test data"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "label"
+            description: "Ground truth cell type labels"
+            required: true
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          var:
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_pca"
+            description: "The resulting PCA embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/label_projection/pancreas/solution.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_scores"
+      description: "A yaml file containing the scores of each of the methods"
+      info: null
+      default:
+      - "score_uns.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_method_configs"
+      info: null
+      default:
+      - "method_configs.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_metric_configs"
+      info: null
+      default:
+      - "metric_configs.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_dataset_info"
+      info: null
+      default:
+      - "dataset_uns.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_task_info"
+      info: null
+      default:
+      - "task_info.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Methods"
+    arguments:
+    - type: "string"
+      name: "--method_ids"
+      description: "A list of method ids to run. If not specified, all methods will\
+        \ be run."
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "../../api/task_info.yaml"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/check_dataset_schema"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+    configInfo:
+      functionalityName: "check_dataset_schema"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/check_dataset_schema/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+  - name: "common/extract_metadata"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+    configInfo:
+      functionalityName: "extract_metadata"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/extract_metadata/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  - name: "label_projection/control_methods/true_labels"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/true_labels/config.vsh.yaml"
+    configInfo:
+      functionalityName: "true_labels"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/true_labels/config.vsh.yaml"
+      functionalityNamespace: "label_projection/control_methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/label_projection/control_methods/true_labels/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/control_methods/true_labels"
+  - name: "label_projection/control_methods/majority_vote"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/majority_vote/config.vsh.yaml"
+    configInfo:
+      functionalityName: "majority_vote"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/majority_vote/config.vsh.yaml"
+      functionalityNamespace: "label_projection/control_methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/label_projection/control_methods/majority_vote/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/control_methods/majority_vote"
+  - name: "label_projection/control_methods/random_labels"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/random_labels/config.vsh.yaml"
+    configInfo:
+      functionalityName: "random_labels"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/random_labels/config.vsh.yaml"
+      functionalityNamespace: "label_projection/control_methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/label_projection/control_methods/random_labels/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/control_methods/random_labels"
+  - name: "label_projection/methods/knn"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/knn/config.vsh.yaml"
+    configInfo:
+      functionalityName: "knn"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/knn/config.vsh.yaml"
+      functionalityNamespace: "label_projection/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/label_projection/methods/knn/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/knn"
+  - name: "label_projection/methods/logistic_regression"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/logistic_regression/config.vsh.yaml"
+    configInfo:
+      functionalityName: "logistic_regression"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/logistic_regression/config.vsh.yaml"
+      functionalityNamespace: "label_projection/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/label_projection/methods/logistic_regression/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/logistic_regression"
+  - name: "label_projection/methods/mlp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/mlp/config.vsh.yaml"
+    configInfo:
+      functionalityName: "mlp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/mlp/config.vsh.yaml"
+      functionalityNamespace: "label_projection/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/label_projection/methods/mlp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/mlp"
+  - name: "label_projection/methods/scanvi"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/scanvi/config.vsh.yaml"
+    configInfo:
+      functionalityName: "scanvi"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/scanvi/config.vsh.yaml"
+      functionalityNamespace: "label_projection/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/label_projection/methods/scanvi/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/scanvi"
+  - name: "label_projection/methods/scanvi_scarches"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/scanvi_scarches/config.vsh.yaml"
+    configInfo:
+      functionalityName: "scanvi_scarches"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/scanvi_scarches/config.vsh.yaml"
+      functionalityNamespace: "label_projection/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/label_projection/methods/scanvi_scarches/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/scanvi_scarches"
+  - name: "label_projection/methods/xgboost"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/xgboost/config.vsh.yaml"
+    configInfo:
+      functionalityName: "xgboost"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/xgboost/config.vsh.yaml"
+      functionalityNamespace: "label_projection/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/label_projection/methods/xgboost/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/xgboost"
+  - name: "label_projection/metrics/accuracy"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/metrics/accuracy/config.vsh.yaml"
+    configInfo:
+      functionalityName: "accuracy"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/metrics/accuracy/config.vsh.yaml"
+      functionalityNamespace: "label_projection/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/label_projection/metrics/accuracy/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/metrics/accuracy"
+  - name: "label_projection/metrics/f1"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/metrics/f1/config.vsh.yaml"
+    configInfo:
+      functionalityName: "f1"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/metrics/f1/config.vsh.yaml"
+      functionalityNamespace: "label_projection/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/label_projection/metrics/f1/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/metrics/f1"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/workflows/run_benchmark/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/workflows/run_benchmark"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/workflows/run_benchmark/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/label_projection/workflows/run_benchmark/main.nf b/target/nextflow/label_projection/workflows/run_benchmark/main.nf
new file mode 100644
index 0000000000..eab193b647
--- /dev/null
+++ b/target/nextflow/label_projection/workflows/run_benchmark/main.nf
@@ -0,0 +1,3854 @@
+// run_benchmark 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "run_benchmark",
+    "namespace" : "label_projection/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input_train",
+            "info" : {
+              "label" : "Training data",
+              "summary" : "The training data",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "label",
+                    "description" : "Ground truth cell type labels",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_pca",
+                    "description" : "The resulting PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/label_projection/pancreas/train.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_test",
+            "info" : {
+              "label" : "Test data",
+              "summary" : "The test data (without labels)",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_pca",
+                    "description" : "The resulting PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/label_projection/pancreas/test.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_solution",
+            "info" : {
+              "label" : "Solution",
+              "summary" : "The solution for the test data",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "label",
+                    "description" : "Ground truth cell type labels",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_pca",
+                    "description" : "The resulting PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/label_projection/pancreas/solution.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_scores",
+            "description" : "A yaml file containing the scores of each of the methods",
+            "default" : [
+              "score_uns.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_method_configs",
+            "default" : [
+              "method_configs.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_metric_configs",
+            "default" : [
+              "metric_configs.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_dataset_info",
+            "default" : [
+              "dataset_uns.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_task_info",
+            "default" : [
+              "task_info.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Methods",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--method_ids",
+            "description" : "A list of method ids to run. If not specified, all methods will be run.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/label_projection/workflows/run_benchmark/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "../../api/task_info.yaml",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/label_projection/workflows/run_benchmark/"
+      }
+    ],
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/check_dataset_schema",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "check_dataset_schema",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/check_dataset_schema/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+      },
+      {
+        "name" : "common/extract_metadata",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "extract_metadata",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/extract_metadata/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+      },
+      {
+        "name" : "label_projection/control_methods/true_labels",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/true_labels/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "true_labels",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/true_labels/config.vsh.yaml",
+          "functionalityNamespace" : "label_projection/control_methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/label_projection/control_methods/true_labels/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/control_methods/true_labels"
+      },
+      {
+        "name" : "label_projection/control_methods/majority_vote",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/majority_vote/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "majority_vote",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/majority_vote/config.vsh.yaml",
+          "functionalityNamespace" : "label_projection/control_methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/label_projection/control_methods/majority_vote/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/control_methods/majority_vote"
+      },
+      {
+        "name" : "label_projection/control_methods/random_labels",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/random_labels/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "random_labels",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/control_methods/random_labels/config.vsh.yaml",
+          "functionalityNamespace" : "label_projection/control_methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/label_projection/control_methods/random_labels/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/control_methods/random_labels"
+      },
+      {
+        "name" : "label_projection/methods/knn",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/knn/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "knn",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/knn/config.vsh.yaml",
+          "functionalityNamespace" : "label_projection/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/label_projection/methods/knn/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/knn"
+      },
+      {
+        "name" : "label_projection/methods/logistic_regression",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/logistic_regression/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "logistic_regression",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/logistic_regression/config.vsh.yaml",
+          "functionalityNamespace" : "label_projection/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/label_projection/methods/logistic_regression/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/logistic_regression"
+      },
+      {
+        "name" : "label_projection/methods/mlp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/mlp/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "mlp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/mlp/config.vsh.yaml",
+          "functionalityNamespace" : "label_projection/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/label_projection/methods/mlp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/mlp"
+      },
+      {
+        "name" : "label_projection/methods/scanvi",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/scanvi/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "scanvi",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/scanvi/config.vsh.yaml",
+          "functionalityNamespace" : "label_projection/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/label_projection/methods/scanvi/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/scanvi"
+      },
+      {
+        "name" : "label_projection/methods/scanvi_scarches",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/scanvi_scarches/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "scanvi_scarches",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/scanvi_scarches/config.vsh.yaml",
+          "functionalityNamespace" : "label_projection/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/label_projection/methods/scanvi_scarches/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/scanvi_scarches"
+      },
+      {
+        "name" : "label_projection/methods/xgboost",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/xgboost/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "xgboost",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/methods/xgboost/config.vsh.yaml",
+          "functionalityNamespace" : "label_projection/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/label_projection/methods/xgboost/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/methods/xgboost"
+      },
+      {
+        "name" : "label_projection/metrics/accuracy",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/metrics/accuracy/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "accuracy",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/metrics/accuracy/config.vsh.yaml",
+          "functionalityNamespace" : "label_projection/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/label_projection/metrics/accuracy/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/metrics/accuracy"
+      },
+      {
+        "name" : "label_projection/metrics/f1",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/metrics/f1/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "f1",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/metrics/f1/config.vsh.yaml",
+          "functionalityNamespace" : "label_projection/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/label_projection/metrics/f1/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/metrics/f1"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/label_projection/workflows/run_benchmark/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/label_projection/workflows/run_benchmark",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { check_dataset_schema } from "${meta.resources_dir}/../../../../nextflow/common/check_dataset_schema/main.nf"
+include { extract_metadata } from "${meta.resources_dir}/../../../../nextflow/common/extract_metadata/main.nf"
+include { true_labels } from "${meta.resources_dir}/../../../../nextflow/label_projection/control_methods/true_labels/main.nf"
+include { majority_vote } from "${meta.resources_dir}/../../../../nextflow/label_projection/control_methods/majority_vote/main.nf"
+include { random_labels } from "${meta.resources_dir}/../../../../nextflow/label_projection/control_methods/random_labels/main.nf"
+include { knn } from "${meta.resources_dir}/../../../../nextflow/label_projection/methods/knn/main.nf"
+include { logistic_regression } from "${meta.resources_dir}/../../../../nextflow/label_projection/methods/logistic_regression/main.nf"
+include { mlp } from "${meta.resources_dir}/../../../../nextflow/label_projection/methods/mlp/main.nf"
+include { scanvi } from "${meta.resources_dir}/../../../../nextflow/label_projection/methods/scanvi/main.nf"
+include { scanvi_scarches } from "${meta.resources_dir}/../../../../nextflow/label_projection/methods/scanvi_scarches/main.nf"
+include { xgboost } from "${meta.resources_dir}/../../../../nextflow/label_projection/methods/xgboost/main.nf"
+include { accuracy } from "${meta.resources_dir}/../../../../nextflow/label_projection/metrics/accuracy/main.nf"
+include { f1 } from "${meta.resources_dir}/../../../../nextflow/label_projection/metrics/f1/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // construct list of methods
+  methods = [
+    true_labels,
+    majority_vote,
+    random_labels,
+    knn,
+    logistic_regression,
+    mlp,
+    scanvi,
+    scanvi_scarches,
+    // seurat_transferdata,
+    xgboost
+  ]
+
+  // construct list of metrics
+  metrics = [
+    accuracy,
+    f1
+  ]
+
+  /****************************
+   * EXTRACT DATASET METADATA *
+   ****************************/
+  dataset_ch = input_ch
+    // store join id
+    | map{ id, state -> 
+      [id, state + ["_meta": [join_id: id]]]
+    }
+
+    // extract the dataset metadata
+    | extract_metadata.run(
+      fromState: [input: "input_solution"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+  /***************************
+   * RUN METHODS AND METRICS *
+   ***************************/
+  score_ch = dataset_ch
+
+    // run all methods
+    | runEach(
+      components: methods,
+
+      // use the 'filter' argument to only run a method on the normalisation the component is asking for
+      filter: { id, state, comp ->
+        def norm = state.dataset_uns.normalization_id
+        def pref = comp.config.functionality.info.preferred_normalization
+        // if the preferred normalisation is none at all,
+        // we can pass whichever dataset we want
+        def norm_check = (norm == "log_cp10k" && pref == "counts") || norm == pref
+        def method_check = !state.method_ids || state.method_ids.contains(comp.config.functionality.name)
+
+        method_check && norm_check
+      },
+
+      // define a new 'id' by appending the method name to the dataset id
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: { id, state, comp ->
+        def new_args = [
+          input_train: state.input_train,
+          input_test: state.input_test
+        ]
+        if (comp.config.functionality.info.type == "control_method") {
+          new_args.input_solution = state.input_solution
+        }
+        new_args
+      },
+
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          method_id: comp.config.functionality.name,
+          method_output: output.output
+        ]
+      }
+    )
+
+    // run all metrics
+    | runEach(
+      components: metrics,
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: [
+        input_solution: "input_solution", 
+        input_prediction: "method_output"
+      ],
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          metric_id: comp.config.functionality.name,
+          metric_output: output.output
+        ]
+      }
+    )
+
+
+  /******************************
+   * GENERATE OUTPUT YAML FILES *
+   ******************************/
+  // TODO: can we store everything below in a separate helper function?
+
+  // extract the dataset metadata
+  dataset_meta_ch = dataset_ch
+    // only keep one of the normalization methods
+    | filter{ id, state ->
+      state.dataset_uns.normalization_id == "log_cp10k"
+    }
+    | joinStates { ids, states ->
+      // store the dataset metadata in a file
+      def dataset_uns = states.collect{state ->
+        def uns = state.dataset_uns.clone()
+        uns.remove("normalization_id")
+        uns
+      }
+      def dataset_uns_yaml_blob = toYamlBlob(dataset_uns)
+      def dataset_uns_file = tempFile("dataset_uns.yaml")
+      dataset_uns_file.write(dataset_uns_yaml_blob)
+
+      ["output", [output_dataset_info: dataset_uns_file]]
+    }
+
+  output_ch = score_ch
+
+    // extract the scores
+    | extract_metadata.run(
+      key: "extract_scores",
+      fromState: [input: "metric_output"],
+      toState: { id, output, state ->
+        state + [
+          score_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | joinStates { ids, states ->
+      // store the method configs in a file
+      def method_configs = methods.collect{it.config}
+      def method_configs_yaml_blob = toYamlBlob(method_configs)
+      def method_configs_file = tempFile("method_configs.yaml")
+      method_configs_file.write(method_configs_yaml_blob)
+
+      // store the metric configs in a file
+      def metric_configs = metrics.collect{it.config}
+      def metric_configs_yaml_blob = toYamlBlob(metric_configs)
+      def metric_configs_file = tempFile("metric_configs.yaml")
+      metric_configs_file.write(metric_configs_yaml_blob)
+
+      def task_info_file = meta.resources_dir.resolve("task_info.yaml")
+
+      // store the scores in a file
+      def score_uns = states.collect{it.score_uns}
+      def score_uns_yaml_blob = toYamlBlob(score_uns)
+      def score_uns_file = tempFile("score_uns.yaml")
+      score_uns_file.write(score_uns_yaml_blob)
+
+      def new_state = [
+        output_method_configs: method_configs_file,
+        output_metric_configs: metric_configs_file,
+        output_task_info: task_info_file,
+        output_scores: score_uns_file,
+        _meta: states[0]._meta
+      ]
+
+      ["output", new_state]
+    }
+
+    // merge all of the output data 
+    | mix(dataset_meta_ch)
+    | joinStates{ ids, states ->
+      def mergedStates = states.inject([:]) { acc, m -> acc + m }
+      [ids[0], mergedStates]
+    }
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/label_projection/workflows/run_benchmark/nextflow.config b/target/nextflow/label_projection/workflows/run_benchmark/nextflow.config
new file mode 100644
index 0000000000..5b7ce4c012
--- /dev/null
+++ b/target/nextflow/label_projection/workflows/run_benchmark/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'label_projection/workflows/run_benchmark'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/label_projection/workflows/run_benchmark/task_info.yaml b/target/nextflow/label_projection/workflows/run_benchmark/task_info.yaml
new file mode 100644
index 0000000000..07b6b0120d
--- /dev/null
+++ b/target/nextflow/label_projection/workflows/run_benchmark/task_info.yaml
@@ -0,0 +1,46 @@
+name: label_projection
+label: Label projection
+v1:
+  path: openproblems/tasks/label_projection/README.md
+  commit: 817ea64a526c7251f74c9a7a6dba98e8602b94a8
+summary: Automated cell type annotation from rich, labeled reference data
+image: "thumbnail.svg"
+motivation: |
+  A major challenge for integrating single cell datasets is creating matching
+  cell type annotations for each cell. One of the most common strategies for
+  annotating cell types is referred to as
+  ["cluster-then-annotate"](https://www.nature.com/articles/s41576-018-0088-9)
+  whereby cells are aggregated into clusters based on feature similarity and
+  then manually characterized based on differential gene expression or previously
+  identified marker genes. Recently, methods have emerged to build on this
+  strategy and annotate cells using
+  [known marker genes](https://www.nature.com/articles/s41592-019-0535-3).
+  However, these strategies pose a difficulty for integrating atlas-scale
+  datasets as the particular annotations may not match.
+description: |
+  To ensure that the cell type labels in newly generated datasets match
+  existing reference datasets, some methods align cells to a previously
+  annotated [reference dataset](https://academic.oup.com/bioinformatics/article/35/22/4688/54802990)
+  and then _project_ labels from the reference to the new dataset.
+
+  Here, we compare methods for annotation based on a reference dataset.
+  The datasets consist of two or more samples of single cell profiles that
+  have been manually annotated with matching labels. These datasets are then
+  split into training and test batches, and the task of each method is to
+  train a cell type classifer on the training set and project those labels
+  onto the test set.
+authors:
+  - name: "Nikolay Markov"
+    roles: [ author, maintainer ]
+    info:
+      github: mxposed
+  - name: "Scott Gigante"
+    roles: [ author ]
+    info:
+      github: scottgigante
+      orcid: "0000-0002-4544-2764"
+  - name: Robrecht Cannoodt
+    roles: [ author ]
+    info:
+      github: rcannood
+      orcid: "0000-0003-3641-729X"
\ No newline at end of file
diff --git a/target/nextflow/match_modalities/control_methods/random_features/.config.vsh.yaml b/target/nextflow/match_modalities/control_methods/random_features/.config.vsh.yaml
new file mode 100644
index 0000000000..a9f3ffee2b
--- /dev/null
+++ b/target/nextflow/match_modalities/control_methods/random_features/.config.vsh.yaml
@@ -0,0 +1,376 @@
+functionality:
+  name: "random_features"
+  namespace: "match_modalities/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_mod1"
+    info:
+      label: "Modality 1"
+      summary: "The first modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Modality 2"
+      summary: "The second modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution_mod1"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the first modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution_mod2"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the second modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod1"
+    info:
+      label: "Integrated mod1"
+      summary: "The integrated embedding for the first modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod2"
+    info:
+      label: "Integrated mod2"
+      summary: "The integrated embedding for the second modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/match_modalities/scicar_cell_lines"
+    dest: "resources_test/match_modalities/scicar_cell_lines"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Random Features"
+    summary: "Randomly permutated features"
+    description: "\"Randomly permuted twice, once for use as the output for each modality,\
+      \ producing random features with no correlation between modalities.\"\n"
+    preferred_normalization: "log_cp10k"
+    v1:
+      path: "openproblems/tasks/matching_modalities/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "A multimodal data integration control method."
+      description: "This folder contains control components for the task. \nThese\
+        \ components have the same interface as the regular methods\nbut also receive\
+        \ the solution object as input. It serves as a\nstarting point to test the\
+        \ relative accuracy of new methods in\nthe task, and also as a quality control\
+        \ for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/control_methods/random_features/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/control_methods/random_features"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/control_methods/random_features/random_features"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/match_modalities/control_methods/random_features/main.nf b/target/nextflow/match_modalities/control_methods/random_features/main.nf
new file mode 100644
index 0000000000..0c84ec714f
--- /dev/null
+++ b/target/nextflow/match_modalities/control_methods/random_features/main.nf
@@ -0,0 +1,3851 @@
+// random_features 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "random_features",
+    "namespace" : "match_modalities/control_methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_mod1",
+        "info" : {
+          "label" : "Modality 1",
+          "summary" : "The first modality of a multimodal dataset. The cells of this dataset are randomly permuted.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_mod2",
+        "info" : {
+          "label" : "Modality 2",
+          "summary" : "The second modality of a multimodal dataset. The cells of this dataset are randomly permuted.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution_mod1",
+        "info" : {
+          "label" : "Solution mod1",
+          "summary" : "The ground truth information for the first modality",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "integer",
+                "name" : "permutation_indices",
+                "description" : "Indices with which to revert the permutation of the cells",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution_mod2",
+        "info" : {
+          "label" : "Solution mod1",
+          "summary" : "The ground truth information for the second modality",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "integer",
+                "name" : "permutation_indices",
+                "description" : "Indices with which to revert the permutation of the cells",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_mod1",
+        "info" : {
+          "label" : "Integrated mod1",
+          "summary" : "The integrated embedding for the first modality",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "integrated",
+                "description" : "An integrated embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "Which method was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_mod2",
+        "info" : {
+          "label" : "Integrated mod2",
+          "summary" : "The integrated embedding for the second modality",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "integrated",
+                "description" : "An integrated embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "Which method was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/control_methods/random_features/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/match_modalities/scicar_cell_lines",
+        "dest" : "resources_test/match_modalities/scicar_cell_lines",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Random Features",
+      "summary" : "Randomly permutated features",
+      "description" : "\\"Randomly permuted twice, once for use as the output for each modality, producing random features with no correlation between modalities.\\"\n",
+      "preferred_normalization" : "log_cp10k",
+      "v1" : {
+        "path" : "openproblems/tasks/matching_modalities/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "type" : "control_method",
+      "type_info" : {
+        "label" : "Control method",
+        "summary" : "A multimodal data integration control method.",
+        "description" : "This folder contains control components for the task. \nThese components have the same interface as the regular methods\nbut also receive the solution object as input. It serves as a\nstarting point to test the relative accuracy of new methods in\nthe task, and also as a quality control for the metrics defined\nin the task. \n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "numpy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/control_methods/random_features/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/control_methods/random_features",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Reading input h5ad file", flush=True)
+adata_mod1 = ad.read_h5ad(par["input_mod1"])
+adata_mod2 = ad.read_h5ad(par["input_mod2"])
+
+print("Generating random features", flush=True)
+# todo: do we actually need to permute this once more
+adata_mod1.obsm["integrated"] = adata_mod1.obsm["X_svd"][np.random.permutation(np.arange(adata_mod1.shape[0]))]
+adata_mod2.obsm["integrated"] = adata_mod1.obsm["X_svd"][np.random.permutation(np.arange(adata_mod1.shape[0]))]
+
+print("Write output to file", flush=True)
+adata_mod1.uns["method_id"] = meta["functionality_name"]
+adata_mod2.uns["method_id"] = meta["functionality_name"]
+adata_mod1.write_h5ad(par["output_mod1"], compression="gzip")
+adata_mod2.write_h5ad(par["output_mod2"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/match_modalities/control_methods/random_features",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/match_modalities/control_methods/random_features/nextflow.config b/target/nextflow/match_modalities/control_methods/random_features/nextflow.config
new file mode 100644
index 0000000000..8ee270eda3
--- /dev/null
+++ b/target/nextflow/match_modalities/control_methods/random_features/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'match_modalities/control_methods/random_features'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/match_modalities/control_methods/true_features/.config.vsh.yaml b/target/nextflow/match_modalities/control_methods/true_features/.config.vsh.yaml
new file mode 100644
index 0000000000..8668ea198c
--- /dev/null
+++ b/target/nextflow/match_modalities/control_methods/true_features/.config.vsh.yaml
@@ -0,0 +1,369 @@
+functionality:
+  name: "true_features"
+  namespace: "match_modalities/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_mod1"
+    info:
+      label: "Modality 1"
+      summary: "The first modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Modality 2"
+      summary: "The second modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution_mod1"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the first modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution_mod2"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the second modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod1"
+    info:
+      label: "Integrated mod1"
+      summary: "The integrated embedding for the first modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod2"
+    info:
+      label: "Integrated mod2"
+      summary: "The integrated embedding for the second modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/match_modalities/scicar_cell_lines"
+    dest: "resources_test/match_modalities/scicar_cell_lines"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "True Features"
+    summary: "A 1 to 1 mapping of features between modalities"
+    description: "\"use the same features for both modalities\"\n"
+    preferred_normalization: "log_cp10k"
+    v1:
+      path: "openproblems/tasks/matching_modalities/methods/baseline.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "A multimodal data integration control method."
+      description: "This folder contains control components for the task. \nThese\
+        \ components have the same interface as the regular methods\nbut also receive\
+        \ the solution object as input. It serves as a\nstarting point to test the\
+        \ relative accuracy of new methods in\nthe task, and also as a quality control\
+        \ for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/control_methods/true_features/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/control_methods/true_features"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/control_methods/true_features/true_features"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/match_modalities/control_methods/true_features/main.nf b/target/nextflow/match_modalities/control_methods/true_features/main.nf
new file mode 100644
index 0000000000..d52cdaa59e
--- /dev/null
+++ b/target/nextflow/match_modalities/control_methods/true_features/main.nf
@@ -0,0 +1,3869 @@
+// true_features 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "true_features",
+    "namespace" : "match_modalities/control_methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_mod1",
+        "info" : {
+          "label" : "Modality 1",
+          "summary" : "The first modality of a multimodal dataset. The cells of this dataset are randomly permuted.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_mod2",
+        "info" : {
+          "label" : "Modality 2",
+          "summary" : "The second modality of a multimodal dataset. The cells of this dataset are randomly permuted.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution_mod1",
+        "info" : {
+          "label" : "Solution mod1",
+          "summary" : "The ground truth information for the first modality",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "integer",
+                "name" : "permutation_indices",
+                "description" : "Indices with which to revert the permutation of the cells",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution_mod2",
+        "info" : {
+          "label" : "Solution mod1",
+          "summary" : "The ground truth information for the second modality",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "integer",
+                "name" : "permutation_indices",
+                "description" : "Indices with which to revert the permutation of the cells",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_mod1",
+        "info" : {
+          "label" : "Integrated mod1",
+          "summary" : "The integrated embedding for the first modality",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "integrated",
+                "description" : "An integrated embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "Which method was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_mod2",
+        "info" : {
+          "label" : "Integrated mod2",
+          "summary" : "The integrated embedding for the second modality",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "integrated",
+                "description" : "An integrated embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "Which method was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/control_methods/true_features/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/match_modalities/scicar_cell_lines",
+        "dest" : "resources_test/match_modalities/scicar_cell_lines",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "True Features",
+      "summary" : "A 1 to 1 mapping of features between modalities",
+      "description" : "\\"use the same features for both modalities\\"\n",
+      "preferred_normalization" : "log_cp10k",
+      "v1" : {
+        "path" : "openproblems/tasks/matching_modalities/methods/baseline.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "type" : "control_method",
+      "type_info" : {
+        "label" : "Control method",
+        "summary" : "A multimodal data integration control method.",
+        "description" : "This folder contains control components for the task. \nThese components have the same interface as the regular methods\nbut also receive the solution object as input. It serves as a\nstarting point to test the relative accuracy of new methods in\nthe task, and also as a quality control for the metrics defined\nin the task. \n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/control_methods/true_features/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/control_methods/true_features",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Reading input h5ad file", flush=True)
+adata_mod1 = ad.read_h5ad(par["input_mod1"])
+adata_mod2 = ad.read_h5ad(par["input_mod2"])
+
+solution_mod1 = ad.read_h5ad(par["input_solution_mod1"])
+solution_mod2 = ad.read_h5ad(par["input_solution_mod2"])
+
+print("Storing true features", flush=True)
+output_mod1 = ad.AnnData(
+  obs=adata_mod1.obs[[]],
+  var=adata_mod1.var[[]],
+  obsm={
+    "integrated": adata_mod1.obsm["X_svd"]
+  },
+  uns={
+    "dataset_id": adata_mod1.uns["dataset_id"],
+    "normalization_id": adata_mod1.uns["normalization_id"],
+    "method_id": meta["functionality_name"]
+  }
+)
+
+# Permutate mod1 according to mod2
+mod2_obsm = adata_mod1.obsm["X_svd"][solution_mod1.obs["permutation_indices"]]
+reverse_indices_mod2 = np.argsort(solution_mod2.obs["permutation_indices"])
+mod2_obsm = mod2_obsm[reverse_indices_mod2]
+
+output_mod2 = ad.AnnData(
+  obs=adata_mod2.obs[[]],
+  var=adata_mod2.var[[]],
+  obsm={
+    "integrated": mod2_obsm
+  },
+  uns={
+    "dataset_id": adata_mod2.uns["dataset_id"],
+    "normalization_id": adata_mod2.uns["normalization_id"],
+    "method_id": meta["functionality_name"]
+  }
+)
+
+print("Write output to file", flush=True)
+output_mod1.write_h5ad(par["output_mod1"], compression="gzip")
+output_mod2.write_h5ad(par["output_mod2"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/match_modalities/control_methods/true_features",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/match_modalities/control_methods/true_features/nextflow.config b/target/nextflow/match_modalities/control_methods/true_features/nextflow.config
new file mode 100644
index 0000000000..9345fcaac6
--- /dev/null
+++ b/target/nextflow/match_modalities/control_methods/true_features/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'match_modalities/control_methods/true_features'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/match_modalities/methods/fastmnn/.config.vsh.yaml b/target/nextflow/match_modalities/methods/fastmnn/.config.vsh.yaml
new file mode 100644
index 0000000000..1d22dd848c
--- /dev/null
+++ b/target/nextflow/match_modalities/methods/fastmnn/.config.vsh.yaml
@@ -0,0 +1,241 @@
+functionality:
+  name: "fastmnn"
+  namespace: "match_modalities/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_mod1"
+    info:
+      label: "Modality 1"
+      summary: "The first modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Modality 2"
+      summary: "The second modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod1"
+    info:
+      label: "Integrated mod1"
+      summary: "The integrated embedding for the first modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod2"
+    info:
+      label: "Integrated mod2"
+      summary: "The integrated embedding for the second modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/match_modalities/scicar_cell_lines"
+    dest: "resources_test/match_modalities/scicar_cell_lines"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "fastMNN"
+    summary: "A simpler version of the original mnnCorrect algorithm."
+    description: "FastMNN is a simplified version of the mnnCorrect algorithm. Both\
+      \ use Mutual Nearest Neighbors to integrate multimodal single-cell data.\n"
+    preferred_normalization: "log_cp10k"
+    variants:
+      mnn_log_cp10k: null
+      mnn_log_scran_pooling:
+        preferred_normalization: "log_scran_pooling"
+    reference: "haghverdi2018batch"
+    repository_url: "https://github.com/LTLA/batchelor"
+    documentation_url: "https://github.com/LTLA/batchelor#readme"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A multimodal data integration method."
+      description: "A multimodal method to integrate data.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    bioc:
+    - "batchelor"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/fastmnn/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/fastmnn"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/fastmnn/fastmnn"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/match_modalities/methods/fastmnn/main.nf b/target/nextflow/match_modalities/methods/fastmnn/main.nf
new file mode 100644
index 0000000000..d6def5284f
--- /dev/null
+++ b/target/nextflow/match_modalities/methods/fastmnn/main.nf
@@ -0,0 +1,3671 @@
+// fastmnn 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "fastmnn",
+    "namespace" : "match_modalities/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_mod1",
+        "info" : {
+          "label" : "Modality 1",
+          "summary" : "The first modality of a multimodal dataset. The cells of this dataset are randomly permuted.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_mod2",
+        "info" : {
+          "label" : "Modality 2",
+          "summary" : "The second modality of a multimodal dataset. The cells of this dataset are randomly permuted.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_mod1",
+        "info" : {
+          "label" : "Integrated mod1",
+          "summary" : "The integrated embedding for the first modality",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "integrated",
+                "description" : "An integrated embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "Which method was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_mod2",
+        "info" : {
+          "label" : "Integrated mod2",
+          "summary" : "The integrated embedding for the second modality",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "integrated",
+                "description" : "An integrated embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "Which method was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/fastmnn/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/match_modalities/scicar_cell_lines",
+        "dest" : "resources_test/match_modalities/scicar_cell_lines",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "fastMNN",
+      "summary" : "A simpler version of the original mnnCorrect algorithm.",
+      "description" : "FastMNN is a simplified version of the mnnCorrect algorithm. Both use Mutual Nearest Neighbors to integrate multimodal single-cell data.\n",
+      "preferred_normalization" : "log_cp10k",
+      "variants" : {
+        "mnn_log_scran_pooling" : {
+          "preferred_normalization" : "log_scran_pooling"
+        }
+      },
+      "reference" : "haghverdi2018batch",
+      "repository_url" : "https://github.com/LTLA/batchelor",
+      "documentation_url" : "https://github.com/LTLA/batchelor#readme",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A multimodal data integration method.",
+        "description" : "A multimodal method to integrate data.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "bioc" : [
+            "batchelor"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/fastmnn/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/fastmnn",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+library(anndata, warn.conflicts = FALSE)
+library(Matrix, warn.conflicts = FALSE)
+requireNamespace("batchelor", quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_MOD1" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_MOD2" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output_mod1" = $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT_MOD1" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output_mod2" = $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT_MOD2" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading input h5ad file\\\\n")
+adata_mod1 <- read_h5ad(par\\$input_mod1)
+adata_mod2 <- read_h5ad(par\\$input_mod2)
+
+cat("Running MNN\\\\n")
+sce_mnn <- batchelor::fastMNN(
+  t(adata_mod1\\$obsm[["X_svd"]]),
+  t(adata_mod2\\$obsm[["X_svd"]])
+)
+
+cat("Storing output\\\\n")
+combined_recons <- t(SummarizedExperiment::assay(sce_mnn, "reconstructed"))
+mode1_recons <- combined_recons[seq_len(nrow(adata_mod1\\$obsm[["X_svd"]])), , drop = FALSE]
+mode2_recons <- combined_recons[-seq_len(nrow(adata_mod1\\$obsm[["X_svd"]])), , drop = FALSE]
+
+adata_mod1\\$obsm[["integrated"]] <- as.matrix(mode1_recons)
+adata_mod2\\$obsm[["integrated"]] <- as.matrix(mode2_recons)
+
+cat("Writing to file\\\\n")
+adata_mod1\\$uns["method_id"] <- meta\\$functionality_name
+adata_mod2\\$uns["method_id"] <- meta\\$functionality_name
+
+yyy <- adata_mod1\\$write_h5ad(par\\$output_mod1, compression = "gzip")
+zzz <- adata_mod2\\$write_h5ad(par\\$output_mod2, compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/match_modalities/methods/fastmnn",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/match_modalities/methods/fastmnn/nextflow.config b/target/nextflow/match_modalities/methods/fastmnn/nextflow.config
new file mode 100644
index 0000000000..024083e0aa
--- /dev/null
+++ b/target/nextflow/match_modalities/methods/fastmnn/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'match_modalities/methods/fastmnn'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/match_modalities/methods/harmonic_alignment/.config.vsh.yaml b/target/nextflow/match_modalities/methods/harmonic_alignment/.config.vsh.yaml
new file mode 100644
index 0000000000..5c08e33294
--- /dev/null
+++ b/target/nextflow/match_modalities/methods/harmonic_alignment/.config.vsh.yaml
@@ -0,0 +1,266 @@
+functionality:
+  name: "harmonic_alignment"
+  namespace: "match_modalities/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_mod1"
+    info:
+      label: "Modality 1"
+      summary: "The first modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Modality 2"
+      summary: "The second modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod1"
+    info:
+      label: "Integrated mod1"
+      summary: "The integrated embedding for the first modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod2"
+    info:
+      label: "Integrated mod2"
+      summary: "The integrated embedding for the second modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pca_XY"
+    description: "Default number of principal components on which to build graph."
+    info: null
+    default:
+    - 100
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_eigenvectors"
+    description: "Number of eigenvectors of the normalized Laplacian on which to perform\
+      \ alignment."
+    info: null
+    default:
+    - 100
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/match_modalities/scicar_cell_lines"
+    dest: "resources_test/match_modalities/scicar_cell_lines"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Harmonic Alignment"
+    summary: "Harmonic Alignment"
+    description: "Harmonic Alignment is a method for integrating multimodal single-cell\
+      \ data. It is based on the idea of aligning the eigenvectors of the Laplacian\
+      \ matrices of the two modalities. The alignment is achieved by solving a generalized\
+      \ eigenvalue problem. The method is described in the following paper: https://doi.org/10.1137/1.9781611976236.36\n"
+    preferred_normalization: "log_cp10k"
+    v1:
+      path: "openproblems/tasks/matching_modalities/methods/harmonic_alignment.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    reference: "stanley2020harmonic"
+    documentation_url: "https://github.com/KrishnaswamyLab/harmonic-alignment#readme"
+    repository_url: "https://github.com/KrishnaswamyLab/harmonic-alignment"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A multimodal data integration method."
+      description: "A multimodal method to integrate data.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    github:
+    - "KrishnaswamyLab/harmonic-alignment#subdirectory=python"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/harmonic_alignment/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/harmonic_alignment"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/harmonic_alignment/harmonic_alignment"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/match_modalities/methods/harmonic_alignment/main.nf b/target/nextflow/match_modalities/methods/harmonic_alignment/main.nf
new file mode 100644
index 0000000000..2e3fd4355d
--- /dev/null
+++ b/target/nextflow/match_modalities/methods/harmonic_alignment/main.nf
@@ -0,0 +1,3698 @@
+// harmonic_alignment 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "harmonic_alignment",
+    "namespace" : "match_modalities/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_mod1",
+        "info" : {
+          "label" : "Modality 1",
+          "summary" : "The first modality of a multimodal dataset. The cells of this dataset are randomly permuted.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_mod2",
+        "info" : {
+          "label" : "Modality 2",
+          "summary" : "The second modality of a multimodal dataset. The cells of this dataset are randomly permuted.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_mod1",
+        "info" : {
+          "label" : "Integrated mod1",
+          "summary" : "The integrated embedding for the first modality",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "integrated",
+                "description" : "An integrated embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "Which method was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_mod2",
+        "info" : {
+          "label" : "Integrated mod2",
+          "summary" : "The integrated embedding for the second modality",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "integrated",
+                "description" : "An integrated embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "Which method was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_pca_XY",
+        "description" : "Default number of principal components on which to build graph.",
+        "default" : [
+          100
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_eigenvectors",
+        "description" : "Number of eigenvectors of the normalized Laplacian on which to perform alignment.",
+        "default" : [
+          100
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/harmonic_alignment/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/match_modalities/scicar_cell_lines",
+        "dest" : "resources_test/match_modalities/scicar_cell_lines",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Harmonic Alignment",
+      "summary" : "Harmonic Alignment",
+      "description" : "Harmonic Alignment is a method for integrating multimodal single-cell data. It is based on the idea of aligning the eigenvectors of the Laplacian matrices of the two modalities. The alignment is achieved by solving a generalized eigenvalue problem. The method is described in the following paper: https://doi.org/10.1137/1.9781611976236.36\n",
+      "preferred_normalization" : "log_cp10k",
+      "v1" : {
+        "path" : "openproblems/tasks/matching_modalities/methods/harmonic_alignment.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "reference" : "stanley2020harmonic",
+      "documentation_url" : "https://github.com/KrishnaswamyLab/harmonic-alignment#readme",
+      "repository_url" : "https://github.com/KrishnaswamyLab/harmonic-alignment",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A multimodal data integration method.",
+        "description" : "A multimodal method to integrate data.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "github" : [
+            "KrishnaswamyLab/harmonic-alignment#subdirectory=python"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/harmonic_alignment/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/harmonic_alignment",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import harmonicalignment
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_pca_XY': $( if [ ! -z ${VIASH_PAR_N_PCA_XY+x} ]; then echo "int(r'${VIASH_PAR_N_PCA_XY//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_eigenvectors': $( if [ ! -z ${VIASH_PAR_N_EIGENVECTORS+x} ]; then echo "int(r'${VIASH_PAR_N_EIGENVECTORS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+
+print("Reading input h5ad file", flush=True)
+adata_mod1 = ad.read_h5ad(par["input_mod1"])
+adata_mod2 = ad.read_h5ad(par["input_mod2"])
+
+print("Check parameters", flush=True)
+n_eigenvectors = par["n_eigenvectors"]
+n_pca_XY = par["n_pca_XY"]
+
+if adata_mod1.layers["normalized"].shape[0] <= n_eigenvectors:
+    n_eigenvectors = None
+if adata_mod1.layers["normalized"].shape[0] <= n_pca_XY:
+    n_pca_XY = None
+
+
+print("Running Harmonic Alignment", flush=True)
+ha_op = harmonicalignment.HarmonicAlignment(
+    n_filters=8, n_pca_XY=n_pca_XY, n_eigenvectors=n_eigenvectors
+)
+ha_op.align(adata_mod1.obsm["X_svd"], adata_mod2.obsm["X_svd"])
+XY_aligned = ha_op.diffusion_map(n_eigenvectors=n_eigenvectors)
+
+print("Storing output data structures", flush=True)
+
+adata_mod1.obsm["integrated"] = XY_aligned[: adata_mod1.obsm["X_svd"].shape[0]]
+adata_mod2.obsm["integrated"] = XY_aligned[-adata_mod2.obsm["X_svd"].shape[0] :]
+
+print("Write output to file", flush=True)
+adata_mod1.uns["method_id"] = meta["functionality_name"]
+adata_mod2.uns["method_id"] = meta["functionality_name"]
+adata_mod1.write_h5ad(par["output_mod1"], compression = "gzip")
+adata_mod2.write_h5ad(par["output_mod2"], compression = "gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/match_modalities/methods/harmonic_alignment",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/match_modalities/methods/harmonic_alignment/nextflow.config b/target/nextflow/match_modalities/methods/harmonic_alignment/nextflow.config
new file mode 100644
index 0000000000..d1c0d5715b
--- /dev/null
+++ b/target/nextflow/match_modalities/methods/harmonic_alignment/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'match_modalities/methods/harmonic_alignment'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/match_modalities/methods/procrustes/.config.vsh.yaml b/target/nextflow/match_modalities/methods/procrustes/.config.vsh.yaml
new file mode 100644
index 0000000000..9515669027
--- /dev/null
+++ b/target/nextflow/match_modalities/methods/procrustes/.config.vsh.yaml
@@ -0,0 +1,245 @@
+functionality:
+  name: "procrustes"
+  namespace: "match_modalities/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_mod1"
+    info:
+      label: "Modality 1"
+      summary: "The first modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Modality 2"
+      summary: "The second modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod1"
+    info:
+      label: "Integrated mod1"
+      summary: "The integrated embedding for the first modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod2"
+    info:
+      label: "Integrated mod2"
+      summary: "The integrated embedding for the second modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/match_modalities/scicar_cell_lines"
+    dest: "resources_test/match_modalities/scicar_cell_lines"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Procrustes"
+    summary: "\"Procrustes superimposition embeds cellular data from each modality\
+      \ into a common space.\"\n"
+    description: "\"Procrustes superimposition embeds cellular data from each modality\
+      \ into a common space by aligning the 100-dimensional SVD embeddings to one\
+      \ another by using an isomorphic transformation that minimizes the root mean\
+      \ squared distance between points. The unmodified SVD embedding and the transformed\
+      \ second modality are used as output for the task.\"\n"
+    v1:
+      path: "openproblems/tasks/matching_modalities/methods/procrustes.py"
+      commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    reference: "gower1975generalized"
+    documentation_url: "https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.procrustes.html"
+    repository_url: "https://github.com/scipy/scipy"
+    preferred_normalization: "log_cp10k"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A multimodal data integration method."
+      description: "A multimodal method to integrate data.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scipy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/procrustes/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/procrustes"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/procrustes/procrustes"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/match_modalities/methods/procrustes/main.nf b/target/nextflow/match_modalities/methods/procrustes/main.nf
new file mode 100644
index 0000000000..32710775c0
--- /dev/null
+++ b/target/nextflow/match_modalities/methods/procrustes/main.nf
@@ -0,0 +1,3654 @@
+// procrustes 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "procrustes",
+    "namespace" : "match_modalities/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_mod1",
+        "info" : {
+          "label" : "Modality 1",
+          "summary" : "The first modality of a multimodal dataset. The cells of this dataset are randomly permuted.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_mod2",
+        "info" : {
+          "label" : "Modality 2",
+          "summary" : "The second modality of a multimodal dataset. The cells of this dataset are randomly permuted.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_mod1",
+        "info" : {
+          "label" : "Integrated mod1",
+          "summary" : "The integrated embedding for the first modality",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "integrated",
+                "description" : "An integrated embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "Which method was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_mod2",
+        "info" : {
+          "label" : "Integrated mod2",
+          "summary" : "The integrated embedding for the second modality",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "integrated",
+                "description" : "An integrated embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "Which method was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/procrustes/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/match_modalities/scicar_cell_lines",
+        "dest" : "resources_test/match_modalities/scicar_cell_lines",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Procrustes",
+      "summary" : "\\"Procrustes superimposition embeds cellular data from each modality into a common space.\\"\n",
+      "description" : "\\"Procrustes superimposition embeds cellular data from each modality into a common space by aligning the 100-dimensional SVD embeddings to one another by using an isomorphic transformation that minimizes the root mean squared distance between points. The unmodified SVD embedding and the transformed second modality are used as output for the task.\\"\n",
+      "v1" : {
+        "path" : "openproblems/tasks/matching_modalities/methods/procrustes.py",
+        "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+      },
+      "reference" : "gower1975generalized",
+      "documentation_url" : "https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.procrustes.html",
+      "repository_url" : "https://github.com/scipy/scipy",
+      "preferred_normalization" : "log_cp10k",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A multimodal data integration method.",
+        "description" : "A multimodal method to integrate data.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scipy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/procrustes/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/procrustes",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import scipy.spatial
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Reading input h5ad file", flush=True)
+adata_mod1 = ad.read_h5ad(par["input_mod1"])
+adata_mod2 = ad.read_h5ad(par["input_mod2"])
+
+print("procrustes alignment", flush=True)
+X_proc, Y_proc, _ = scipy.spatial.procrustes(adata_mod1.obsm["X_svd"], adata_mod2.obsm["X_svd"])
+
+print("Storing output data", flush=True)
+adata_mod1.obsm["integrated"] = X_proc
+adata_mod2.obsm["integrated"] = Y_proc
+
+print("Write output to file", flush=True)
+adata_mod1.uns["method_id"] = meta["functionality_name"]
+adata_mod2.uns["method_id"] = meta["functionality_name"]
+adata_mod1.write_h5ad(par["output_mod1"], compression = "gzip")
+adata_mod2.write_h5ad(par["output_mod2"], compression = "gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/match_modalities/methods/procrustes",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/match_modalities/methods/procrustes/nextflow.config b/target/nextflow/match_modalities/methods/procrustes/nextflow.config
new file mode 100644
index 0000000000..24f672d6fb
--- /dev/null
+++ b/target/nextflow/match_modalities/methods/procrustes/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'match_modalities/methods/procrustes'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/match_modalities/methods/scot/.config.vsh.yaml b/target/nextflow/match_modalities/methods/scot/.config.vsh.yaml
new file mode 100644
index 0000000000..3c2da31768
--- /dev/null
+++ b/target/nextflow/match_modalities/methods/scot/.config.vsh.yaml
@@ -0,0 +1,250 @@
+functionality:
+  name: "scot"
+  namespace: "match_modalities/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_mod1"
+    info:
+      label: "Modality 1"
+      summary: "The first modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Modality 2"
+      summary: "The second modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod1"
+    info:
+      label: "Integrated mod1"
+      summary: "The integrated embedding for the first modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod2"
+    info:
+      label: "Integrated mod2"
+      summary: "The integrated embedding for the second modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean_true"
+    name: "--balanced"
+    description: "Determines whether balanced or unbalanced optimal transport. In\
+      \ the balanced case, the target and source distributions are assumed to have\
+      \ equal mass."
+    info: null
+    direction: "input"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/match_modalities/scicar_cell_lines"
+    dest: "resources_test/match_modalities/scicar_cell_lines"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Single Cell Optimal Transport"
+    description: "Single Cell Optimal Transport (SCOT) is a method for integrating\
+      \ multimodal single-cell data. It is based on the idea of aligning the distributions\
+      \ of the two modalities using optimal transport.\n"
+    summary: "Run Single Cell Optimal Transport"
+    preferred_normalization: "log_cp10k"
+    reference: "Demetci2020scot"
+    documentation_url: "https://github.com/rsinghlab/SCOT#readme"
+    repository_url: "https://github.com/rsinghlab/SCOT"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A multimodal data integration method."
+      description: "A multimodal method to integrate data.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    interactive: false
+  - type: "docker"
+    run:
+    - "cd /opt && git clone --depth 1 https://github.com/rsinghlab/SCOT.git && cd\
+      \ SCOT && pip install -r requirements.txt"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/scot/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/scot"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/scot/scot"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/match_modalities/methods/scot/main.nf b/target/nextflow/match_modalities/methods/scot/main.nf
new file mode 100644
index 0000000000..527b645af0
--- /dev/null
+++ b/target/nextflow/match_modalities/methods/scot/main.nf
@@ -0,0 +1,3679 @@
+// scot 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "scot",
+    "namespace" : "match_modalities/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_mod1",
+        "info" : {
+          "label" : "Modality 1",
+          "summary" : "The first modality of a multimodal dataset. The cells of this dataset are randomly permuted.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_mod2",
+        "info" : {
+          "label" : "Modality 2",
+          "summary" : "The second modality of a multimodal dataset. The cells of this dataset are randomly permuted.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_mod1",
+        "info" : {
+          "label" : "Integrated mod1",
+          "summary" : "The integrated embedding for the first modality",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "integrated",
+                "description" : "An integrated embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "Which method was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_mod2",
+        "info" : {
+          "label" : "Integrated mod2",
+          "summary" : "The integrated embedding for the second modality",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "integrated",
+                "description" : "An integrated embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "Which method was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "boolean_true",
+        "name" : "--balanced",
+        "description" : "Determines whether balanced or unbalanced optimal transport. In the balanced case, the target and source distributions are assumed to have equal mass.",
+        "direction" : "input",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/scot/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/match_modalities/scicar_cell_lines",
+        "dest" : "resources_test/match_modalities/scicar_cell_lines",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Single Cell Optimal Transport",
+      "description" : "Single Cell Optimal Transport (SCOT) is a method for integrating multimodal single-cell data. It is based on the idea of aligning the distributions of the two modalities using optimal transport.\n",
+      "summary" : "Run Single Cell Optimal Transport",
+      "preferred_normalization" : "log_cp10k",
+      "reference" : "Demetci2020scot",
+      "documentation_url" : "https://github.com/rsinghlab/SCOT#readme",
+      "repository_url" : "https://github.com/rsinghlab/SCOT",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A multimodal data integration method.",
+        "description" : "A multimodal method to integrate data.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "apt",
+          "packages" : [
+            "git"
+          ],
+          "interactive" : false
+        },
+        {
+          "type" : "docker",
+          "run" : [
+            "cd /opt && git clone --depth 1 https://github.com/rsinghlab/SCOT.git && cd SCOT && pip install -r requirements.txt"
+          ]
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/scot/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/scot",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import sys
+sys.path.append("/opt/SCOT/src/")
+import scotv1
+import pandas as pd
+
+# importing helper functions from common preprocessing.py file in resources dir
+import sys
+
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'balanced': $( if [ ! -z ${VIASH_PAR_BALANCED+x} ]; then echo "r'${VIASH_PAR_BALANCED//\\'/\\'\\"\\'\\"r\\'}'.lower() == 'true'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+
+print("Reading input h5ad file", flush=True)
+adata_mod1 = ad.read_h5ad(par["input_mod1"])
+adata_mod2 = ad.read_h5ad(par["input_mod2"])
+
+
+print("Initialize SCOT", flush=True)
+scot = scotv1.SCOT(adata_mod1.obsm["X_svd"], adata_mod2.obsm["X_svd"])
+
+print("Call the unbalanced alignment", flush=True)
+# From https://github.com/rsinghlab/SCOT/blob/master/examples/unbalanced_GW_SNAREseq.ipynb # noqa: 501
+X_new_unbal, y_new_unbal = scot.align(
+    k=50, e=1e-3, normalize=True
+)
+
+
+print("store output", flush=True)
+adata_mod1.obsm["integrated"] = X_new_unbal
+adata_mod2.obsm["integrated"] = y_new_unbal
+
+print("Write output to file", flush=True)
+adata_mod1.uns["method_id"] = meta["functionality_name"]
+adata_mod2.uns["method_id"] = meta["functionality_name"]
+adata_mod1.write_h5ad(par["output_mod1"], compression = "gzip")
+adata_mod2.write_h5ad(par["output_mod2"], compression = "gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/match_modalities/methods/scot",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/match_modalities/methods/scot/nextflow.config b/target/nextflow/match_modalities/methods/scot/nextflow.config
new file mode 100644
index 0000000000..a027d2b67a
--- /dev/null
+++ b/target/nextflow/match_modalities/methods/scot/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'match_modalities/methods/scot'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/match_modalities/metrics/knn_auc/.config.vsh.yaml b/target/nextflow/match_modalities/metrics/knn_auc/.config.vsh.yaml
new file mode 100644
index 0000000000..0e6ce3f279
--- /dev/null
+++ b/target/nextflow/match_modalities/metrics/knn_auc/.config.vsh.yaml
@@ -0,0 +1,352 @@
+functionality:
+  name: "knn_auc"
+  namespace: "match_modalities/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated_mod1"
+    info:
+      label: "Integrated mod1"
+      summary: "The integrated embedding for the first modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_integrated_mod2"
+    info:
+      label: "Integrated mod2"
+      summary: "The integrated embedding for the second modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution_mod1"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the first modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution_mod2"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the second modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--proportion_neighbors"
+    description: "The proportion of neighbours to use in computing the KNN."
+    info: null
+    default:
+    - 0.1
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/match_modalities/scicar_cell_lines"
+    dest: "resources_test/match_modalities/scicar_cell_lines"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - label: "kNN Area Under the Curve"
+      name: "knn_auc"
+      summary: "Compute the kNN Area Under the Curve"
+      description: "Let $f(i) \\in F$ be the scRNA-seq measurement of cell $i$, and\
+        \ $g(i) \\in G$ be the scATAC- seq measurement of cell $i$. kNN-AUC calculates\
+        \ the average percentage overlap of neighborhoods of $f(i)$ in $F$ with neighborhoods\
+        \ of $g(i)$ in $G$. Higher is better.\n"
+      reference: "lance2022multimodal"
+      min: 0
+      max: 1
+      maximize: true
+      v1:
+        path: "openproblems/tasks/matching_modalities/metrics/knn_auc.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A multimodal data integration metric."
+      description: "A metric for evaluating integrated data.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/metrics/knn_auc/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/metrics/knn_auc"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/metrics/knn_auc/knn_auc"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/match_modalities/metrics/knn_auc/main.nf b/target/nextflow/match_modalities/metrics/knn_auc/main.nf
new file mode 100644
index 0000000000..e57f6f28e6
--- /dev/null
+++ b/target/nextflow/match_modalities/metrics/knn_auc/main.nf
@@ -0,0 +1,3859 @@
+// knn_auc 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "knn_auc",
+    "namespace" : "match_modalities/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_integrated_mod1",
+        "info" : {
+          "label" : "Integrated mod1",
+          "summary" : "The integrated embedding for the first modality",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "integrated",
+                "description" : "An integrated embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "Which method was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_integrated_mod2",
+        "info" : {
+          "label" : "Integrated mod2",
+          "summary" : "The integrated embedding for the second modality",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "integrated",
+                "description" : "An integrated embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "Which method was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution_mod1",
+        "info" : {
+          "label" : "Solution mod1",
+          "summary" : "The ground truth information for the first modality",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "integer",
+                "name" : "permutation_indices",
+                "description" : "Indices with which to revert the permutation of the cells",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution_mod2",
+        "info" : {
+          "label" : "Solution mod1",
+          "summary" : "The ground truth information for the second modality",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "integer",
+                "name" : "permutation_indices",
+                "description" : "Indices with which to revert the permutation of the cells",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "double",
+        "name" : "--proportion_neighbors",
+        "description" : "The proportion of neighbours to use in computing the KNN.",
+        "default" : [
+          0.1
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/metrics/knn_auc/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/match_modalities/scicar_cell_lines",
+        "dest" : "resources_test/match_modalities/scicar_cell_lines",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_metric_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "label" : "kNN Area Under the Curve",
+          "name" : "knn_auc",
+          "summary" : "Compute the kNN Area Under the Curve",
+          "description" : "Let $f(i) \\\\in F$ be the scRNA-seq measurement of cell $i$, and $g(i) \\\\in G$ be the scATAC- seq measurement of cell $i$. kNN-AUC calculates the average percentage overlap of neighborhoods of $f(i)$ in $F$ with neighborhoods of $g(i)$ in $G$. Higher is better.\n",
+          "reference" : "lance2022multimodal",
+          "min" : 0,
+          "max" : 1,
+          "maximize" : true,
+          "v1" : {
+            "path" : "openproblems/tasks/matching_modalities/metrics/knn_auc.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          }
+        }
+      ],
+      "type" : "metric",
+      "type_info" : {
+        "label" : "Metric",
+        "summary" : "A multimodal data integration metric.",
+        "description" : "A metric for evaluating integrated data.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "numpy",
+            "scikit-learn"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/metrics/knn_auc/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/metrics/knn_auc",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import numpy as np
+import sklearn.decomposition
+import sklearn.neighbors
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_integrated_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'proportion_neighbors': $( if [ ! -z ${VIASH_PAR_PROPORTION_NEIGHBORS+x} ]; then echo "float(r'${VIASH_PAR_PROPORTION_NEIGHBORS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Reading adata file", flush=True)
+input_solution_mod1 = ad.read_h5ad(par["input_solution_mod1"])
+input_solution_mod2 = ad.read_h5ad(par["input_solution_mod2"])
+
+input_integrated_mod1 = ad.read_h5ad(par["input_integrated_mod1"])[input_solution_mod1.obs["permutation_indices"]]
+input_integrated_mod2 = ad.read_h5ad(par["input_integrated_mod2"])[input_solution_mod2.obs["permutation_indices"]]
+
+print("Checking parameters", flush=True)
+n_neighbors = int(np.ceil(par["proportion_neighbors"] * input_solution_mod1.n_obs))
+
+print("Compute KNN on PCA", flush=True)
+_, indices_true = (
+    sklearn.neighbors.NearestNeighbors(n_neighbors=n_neighbors)
+    .fit(input_solution_mod1.obsm["X_svd"])
+    .kneighbors(input_solution_mod1.obsm["X_svd"])
+)
+
+_, indices_pred = (
+    sklearn.neighbors.NearestNeighbors(n_neighbors=n_neighbors)
+    .fit(input_integrated_mod1.obsm["integrated"])
+    .kneighbors(input_integrated_mod2.obsm["integrated"])
+)
+
+print("Check which neighbours match", flush=True)
+neighbors_match = np.zeros(n_neighbors, dtype=int)
+for i in range(input_solution_mod1.n_obs):
+    _, pred_matches, true_matches = np.intersect1d(
+        indices_pred[i], indices_true[i], return_indices=True
+    )
+    neighbors_match_idx = np.maximum(pred_matches, true_matches)
+    neighbors_match += np.sum(
+        np.arange(n_neighbors) >= neighbors_match_idx[:, None],
+        axis=0,
+    )
+
+print("Compute area under neighbours match curve", flush=True)
+neighbors_match_curve = neighbors_match / (
+    np.arange(1, n_neighbors + 1) * input_solution_mod1.n_obs
+)
+area_under_curve = np.mean(neighbors_match_curve)
+
+print("Store metric value", flush=True)
+uns = {
+  "dataset_id": input_solution_mod1.uns["dataset_id"],
+  "normalization_id": input_solution_mod1.uns["normalization_id"],
+  "method_id": input_integrated_mod1.uns["method_id"],
+  "metric_ids": "knn_auc",
+  "metric_values": area_under_curve
+}
+output_metric = ad.AnnData(
+  shape=(0,0),
+  uns=uns
+)
+
+print("Writing adata to file", flush=True)
+output_metric.write_h5ad(par["output"], compression = "gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/match_modalities/metrics/knn_auc",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/match_modalities/metrics/knn_auc/nextflow.config b/target/nextflow/match_modalities/metrics/knn_auc/nextflow.config
new file mode 100644
index 0000000000..7fceb9c22b
--- /dev/null
+++ b/target/nextflow/match_modalities/metrics/knn_auc/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'match_modalities/metrics/knn_auc'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/match_modalities/metrics/mse/.config.vsh.yaml b/target/nextflow/match_modalities/metrics/mse/.config.vsh.yaml
new file mode 100644
index 0000000000..66e669d327
--- /dev/null
+++ b/target/nextflow/match_modalities/metrics/mse/.config.vsh.yaml
@@ -0,0 +1,341 @@
+functionality:
+  name: "mse"
+  namespace: "match_modalities/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_integrated_mod1"
+    info:
+      label: "Integrated mod1"
+      summary: "The integrated embedding for the first modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_integrated_mod2"
+    info:
+      label: "Integrated mod2"
+      summary: "The integrated embedding for the second modality"
+      slots:
+        obsm:
+        - type: "double"
+          name: "integrated"
+          description: "An integrated embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "Which method was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution_mod1"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the first modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution_mod2"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the second modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "file"
+    path: "resources_test/match_modalities/scicar_cell_lines"
+    dest: "resources_test/match_modalities/scicar_cell_lines"
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - label: "Mean Squared Error"
+      name: "mse"
+      summary: "Compute the mean squared error."
+      description: "Mean squared error (MSE) is the average distance between each\
+        \ pair of matched observations of the same cell in the learned latent space.\
+        \ Lower is better.\n"
+      reference: "lance2022multimodal"
+      maximize: false
+      min: 0
+      max: "+.inf"
+      v1:
+        path: "openproblems/tasks/matching_modalities/metrics/mse.py"
+        commit: "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A multimodal data integration metric."
+      description: "A metric for evaluating integrated data.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy<2"
+    - "scipy"
+    - "scprep"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/metrics/mse/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/metrics/mse"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/metrics/mse/mse"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/match_modalities/metrics/mse/main.nf b/target/nextflow/match_modalities/metrics/mse/main.nf
new file mode 100644
index 0000000000..cc733226fc
--- /dev/null
+++ b/target/nextflow/match_modalities/metrics/mse/main.nf
@@ -0,0 +1,3828 @@
+// mse 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "mse",
+    "namespace" : "match_modalities/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_integrated_mod1",
+        "info" : {
+          "label" : "Integrated mod1",
+          "summary" : "The integrated embedding for the first modality",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "integrated",
+                "description" : "An integrated embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "Which method was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/integrated_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_integrated_mod2",
+        "info" : {
+          "label" : "Integrated mod2",
+          "summary" : "The integrated embedding for the second modality",
+          "slots" : {
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "integrated",
+                "description" : "An integrated embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "Which method was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/integrated_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution_mod1",
+        "info" : {
+          "label" : "Solution mod1",
+          "summary" : "The ground truth information for the first modality",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "integer",
+                "name" : "permutation_indices",
+                "description" : "Indices with which to revert the permutation of the cells",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution_mod2",
+        "info" : {
+          "label" : "Solution mod1",
+          "summary" : "The ground truth information for the second modality",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "integer",
+                "name" : "permutation_indices",
+                "description" : "Indices with which to revert the permutation of the cells",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/metrics/mse/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/match_modalities/scicar_cell_lines",
+        "dest" : "resources_test/match_modalities/scicar_cell_lines",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_metric_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "label" : "Mean Squared Error",
+          "name" : "mse",
+          "summary" : "Compute the mean squared error.",
+          "description" : "Mean squared error (MSE) is the average distance between each pair of matched observations of the same cell in the learned latent space. Lower is better.\n",
+          "reference" : "lance2022multimodal",
+          "maximize" : false,
+          "min" : 0,
+          "max" : "+.inf",
+          "v1" : {
+            "path" : "openproblems/tasks/matching_modalities/metrics/mse.py",
+            "commit" : "b3456fd73c04c28516f6df34c57e6e3e8b0dab32"
+          }
+        }
+      ],
+      "type" : "metric",
+      "type_info" : {
+        "label" : "Metric",
+        "summary" : "A multimodal data integration metric.",
+        "description" : "A metric for evaluating integrated data.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "numpy<2",
+            "scipy",
+            "scprep"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/metrics/mse/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/metrics/mse",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import numpy as np
+from scipy import sparse
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_integrated_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_integrated_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_INTEGRATED_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_INTEGRATED_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print("Reading adata file", flush=True)
+input_solution_mod1 = ad.read_h5ad(par["input_solution_mod1"])
+input_solution_mod2 = ad.read_h5ad(par["input_solution_mod2"])
+
+input_integrated_mod1 = ad.read_h5ad(par["input_integrated_mod1"])[input_solution_mod1.obs["permutation_indices"]]
+input_integrated_mod2 = ad.read_h5ad(par["input_integrated_mod2"])[input_solution_mod2.obs["permutation_indices"]]
+
+print("Computing MSE", flush=True)
+def _square(X):
+	if sparse.issparse(X):
+		X.data = X.data ** 2
+		return X
+	else:
+		return X ** 2
+
+
+X = input_integrated_mod1.obsm["integrated"].toarray()
+Y = input_integrated_mod2.obsm["integrated"].toarray()
+
+X_shuffled = X[np.random.permutation(np.arange(X.shape[0])), :]
+error_random = np.mean(np.sum(_square(X_shuffled - Y)))
+error_abs = np.mean(np.sum(_square(X - Y)))
+metric_value = (error_abs / error_random).item()
+
+print("Store metric value", flush=True)
+uns = {
+  "dataset_id": input_solution_mod1.uns["dataset_id"],
+  "normalization_id": input_solution_mod1.uns["normalization_id"],
+  "method_id": input_integrated_mod1.uns["method_id"],
+  "metric_ids": "mse",
+  "metric_values": metric_value
+}
+output_metric = ad.AnnData(
+  shape=(0,0),
+  uns=uns
+)
+
+print("Writing adata to file", flush=True)
+output_metric.write_h5ad(par["output"], compression = "gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/match_modalities/metrics/mse",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/match_modalities/metrics/mse/nextflow.config b/target/nextflow/match_modalities/metrics/mse/nextflow.config
new file mode 100644
index 0000000000..69c5e42282
--- /dev/null
+++ b/target/nextflow/match_modalities/metrics/mse/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'match_modalities/metrics/mse'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/match_modalities/process_dataset/.config.vsh.yaml b/target/nextflow/match_modalities/process_dataset/.config.vsh.yaml
new file mode 100644
index 0000000000..9c9247e8bb
--- /dev/null
+++ b/target/nextflow/match_modalities/process_dataset/.config.vsh.yaml
@@ -0,0 +1,431 @@
+functionality:
+  name: "process_dataset"
+  namespace: "match_modalities"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_mod1"
+    info:
+      label: "Common dataset mod1"
+      summary: "The first modality (RNA) of a dataset processed by the common multimodal\
+        \ dataset processing pipeline."
+      description: "This dataset contains both raw counts and normalized data matrices,\n\
+        as well as a PCA embedding, HVG selection and a kNN graph.\n"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/common/scicar_cell_lines/dataset_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Common dataset mod2"
+      summary: "The second modality (ADT or ATAC) of a dataset processed by the common\
+        \ multimodal dataset processing pipeline."
+      description: "This dataset contains both raw counts and normalized data matrices,\n\
+        as well as a PCA embedding, HVG selection and a kNN graph.\n"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/common/scicar_cell_lines/dataset_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod1"
+    info:
+      label: "Modality 1"
+      summary: "The first modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_mod2"
+    info:
+      label: "Modality 2"
+      summary: "The second modality of a multimodal dataset. The cells of this dataset\
+        \ are randomly permuted."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_solution_mod1"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the first modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_solution_mod2"
+    info:
+      label: "Solution mod1"
+      summary: "The ground truth information for the second modality"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized counts"
+          required: true
+        obs:
+        - type: "integer"
+          name: "permutation_indices"
+          description: "Indices with which to revert the permutation of the cells"
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_svd"
+          description: "The resulting SVD PCA embedding."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--seed"
+    description: "A seed for the subsampling."
+    info: null
+    example:
+    - 123
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/subset_anndata.py"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/scicar_cell_lines"
+    dest: "resources_test/common/scicar_cell_lines"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "process_dataset"
+    type_info:
+      label: "Data processor"
+      summary: "A match modalities dataset processor."
+      description: "A component for processing a Common Dataset into a task-specific\
+        \ dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "midcpu"
+    - "midtime"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/process_dataset/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/process_dataset"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/process_dataset/process_dataset"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/match_modalities/process_dataset/main.nf b/target/nextflow/match_modalities/process_dataset/main.nf
new file mode 100644
index 0000000000..ca0433a7cb
--- /dev/null
+++ b/target/nextflow/match_modalities/process_dataset/main.nf
@@ -0,0 +1,3962 @@
+// process_dataset 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_dataset",
+    "namespace" : "match_modalities",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_mod1",
+        "info" : {
+          "label" : "Common dataset mod1",
+          "summary" : "The first modality (RNA) of a dataset processed by the common multimodal dataset processing pipeline.",
+          "description" : "This dataset contains both raw counts and normalized data matrices,\nas well as a PCA embedding, HVG selection and a kNN graph.\n",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/scicar_cell_lines/dataset_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_mod2",
+        "info" : {
+          "label" : "Common dataset mod2",
+          "summary" : "The second modality (ADT or ATAC) of a dataset processed by the common multimodal dataset processing pipeline.",
+          "description" : "This dataset contains both raw counts and normalized data matrices,\nas well as a PCA embedding, HVG selection and a kNN graph.\n",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/scicar_cell_lines/dataset_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_mod1",
+        "info" : {
+          "label" : "Modality 1",
+          "summary" : "The first modality of a multimodal dataset. The cells of this dataset are randomly permuted.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_mod2",
+        "info" : {
+          "label" : "Modality 2",
+          "summary" : "The second modality of a multimodal dataset. The cells of this dataset are randomly permuted.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_solution_mod1",
+        "info" : {
+          "label" : "Solution mod1",
+          "summary" : "The ground truth information for the first modality",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "integer",
+                "name" : "permutation_indices",
+                "description" : "Indices with which to revert the permutation of the cells",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_solution_mod2",
+        "info" : {
+          "label" : "Solution mod1",
+          "summary" : "The ground truth information for the second modality",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "integer",
+                "name" : "permutation_indices",
+                "description" : "Indices with which to revert the permutation of the cells",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_svd",
+                "description" : "The resulting SVD PCA embedding.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--seed",
+        "description" : "A seed for the subsampling.",
+        "example" : [
+          123
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/process_dataset/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/helper_functions/subset_anndata.py",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/common/scicar_cell_lines",
+        "dest" : "resources_test/common/scicar_cell_lines",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "process_dataset",
+      "type_info" : {
+        "label" : "Data processor",
+        "summary" : "A match modalities dataset processor.",
+        "description" : "A component for processing a Common Dataset into a task-specific dataset.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "highmem",
+          "midcpu",
+          "midtime"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/process_dataset/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/process_dataset",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import random
+import numpy as np
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_solution_mod1': $( if [ ! -z ${VIASH_PAR_OUTPUT_SOLUTION_MOD1+x} ]; then echo "r'${VIASH_PAR_OUTPUT_SOLUTION_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_solution_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_SOLUTION_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_SOLUTION_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'seed': $( if [ ! -z ${VIASH_PAR_SEED+x} ]; then echo "int(r'${VIASH_PAR_SEED//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+# import helper functions
+sys.path.append(meta["resources_dir"])
+from subset_anndata import read_config_slots_info, subset_anndata
+
+# set seed if need be
+if par["seed"]:
+    print(f">> Setting seed to {par['seed']}")
+    random.seed(par["seed"])
+
+print(">> Load data", flush=True)
+input_mod1 = ad.read_h5ad(par["input_mod1"])
+input_mod2 = ad.read_h5ad(par["input_mod2"])
+
+print(f">> Permute input data")
+mod1_perm = np.random.permutation(np.arange(input_mod1.n_obs))
+mod2_perm = np.random.permutation(np.arange(input_mod2.n_obs))
+
+output_mod1 = input_mod1[mod1_perm]
+output_mod1.obs_names = [f"cell_mod1_{i}" for i in range(output_mod1.n_obs)]
+output_mod2 = input_mod2[mod2_perm]
+output_mod2.obs_names = [f"cell_mod2_{i}" for i in range(output_mod2.n_obs)]
+
+print(f">> Create solution objects")
+output_solution_mod1 = input_mod1.copy()
+output_solution_mod1.obs["permutation_indices"] = np.argsort(mod1_perm)
+output_solution_mod2 = input_mod2.copy()
+output_solution_mod2.obs["permutation_indices"] = np.argsort(mod2_perm)
+    
+# subset the different adatas
+print(">> Read slot info from config file", flush=True)
+slot_info = read_config_slots_info(meta["config"])
+
+print(">> Subset anndatas", flush=True)
+output_mod1 = subset_anndata(output_mod1, slot_info["output_mod1"])
+output_mod2 = subset_anndata(output_mod2, slot_info["output_mod2"])
+output_solution_mod1 = subset_anndata(output_solution_mod1, slot_info["output_solution_mod1"])
+output_solution_mod2 = subset_anndata(output_solution_mod2, slot_info["output_solution_mod2"])
+
+print(">> Writing data", flush=True)
+output_mod1.write_h5ad(par["output_mod1"])
+output_mod2.write_h5ad(par["output_mod2"])
+output_solution_mod1.write_h5ad(par["output_solution_mod1"])
+output_solution_mod2.write_h5ad(par["output_solution_mod2"])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/match_modalities/process_dataset",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "highmem",
+    "midcpu",
+    "midtime"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/match_modalities/process_dataset/nextflow.config b/target/nextflow/match_modalities/process_dataset/nextflow.config
new file mode 100644
index 0000000000..ec2e349fd5
--- /dev/null
+++ b/target/nextflow/match_modalities/process_dataset/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'match_modalities/process_dataset'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/match_modalities/process_dataset/subset_anndata.py b/target/nextflow/match_modalities/process_dataset/subset_anndata.py
new file mode 100644
index 0000000000..80bd160872
--- /dev/null
+++ b/target/nextflow/match_modalities/process_dataset/subset_anndata.py
@@ -0,0 +1,83 @@
+"""Helper functions related to subsetting AnnData objects based on the file format 
+specifications in the .config.vsh.yaml and slot mapping overrides."""
+
+def read_config_slots_info(config_file, slot_mapping = {}):
+    """Read the .config.vsh.yaml to find out which output slots need to be copied to which output file.
+    
+    Arguments:
+    config_file -- Path to the .config.vsh.yaml file (required).
+    slot_mapping -- Which slots to retain. Must be a dictionary whose keys are the names 
+      of the AnnData structs, and values is another dictionary with destination value
+      names as keys and source value names as values.
+      Example of slot_mapping: 
+        ```
+        slot_mapping = {
+          "layers": {
+            "counts": par["layer_counts"],
+          },
+          "obs": {
+            "cell_type": par["obs_cell_type"],
+            "batch": par["obs_batch"],
+          }
+        }
+        ```
+    """
+    import yaml
+    import re
+
+    # read output spec from yaml
+    with open(config_file, "r") as object_name:
+        config = yaml.safe_load(object_name)
+
+    output_struct_slots = {}
+
+    # fetch info on which slots should be copied to which file
+    for arg in config["functionality"]["arguments"]:
+        # argument is an output file with a slot specification
+        if arg["direction"] == "output" and arg.get("info", {}).get("slots"):
+            object_name = re.sub("--", "", arg["name"])
+            
+            struct_slots = arg['info']['slots']
+            out = {}
+            for (struct, slots) in struct_slots.items():
+                out_struct = {}
+                for slot in slots:
+                    # if slot_mapping[struct][slot['name']] exists, use that as the source slot name
+                    # otherwise use slot['name']
+                    source_slot = slot_mapping.get(struct, {}).get(slot["name"], slot["name"])
+                    out_struct[slot["name"]] = source_slot
+                out[struct] = out_struct
+
+            output_struct_slots[object_name] = out
+
+    return output_struct_slots
+
+# create new anndata objects according to api spec
+def subset_anndata(adata, slot_info):
+    """Create new anndata object according to slot info specifications.
+    
+    Arguments:
+    adata -- An AnnData object to subset (required)
+    slot_info -- Which slots to retain, typically one of the items in the output of read_config_slots_info.
+      Must be a dictionary whose keys are the names of the AnnData structs, and values is another 
+      dictionary with destination value names as keys and source value names as values. 
+      """
+    import pandas as pd
+    import anndata as ad
+
+    structs = ["layers", "obs", "var", "uns", "obsp", "obsm", "varp", "varm"]
+    kwargs = {}
+
+    for struct in structs:
+        slot_mapping = slot_info.get(struct, {})
+        data = {dest : getattr(adata, struct)[src] for (dest, src) in slot_mapping.items()}
+        if len(data) > 0:
+            if struct in ['obs', 'var']:
+                data = pd.concat(data, axis=1)
+            kwargs[struct] = data
+        elif struct in ['obs', 'var']:
+            # if no columns need to be copied, we still need an 'obs' and a 'var' 
+            # to help determine the shape of the adata
+            kwargs[struct] = getattr(adata, struct).iloc[:,[]]
+
+    return ad.AnnData(**kwargs)
\ No newline at end of file
diff --git a/target/nextflow/match_modalities/workflows/process_datasets/.config.vsh.yaml b/target/nextflow/match_modalities/workflows/process_datasets/.config.vsh.yaml
new file mode 100644
index 0000000000..774549494a
--- /dev/null
+++ b/target/nextflow/match_modalities/workflows/process_datasets/.config.vsh.yaml
@@ -0,0 +1,431 @@
+functionality:
+  name: "process_datasets"
+  namespace: "match_modalities/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input_mod1"
+      info:
+        label: "Common dataset mod1"
+        summary: "The first modality (RNA) of a dataset processed by the common multimodal\
+          \ dataset processing pipeline."
+        description: "This dataset contains both raw counts and normalized data matrices,\n\
+          as well as a PCA embedding, HVG selection and a kNN graph.\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized counts"
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_svd"
+            description: "The resulting SVD PCA embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/common/scicar_cell_lines/dataset_mod1.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_mod2"
+      info:
+        label: "Common dataset mod2"
+        summary: "The second modality (ADT or ATAC) of a dataset processed by the\
+          \ common multimodal dataset processing pipeline."
+        description: "This dataset contains both raw counts and normalized data matrices,\n\
+          as well as a PCA embedding, HVG selection and a kNN graph.\n"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized counts"
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_svd"
+            description: "The resulting SVD PCA embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/common/scicar_cell_lines/dataset_mod2.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_mod1"
+      info:
+        label: "Modality 1"
+        summary: "The first modality of a multimodal dataset. The cells of this dataset\
+          \ are randomly permuted."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized counts"
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_svd"
+            description: "The resulting SVD PCA embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_mod2"
+      info:
+        label: "Modality 2"
+        summary: "The second modality of a multimodal dataset. The cells of this dataset\
+          \ are randomly permuted."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized counts"
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_svd"
+            description: "The resulting SVD PCA embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_solution_mod1"
+      info:
+        label: "Solution mod1"
+        summary: "The ground truth information for the first modality"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized counts"
+            required: true
+          obs:
+          - type: "integer"
+            name: "permutation_indices"
+            description: "Indices with which to revert the permutation of the cells"
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_svd"
+            description: "The resulting SVD PCA embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_solution_mod2"
+      info:
+        label: "Solution mod1"
+        summary: "The ground truth information for the second modality"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized counts"
+            required: true
+          obs:
+          - type: "integer"
+            name: "permutation_indices"
+            description: "Indices with which to revert the permutation of the cells"
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_svd"
+            description: "The resulting SVD PCA embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "src/wf_utils/helper.nf"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/check_dataset_schema"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+    configInfo:
+      functionalityName: "check_dataset_schema"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/check_dataset_schema/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+  - name: "match_modalities/process_dataset"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/process_dataset/config.vsh.yaml"
+    configInfo:
+      functionalityName: "process_dataset"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/process_dataset/config.vsh.yaml"
+      functionalityNamespace: "match_modalities"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/match_modalities/process_dataset/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/process_dataset"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/workflows/process_datasets/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/workflows/process_datasets"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/workflows/process_datasets/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/match_modalities/workflows/process_datasets/helper.nf b/target/nextflow/match_modalities/workflows/process_datasets/helper.nf
new file mode 100644
index 0000000000..7b3acd5b1c
--- /dev/null
+++ b/target/nextflow/match_modalities/workflows/process_datasets/helper.nf
@@ -0,0 +1,14 @@
+Map findArgumentSchema(Map config, String argument_id) {
+  def argument_groups =
+    (config.functionality.argument_groups ?: []) +
+    [
+      arguments: config.functionality.arguments ?: []
+    ]
+
+  def schema_value = argument_groups.findResult{ gr ->
+    gr.arguments.find { arg ->
+      arg.name == ("--" + argument_id)
+    }
+  }
+  return schema_value
+}
diff --git a/target/nextflow/match_modalities/workflows/process_datasets/main.nf b/target/nextflow/match_modalities/workflows/process_datasets/main.nf
new file mode 100644
index 0000000000..01f1aa7108
--- /dev/null
+++ b/target/nextflow/match_modalities/workflows/process_datasets/main.nf
@@ -0,0 +1,3624 @@
+// process_datasets 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_datasets",
+    "namespace" : "match_modalities/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input_mod1",
+            "info" : {
+              "label" : "Common dataset mod1",
+              "summary" : "The first modality (RNA) of a dataset processed by the common multimodal dataset processing pipeline.",
+              "description" : "This dataset contains both raw counts and normalized data matrices,\nas well as a PCA embedding, HVG selection and a kNN graph.\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized counts",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_svd",
+                    "description" : "The resulting SVD PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/scicar_cell_lines/dataset_mod1.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_mod2",
+            "info" : {
+              "label" : "Common dataset mod2",
+              "summary" : "The second modality (ADT or ATAC) of a dataset processed by the common multimodal dataset processing pipeline.",
+              "description" : "This dataset contains both raw counts and normalized data matrices,\nas well as a PCA embedding, HVG selection and a kNN graph.\n",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized counts",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_svd",
+                    "description" : "The resulting SVD PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/scicar_cell_lines/dataset_mod2.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_mod1",
+            "info" : {
+              "label" : "Modality 1",
+              "summary" : "The first modality of a multimodal dataset. The cells of this dataset are randomly permuted.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized counts",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_svd",
+                    "description" : "The resulting SVD PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_mod2",
+            "info" : {
+              "label" : "Modality 2",
+              "summary" : "The second modality of a multimodal dataset. The cells of this dataset are randomly permuted.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized counts",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_svd",
+                    "description" : "The resulting SVD PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_solution_mod1",
+            "info" : {
+              "label" : "Solution mod1",
+              "summary" : "The ground truth information for the first modality",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "integer",
+                    "name" : "permutation_indices",
+                    "description" : "Indices with which to revert the permutation of the cells",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_svd",
+                    "description" : "The resulting SVD PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_solution_mod2",
+            "info" : {
+              "label" : "Solution mod1",
+              "summary" : "The ground truth information for the second modality",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "integer",
+                    "name" : "permutation_indices",
+                    "description" : "Indices with which to revert the permutation of the cells",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_svd",
+                    "description" : "The resulting SVD PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/workflows/process_datasets/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "src/wf_utils/helper.nf",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/check_dataset_schema",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "check_dataset_schema",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/check_dataset_schema/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+      },
+      {
+        "name" : "match_modalities/process_dataset",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/process_dataset/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "process_dataset",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/process_dataset/config.vsh.yaml",
+          "functionalityNamespace" : "match_modalities",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/match_modalities/process_dataset/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/process_dataset"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/workflows/process_datasets/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/workflows/process_datasets",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { check_dataset_schema } from "${meta.resources_dir}/../../../../nextflow/common/check_dataset_schema/main.nf"
+include { process_dataset } from "${meta.resources_dir}/../../../../nextflow/match_modalities/process_dataset/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    | check_dataset_schema.run(
+      key: "check_dataset_schema_mod1",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input_mod1")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input_mod1,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset_mod1": checks["exit_code"] == 0 ? state.input_mod1 : null,
+        ]
+      }
+    )
+
+    | check_dataset_schema.run(
+      key: "check_dataset_schema_mod2",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input_mod2")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input_mod2,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset_mod2": checks["exit_code"] == 0 ? state.input_mod2 : null,
+        ]
+      }
+    )
+    
+    // remove datasets which didn't pass the schema check
+    | filter { id, state ->
+      state.dataset_mod1 != null && state.dataset_mod2 != null
+    }
+
+    | process_dataset.run(
+      fromState: [ input_mod1: "dataset_mod1", input_mod2: "dataset_mod2" ],
+      toState: [
+        "output_mod1",
+        "output_mod2",
+        "output_solution_mod1",
+        "output_solution_mod2"
+      ]
+    )
+
+    // only output the files for which an output file was specified
+    | setState([
+      "output_mod1",
+      "output_mod2",
+      "output_solution_mod1",
+      "output_solution_mod2"
+    ])
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/match_modalities/workflows/process_datasets/nextflow.config b/target/nextflow/match_modalities/workflows/process_datasets/nextflow.config
new file mode 100644
index 0000000000..a28ae42097
--- /dev/null
+++ b/target/nextflow/match_modalities/workflows/process_datasets/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'match_modalities/workflows/process_datasets'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/match_modalities/workflows/run_benchmark/.config.vsh.yaml b/target/nextflow/match_modalities/workflows/run_benchmark/.config.vsh.yaml
new file mode 100644
index 0000000000..5fcc8811f7
--- /dev/null
+++ b/target/nextflow/match_modalities/workflows/run_benchmark/.config.vsh.yaml
@@ -0,0 +1,510 @@
+functionality:
+  name: "run_benchmark"
+  namespace: "match_modalities/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input_mod1"
+      info:
+        label: "Modality 1"
+        summary: "The first modality of a multimodal dataset. The cells of this dataset\
+          \ are randomly permuted."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized counts"
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_svd"
+            description: "The resulting SVD PCA embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_mod2"
+      info:
+        label: "Modality 2"
+        summary: "The second modality of a multimodal dataset. The cells of this dataset\
+          \ are randomly permuted."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized counts"
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_svd"
+            description: "The resulting SVD PCA embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_solution_mod1"
+      info:
+        label: "Solution mod1"
+        summary: "The ground truth information for the first modality"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized counts"
+            required: true
+          obs:
+          - type: "integer"
+            name: "permutation_indices"
+            description: "Indices with which to revert the permutation of the cells"
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_svd"
+            description: "The resulting SVD PCA embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_solution_mod2"
+      info:
+        label: "Solution mod1"
+        summary: "The ground truth information for the second modality"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized counts"
+            required: true
+          obs:
+          - type: "integer"
+            name: "permutation_indices"
+            description: "Indices with which to revert the permutation of the cells"
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_svd"
+            description: "The resulting SVD PCA embedding."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_scores"
+      description: "A yaml file containing the scores of each of the methods"
+      info: null
+      default:
+      - "score_uns.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_method_configs"
+      info: null
+      default:
+      - "method_configs.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_metric_configs"
+      info: null
+      default:
+      - "metric_configs.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_dataset_info"
+      info: null
+      default:
+      - "dataset_uns.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_task_info"
+      info: null
+      default:
+      - "task_info.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Methods"
+    arguments:
+    - type: "string"
+      name: "--method_ids"
+      description: "A list of method ids to run. If not specified, all methods will\
+        \ be run."
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "src/tasks/match_modalities/api/task_info.yaml"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/check_dataset_schema"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+    configInfo:
+      functionalityName: "check_dataset_schema"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/check_dataset_schema/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+  - name: "common/extract_metadata"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+    configInfo:
+      functionalityName: "extract_metadata"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/extract_metadata/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  - name: "match_modalities/control_methods/random_features"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/control_methods/random_features/config.vsh.yaml"
+    configInfo:
+      functionalityName: "random_features"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/control_methods/random_features/config.vsh.yaml"
+      functionalityNamespace: "match_modalities/control_methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/match_modalities/control_methods/random_features/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/control_methods/random_features"
+  - name: "match_modalities/control_methods/true_features"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/control_methods/true_features/config.vsh.yaml"
+    configInfo:
+      functionalityName: "true_features"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/control_methods/true_features/config.vsh.yaml"
+      functionalityNamespace: "match_modalities/control_methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/match_modalities/control_methods/true_features/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/control_methods/true_features"
+  - name: "match_modalities/methods/fastmnn"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/fastmnn/config.vsh.yaml"
+    configInfo:
+      functionalityName: "fastmnn"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/fastmnn/config.vsh.yaml"
+      functionalityNamespace: "match_modalities/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/match_modalities/methods/fastmnn/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/fastmnn"
+  - name: "match_modalities/methods/scot"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/scot/config.vsh.yaml"
+    configInfo:
+      functionalityName: "scot"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/scot/config.vsh.yaml"
+      functionalityNamespace: "match_modalities/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/match_modalities/methods/scot/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/scot"
+  - name: "match_modalities/methods/harmonic_alignment"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/harmonic_alignment/config.vsh.yaml"
+    configInfo:
+      functionalityName: "harmonic_alignment"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/harmonic_alignment/config.vsh.yaml"
+      functionalityNamespace: "match_modalities/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/match_modalities/methods/harmonic_alignment/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/harmonic_alignment"
+  - name: "match_modalities/methods/procrustes"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/procrustes/config.vsh.yaml"
+    configInfo:
+      functionalityName: "procrustes"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/procrustes/config.vsh.yaml"
+      functionalityNamespace: "match_modalities/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/match_modalities/methods/procrustes/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/procrustes"
+  - name: "match_modalities/metrics/knn_auc"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/metrics/knn_auc/config.vsh.yaml"
+    configInfo:
+      functionalityName: "knn_auc"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/metrics/knn_auc/config.vsh.yaml"
+      functionalityNamespace: "match_modalities/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/match_modalities/metrics/knn_auc/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/metrics/knn_auc"
+  - name: "match_modalities/metrics/mse"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/metrics/mse/config.vsh.yaml"
+    configInfo:
+      functionalityName: "mse"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/metrics/mse/config.vsh.yaml"
+      functionalityNamespace: "match_modalities/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/match_modalities/metrics/mse/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/metrics/mse"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/workflows/run_benchmark/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/workflows/run_benchmark"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/workflows/run_benchmark/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/match_modalities/workflows/run_benchmark/main.nf b/target/nextflow/match_modalities/workflows/run_benchmark/main.nf
new file mode 100644
index 0000000000..c266f247af
--- /dev/null
+++ b/target/nextflow/match_modalities/workflows/run_benchmark/main.nf
@@ -0,0 +1,3820 @@
+// run_benchmark 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "run_benchmark",
+    "namespace" : "match_modalities/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input_mod1",
+            "info" : {
+              "label" : "Modality 1",
+              "summary" : "The first modality of a multimodal dataset. The cells of this dataset are randomly permuted.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized counts",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_svd",
+                    "description" : "The resulting SVD PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/match_modalities/scicar_cell_lines/dataset_mod1.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_mod2",
+            "info" : {
+              "label" : "Modality 2",
+              "summary" : "The second modality of a multimodal dataset. The cells of this dataset are randomly permuted.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized counts",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_svd",
+                    "description" : "The resulting SVD PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/match_modalities/scicar_cell_lines/dataset_mod2.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_solution_mod1",
+            "info" : {
+              "label" : "Solution mod1",
+              "summary" : "The ground truth information for the first modality",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "integer",
+                    "name" : "permutation_indices",
+                    "description" : "Indices with which to revert the permutation of the cells",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_svd",
+                    "description" : "The resulting SVD PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/match_modalities/scicar_cell_lines/solution_mod1.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_solution_mod2",
+            "info" : {
+              "label" : "Solution mod1",
+              "summary" : "The ground truth information for the second modality",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "integer",
+                    "name" : "permutation_indices",
+                    "description" : "Indices with which to revert the permutation of the cells",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_svd",
+                    "description" : "The resulting SVD PCA embedding.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/match_modalities/scicar_cell_lines/solution_mod2.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_scores",
+            "description" : "A yaml file containing the scores of each of the methods",
+            "default" : [
+              "score_uns.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_method_configs",
+            "default" : [
+              "method_configs.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_metric_configs",
+            "default" : [
+              "metric_configs.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_dataset_info",
+            "default" : [
+              "dataset_uns.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_task_info",
+            "default" : [
+              "task_info.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Methods",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--method_ids",
+            "description" : "A list of method ids to run. If not specified, all methods will be run.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/workflows/run_benchmark/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "src/tasks/match_modalities/api/task_info.yaml",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/check_dataset_schema",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "check_dataset_schema",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/check_dataset_schema/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+      },
+      {
+        "name" : "common/extract_metadata",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "extract_metadata",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/extract_metadata/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+      },
+      {
+        "name" : "match_modalities/control_methods/random_features",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/control_methods/random_features/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "random_features",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/control_methods/random_features/config.vsh.yaml",
+          "functionalityNamespace" : "match_modalities/control_methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/match_modalities/control_methods/random_features/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/control_methods/random_features"
+      },
+      {
+        "name" : "match_modalities/control_methods/true_features",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/control_methods/true_features/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "true_features",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/control_methods/true_features/config.vsh.yaml",
+          "functionalityNamespace" : "match_modalities/control_methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/match_modalities/control_methods/true_features/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/control_methods/true_features"
+      },
+      {
+        "name" : "match_modalities/methods/fastmnn",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/fastmnn/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "fastmnn",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/fastmnn/config.vsh.yaml",
+          "functionalityNamespace" : "match_modalities/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/match_modalities/methods/fastmnn/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/fastmnn"
+      },
+      {
+        "name" : "match_modalities/methods/scot",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/scot/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "scot",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/scot/config.vsh.yaml",
+          "functionalityNamespace" : "match_modalities/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/match_modalities/methods/scot/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/scot"
+      },
+      {
+        "name" : "match_modalities/methods/harmonic_alignment",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/harmonic_alignment/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "harmonic_alignment",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/harmonic_alignment/config.vsh.yaml",
+          "functionalityNamespace" : "match_modalities/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/match_modalities/methods/harmonic_alignment/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/harmonic_alignment"
+      },
+      {
+        "name" : "match_modalities/methods/procrustes",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/procrustes/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "procrustes",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/methods/procrustes/config.vsh.yaml",
+          "functionalityNamespace" : "match_modalities/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/match_modalities/methods/procrustes/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/methods/procrustes"
+      },
+      {
+        "name" : "match_modalities/metrics/knn_auc",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/metrics/knn_auc/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "knn_auc",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/metrics/knn_auc/config.vsh.yaml",
+          "functionalityNamespace" : "match_modalities/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/match_modalities/metrics/knn_auc/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/metrics/knn_auc"
+      },
+      {
+        "name" : "match_modalities/metrics/mse",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/metrics/mse/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "mse",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/metrics/mse/config.vsh.yaml",
+          "functionalityNamespace" : "match_modalities/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/match_modalities/metrics/mse/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/metrics/mse"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/match_modalities/workflows/run_benchmark/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/match_modalities/workflows/run_benchmark",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { check_dataset_schema } from "${meta.resources_dir}/../../../../nextflow/common/check_dataset_schema/main.nf"
+include { extract_metadata } from "${meta.resources_dir}/../../../../nextflow/common/extract_metadata/main.nf"
+include { random_features } from "${meta.resources_dir}/../../../../nextflow/match_modalities/control_methods/random_features/main.nf"
+include { true_features } from "${meta.resources_dir}/../../../../nextflow/match_modalities/control_methods/true_features/main.nf"
+include { fastmnn } from "${meta.resources_dir}/../../../../nextflow/match_modalities/methods/fastmnn/main.nf"
+include { scot } from "${meta.resources_dir}/../../../../nextflow/match_modalities/methods/scot/main.nf"
+include { harmonic_alignment } from "${meta.resources_dir}/../../../../nextflow/match_modalities/methods/harmonic_alignment/main.nf"
+include { procrustes } from "${meta.resources_dir}/../../../../nextflow/match_modalities/methods/procrustes/main.nf"
+include { knn_auc } from "${meta.resources_dir}/../../../../nextflow/match_modalities/metrics/knn_auc/main.nf"
+include { mse } from "${meta.resources_dir}/../../../../nextflow/match_modalities/metrics/mse/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+workflow auto {
+  findStates(params, meta.config)
+  | meta.workflow.run(
+    auto: [publish: "state"]
+  )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // construct list of methods
+  methods = [
+    random_features,
+    true_features,
+    scot,
+    harmonic_alignment,
+    fastmnn,
+    procrustes
+  ]
+
+  // construct list of metrics
+  metrics = [
+    knn_auc,
+    mse
+  ]
+
+  /****************************
+   * EXTRACT DATASET METADATA *
+   ****************************/
+  dataset_ch = input_ch
+    // store join id
+    | map{ id, state ->
+      [id, state + ["_meta": [join_id: id]]]
+    }
+
+    // extract the dataset metadata
+    | extract_metadata.run(
+      fromState: [input: "input_solution_mod1"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+  /***************************
+   * RUN METHODS AND METRICS *
+   ***************************/
+  score_ch = dataset_ch
+
+    // run all methods
+    | runEach(
+      components: methods,
+
+      // use the 'filter' argument to only run a method on the normalisation the component is asking for
+      filter: { id, state, comp ->
+        def norm = state.dataset_uns.normalization_id
+        def pref = comp.config.functionality.info.preferred_normalization
+        // if the preferred normalisation is none at all,
+        // we can pass whichever dataset we want
+        def norm_check = (norm == "log_cp10k" && pref == "counts") || norm == pref
+        def method_check = !state.method_ids || state.method_ids.contains(comp.config.functionality.name)
+
+        method_check && norm_check
+      },
+
+      // define a new 'id' by appending the method name to the dataset id
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: { id, state, comp ->
+        def new_args = [
+          input_mod1: state.input_mod1,
+          input_mod2: state.input_mod2
+        ]
+        if (comp.config.functionality.info.type == "control_method") {
+          new_args.input_solution_mod1 = state.input_solution_mod1
+          new_args.input_solution_mod2 = state.input_solution_mod2
+        }
+        new_args
+      },
+
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          method_id: comp.config.functionality.name,
+          method_output_mod1: output.output_mod1,
+          method_output_mod2: output.output_mod2
+        ]
+      }
+    )
+
+      // run all metrics
+    | runEach(
+      components: metrics,
+            id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: [
+        input_integrated_mod1: "method_output_mod1",
+        input_integrated_mod2: "method_output_mod2",
+        input_solution_mod1: "input_solution_mod1",
+        input_solution_mod2: "input_solution_mod2"
+      ],
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          metric_id: comp.config.functionality.name,
+          metric_output: output.output
+        ]
+      }
+    )
+
+  /******************************
+   * GENERATE OUTPUT YAML FILES *
+   ******************************/
+  // TODO: can we store everything below in a separate helper function?
+
+  // extract the dataset metadata
+  dataset_meta_ch = dataset_ch
+    // only keep one of the normalization methods
+    | filter{ id, state ->
+      state.dataset_uns.normalization_id == "log_cp10k"
+    }
+
+    | joinStates { ids, states ->
+      // store the dataset metadata in a file
+      def dataset_uns = states.collect{state ->
+        def uns = state.dataset_uns.clone()
+        uns.remove("normalization_id")
+        uns
+      }
+      def dataset_uns_yaml_blob = toYamlBlob(dataset_uns)
+      def dataset_uns_file = tempFile("dataset_uns.yaml")
+      dataset_uns_file.write(dataset_uns_yaml_blob)
+
+      ["output", [output_dataset_info: dataset_uns_file]]
+    }
+
+  output_ch = score_ch
+
+    // extract the scores
+    | extract_metadata.run(
+      key: "extract_scores",
+      fromState: [input: "metric_output"],
+      toState: { id, output, state ->
+        state + [
+          score_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | joinStates { ids, states ->
+
+      // store the method configs in a file
+      def method_configs = methods.collect{it.config}
+      def method_configs_yaml_blob = toYamlBlob(method_configs)
+      def method_configs_file = tempFile("method_configs.yaml")
+      method_configs_file.write(method_configs_yaml_blob)
+
+      // store the metric configs in a file
+      def metric_configs = metrics.collect{it.config}
+      def metric_configs_yaml_blob = toYamlBlob(metric_configs)
+      def metric_configs_file = tempFile("metric_configs.yaml")
+      metric_configs_file.write(metric_configs_yaml_blob)
+
+      def task_info_file = meta.resources_dir.resolve("task_info.yaml")
+
+      // store the scores in a file
+      def score_uns = states.collect{it.score_uns}
+      def score_uns_yaml_blob = toYamlBlob(score_uns)
+      def score_uns_file = tempFile("score_uns.yaml")
+      score_uns_file.write(score_uns_yaml_blob)
+
+      def new_state = [
+        output_method_configs: method_configs_file,
+        output_metric_configs: metric_configs_file,
+        output_task_info: task_info_file,
+        output_scores: score_uns_file,
+        _meta: states[0]._meta
+      ]
+      
+      ["output", new_state]
+    }
+
+    // merge all of the output data 
+    | mix(dataset_meta_ch)
+    | joinStates{ ids, states ->
+      def mergedStates = states.inject([:]) { acc, m -> acc + m }
+      [ids[0], mergedStates]
+    }
+
+  emit:
+  output_ch
+
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/match_modalities/workflows/run_benchmark/nextflow.config b/target/nextflow/match_modalities/workflows/run_benchmark/nextflow.config
new file mode 100644
index 0000000000..14edc29fb6
--- /dev/null
+++ b/target/nextflow/match_modalities/workflows/run_benchmark/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'match_modalities/workflows/run_benchmark'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/match_modalities/workflows/run_benchmark/task_info.yaml b/target/nextflow/match_modalities/workflows/run_benchmark/task_info.yaml
new file mode 100644
index 0000000000..bc5550df16
--- /dev/null
+++ b/target/nextflow/match_modalities/workflows/run_benchmark/task_info.yaml
@@ -0,0 +1,47 @@
+name: match_modalities
+label: Match Modalities
+summary: |
+  Match cells across datasets of the same set of samples on different technologies / modalities.
+image: "thumbnail.svg"
+motivation: |
+    Cellular function is regulated by the complex interplay of different types of biological
+    molecules (DNA, RNA, proteins, etc.), which determine the state of a cell. Several
+    recently described technologies allow for simultaneous measurement of different aspects
+    of cellular state. For example, sci-CAR [@cao2018joint]
+    jointly profiles RNA expression and chromatin accessibility on the same cell and
+    CITE-seq [@stoeckius2017simultaneous] measures
+    surface protein abundance and RNA expression from each cell. These technologies enable
+    us to better understand cellular function, however datasets are still rare and there are
+    tradeoffs that these measurements make for to profile multiple modalities.
+
+    Joint methods can be more expensive or lower throughput or more noisy than measuring a
+    single modality at a time. Therefore it is useful to develop methods that are capable
+    of integrating measurements of the same biological system but obtained using different
+    technologies on different cells.
+description: |
+  In this task, the goal is to learn a latent space where cells profiled by different
+  technologies in different modalities are matched if they have the same state. We use
+  jointly profiled data as ground truth so that we can evaluate when the observations
+  from the same cell acquired using different modalities are similar. A perfect result
+  has each of the paired observations sharing the same coordinates in the latent space.
+  A method that can achieve this would be able to match datasets across modalities to
+  enable multimodal cellular analysis from separately measured profiles.
+authors:
+  - name: "Scott Gigante"
+    roles: [ author, maintainer ]
+    info:
+      github: scottgigante
+      orcid: "0000-0002-4544-2764"
+  - name: Alex Tong
+    roles: [ author ]
+    info:
+      github: atong01
+  - name: Robrecht Cannoodt
+    roles: [ author ]
+    info:
+      github: rcannood
+      orcid: "0000-0003-3641-729X"
+  - name: Kai Waldrant
+    roles: [ contributor ]
+    info:
+      github: KaiWaldrant
\ No newline at end of file
diff --git a/target/nextflow/predict_modality/control_methods/mean_per_gene/.config.vsh.yaml b/target/nextflow/predict_modality/control_methods/mean_per_gene/.config.vsh.yaml
new file mode 100644
index 0000000000..fb64fa94b9
--- /dev/null
+++ b/target/nextflow/predict_modality/control_methods/mean_per_gene/.config.vsh.yaml
@@ -0,0 +1,446 @@
+functionality:
+  name: "mean_per_gene"
+  namespace: "predict_modality/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod2"
+    info:
+      label: "Test mod2"
+      summary: "The mod2 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  info:
+    label: "Mean per gene"
+    summary: "Returns the mean expression value per gene."
+    description: "Returns the mean expression value per gene."
+    type: "control_method"
+    preferred_normalization: "counts"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "These components have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/meanpergene/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/mean_per_gene"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/mean_per_gene/mean_per_gene"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/control_methods/mean_per_gene/main.nf b/target/nextflow/predict_modality/control_methods/mean_per_gene/main.nf
new file mode 100644
index 0000000000..b5deade62a
--- /dev/null
+++ b/target/nextflow/predict_modality/control_methods/mean_per_gene/main.nf
@@ -0,0 +1,3958 @@
+// mean_per_gene 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "mean_per_gene",
+    "namespace" : "predict_modality/control_methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train_mod1",
+        "info" : {
+          "label" : "Train mod1",
+          "summary" : "The mod1 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_train_mod2",
+        "info" : {
+          "label" : "Train mod2",
+          "summary" : "The mod2 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod1",
+        "info" : {
+          "label" : "Test mod1",
+          "summary" : "The mod1 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod2",
+        "info" : {
+          "label" : "Test mod2",
+          "summary" : "The mod2 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "A prediction of the mod2 expression values of the test cells",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Predicted normalized expression values",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/meanpergene/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "dest" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Mean per gene",
+      "summary" : "Returns the mean expression value per gene.",
+      "description" : "Returns the mean expression value per gene.",
+      "type" : "control_method",
+      "preferred_normalization" : "counts",
+      "type_info" : {
+        "label" : "Control method",
+        "summary" : "Quality control methods for verifying the pipeline.",
+        "description" : "These components have the same interface as the regular methods\nbut also receive the solution object as input. It serves as a\nstarting point to test the relative accuracy of new methods in\nthe task, and also as a quality control for the metrics defined\nin the task. \n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/meanpergene/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/mean_per_gene",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+from scipy.sparse import csc_matrix
+import numpy as np
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_train_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+input_test_mod1 = ad.read_h5ad(par["input_test_mod1"])
+input_train_mod2 = ad.read_h5ad(par["input_train_mod2"])
+
+
+# Find the correct shape
+mean = np.array(input_train_mod2.layers["normalized"].mean(axis=0)).flatten()
+prediction = csc_matrix(np.tile(mean, (input_test_mod1.shape[0], 1)))
+
+# Write out prediction
+out = ad.AnnData(
+    layers={"normalized": prediction},
+    shape=prediction.shape,
+    obs=input_test_mod1.obs,
+    var=input_train_mod2.var,
+    uns={
+        "dataset_id": input_test_mod1.uns["dataset_id"],
+        "method_id": meta["functionality_name"],
+    }
+)
+out.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/predict_modality/control_methods/mean_per_gene",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/control_methods/mean_per_gene/nextflow.config b/target/nextflow/predict_modality/control_methods/mean_per_gene/nextflow.config
new file mode 100644
index 0000000000..258565c039
--- /dev/null
+++ b/target/nextflow/predict_modality/control_methods/mean_per_gene/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/control_methods/mean_per_gene'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/control_methods/random_predict/.config.vsh.yaml b/target/nextflow/predict_modality/control_methods/random_predict/.config.vsh.yaml
new file mode 100644
index 0000000000..d9a5459969
--- /dev/null
+++ b/target/nextflow/predict_modality/control_methods/random_predict/.config.vsh.yaml
@@ -0,0 +1,446 @@
+functionality:
+  name: "random_predict"
+  namespace: "predict_modality/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod2"
+    info:
+      label: "Test mod2"
+      summary: "The mod2 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  info:
+    label: "Random predictions"
+    summary: "Returns random training profiles."
+    description: "Returns random training profiles."
+    type: "control_method"
+    preferred_normalization: "counts"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "These components have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/random_predict/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/random_predict"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/random_predict/random_predict"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/control_methods/random_predict/main.nf b/target/nextflow/predict_modality/control_methods/random_predict/main.nf
new file mode 100644
index 0000000000..48a89097e3
--- /dev/null
+++ b/target/nextflow/predict_modality/control_methods/random_predict/main.nf
@@ -0,0 +1,3966 @@
+// random_predict 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "random_predict",
+    "namespace" : "predict_modality/control_methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train_mod1",
+        "info" : {
+          "label" : "Train mod1",
+          "summary" : "The mod1 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_train_mod2",
+        "info" : {
+          "label" : "Train mod2",
+          "summary" : "The mod2 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod1",
+        "info" : {
+          "label" : "Test mod1",
+          "summary" : "The mod1 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod2",
+        "info" : {
+          "label" : "Test mod2",
+          "summary" : "The mod2 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "A prediction of the mod2 expression values of the test cells",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Predicted normalized expression values",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/random_predict/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "dest" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Random predictions",
+      "summary" : "Returns random training profiles.",
+      "description" : "Returns random training profiles.",
+      "type" : "control_method",
+      "preferred_normalization" : "counts",
+      "type_info" : {
+        "label" : "Control method",
+        "summary" : "Quality control methods for verifying the pipeline.",
+        "description" : "These components have the same interface as the regular methods\nbut also receive the solution object as input. It serves as a\nstarting point to test the relative accuracy of new methods in\nthe task, and also as a quality control for the metrics defined\nin the task. \n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/random_predict/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/random_predict",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+cat("Loading dependencies\\\\n")
+requireNamespace("anndata", quietly = TRUE)
+library(Matrix, warn.conflicts = FALSE, quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_train_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD1" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_train_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD2" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_test_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TEST_MOD1" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_test_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TEST_MOD2" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading h5ad files\\\\n")
+input_train_mod2 <- anndata::read_h5ad(par\\$input_train_mod2)
+input_test_mod1 <- anndata::read_h5ad(par\\$input_test_mod1)
+
+cat("Creating outputs object\\\\n")
+sample_ix <- sample.int(nrow(input_train_mod2), nrow(input_test_mod1), replace = TRUE)
+prediction <- input_train_mod2\\$layers[["normalized"]][sample_ix, , drop = FALSE]
+rownames(prediction) <- rownames(input_test_mod1)
+
+out <- anndata::AnnData(
+  layers = list(normalized = prediction),
+  shape = dim(prediction),
+  uns = list(
+    dataset_id = input_train_mod2\\$uns[["dataset_id"]],
+    method_id = meta[["functionality_name"]]
+  )
+)
+
+cat("Writing predictions to file\\\\n")
+zzz <- out\\$write_h5ad(par\\$output, compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/predict_modality/control_methods/random_predict",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/control_methods/random_predict/nextflow.config b/target/nextflow/predict_modality/control_methods/random_predict/nextflow.config
new file mode 100644
index 0000000000..2d2c2da10c
--- /dev/null
+++ b/target/nextflow/predict_modality/control_methods/random_predict/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/control_methods/random_predict'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/control_methods/solution/.config.vsh.yaml b/target/nextflow/predict_modality/control_methods/solution/.config.vsh.yaml
new file mode 100644
index 0000000000..dad3be3feb
--- /dev/null
+++ b/target/nextflow/predict_modality/control_methods/solution/.config.vsh.yaml
@@ -0,0 +1,446 @@
+functionality:
+  name: "solution"
+  namespace: "predict_modality/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod2"
+    info:
+      label: "Test mod2"
+      summary: "The mod2 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  info:
+    label: "Solution"
+    summary: "Returns the ground-truth solution."
+    description: "Returns the ground-truth solution."
+    type: "control_method"
+    preferred_normalization: "counts"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "These components have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/solution/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/solution"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/solution/solution"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/control_methods/solution/main.nf b/target/nextflow/predict_modality/control_methods/solution/main.nf
new file mode 100644
index 0000000000..a1e0de0954
--- /dev/null
+++ b/target/nextflow/predict_modality/control_methods/solution/main.nf
@@ -0,0 +1,3951 @@
+// solution 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "solution",
+    "namespace" : "predict_modality/control_methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train_mod1",
+        "info" : {
+          "label" : "Train mod1",
+          "summary" : "The mod1 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_train_mod2",
+        "info" : {
+          "label" : "Train mod2",
+          "summary" : "The mod2 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod1",
+        "info" : {
+          "label" : "Test mod1",
+          "summary" : "The mod1 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod2",
+        "info" : {
+          "label" : "Test mod2",
+          "summary" : "The mod2 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "A prediction of the mod2 expression values of the test cells",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Predicted normalized expression values",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/solution/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "dest" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Solution",
+      "summary" : "Returns the ground-truth solution.",
+      "description" : "Returns the ground-truth solution.",
+      "type" : "control_method",
+      "preferred_normalization" : "counts",
+      "type_info" : {
+        "label" : "Control method",
+        "summary" : "Quality control methods for verifying the pipeline.",
+        "description" : "These components have the same interface as the regular methods\nbut also receive the solution object as input. It serves as a\nstarting point to test the relative accuracy of new methods in\nthe task, and also as a quality control for the metrics defined\nin the task. \n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/solution/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/solution",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+cat("Loading dependencies\\\\n")
+requireNamespace("anndata", quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_train_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD1" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_train_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD2" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_test_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TEST_MOD1" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_test_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TEST_MOD2" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading h5ad files\\\\n")
+ad2_test <- anndata::read_h5ad(par\\$input_test_mod2)
+ad2_test\\$uns[["method_id"]] <- meta\\$functionality_name
+
+cat("Writing predictions to file\\\\n")
+zzz <- ad2_test\\$write_h5ad(par\\$output, compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/predict_modality/control_methods/solution",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/control_methods/solution/nextflow.config b/target/nextflow/predict_modality/control_methods/solution/nextflow.config
new file mode 100644
index 0000000000..2e0f30a25f
--- /dev/null
+++ b/target/nextflow/predict_modality/control_methods/solution/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/control_methods/solution'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/control_methods/zeros/.config.vsh.yaml b/target/nextflow/predict_modality/control_methods/zeros/.config.vsh.yaml
new file mode 100644
index 0000000000..d2ec5d5141
--- /dev/null
+++ b/target/nextflow/predict_modality/control_methods/zeros/.config.vsh.yaml
@@ -0,0 +1,446 @@
+functionality:
+  name: "zeros"
+  namespace: "predict_modality/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod2"
+    info:
+      label: "Test mod2"
+      summary: "The mod2 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  info:
+    label: "Zeros"
+    summary: "Returns a prediction consisting of all zeros."
+    description: "Returns a prediction consisting of all zeros."
+    type: "control_method"
+    preferred_normalization: "counts"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "These components have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task. \n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/zeros/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/zeros"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/zeros/zeros"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/control_methods/zeros/main.nf b/target/nextflow/predict_modality/control_methods/zeros/main.nf
new file mode 100644
index 0000000000..eca74e398f
--- /dev/null
+++ b/target/nextflow/predict_modality/control_methods/zeros/main.nf
@@ -0,0 +1,3958 @@
+// zeros 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "zeros",
+    "namespace" : "predict_modality/control_methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train_mod1",
+        "info" : {
+          "label" : "Train mod1",
+          "summary" : "The mod1 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_train_mod2",
+        "info" : {
+          "label" : "Train mod2",
+          "summary" : "The mod2 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod1",
+        "info" : {
+          "label" : "Test mod1",
+          "summary" : "The mod1 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod2",
+        "info" : {
+          "label" : "Test mod2",
+          "summary" : "The mod2 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "A prediction of the mod2 expression values of the test cells",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Predicted normalized expression values",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/zeros/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "dest" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Zeros",
+      "summary" : "Returns a prediction consisting of all zeros.",
+      "description" : "Returns a prediction consisting of all zeros.",
+      "type" : "control_method",
+      "preferred_normalization" : "counts",
+      "type_info" : {
+        "label" : "Control method",
+        "summary" : "Quality control methods for verifying the pipeline.",
+        "description" : "These components have the same interface as the regular methods\nbut also receive the solution object as input. It serves as a\nstarting point to test the relative accuracy of new methods in\nthe task, and also as a quality control for the metrics defined\nin the task. \n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/zeros/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/zeros",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata
+from scipy.sparse import csc_matrix
+import numpy as np
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_train_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print("Reading h5ad files", flush=True)
+ad_mod1_test = anndata.read_h5ad(par["input_test_mod1"])
+ad_mod2 = anndata.read_h5ad(par["input_train_mod2"])
+
+print("create output objects", flush=True)
+prediction = csc_matrix((ad_mod1_test.n_obs, ad_mod2.n_vars), dtype = np.float32)
+
+out = anndata.AnnData(
+    layers={"normalized": prediction},
+    shape=prediction.shape,
+    obs=ad_mod1_test.obs,
+    var=ad_mod2.var,
+    uns={
+        "dataset_id": ad_mod2.uns["dataset_id"],
+        "method_id": meta["functionality_name"],
+    }
+)
+
+print("write predictions to file", flush=True)
+out.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/predict_modality/control_methods/zeros",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/control_methods/zeros/nextflow.config b/target/nextflow/predict_modality/control_methods/zeros/nextflow.config
new file mode 100644
index 0000000000..a9f54e0fb4
--- /dev/null
+++ b/target/nextflow/predict_modality/control_methods/zeros/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/control_methods/zeros'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/methods/guanlab_dengkw_pm/.config.vsh.yaml b/target/nextflow/predict_modality/methods/guanlab_dengkw_pm/.config.vsh.yaml
new file mode 100644
index 0000000000..0ab9f1169d
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/guanlab_dengkw_pm/.config.vsh.yaml
@@ -0,0 +1,401 @@
+functionality:
+  name: "guanlab_dengkw_pm"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--distance_method"
+    description: "The distance metric to use. Possible values include `euclidean`\
+      \ and `minkowski`."
+    info: null
+    default:
+    - "minkowski"
+    required: false
+    choices:
+    - "euclidean"
+    - "minkowski"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pcs"
+    description: "Number of components to use for dimensionality reduction."
+    info: null
+    default:
+    - 50
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Guanlab-dengkw"
+    summary: "A kernel ridge regression method with RBF kernel."
+    description: "This is a solution developed by Team Guanlab - dengkw in the Neurips\
+      \ 2021 competition to predict one modality\nfrom another using kernel ridge\
+      \ regression (KRR) with RBF kernel. Truncated SVD is applied on the combined\n\
+      training and test data from modality 1 followed by row-wise z-score normalization\
+      \ on the reduced matrix. The\ntruncated SVD of modality 2 is predicted by training\
+      \ a KRR model on the normalized training matrix of modality 1.\nPredictions\
+      \ on the normalized test matrix are then re-mapped to the modality 2 feature\
+      \ space via the right\nsingular vectors. \n"
+    preferred_normalization: "log_cp10k"
+    reference: "lance2022multimodal"
+    documentation_url: "https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/Guanlab-dengkw"
+    repository_url: "https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/Guanlab-dengkw"
+    competition_submission_id: 170636
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A regression method."
+      description: "A regression method to predict the expression of one modality\
+        \ from another.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    - "pandas"
+    - "numpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/guanlab_dengkw_pm/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/guanlab_dengkw_pm"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/guanlab_dengkw_pm/guanlab_dengkw_pm"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/methods/guanlab_dengkw_pm/main.nf b/target/nextflow/predict_modality/methods/guanlab_dengkw_pm/main.nf
new file mode 100644
index 0000000000..9adc4e0a0f
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/guanlab_dengkw_pm/main.nf
@@ -0,0 +1,3976 @@
+// guanlab_dengkw_pm 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "guanlab_dengkw_pm",
+    "namespace" : "predict_modality/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train_mod1",
+        "info" : {
+          "label" : "Train mod1",
+          "summary" : "The mod1 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_train_mod2",
+        "info" : {
+          "label" : "Train mod2",
+          "summary" : "The mod2 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod1",
+        "info" : {
+          "label" : "Test mod1",
+          "summary" : "The mod1 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "A prediction of the mod2 expression values of the test cells",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Predicted normalized expression values",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--distance_method",
+        "description" : "The distance metric to use. Possible values include `euclidean` and `minkowski`.",
+        "default" : [
+          "minkowski"
+        ],
+        "required" : false,
+        "choices" : [
+          "euclidean",
+          "minkowski"
+        ],
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_pcs",
+        "description" : "Number of components to use for dimensionality reduction.",
+        "default" : [
+          50
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/guanlab_dengkw_pm/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "dest" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Guanlab-dengkw",
+      "summary" : "A kernel ridge regression method with RBF kernel.",
+      "description" : "This is a solution developed by Team Guanlab - dengkw in the Neurips 2021 competition to predict one modality\nfrom another using kernel ridge regression (KRR) with RBF kernel. Truncated SVD is applied on the combined\ntraining and test data from modality 1 followed by row-wise z-score normalization on the reduced matrix. The\ntruncated SVD of modality 2 is predicted by training a KRR model on the normalized training matrix of modality 1.\nPredictions on the normalized test matrix are then re-mapped to the modality 2 feature space via the right\nsingular vectors. \n",
+      "preferred_normalization" : "log_cp10k",
+      "reference" : "lance2022multimodal",
+      "documentation_url" : "https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/Guanlab-dengkw",
+      "repository_url" : "https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/Guanlab-dengkw",
+      "competition_submission_id" : 170636,
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A regression method.",
+        "description" : "A regression method to predict the expression of one modality from another.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scikit-learn",
+            "pandas",
+            "numpy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "hightime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/guanlab_dengkw_pm/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/guanlab_dengkw_pm",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import numpy as np
+from scipy.sparse import csc_matrix
+from sklearn.decomposition import TruncatedSVD
+from sklearn.gaussian_process.kernels import RBF
+from sklearn.kernel_ridge import KernelRidge
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_train_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'distance_method': $( if [ ! -z ${VIASH_PAR_DISTANCE_METHOD+x} ]; then echo "r'${VIASH_PAR_DISTANCE_METHOD//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_pcs': $( if [ ! -z ${VIASH_PAR_N_PCS+x} ]; then echo "int(r'${VIASH_PAR_N_PCS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+
+## Removed PCA and normalization steps, as they arr already performed with the input data
+print('Reading input files', flush=True)
+input_train_mod1 = ad.read_h5ad(par['input_train_mod1'])
+input_train_mod2 = ad.read_h5ad(par['input_train_mod2'])
+input_test_mod1 = ad.read_h5ad(par['input_test_mod1'])
+
+batches = input_train_mod1.obs.batch.unique().tolist()
+batch_len = len(batches)
+
+# combine the train and test data
+input_train = ad.concat(
+    {"train": input_train_mod1, "test": input_test_mod1},
+    axis=0,
+    join="outer",
+    label="group",
+    fill_value=0,
+    index_unique="-"
+)
+
+print('Determine parameters by the modalities', flush=True)
+mod1_type = input_train_mod1.uns["modality"].upper()
+mod2_type = input_train_mod2.uns["modality"].upper()
+n_comp_dict = {
+    ("GEX", "ADT"): (300, 70, 10, 0.2),
+    ("ADT", "GEX"): (None, 50, 10, 0.2),
+    ("GEX", "ATAC"): (1000, 50, 10, 0.1),
+    ("ATAC", "GEX"): (100, 70, 10, 0.1)
+}
+print(f"{mod1_type}, {mod2_type}", flush=True)
+n_mod1, n_mod2, scale, alpha = n_comp_dict[(mod1_type, mod2_type)]
+print(f"{n_mod1}, {n_mod2}, {scale}, {alpha}", flush=True)
+
+# Perform PCA on the input data
+print('Models using the Truncated SVD to reduce the dimension', flush=True)
+
+if n_mod1 is not None and n_mod1 < input_train.n_vars:
+    embedder_mod1 = TruncatedSVD(n_components=n_mod1)
+    mod1_pca = embedder_mod1.fit_transform(input_train.layers["normalized"]).astype(np.float32)
+    train_matrix = mod1_pca[input_train.obs['group'] == 'train']
+    test_matrix = mod1_pca[input_train.obs['group'] == 'test']
+else:
+    train_matrix = input_train_mod1.to_df(layer="normalized").values.astype(np.float32)
+    test_matrix = input_test_mod1.to_df(layer="normalized").values.astype(np.float32)
+  
+if n_mod2 is not None and n_mod2 < input_train_mod2.n_vars:
+    embedder_mod2 = TruncatedSVD(n_components=n_mod2)
+    train_gs = embedder_mod2.fit_transform(input_train_mod2.layers["normalized"]).astype(np.float32)
+else:
+    train_gs = input_train_mod2.to_df(layer="normalized").values.astype(np.float32)
+
+del input_train
+
+print('Running normalization ...', flush=True)
+train_sd = np.std(train_matrix, axis=1).reshape(-1, 1)
+train_sd[train_sd == 0] = 1
+train_norm = (train_matrix - np.mean(train_matrix, axis=1).reshape(-1, 1)) / train_sd
+train_norm = train_norm.astype(np.float32)
+del train_matrix
+
+test_sd = np.std(test_matrix, axis=1).reshape(-1, 1)
+test_sd[test_sd == 0] = 1
+test_norm = (test_matrix - np.mean(test_matrix, axis=1).reshape(-1, 1)) / test_sd
+test_norm = test_norm.astype(np.float32)
+del test_matrix
+
+print('Running KRR model ...', flush=True)
+if batch_len == 1:
+    # just in case there is only one batch
+    batch_subsets = [batches]
+elif mod1_type == "ADT" or mod2_type == "ADT":
+    # two fold consensus predictions
+    batch_subsets = [
+        batches[:batch_len//2],
+        batches[batch_len//2:]
+    ]
+else:
+    # leave-one-batch-out consensus predictions
+    batch_subsets = [
+        batches[:i] + batches[i+1:]
+        for i in range(batch_len)
+    ]
+
+y_pred = np.zeros((input_test_mod1.n_obs, input_train_mod2.n_vars), dtype=np.float32)
+for batch in batch_subsets:
+    print(batch, flush=True)
+    kernel = RBF(length_scale = scale)
+    krr = KernelRidge(alpha=alpha, kernel=kernel)
+    print('Fitting KRR ... ', flush=True)
+    krr.fit(
+        train_norm[input_train_mod1.obs.batch.isin(batch)], 
+        train_gs[input_train_mod2.obs.batch.isin(batch)]
+    )
+    y_pred += (krr.predict(test_norm) @ embedder_mod2.components_)
+
+np.clip(y_pred, a_min=0, a_max=None, out=y_pred)
+y_pred /= len(batch_subsets)
+
+# Store as sparse matrix to be efficient. 
+# Note that this might require different classifiers/embedders before-hand. 
+# Not every class is able to support such data structures.
+## Changed from csr to csc matrix as this is more supported.
+y_pred = csc_matrix(y_pred)
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  layers = { 'normalized': y_pred },
+  obs = input_test_mod1.obs[[]],
+  var = input_train_mod2.var[[]],
+  uns = {
+    'dataset_id': input_train_mod1.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/predict_modality/methods/guanlab_dengkw_pm",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "hightime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/methods/guanlab_dengkw_pm/nextflow.config b/target/nextflow/predict_modality/methods/guanlab_dengkw_pm/nextflow.config
new file mode 100644
index 0000000000..0995ea2c94
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/guanlab_dengkw_pm/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/methods/guanlab_dengkw_pm'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/methods/knnr_py/.config.vsh.yaml b/target/nextflow/predict_modality/methods/knnr_py/.config.vsh.yaml
new file mode 100644
index 0000000000..14f6aa65e0
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/knnr_py/.config.vsh.yaml
@@ -0,0 +1,393 @@
+functionality:
+  name: "knnr_py"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--distance_method"
+    description: "The distance metric to use. Possible values include `euclidean`\
+      \ and `minkowski`."
+    info: null
+    default:
+    - "minkowski"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pcs"
+    description: "Number of components to use for dimensionality reduction."
+    info: null
+    default:
+    - 50
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_neighbors"
+    description: "Number of neighbors to use."
+    info: null
+    default:
+    - 100
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "KNNR (Py)"
+    summary: "K-nearest neighbor regression in Python."
+    description: "K-nearest neighbor regression in Python."
+    reference: "fix1989discriminatory"
+    documentation_url: "https://scikit-learn.org/stable/modules/neighbors.html"
+    repository_url: "https://github.com/scikit-learn/scikit-learn"
+    preferred_normalization: "log_cp10k"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A regression method."
+      description: "A regression method to predict the expression of one modality\
+        \ from another.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/knnr_py/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/knnr_py"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/knnr_py/knnr_py"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/methods/knnr_py/main.nf b/target/nextflow/predict_modality/methods/knnr_py/main.nf
new file mode 100644
index 0000000000..264cf04559
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/knnr_py/main.nf
@@ -0,0 +1,3905 @@
+// knnr_py 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "knnr_py",
+    "namespace" : "predict_modality/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train_mod1",
+        "info" : {
+          "label" : "Train mod1",
+          "summary" : "The mod1 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_train_mod2",
+        "info" : {
+          "label" : "Train mod2",
+          "summary" : "The mod2 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod1",
+        "info" : {
+          "label" : "Test mod1",
+          "summary" : "The mod1 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "A prediction of the mod2 expression values of the test cells",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Predicted normalized expression values",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--distance_method",
+        "description" : "The distance metric to use. Possible values include `euclidean` and `minkowski`.",
+        "default" : [
+          "minkowski"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_pcs",
+        "description" : "Number of components to use for dimensionality reduction.",
+        "default" : [
+          50
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_neighbors",
+        "description" : "Number of neighbors to use.",
+        "default" : [
+          100
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/knnr_py/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "dest" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "KNNR (Py)",
+      "summary" : "K-nearest neighbor regression in Python.",
+      "description" : "K-nearest neighbor regression in Python.",
+      "reference" : "fix1989discriminatory",
+      "documentation_url" : "https://scikit-learn.org/stable/modules/neighbors.html",
+      "repository_url" : "https://github.com/scikit-learn/scikit-learn",
+      "preferred_normalization" : "log_cp10k",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A regression method.",
+        "description" : "A regression method to predict the expression of one modality from another.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "hightime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/knnr_py/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/knnr_py",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+from scipy.sparse import csc_matrix
+from sklearn.decomposition import TruncatedSVD
+from sklearn.neighbors import KNeighborsRegressor
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_train_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'distance_method': $( if [ ! -z ${VIASH_PAR_DISTANCE_METHOD+x} ]; then echo "r'${VIASH_PAR_DISTANCE_METHOD//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_pcs': $( if [ ! -z ${VIASH_PAR_N_PCS+x} ]; then echo "int(r'${VIASH_PAR_N_PCS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_neighbors': $( if [ ! -z ${VIASH_PAR_N_NEIGHBORS+x} ]; then echo "int(r'${VIASH_PAR_N_NEIGHBORS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading \\`h5ad\\` files...', flush=True)
+input_train_mod1 = ad.read_h5ad(par['input_train_mod1'])
+input_train_mod2 = ad.read_h5ad(par['input_train_mod2'])
+input_test_mod1 = ad.read_h5ad(par['input_test_mod1'])
+
+input_train = ad.concat(
+    {"train": input_train_mod1, "test": input_test_mod1},
+    axis=0,
+    join="outer",
+    label="group",
+    fill_value=0,
+    index_unique="-"
+)
+
+print('Performing dimensionality reduction on modality 1 values...', flush=True)
+embedder = TruncatedSVD(n_components=par['n_pcs'])
+X = embedder.fit_transform(input_train.layers["normalized"])
+
+# split dimred back up
+X_train = X[input_train.obs['group'] == 'train']
+X_test = X[input_train.obs['group'] == 'test']
+y_train = input_train_mod2.layers["normalized"].toarray()
+
+assert len(X_train) + len(X_test) == len(X)
+
+print('Running KNN regression...', flush=True)
+
+reg = KNeighborsRegressor(
+    n_neighbors=par['n_neighbors'],
+    metric=par['distance_method']
+)
+
+reg.fit(X_train, y_train)
+y_pred = reg.predict(X_test)
+
+y_pred = csc_matrix(y_pred)
+
+adata = ad.AnnData(
+    layers={"normalized": y_pred},
+    obs=input_test_mod1.obs,
+    var=input_train_mod2.var,
+    uns={
+        'dataset_id': input_train_mod1.uns['dataset_id'],
+        'method_id': meta["functionality_name"],
+    },
+)
+
+print('Storing annotated data...', flush=True)
+adata.write_h5ad(par['output'], compression = "gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/predict_modality/methods/knnr_py",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "hightime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/methods/knnr_py/nextflow.config b/target/nextflow/predict_modality/methods/knnr_py/nextflow.config
new file mode 100644
index 0000000000..86302c789b
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/knnr_py/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/methods/knnr_py'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/methods/knnr_r/.config.vsh.yaml b/target/nextflow/predict_modality/methods/knnr_r/.config.vsh.yaml
new file mode 100644
index 0000000000..c1c18d0bb0
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/knnr_r/.config.vsh.yaml
@@ -0,0 +1,400 @@
+functionality:
+  name: "knnr_r"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--distance_method"
+    description: "The distance method to use. Possible values are euclidean, pearson,\
+      \ spearman and others."
+    info: null
+    default:
+    - "spearman"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pcs"
+    description: "Number of principal components to use."
+    info: null
+    default:
+    - 50
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_neighbors"
+    description: "Number of neighbors to use in the knn regression."
+    info: null
+    default:
+    - 20
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "KNNR (R)"
+    summary: "K-nearest neighbor regression in R."
+    description: "K-nearest neighbor regression in R."
+    reference: "fix1989discriminatory"
+    documentation_url: "https://cran.r-project.org/package=FNN"
+    repository_url: "https://github.com/cran/FNN"
+    preferred_normalization: "log_cp10k"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A regression method."
+      description: "A regression method to predict the expression of one modality\
+        \ from another.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "lmds"
+    - "FNN"
+    - "proxyC"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/knnr_r/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/knnr_r"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/knnr_r/knnr_r"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/methods/knnr_r/main.nf b/target/nextflow/predict_modality/methods/knnr_r/main.nf
new file mode 100644
index 0000000000..d087561037
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/knnr_r/main.nf
@@ -0,0 +1,3938 @@
+// knnr_r 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "knnr_r",
+    "namespace" : "predict_modality/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train_mod1",
+        "info" : {
+          "label" : "Train mod1",
+          "summary" : "The mod1 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_train_mod2",
+        "info" : {
+          "label" : "Train mod2",
+          "summary" : "The mod2 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod1",
+        "info" : {
+          "label" : "Test mod1",
+          "summary" : "The mod1 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "A prediction of the mod2 expression values of the test cells",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Predicted normalized expression values",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--distance_method",
+        "description" : "The distance method to use. Possible values are euclidean, pearson, spearman and others.",
+        "default" : [
+          "spearman"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_pcs",
+        "description" : "Number of principal components to use.",
+        "default" : [
+          50
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_neighbors",
+        "description" : "Number of neighbors to use in the knn regression.",
+        "default" : [
+          20
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/knnr_r/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "dest" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "KNNR (R)",
+      "summary" : "K-nearest neighbor regression in R.",
+      "description" : "K-nearest neighbor regression in R.",
+      "reference" : "fix1989discriminatory",
+      "documentation_url" : "https://cran.r-project.org/package=FNN",
+      "repository_url" : "https://github.com/cran/FNN",
+      "preferred_normalization" : "log_cp10k",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A regression method.",
+        "description" : "A regression method to predict the expression of one modality from another.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "cran" : [
+            "lmds",
+            "FNN",
+            "proxyC"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "hightime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/knnr_r/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/knnr_r",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+cat("Loading dependencies\\\\n")
+requireNamespace("anndata", quietly = TRUE)
+library(Matrix, warn.conflicts = FALSE, quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_train_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD1" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_train_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD2" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_test_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TEST_MOD1" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "distance_method" = $( if [ ! -z ${VIASH_PAR_DISTANCE_METHOD+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_DISTANCE_METHOD" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "n_pcs" = $( if [ ! -z ${VIASH_PAR_N_PCS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_PCS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "n_neighbors" = $( if [ ! -z ${VIASH_PAR_N_NEIGHBORS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_NEIGHBORS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading mod1 h5ad files\\\\n")
+input_train_mod1 <- anndata::read_h5ad(par\\$input_train_mod1)
+dataset_id <- input_train_mod1\\$uns[["dataset_id"]]
+
+# subset to HVG to reduce memory consumption
+train_mod1_sd <- proxyC::colSds(input_train_mod1\\$layers[["normalized"]])
+ix <- order(train_mod1_sd, decreasing = TRUE)[seq_len(min(1000, length(train_mod1_sd)))]
+input_train_mod1 <- input_train_mod1[,ix]\\$copy()
+gc()
+
+# subset to HVG to reduce memory consumption
+input_test_mod1 <- anndata::read_h5ad(par\\$input_test_mod1)
+input_test_mod1 <- input_test_mod1[,ix]\\$copy()
+gc()
+
+cat("Performing DR on the mod1 values\\\\n")
+# LMDS is more efficient than regular MDS because
+# it does not compure a square distance matrix.
+dr_mod1 <- lmds::lmds(
+  rbind(input_train_mod1\\$layers[["normalized"]], input_test_mod1\\$layers[["normalized"]]),
+  ndim = par\\$n_pcs,
+  distance_method = par\\$distance_method
+)
+
+ix <- seq_len(nrow(input_train_mod1))
+dr_mod1_train <- dr_mod1[ix, , drop = FALSE]
+dr_mod1_test <- dr_mod1[-ix, , drop = FALSE]
+
+# remove previous objects to save memory
+rm(input_train_mod1, input_test_mod1)
+gc()
+
+cat("Reading mod2 h5ad files\\\\n")
+input_train_mod2 <- anndata::read_h5ad(par\\$input_train_mod2)
+
+cat("Predicting for each column in modality 2\\\\n")
+# precompute knn indices
+knn_ix <- FNN::get.knnx(
+  dr_mod1_train,
+  dr_mod1_test,
+  k = par\\$n_neighbors
+)\\$nn.index
+
+# perform knn regression.
+pred <- input_train_mod2\\$layers[["normalized"]][knn_ix[, 1], , drop = FALSE]
+if (par\\$n_neighbors > 1) {
+  for (k in seq(2, par\\$n_neighbors)) {
+    pred <- pred + input_train_mod2\\$layers[["normalized"]][knn_ix[, k], , drop = FALSE]
+  }
+}
+pred <- pred / par\\$n_neighbors
+rownames(pred) <- rownames(dr_mod1_test)
+
+out <- anndata::AnnData(
+  layers = list(normalized = pred),
+  shape = dim(pred),
+  uns = list(
+    dataset_id = dataset_id,
+    method_id = meta\\$functionality_name
+  )
+)
+
+cat("Writing predictions to file\\\\n")
+zzz <- out\\$write_h5ad(par\\$output, compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/predict_modality/methods/knnr_r",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "hightime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/methods/knnr_r/nextflow.config b/target/nextflow/predict_modality/methods/knnr_r/nextflow.config
new file mode 100644
index 0000000000..a2f6302b6a
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/knnr_r/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/methods/knnr_r'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/methods/lm/.config.vsh.yaml b/target/nextflow/predict_modality/methods/lm/.config.vsh.yaml
new file mode 100644
index 0000000000..44cba4bf20
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/lm/.config.vsh.yaml
@@ -0,0 +1,389 @@
+functionality:
+  name: "lm"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--distance_method"
+    description: "The distance method to use. Possible values are euclidean, pearson,\
+      \ spearman and others."
+    info: null
+    default:
+    - "spearman"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pcs"
+    description: "Number of principal components to use."
+    info: null
+    default:
+    - 50
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Linear Model"
+    summary: "Linear model regression."
+    description: "A linear model regression method."
+    reference: "wilkinson1973symbolic"
+    repository_url: "https://github.com/RcppCore/RcppArmadillo"
+    documentation_url: "https://cran.r-project.org/package=RcppArmadillo"
+    preferred_normalization: "log_cp10k"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A regression method."
+      description: "A regression method to predict the expression of one modality\
+        \ from another.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "lmds"
+    - "RcppArmadillo"
+    - "pbapply"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/lm/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/lm"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/lm/lm"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/methods/lm/main.nf b/target/nextflow/predict_modality/methods/lm/main.nf
new file mode 100644
index 0000000000..d408de10fe
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/lm/main.nf
@@ -0,0 +1,3918 @@
+// lm 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "lm",
+    "namespace" : "predict_modality/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train_mod1",
+        "info" : {
+          "label" : "Train mod1",
+          "summary" : "The mod1 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_train_mod2",
+        "info" : {
+          "label" : "Train mod2",
+          "summary" : "The mod2 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod1",
+        "info" : {
+          "label" : "Test mod1",
+          "summary" : "The mod1 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "A prediction of the mod2 expression values of the test cells",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Predicted normalized expression values",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--distance_method",
+        "description" : "The distance method to use. Possible values are euclidean, pearson, spearman and others.",
+        "default" : [
+          "spearman"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_pcs",
+        "description" : "Number of principal components to use.",
+        "default" : [
+          50
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/lm/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "dest" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Linear Model",
+      "summary" : "Linear model regression.",
+      "description" : "A linear model regression method.",
+      "reference" : "wilkinson1973symbolic",
+      "repository_url" : "https://github.com/RcppCore/RcppArmadillo",
+      "documentation_url" : "https://cran.r-project.org/package=RcppArmadillo",
+      "preferred_normalization" : "log_cp10k",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A regression method.",
+        "description" : "A regression method to predict the expression of one modality from another.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "cran" : [
+            "lmds",
+            "RcppArmadillo",
+            "pbapply"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "hightime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/lm/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/lm",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+cat("Loading dependencies\\\\n")
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("pbapply", quietly = TRUE)
+library(Matrix, warn.conflicts = FALSE, quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_train_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD1" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_train_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD2" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_test_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TEST_MOD1" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "distance_method" = $( if [ ! -z ${VIASH_PAR_DISTANCE_METHOD+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_DISTANCE_METHOD" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "n_pcs" = $( if [ ! -z ${VIASH_PAR_N_PCS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_PCS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+n_cores <- parallel::detectCores(all.tests = FALSE, logical = TRUE)
+
+cat("Reading mod1 files\\\\n")
+input_train_mod1 <- anndata::read_h5ad(par\\$input_train_mod1)
+input_test_mod1 <- anndata::read_h5ad(par\\$input_test_mod1)
+
+
+cat("Performing DR on the mod1 values\\\\n")
+dr <- lmds::lmds(
+  rbind(input_train_mod1\\$layers[["normalized"]], input_test_mod1\\$layers[["normalized"]]), 
+  ndim = par\\$n_pcs,
+  distance_method = par\\$distance_method
+)
+
+ix <- seq_len(nrow(input_train_mod1))
+dr_train <- dr[ix, , drop = FALSE]
+dr_test <- dr[-ix, , drop = FALSE]
+
+rm(input_test_mod1)
+gc()
+
+
+cat("Reading mod2 files\\\\n")
+X_mod2 <- anndata::read_h5ad(par\\$input_train_mod2)\\$layers[["normalized"]]
+
+cat("Predicting for each column in modality 2\\\\n")
+preds <- pbapply::pblapply(
+  seq_len(ncol(X_mod2)),
+  function(i) {
+    y <- X_mod2[, i]
+    uy <- unique(y)
+    if (length(uy) > 1) {
+      fit <- RcppArmadillo::fastLm(dr_train, y)
+      # fit <- lm(y ~ ., dr_train)
+      stats::predict(fit, dr_test)
+    } else {
+      rep(uy, nrow(dr_test))
+    }
+  }
+)
+
+cat("Creating outputs object\\\\n")
+prediction <- Matrix::Matrix(do.call(cbind, preds), sparse = TRUE)
+rownames(prediction) <- rownames(dr_test)
+colnames(prediction) <- colnames(X_mod2)
+
+out <- anndata::AnnData(
+  layers = list(normalized = prediction),
+  shape = dim(prediction),
+  uns = list(
+    dataset_id = input_train_mod1\\$uns[["dataset_id"]],
+    method_id = meta\\$functionality_name
+  )
+)
+
+cat("Writing predictions to file\\\\n")
+zzz <- out\\$write_h5ad(par\\$output, compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/predict_modality/methods/lm",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "hightime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/methods/lm/nextflow.config b/target/nextflow/predict_modality/methods/lm/nextflow.config
new file mode 100644
index 0000000000..72eeac96e0
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/lm/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/methods/lm'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/methods/lmds_irlba_rf/.config.vsh.yaml b/target/nextflow/predict_modality/methods/lmds_irlba_rf/.config.vsh.yaml
new file mode 100644
index 0000000000..545eecb05f
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/lmds_irlba_rf/.config.vsh.yaml
@@ -0,0 +1,405 @@
+functionality:
+  name: "lmds_irlba_rf"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--distance_method"
+    description: "The distance method to use. Possible values are euclidean, pearson,\
+      \ spearman and others."
+    info: null
+    default:
+    - "pearson"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pcs"
+    description: "Number of principal components to use."
+    info: null
+    default:
+    - 20
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_trees"
+    description: "Number of trees to use."
+    info: null
+    default:
+    - 500
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "LMDS + IRLBA + RF"
+    summary: "A random forest regression using LMDS of modality 1 to predict a PCA\
+      \ embedding of modality 2, which is then reversed to predict the original modality\
+      \ 2."
+    description: "A random forest regression using LMDS of modality 1 to predict a\
+      \ PCA embedding of modality 2, which is then reversed to predict the original\
+      \ modality 2.\n"
+    reference: "lance2022multimodal"
+    documentation_url: "https://github.com/openproblems-bio/openproblems/tree/main/src/tasks/predict_modality/methods"
+    repository_url: "https://github.com/openproblems-bio/openproblems"
+    preferred_normalization: "log_cp10k"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A regression method."
+      description: "A regression method to predict the expression of one modality\
+        \ from another.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "lmds"
+    - "ranger"
+    - "pbapply"
+    - "irlba"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/lmds_irlba_rf/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/lmds_irlba_rf"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/lmds_irlba_rf/lmds_irlba_rf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/methods/lmds_irlba_rf/main.nf b/target/nextflow/predict_modality/methods/lmds_irlba_rf/main.nf
new file mode 100644
index 0000000000..31fbea79e8
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/lmds_irlba_rf/main.nf
@@ -0,0 +1,3951 @@
+// lmds_irlba_rf 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "lmds_irlba_rf",
+    "namespace" : "predict_modality/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train_mod1",
+        "info" : {
+          "label" : "Train mod1",
+          "summary" : "The mod1 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_train_mod2",
+        "info" : {
+          "label" : "Train mod2",
+          "summary" : "The mod2 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod1",
+        "info" : {
+          "label" : "Test mod1",
+          "summary" : "The mod1 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "A prediction of the mod2 expression values of the test cells",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Predicted normalized expression values",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--distance_method",
+        "description" : "The distance method to use. Possible values are euclidean, pearson, spearman and others.",
+        "default" : [
+          "pearson"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_pcs",
+        "description" : "Number of principal components to use.",
+        "default" : [
+          20
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_trees",
+        "description" : "Number of trees to use.",
+        "default" : [
+          500
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/lmds_irlba_rf/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "dest" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "LMDS + IRLBA + RF",
+      "summary" : "A random forest regression using LMDS of modality 1 to predict a PCA embedding of modality 2, which is then reversed to predict the original modality 2.",
+      "description" : "A random forest regression using LMDS of modality 1 to predict a PCA embedding of modality 2, which is then reversed to predict the original modality 2.\n",
+      "reference" : "lance2022multimodal",
+      "documentation_url" : "https://github.com/openproblems-bio/openproblems/tree/main/src/tasks/predict_modality/methods",
+      "repository_url" : "https://github.com/openproblems-bio/openproblems",
+      "preferred_normalization" : "log_cp10k",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A regression method.",
+        "description" : "A regression method to predict the expression of one modality from another.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "cran" : [
+            "lmds",
+            "ranger",
+            "pbapply",
+            "irlba"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "hightime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/lmds_irlba_rf/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/lmds_irlba_rf",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+cat("Loading dependencies\\\\n")
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("pbapply", quietly = TRUE)
+library(Matrix, warn.conflicts = FALSE, quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_train_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD1" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_train_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TRAIN_MOD2" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_test_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TEST_MOD1" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "distance_method" = $( if [ ! -z ${VIASH_PAR_DISTANCE_METHOD+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_DISTANCE_METHOD" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "n_pcs" = $( if [ ! -z ${VIASH_PAR_N_PCS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_PCS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "n_trees" = $( if [ ! -z ${VIASH_PAR_N_TREES+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_TREES" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+n_cores <- parallel::detectCores(all.tests = FALSE, logical = TRUE)
+
+cat("Reading mod1 files\\\\n")
+input_train_mod1 <- anndata::read_h5ad(par\\$input_train_mod1)
+input_test_mod1 <- anndata::read_h5ad(par\\$input_test_mod1)
+
+dataset_id <- input_train_mod1\\$uns[["dataset_id"]]
+
+cat("Performing DR on the mod1 values\\\\n")
+dr <- lmds::lmds(
+  rbind(input_train_mod1\\$layers[["normalized"]], input_test_mod1\\$layers[["normalized"]]), 
+  ndim = par\\$n_pcs,
+  distance_method = par\\$distance_method
+)
+# alternative:
+# pr_out <- irlba::prcomp_irlba(
+#   rbind(input_train_mod1\\$layers[["normalized"]], input_test_mod1\\$layers[["normalized"]]),
+#   n = par\\$n_pcs
+# )
+# dr <- pr_out\\$x
+
+# split up dr data
+ix <- seq_len(nrow(input_train_mod1))
+dr_train <- as.data.frame(dr[ix, , drop = FALSE])
+dr_test <- as.data.frame(dr[-ix, , drop = FALSE])
+dr_train <- dr[ix, , drop = FALSE]
+dr_test <- dr[-ix, , drop = FALSE]
+
+rm(input_train_mod1, input_test_mod1)
+gc()
+
+
+cat("Reading mod2 files\\\\n")
+X_mod2 <- anndata::read_h5ad(par\\$input_train_mod2)\\$layers[["normalized"]]
+prcomp_mod2 <- irlba::prcomp_irlba(X_mod2, n = par\\$n_pcs)
+dr_mod2 <- prcomp_mod2\\$x
+
+cat("Predicting for each column in modality 2\\\\n")
+pred_drs <- pbapply::pblapply(
+  seq_len(ncol(dr_mod2)),
+  function(i) {
+    y <- dr_mod2[, i]
+    uy <- unique(y)
+    if (length(uy) > 1) {
+      rf <- ranger::ranger(
+        x = dr_train,
+        y = y,
+        num.trees = par\\$n_trees,
+        num.threads = n_cores
+      )
+      stats::predict(rf, dr_test)\\$prediction
+    } else {
+      rep(uy, nrow(dr_test))
+    }
+  }
+)
+
+cat("Creating outputs object\\\\n")
+pred_dr <- Matrix::Matrix(do.call(cbind, pred_drs), sparse = TRUE)
+prediction <- pred_dr %*% t(prcomp_mod2\\$rotation)
+rownames(prediction) <- rownames(dr_test)
+colnames(prediction) <- colnames(X_mod2)
+
+out <- anndata::AnnData(
+  layers = list(normalized = as(prediction, "CsparseMatrix")),
+  shape = dim(prediction),
+  uns = list(
+    dataset_id = dataset_id,
+    method_id = meta\\$functionality_name
+  )
+)
+
+
+cat("Writing predictions to file\\\\n")
+zzz <- out\\$write_h5ad(par\\$output, compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/predict_modality/methods/lmds_irlba_rf",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "hightime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/methods/lmds_irlba_rf/nextflow.config b/target/nextflow/predict_modality/methods/lmds_irlba_rf/nextflow.config
new file mode 100644
index 0000000000..702efb11ce
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/lmds_irlba_rf/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/methods/lmds_irlba_rf'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/methods/novel/.config.vsh.yaml b/target/nextflow/predict_modality/methods/novel/.config.vsh.yaml
new file mode 100644
index 0000000000..f95faca500
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/novel/.config.vsh.yaml
@@ -0,0 +1,384 @@
+functionality:
+  name: "novel"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Novel"
+    summary: "A method using encoder-decoder MLP model"
+    description: "This method trains an encoder-decoder MLP model with one output\
+      \ neuron per component in the target. As an input, the encoders use representations\
+      \ obtained from ATAC and GEX data via LSI transform and raw ADT data. The hyperparameters\
+      \ of the models were found via broad hyperparameter search using the Optuna\
+      \ framework."
+    documentation_url: "https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/novel#readme"
+    repository_url: "https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/novel"
+    reference: "pmlr-v176-lance2022multimodal"
+    submission_id: "169769"
+    preferred_normalization: "log_cp10k"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A regression method."
+      description: "A regression method to predict the expression of one modality\
+        \ from another.\n"
+  status: "enabled"
+  dependencies:
+  - name: "predict_modality/methods/novel_train"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/train/config.vsh.yaml"
+    configInfo:
+      functionalityName: "novel_train"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/train/config.vsh.yaml"
+      functionalityNamespace: "predict_modality/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/predict_modality/methods/novel_train/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/novel_train"
+  - name: "predict_modality/methods/novel_predict"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/predict/config.vsh.yaml"
+    configInfo:
+      functionalityName: "novel_predict"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/predict/config.vsh.yaml"
+      functionalityNamespace: "predict_modality/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/predict_modality/methods/novel_predict/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/novel_predict"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/run/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/novel"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/novel/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/methods/novel/main.nf b/target/nextflow/predict_modality/methods/novel/main.nf
new file mode 100644
index 0000000000..3379cae745
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/novel/main.nf
@@ -0,0 +1,3491 @@
+// novel 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "novel",
+    "namespace" : "predict_modality/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train_mod1",
+        "info" : {
+          "label" : "Train mod1",
+          "summary" : "The mod1 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_train_mod2",
+        "info" : {
+          "label" : "Train mod2",
+          "summary" : "The mod2 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod1",
+        "info" : {
+          "label" : "Test mod1",
+          "summary" : "The mod1 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "A prediction of the mod2 expression values of the test cells",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Predicted normalized expression values",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/run/",
+        "entrypoint" : "run_wf"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "dest" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Novel",
+      "summary" : "A method using encoder-decoder MLP model",
+      "description" : "This method trains an encoder-decoder MLP model with one output neuron per component in the target. As an input, the encoders use representations obtained from ATAC and GEX data via LSI transform and raw ADT data. The hyperparameters of the models were found via broad hyperparameter search using the Optuna framework.",
+      "documentation_url" : "https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/novel#readme",
+      "repository_url" : "https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/novel",
+      "reference" : "pmlr-v176-lance2022multimodal",
+      "submission_id" : "169769",
+      "preferred_normalization" : "log_cp10k",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A regression method.",
+        "description" : "A regression method to predict the expression of one modality from another.\n"
+      }
+    },
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "predict_modality/methods/novel_train",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/train/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "novel_train",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/train/config.vsh.yaml",
+          "functionalityNamespace" : "predict_modality/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/predict_modality/methods/novel_train/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/novel_train"
+      },
+      {
+        "name" : "predict_modality/methods/novel_predict",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/predict/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "novel_predict",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/predict/config.vsh.yaml",
+          "functionalityNamespace" : "predict_modality/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/predict_modality/methods/novel_predict/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/novel_predict"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/run/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/novel",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { novel_train } from "${meta.resources_dir}/../../../../nextflow/predict_modality/methods/novel_train/main.nf"
+include { novel_predict } from "${meta.resources_dir}/../../../../nextflow/predict_modality/methods/novel_predict/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+workflow run_wf {
+  take: input_ch
+  main:
+  output_ch = input_ch
+    | novel_train.run(
+      fromState: ["input_train_mod1", "input_train_mod2"],
+      toState: ["input_model": "output", "input_transform": "output_transform", "output_train_mod2": "output_train_mod2"]
+    )
+    | novel_predict.run(
+      fromState: { id, state ->
+        [
+          "input_train_mod2": state.output_train_mod2,
+          "input_test_mod1": state.input_test_mod1,
+          "input_model": state.input_model, 
+          "input_transform": state.input_transform,
+          "output": state.output]},
+      toState: ["output": "output"]
+    )
+
+    | map { tup ->
+      [tup[0], [output: tup[1].output]]
+    }
+
+  emit: output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/methods/novel/nextflow.config b/target/nextflow/predict_modality/methods/novel/nextflow.config
new file mode 100644
index 0000000000..220415ce1a
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/novel/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/methods/novel'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/methods/novel_predict/.config.vsh.yaml b/target/nextflow/predict_modality/methods/novel_predict/.config.vsh.yaml
new file mode 100644
index 0000000000..bd1c0d4a4c
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/novel_predict/.config.vsh.yaml
@@ -0,0 +1,375 @@
+functionality:
+  name: "novel_predict"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_model"
+    info:
+      label: "Pretrained model"
+      summary: "A pretrained model for predicting the expression of one modality from\
+        \ another."
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_transform"
+    info: null
+    example:
+    - "lsi_transformer.pickle"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "../helper_functions.py"
+  info:
+    type: "method_predict"
+    type_info:
+      label: "Predict"
+      summary: "Make predictions using a trained model."
+      description: "This method makes predictions using a trained model.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_pytorch_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    - "networkx"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "hightime"
+    - "midcpu"
+    - "highsharedmem"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/predict/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/novel_predict"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/novel_predict/novel_predict"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/methods/novel_predict/helper_functions.py b/target/nextflow/predict_modality/methods/novel_predict/helper_functions.py
new file mode 100644
index 0000000000..17c57c9b3b
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/novel_predict/helper_functions.py
@@ -0,0 +1,247 @@
+import torch
+
+from torch import nn
+import torch.nn.functional as F
+
+from torch.utils.data import Dataset
+
+from typing import Optional
+
+import anndata
+import numpy as np
+import pandas as pd
+import scipy.sparse
+import sklearn.decomposition
+import sklearn.feature_extraction.text
+import sklearn.preprocessing
+import sklearn.neighbors
+import sklearn.utils.extmath
+
+class tfidfTransformer():
+    def __init__(self):
+        self.idf = None
+        self.fitted = False
+
+    def fit(self, X):
+        self.idf = X.shape[0] / X.sum(axis=0)
+        self.fitted = True
+
+    def transform(self, X):
+        if not self.fitted:
+            raise RuntimeError('Transformer was not fitted on any data')
+        if scipy.sparse.issparse(X):
+            tf = X.multiply(1 / X.sum(axis=1))
+            return tf.multiply(self.idf)
+        else:
+            tf = X / X.sum(axis=1, keepdims=True)
+            return tf * self.idf
+
+    def fit_transform(self, X):
+        self.fit(X)
+        return self.transform(X)
+
+class lsiTransformer():
+    def __init__(self,
+                 n_components: int = 20,
+                 use_highly_variable = None
+                ):
+        self.n_components = n_components
+        self.use_highly_variable = use_highly_variable
+        self.tfidfTransformer = tfidfTransformer()
+        self.normalizer =  sklearn.preprocessing.Normalizer(norm="l1")
+        self.pcaTransformer = sklearn.decomposition.TruncatedSVD(n_components = self.n_components, random_state=777)
+        # self.lsi_mean = None
+        # self.lsi_std = None
+        self.fitted = None
+        
+    def fit(self, adata: anndata.AnnData):
+        if self.use_highly_variable is None:
+            self.use_highly_variable = "hvg" in adata.var
+        adata_use = adata[:, adata.var["hvg"]] if self.use_highly_variable else adata
+        X = self.tfidfTransformer.fit_transform(adata_use.X)
+        X_norm = self.normalizer.fit_transform(X)
+        X_norm = np.log1p(X_norm * 1e4)
+        X_lsi = self.pcaTransformer.fit_transform(X_norm)
+        # self.lsi_mean = X_lsi.mean(axis=1, keepdims=True)
+        # self.lsi_std = X_lsi.std(axis=1, ddof=1, keepdims=True)
+        self.fitted = True
+    
+    def transform(self, adata):
+        if not self.fitted:
+            raise RuntimeError('Transformer was not fitted on any data')
+        adata_use = adata[:, adata.var["hvg"]] if self.use_highly_variable else adata
+        X = self.tfidfTransformer.transform(adata_use.X)
+        X_norm = self.normalizer.transform(X)
+        X_norm = np.log1p(X_norm * 1e4)
+        X_lsi = self.pcaTransformer.transform(X_norm)
+        X_lsi -= X_lsi.mean(axis=1, keepdims=True)
+        X_lsi /= X_lsi.std(axis=1, ddof=1, keepdims=True)
+        lsi_df = pd.DataFrame(X_lsi, index = adata_use.obs_names)
+        return lsi_df
+    
+    def fit_transform(self, adata):
+        self.fit(adata)
+        return self.transform(adata)
+
+class ModalityMatchingDataset(Dataset):
+    def __init__(
+        self, df_modality1, df_modality2, is_train=True
+    ):
+        super().__init__()
+        self.df_modality1 = df_modality1
+        self.df_modality2 = df_modality2
+        self.is_train = is_train
+    def __len__(self):
+        return self.df_modality1.shape[0]
+    
+    def __getitem__(self, index: int):
+        if self.is_train == True:
+            x = self.df_modality1.iloc[index].values
+            y = self.df_modality2.iloc[index].values
+            return x, y
+        else:
+            x = self.df_modality1.iloc[index].values
+            return x
+
+class Swish(torch.autograd.Function):
+    @staticmethod
+    def forward(ctx, i):
+        result = i * sigmoid(i)
+        ctx.save_for_backward(i)
+        return result
+    @staticmethod
+    def backward(ctx, grad_output):
+        i = ctx.saved_variables[0]
+        sigmoid_i = sigmoid(i)
+        return grad_output * (sigmoid_i * (1 + i * (1 - sigmoid_i)))
+
+class Swish_module(nn.Module):
+    def forward(self, x):
+        return Swish.apply(x)
+    
+sigmoid = torch.nn.Sigmoid()
+
+class ModelRegressionGex2Atac(nn.Module):
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionGex2Atac, self).__init__()
+        #self.bn = torch.nn.BatchNorm1d(1024)
+        self.input_ = nn.Linear(dim_mod1, 1024)
+        self.fc = nn.Linear(1024, 256)
+        self.fc1 = nn.Linear(256, 2048)
+        self.dropout1 = nn.Dropout(p=0.298885630228993)
+        self.dropout2 = nn.Dropout(p=0.11289717442776658)
+        self.dropout3 = nn.Dropout(p=0.13523634924414762)
+        self.output = nn.Linear(2048, dim_mod2)
+    def forward(self, x):
+        x = F.gelu(self.input_(x))
+        x = self.dropout1(x)
+        x = F.gelu(self.fc(x))
+        x = self.dropout2(x)
+        x = F.gelu(self.fc1(x))
+        x = self.dropout3(x)
+        x = F.gelu(self.output(x))
+        return x
+
+class ModelRegressionAtac2Gex(nn.Module): #
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionAtac2Gex, self).__init__()
+        self.input_ = nn.Linear(dim_mod1, 2048)
+        self.fc = nn.Linear(2048, 2048)
+        self.fc1 = nn.Linear(2048, 512)
+        self.dropout1 = nn.Dropout(p=0.2649138776004753)
+        self.dropout2 = nn.Dropout(p=0.1769628308148758)
+        self.dropout3 = nn.Dropout(p=0.2516791883012817)
+        self.output = nn.Linear(512, dim_mod2)
+    def forward(self, x):
+        x = F.gelu(self.input_(x))
+        x = self.dropout1(x)
+        x = F.gelu(self.fc(x))
+        x = self.dropout2(x)
+        x = F.gelu(self.fc1(x))
+        x = self.dropout3(x)
+        x = F.gelu(self.output(x))
+        return x
+
+class ModelRegressionAdt2Gex(nn.Module):
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionAdt2Gex, self).__init__()
+        self.input_ = nn.Linear(dim_mod1, 512)
+        self.dropout1 = nn.Dropout(p=0.0)
+        self.swish = Swish_module()
+        self.fc = nn.Linear(512, 512)
+        self.fc1 = nn.Linear(512, 512)
+        self.fc2 = nn.Linear(512, 512)
+        self.output = nn.Linear(512, dim_mod2)
+    def forward(self, x):
+        x = F.gelu(self.input_(x))
+        x = F.gelu(self.fc(x))
+        x = F.gelu(self.fc1(x))
+        x = F.gelu(self.fc2(x))
+        x = F.gelu(self.output(x))
+        return x
+    
+class ModelRegressionGex2Adt(nn.Module):
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionGex2Adt, self).__init__()
+        self.input_ = nn.Linear(dim_mod1, 512)
+        self.dropout1 = nn.Dropout(p=0.20335661386636347)
+        self.dropout2 = nn.Dropout(p=0.15395289261127876)
+        self.dropout3 = nn.Dropout(p=0.16902655078832815)
+        self.fc = nn.Linear(512, 512)
+        self.fc1 = nn.Linear(512, 2048)
+        self.output = nn.Linear(2048, dim_mod2)
+    def forward(self, x):
+       # x = self.batchswap_noise(x)
+        x = F.gelu(self.input_(x))
+        x = self.dropout1(x)
+        x = F.gelu(self.fc(x))
+        x = self.dropout2(x)
+        x = F.gelu(self.fc1(x))
+        x = self.dropout3(x)
+        x = F.gelu(self.output(x))
+        return x
+
+def rmse(y, y_pred):
+    return np.sqrt(np.mean(np.square(y - y_pred)))
+
+def train_and_valid(model, optimizer, loss_fn, dataloader_train, dataloader_test, name_model, device):
+    best_score = 100000
+    for i in range(100):
+        train_losses = []
+        test_losses = []
+        model.train()
+
+        for x, y in dataloader_train:
+            optimizer.zero_grad()
+            output = model(x.float().to(device))
+            loss = torch.sqrt(loss_fn(output, y.float().to(device)))
+            loss.backward()
+            train_losses.append(loss.item())
+            optimizer.step()
+           
+        model.eval()
+        with torch.no_grad():
+            for x, y in dataloader_test:
+                output = model(x.float().to(device))
+                output[output<0] = 0.0
+                loss = torch.sqrt(loss_fn(output, y.float().to(device)))
+                test_losses.append(loss.item())
+        
+        outputs = []
+        targets = []
+        model.eval()
+        with torch.no_grad():
+            for x, y in dataloader_test:
+                output = model(x.float().to(device))
+                
+                outputs.append(output.detach().cpu().numpy())
+                targets.append(y.float().detach().cpu().numpy())
+        cat_outputs = np.concatenate(outputs)
+        cat_targets = np.concatenate(targets)
+        cat_outputs[cat_outputs<0.0] = 0
+        
+        if best_score > rmse(cat_targets,cat_outputs):
+            torch.save(model.state_dict(), name_model)
+            best_score = rmse(cat_targets,cat_outputs)
+    print("best rmse: ", best_score)
+    
diff --git a/target/nextflow/predict_modality/methods/novel_predict/main.nf b/target/nextflow/predict_modality/methods/novel_predict/main.nf
new file mode 100644
index 0000000000..1421bc5a40
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/novel_predict/main.nf
@@ -0,0 +1,3931 @@
+// novel_predict 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "novel_predict",
+    "namespace" : "predict_modality/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train_mod1",
+        "info" : {
+          "label" : "Train mod1",
+          "summary" : "The mod1 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_train_mod2",
+        "info" : {
+          "label" : "Train mod2",
+          "summary" : "The mod2 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod1",
+        "info" : {
+          "label" : "Test mod1",
+          "summary" : "The mod1 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_model",
+        "info" : {
+          "label" : "Pretrained model",
+          "summary" : "A pretrained model for predicting the expression of one modality from another."
+        },
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "A prediction of the mod2 expression values of the test cells",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Predicted normalized expression values",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_transform",
+        "example" : [
+          "lsi_transformer.pickle"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/predict/"
+      },
+      {
+        "type" : "file",
+        "path" : "../helper_functions.py",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/predict/"
+      }
+    ],
+    "info" : {
+      "type" : "method_predict",
+      "type_info" : {
+        "label" : "Predict",
+        "summary" : "Make predictions using a trained model.",
+        "description" : "This method makes predictions using a trained model.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_pytorch_nvidia:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scikit-learn",
+            "networkx"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "highmem",
+          "hightime",
+          "midcpu",
+          "highsharedmem",
+          "gpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/predict/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/novel_predict",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+import torch
+from torch.utils.data import DataLoader
+
+import anndata as ad
+import pickle
+import numpy as np
+from scipy.sparse import csc_matrix
+
+#check gpu available
+if (torch.cuda.is_available()):
+    device = 'cuda:0' #switch to current device
+    print('current device: gpu', flush=True)
+else:
+    device = 'cpu'
+    print('current device: cpu', flush=True)
+
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_train_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_model': $( if [ ! -z ${VIASH_PAR_INPUT_MODEL+x} ]; then echo "r'${VIASH_PAR_INPUT_MODEL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_transform': $( if [ ! -z ${VIASH_PAR_INPUT_TRANSFORM+x} ]; then echo "r'${VIASH_PAR_INPUT_TRANSFORM//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta['resources_dir'])
+from helper_functions import ModelRegressionAtac2Gex, ModelRegressionAdt2Gex, ModelRegressionGex2Adt, ModelRegressionGex2Atac, ModalityMatchingDataset
+
+print("Load data", flush=True)
+
+input_test_mod1 = ad.read_h5ad(par['input_test_mod1'])
+input_train_mod2 = ad.read_h5ad(par['input_train_mod2'])
+
+mod1 = input_test_mod1.uns['modality']
+mod2 = input_train_mod2.uns['modality']
+
+n_vars_mod1 = input_train_mod2.uns["model_dim"]["mod1"]
+n_vars_mod2 = input_train_mod2.uns["model_dim"]["mod2"]
+
+input_test_mod1.X = input_test_mod1.layers['normalized'].tocsr()
+
+# Remove vars that were removed from training set. Mostlyy only applicable for testing.
+if input_train_mod2.uns.get("removed_vars"):
+  rem_var = input_train_mod2.uns["removed_vars"]
+  input_test_mod1 = input_test_mod1[:, ~input_test_mod1.var_names.isin(rem_var)]
+
+del input_train_mod2
+
+
+model_fp = par['input_model']
+
+print("Start predict", flush=True)
+
+if mod1 == 'GEX' and mod2 == 'ADT':
+  model = ModelRegressionGex2Adt(n_vars_mod1,n_vars_mod2)   
+  weight = torch.load(model_fp, map_location='cpu')    
+  with open(par['input_transform'], 'rb') as f:
+    lsi_transformer_gex = pickle.load(f)
+  
+  model.load_state_dict(weight)    
+  input_test_mod1_ = lsi_transformer_gex.transform(input_test_mod1)
+
+elif mod1 == 'GEX' and mod2 == 'ATAC':
+  model = ModelRegressionGex2Atac(n_vars_mod1,n_vars_mod2)   
+  weight = torch.load(model_fp, map_location='cpu')
+  with open(par['input_transform'], 'rb') as f:
+    lsi_transformer_gex = pickle.load(f)
+  
+  model.load_state_dict(weight)    
+  input_test_mod1_ = lsi_transformer_gex.transform(input_test_mod1)
+    
+elif mod1 == 'ATAC' and mod2 == 'GEX':
+  model = ModelRegressionAtac2Gex(n_vars_mod1,n_vars_mod2)   
+  weight = torch.load(model_fp, map_location='cpu')
+  with open(par['input_transform'], 'rb') as f:
+    lsi_transformer_gex = pickle.load(f)
+      
+  model.load_state_dict(weight)    
+  input_test_mod1_ = lsi_transformer_gex.transform(input_test_mod1)
+
+elif mod1 == 'ADT' and mod2 == 'GEX':
+    model = ModelRegressionAdt2Gex(n_vars_mod1,n_vars_mod2)   
+    weight = torch.load(model_fp, map_location='cpu')
+
+    model.load_state_dict(weight)    
+    input_test_mod1_ = input_test_mod1.to_df()
+    
+dataset_test = ModalityMatchingDataset(input_test_mod1_, None, is_train=False)
+dataloader_test = DataLoader(dataset_test, 32, shuffle = False, num_workers = 4)
+
+outputs = []
+model.eval()
+with torch.no_grad():
+    for x in dataloader_test:
+        output = model(x.float())
+        outputs.append(output.detach().cpu().numpy())
+
+outputs = np.concatenate(outputs)
+outputs[outputs<0] = 0
+outputs = csc_matrix(outputs)
+
+adata = ad.AnnData(
+    layers={"normalized": outputs},
+    shape=outputs.shape,
+    uns={
+        'dataset_id': input_test_mod1.uns['dataset_id'],
+        'method_id': meta['functionality_name'],
+    },
+)
+adata.write_h5ad(par['output'], compression = "gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/predict_modality/methods/novel_predict",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "highmem",
+    "hightime",
+    "midcpu",
+    "highsharedmem",
+    "gpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/methods/novel_predict/nextflow.config b/target/nextflow/predict_modality/methods/novel_predict/nextflow.config
new file mode 100644
index 0000000000..4cf92b6dca
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/novel_predict/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/methods/novel_predict'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/methods/novel_train/.config.vsh.yaml b/target/nextflow/predict_modality/methods/novel_train/.config.vsh.yaml
new file mode 100644
index 0000000000..25276dd6d2
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/novel_train/.config.vsh.yaml
@@ -0,0 +1,361 @@
+functionality:
+  name: "novel_train"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Pretrained model"
+      summary: "A pretrained model for predicting the expression of one modality from\
+        \ another."
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_transform"
+    description: "The output transform file"
+    info: null
+    default:
+    - "lsi_transformer.pickle"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_train_mod2"
+    description: "copy of the input with model dim in `.uns`"
+    info: null
+    default:
+    - "train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "../helper_functions.py"
+  info:
+    type: "method_train"
+    type_info:
+      label: "Train"
+      summary: "Train a model to predict the expression of one modality from another."
+      description: "This method trains a model to predict the expression of one modality\
+        \ from another.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_pytorch_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    - "networkx"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "hightime"
+    - "midcpu"
+    - "highsharedmem"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/train/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/novel_train"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/novel_train/novel_train"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/methods/novel_train/helper_functions.py b/target/nextflow/predict_modality/methods/novel_train/helper_functions.py
new file mode 100644
index 0000000000..17c57c9b3b
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/novel_train/helper_functions.py
@@ -0,0 +1,247 @@
+import torch
+
+from torch import nn
+import torch.nn.functional as F
+
+from torch.utils.data import Dataset
+
+from typing import Optional
+
+import anndata
+import numpy as np
+import pandas as pd
+import scipy.sparse
+import sklearn.decomposition
+import sklearn.feature_extraction.text
+import sklearn.preprocessing
+import sklearn.neighbors
+import sklearn.utils.extmath
+
+class tfidfTransformer():
+    def __init__(self):
+        self.idf = None
+        self.fitted = False
+
+    def fit(self, X):
+        self.idf = X.shape[0] / X.sum(axis=0)
+        self.fitted = True
+
+    def transform(self, X):
+        if not self.fitted:
+            raise RuntimeError('Transformer was not fitted on any data')
+        if scipy.sparse.issparse(X):
+            tf = X.multiply(1 / X.sum(axis=1))
+            return tf.multiply(self.idf)
+        else:
+            tf = X / X.sum(axis=1, keepdims=True)
+            return tf * self.idf
+
+    def fit_transform(self, X):
+        self.fit(X)
+        return self.transform(X)
+
+class lsiTransformer():
+    def __init__(self,
+                 n_components: int = 20,
+                 use_highly_variable = None
+                ):
+        self.n_components = n_components
+        self.use_highly_variable = use_highly_variable
+        self.tfidfTransformer = tfidfTransformer()
+        self.normalizer =  sklearn.preprocessing.Normalizer(norm="l1")
+        self.pcaTransformer = sklearn.decomposition.TruncatedSVD(n_components = self.n_components, random_state=777)
+        # self.lsi_mean = None
+        # self.lsi_std = None
+        self.fitted = None
+        
+    def fit(self, adata: anndata.AnnData):
+        if self.use_highly_variable is None:
+            self.use_highly_variable = "hvg" in adata.var
+        adata_use = adata[:, adata.var["hvg"]] if self.use_highly_variable else adata
+        X = self.tfidfTransformer.fit_transform(adata_use.X)
+        X_norm = self.normalizer.fit_transform(X)
+        X_norm = np.log1p(X_norm * 1e4)
+        X_lsi = self.pcaTransformer.fit_transform(X_norm)
+        # self.lsi_mean = X_lsi.mean(axis=1, keepdims=True)
+        # self.lsi_std = X_lsi.std(axis=1, ddof=1, keepdims=True)
+        self.fitted = True
+    
+    def transform(self, adata):
+        if not self.fitted:
+            raise RuntimeError('Transformer was not fitted on any data')
+        adata_use = adata[:, adata.var["hvg"]] if self.use_highly_variable else adata
+        X = self.tfidfTransformer.transform(adata_use.X)
+        X_norm = self.normalizer.transform(X)
+        X_norm = np.log1p(X_norm * 1e4)
+        X_lsi = self.pcaTransformer.transform(X_norm)
+        X_lsi -= X_lsi.mean(axis=1, keepdims=True)
+        X_lsi /= X_lsi.std(axis=1, ddof=1, keepdims=True)
+        lsi_df = pd.DataFrame(X_lsi, index = adata_use.obs_names)
+        return lsi_df
+    
+    def fit_transform(self, adata):
+        self.fit(adata)
+        return self.transform(adata)
+
+class ModalityMatchingDataset(Dataset):
+    def __init__(
+        self, df_modality1, df_modality2, is_train=True
+    ):
+        super().__init__()
+        self.df_modality1 = df_modality1
+        self.df_modality2 = df_modality2
+        self.is_train = is_train
+    def __len__(self):
+        return self.df_modality1.shape[0]
+    
+    def __getitem__(self, index: int):
+        if self.is_train == True:
+            x = self.df_modality1.iloc[index].values
+            y = self.df_modality2.iloc[index].values
+            return x, y
+        else:
+            x = self.df_modality1.iloc[index].values
+            return x
+
+class Swish(torch.autograd.Function):
+    @staticmethod
+    def forward(ctx, i):
+        result = i * sigmoid(i)
+        ctx.save_for_backward(i)
+        return result
+    @staticmethod
+    def backward(ctx, grad_output):
+        i = ctx.saved_variables[0]
+        sigmoid_i = sigmoid(i)
+        return grad_output * (sigmoid_i * (1 + i * (1 - sigmoid_i)))
+
+class Swish_module(nn.Module):
+    def forward(self, x):
+        return Swish.apply(x)
+    
+sigmoid = torch.nn.Sigmoid()
+
+class ModelRegressionGex2Atac(nn.Module):
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionGex2Atac, self).__init__()
+        #self.bn = torch.nn.BatchNorm1d(1024)
+        self.input_ = nn.Linear(dim_mod1, 1024)
+        self.fc = nn.Linear(1024, 256)
+        self.fc1 = nn.Linear(256, 2048)
+        self.dropout1 = nn.Dropout(p=0.298885630228993)
+        self.dropout2 = nn.Dropout(p=0.11289717442776658)
+        self.dropout3 = nn.Dropout(p=0.13523634924414762)
+        self.output = nn.Linear(2048, dim_mod2)
+    def forward(self, x):
+        x = F.gelu(self.input_(x))
+        x = self.dropout1(x)
+        x = F.gelu(self.fc(x))
+        x = self.dropout2(x)
+        x = F.gelu(self.fc1(x))
+        x = self.dropout3(x)
+        x = F.gelu(self.output(x))
+        return x
+
+class ModelRegressionAtac2Gex(nn.Module): #
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionAtac2Gex, self).__init__()
+        self.input_ = nn.Linear(dim_mod1, 2048)
+        self.fc = nn.Linear(2048, 2048)
+        self.fc1 = nn.Linear(2048, 512)
+        self.dropout1 = nn.Dropout(p=0.2649138776004753)
+        self.dropout2 = nn.Dropout(p=0.1769628308148758)
+        self.dropout3 = nn.Dropout(p=0.2516791883012817)
+        self.output = nn.Linear(512, dim_mod2)
+    def forward(self, x):
+        x = F.gelu(self.input_(x))
+        x = self.dropout1(x)
+        x = F.gelu(self.fc(x))
+        x = self.dropout2(x)
+        x = F.gelu(self.fc1(x))
+        x = self.dropout3(x)
+        x = F.gelu(self.output(x))
+        return x
+
+class ModelRegressionAdt2Gex(nn.Module):
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionAdt2Gex, self).__init__()
+        self.input_ = nn.Linear(dim_mod1, 512)
+        self.dropout1 = nn.Dropout(p=0.0)
+        self.swish = Swish_module()
+        self.fc = nn.Linear(512, 512)
+        self.fc1 = nn.Linear(512, 512)
+        self.fc2 = nn.Linear(512, 512)
+        self.output = nn.Linear(512, dim_mod2)
+    def forward(self, x):
+        x = F.gelu(self.input_(x))
+        x = F.gelu(self.fc(x))
+        x = F.gelu(self.fc1(x))
+        x = F.gelu(self.fc2(x))
+        x = F.gelu(self.output(x))
+        return x
+    
+class ModelRegressionGex2Adt(nn.Module):
+    def __init__(self, dim_mod1, dim_mod2):
+        super(ModelRegressionGex2Adt, self).__init__()
+        self.input_ = nn.Linear(dim_mod1, 512)
+        self.dropout1 = nn.Dropout(p=0.20335661386636347)
+        self.dropout2 = nn.Dropout(p=0.15395289261127876)
+        self.dropout3 = nn.Dropout(p=0.16902655078832815)
+        self.fc = nn.Linear(512, 512)
+        self.fc1 = nn.Linear(512, 2048)
+        self.output = nn.Linear(2048, dim_mod2)
+    def forward(self, x):
+       # x = self.batchswap_noise(x)
+        x = F.gelu(self.input_(x))
+        x = self.dropout1(x)
+        x = F.gelu(self.fc(x))
+        x = self.dropout2(x)
+        x = F.gelu(self.fc1(x))
+        x = self.dropout3(x)
+        x = F.gelu(self.output(x))
+        return x
+
+def rmse(y, y_pred):
+    return np.sqrt(np.mean(np.square(y - y_pred)))
+
+def train_and_valid(model, optimizer, loss_fn, dataloader_train, dataloader_test, name_model, device):
+    best_score = 100000
+    for i in range(100):
+        train_losses = []
+        test_losses = []
+        model.train()
+
+        for x, y in dataloader_train:
+            optimizer.zero_grad()
+            output = model(x.float().to(device))
+            loss = torch.sqrt(loss_fn(output, y.float().to(device)))
+            loss.backward()
+            train_losses.append(loss.item())
+            optimizer.step()
+           
+        model.eval()
+        with torch.no_grad():
+            for x, y in dataloader_test:
+                output = model(x.float().to(device))
+                output[output<0] = 0.0
+                loss = torch.sqrt(loss_fn(output, y.float().to(device)))
+                test_losses.append(loss.item())
+        
+        outputs = []
+        targets = []
+        model.eval()
+        with torch.no_grad():
+            for x, y in dataloader_test:
+                output = model(x.float().to(device))
+                
+                outputs.append(output.detach().cpu().numpy())
+                targets.append(y.float().detach().cpu().numpy())
+        cat_outputs = np.concatenate(outputs)
+        cat_targets = np.concatenate(targets)
+        cat_outputs[cat_outputs<0.0] = 0
+        
+        if best_score > rmse(cat_targets,cat_outputs):
+            torch.save(model.state_dict(), name_model)
+            best_score = rmse(cat_targets,cat_outputs)
+    print("best rmse: ", best_score)
+    
diff --git a/target/nextflow/predict_modality/methods/novel_train/main.nf b/target/nextflow/predict_modality/methods/novel_train/main.nf
new file mode 100644
index 0000000000..b4988631ff
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/novel_train/main.nf
@@ -0,0 +1,3935 @@
+// novel_train 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "novel_train",
+    "namespace" : "predict_modality/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train_mod1",
+        "info" : {
+          "label" : "Train mod1",
+          "summary" : "The mod1 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_train_mod2",
+        "info" : {
+          "label" : "Train mod2",
+          "summary" : "The mod2 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod1",
+        "info" : {
+          "label" : "Test mod1",
+          "summary" : "The mod1 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Pretrained model",
+          "summary" : "A pretrained model for predicting the expression of one modality from another."
+        },
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_transform",
+        "description" : "The output transform file",
+        "default" : [
+          "lsi_transformer.pickle"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_train_mod2",
+        "description" : "copy of the input with model dim in `.uns`",
+        "default" : [
+          "train_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/train/"
+      },
+      {
+        "type" : "file",
+        "path" : "../helper_functions.py",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/train/"
+      }
+    ],
+    "info" : {
+      "type" : "method_train",
+      "type_info" : {
+        "label" : "Train",
+        "summary" : "Train a model to predict the expression of one modality from another.",
+        "description" : "This method trains a model to predict the expression of one modality from another.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_pytorch_nvidia:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scikit-learn",
+            "networkx"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "highmem",
+          "hightime",
+          "midcpu",
+          "highsharedmem",
+          "gpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/train/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/novel_train",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import sys
+
+import torch
+from torch.utils.data import DataLoader
+# from sklearn.model_selection import train_test_split
+
+import anndata as ad
+import pickle
+
+#check gpu available
+if (torch.cuda.is_available()):
+    device = 'cuda:0' #switch to current device
+    print('current device: gpu', flush=True)
+else:
+    device = 'cpu'
+    print('current device: cpu', flush=True)
+
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_train_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_transform': $( if [ ! -z ${VIASH_PAR_OUTPUT_TRANSFORM+x} ]; then echo "r'${VIASH_PAR_OUTPUT_TRANSFORM//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_train_mod2': $( if [ ! -z ${VIASH_PAR_OUTPUT_TRAIN_MOD2+x} ]; then echo "r'${VIASH_PAR_OUTPUT_TRAIN_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+
+sys.path.append(meta['resources_dir'])
+from helper_functions import train_and_valid, lsiTransformer, ModalityMatchingDataset
+from helper_functions import ModelRegressionAtac2Gex, ModelRegressionAdt2Gex, ModelRegressionGex2Adt, ModelRegressionGex2Atac
+
+print('Load data', flush=True)
+
+input_train_mod1 = ad.read_h5ad(par['input_train_mod1'])
+input_train_mod2 = ad.read_h5ad(par['input_train_mod2'])
+
+adata = input_train_mod2.copy()
+
+mod1 = input_train_mod1.uns['modality']
+mod2 = input_train_mod2.uns['modality']
+
+input_train_mod1.X = input_train_mod1.layers['normalized']
+input_train_mod2.X = input_train_mod2.layers['normalized']
+
+input_train_mod2_df = input_train_mod2.to_df()
+
+del input_train_mod2
+
+print('Start train', flush=True)
+
+
+# Check for zero divide
+zero_row = input_train_mod1.X.sum(axis=0) == 0
+
+rem_var = None
+if True in zero_row:
+  rem_var = input_train_mod1[:, zero_row].var_names
+  input_train_mod1 = input_train_mod1[:, ~zero_row]
+  
+
+# select number of variables for LSI
+n_comp = input_train_mod1.n_vars -1 if input_train_mod1.n_vars < 256 else 256
+
+if mod1 != 'ADT':  
+  lsi_transformer_gex = lsiTransformer(n_components=n_comp)
+  input_train_mod1_df = lsi_transformer_gex.fit_transform(input_train_mod1)
+else:
+  input_train_mod1_df = input_train_mod1.to_df()
+
+# reproduce train/test split from phase 1
+batch = input_train_mod1.obs["batch"]
+train_ix = [ k for k,v in enumerate(batch) if v not in {'s1d2', 's3d7'} ]
+test_ix = [ k for k,v in enumerate(batch) if v in {'s1d2', 's3d7'} ]
+
+train_mod1 = input_train_mod1_df.iloc[train_ix, :]
+train_mod2 = input_train_mod2_df.iloc[train_ix, :]
+test_mod1 = input_train_mod1_df.iloc[test_ix, :]
+test_mod2 = input_train_mod2_df.iloc[test_ix, :]
+
+n_vars_train_mod1 = train_mod1.shape[1]
+n_vars_train_mod2 = train_mod2.shape[1]
+n_vars_test_mod1 = test_mod1.shape[1]
+n_vars_test_mod2 = test_mod2.shape[1]
+
+n_vars_mod1 = input_train_mod1_df.shape[1]
+n_vars_mod2 = input_train_mod2_df.shape[1]
+  
+if mod1 == 'ATAC' and mod2 == 'GEX':
+  dataset_train = ModalityMatchingDataset(train_mod1, train_mod2)
+  dataloader_train = DataLoader(dataset_train, 256, shuffle = True, num_workers = 8)
+
+  dataset_test = ModalityMatchingDataset(test_mod1, test_mod2)
+  dataloader_test = DataLoader(dataset_test, 64, shuffle = False, num_workers = 8)
+
+  model = ModelRegressionAtac2Gex(n_vars_mod1,n_vars_mod2).to(device)
+  optimizer = torch.optim.AdamW(model.parameters(), lr=0.00008386597445284492,weight_decay=0.000684887347727808)
+        
+elif mod1 == 'ADT' and mod2 == 'GEX':
+  dataset_train = ModalityMatchingDataset(train_mod1, train_mod2)
+  dataloader_train = DataLoader(dataset_train, 64, shuffle = True, num_workers = 4)
+
+  dataset_test = ModalityMatchingDataset(test_mod1, test_mod2)
+  dataloader_test = DataLoader(dataset_test, 32, shuffle = False, num_workers = 4)
+
+  model = ModelRegressionAdt2Gex(n_vars_mod1,n_vars_mod2).to(device)
+  optimizer = torch.optim.Adam(model.parameters(), lr=0.00041, weight_decay=0.0000139)
+
+
+elif mod1 == 'GEX' and mod2 == 'ADT':
+  dataset_train = ModalityMatchingDataset(train_mod1, train_mod2)
+  dataloader_train = DataLoader(dataset_train, 32, shuffle = True, num_workers = 8)
+
+  dataset_test = ModalityMatchingDataset(test_mod1, test_mod2)
+  dataloader_test = DataLoader(dataset_test, 64, shuffle = False, num_workers = 8)
+
+  model = ModelRegressionGex2Adt(n_vars_mod1,n_vars_mod2).to(device)
+  optimizer = torch.optim.AdamW(model.parameters(), lr=0.000034609210829678734, weight_decay=0.0009965881574697426)
+
+
+elif mod1 == 'GEX' and mod2 == 'ATAC':
+  dataset_train = ModalityMatchingDataset(train_mod1, train_mod2)
+  dataloader_train = DataLoader(dataset_train, 64, shuffle = True, num_workers = 8)
+
+  dataset_test = ModalityMatchingDataset(test_mod1, test_mod2)
+  dataloader_test = DataLoader(dataset_test, 64, shuffle = False, num_workers = 8)
+
+  model = ModelRegressionGex2Atac(n_vars_mod1,n_vars_mod2).to(device)
+  optimizer = torch.optim.AdamW(model.parameters(), lr=0.00001806762345275399, weight_decay=0.0004084171379280058)
+
+loss_fn = torch.nn.MSELoss()
+train_and_valid(model, optimizer, loss_fn, dataloader_train, dataloader_test, par['output'], device)
+
+# Add model dim for use in predict part
+adata.uns["model_dim"] = {"mod1": n_vars_mod1, "mod2": n_vars_mod2}
+if rem_var:
+  adata.uns["removed_vars"] = [rem_var[0]]
+adata.write_h5ad(par['output_train_mod2'], compression="gzip")
+
+if mod1 != 'ADT':
+    with open(par['output_transform'], 'wb') as f:
+        pickle.dump(lsi_transformer_gex, f)
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/predict_modality/methods/novel_train",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "highmem",
+    "hightime",
+    "midcpu",
+    "highsharedmem",
+    "gpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/methods/novel_train/nextflow.config b/target/nextflow/predict_modality/methods/novel_train/nextflow.config
new file mode 100644
index 0000000000..1d4a94373c
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/novel_train/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/methods/novel_train'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/methods/simplemlp/.config.vsh.yaml b/target/nextflow/predict_modality/methods/simplemlp/.config.vsh.yaml
new file mode 100644
index 0000000000..650ecffaef
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/simplemlp/.config.vsh.yaml
@@ -0,0 +1,357 @@
+functionality:
+  name: "simplemlp"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Pretrained model"
+      summary: "A pretrained model for predicting the expression of one modality from\
+        \ another."
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  info:
+    label: "Simple MLP"
+    summary: "Ensemble of MLPs trained on different sites (team AXX)"
+    description: "This folder contains the AXX solution to the OpenProblems-NeurIPS2021\
+      \ Single-Cell Multimodal Data Integration.\nTeam took the 4th place of the modality\
+      \ prediction task in terms of overall ranking of 4 subtasks: namely GEX\nto\
+      \ ADT, ADT to GEX, GEX to ATAC and ATAC to GEX. Specifically, our methods ranked\
+      \ 3rd in GEX to ATAC and 4th\nin GEX to ADT. More details about the task can\
+      \ be found in the\n[competition webpage](https://openproblems.bio/events/2021-09_neurips/documentation/about_tasks/task1_modality_prediction).\n"
+    documentation_url: "https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/AXX"
+    repository_url: "https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/AXX"
+    reference: "lance2022multimodal"
+    preferred_normalization: "log_cp10k"
+    competition_submission_id: 170812
+    type: "method_train"
+    type_info:
+      label: "Train"
+      summary: "Train a model to predict the expression of one modality from another."
+      description: "This method trains a model to predict the expression of one modality\
+        \ from another.\n"
+  status: "enabled"
+  dependencies:
+  - name: "predict_modality/methods/simplemlp_train"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/train/config.vsh.yaml"
+    configInfo:
+      functionalityName: "simplemlp_train"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/train/config.vsh.yaml"
+      functionalityNamespace: "predict_modality/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/predict_modality/methods/simplemlp_train/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/simplemlp_train"
+  - name: "predict_modality/methods/simplemlp_predict"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/predict/config.vsh.yaml"
+    configInfo:
+      functionalityName: "simplemlp_predict"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/predict/config.vsh.yaml"
+      functionalityNamespace: "predict_modality/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/predict_modality/methods/simplemlp_predict/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/simplemlp_predict"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/run/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/simplemlp"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/simplemlp/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/methods/simplemlp/main.nf b/target/nextflow/predict_modality/methods/simplemlp/main.nf
new file mode 100644
index 0000000000..28dfbd22cc
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/simplemlp/main.nf
@@ -0,0 +1,3435 @@
+// simplemlp 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "simplemlp",
+    "namespace" : "predict_modality/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train_mod1",
+        "info" : {
+          "label" : "Train mod1",
+          "summary" : "The mod1 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_train_mod2",
+        "info" : {
+          "label" : "Train mod2",
+          "summary" : "The mod2 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod1",
+        "info" : {
+          "label" : "Test mod1",
+          "summary" : "The mod1 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Pretrained model",
+          "summary" : "A pretrained model for predicting the expression of one modality from another."
+        },
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/run/",
+        "entrypoint" : "run_wf"
+      }
+    ],
+    "info" : {
+      "label" : "Simple MLP",
+      "summary" : "Ensemble of MLPs trained on different sites (team AXX)",
+      "description" : "This folder contains the AXX solution to the OpenProblems-NeurIPS2021 Single-Cell Multimodal Data Integration.\nTeam took the 4th place of the modality prediction task in terms of overall ranking of 4 subtasks: namely GEX\nto ADT, ADT to GEX, GEX to ATAC and ATAC to GEX. Specifically, our methods ranked 3rd in GEX to ATAC and 4th\nin GEX to ADT. More details about the task can be found in the\n[competition webpage](https://openproblems.bio/events/2021-09_neurips/documentation/about_tasks/task1_modality_prediction).\n",
+      "documentation_url" : "https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/AXX",
+      "repository_url" : "https://github.com/openproblems-bio/neurips2021_multimodal_topmethods/tree/main/src/predict_modality/methods/AXX",
+      "reference" : "lance2022multimodal",
+      "preferred_normalization" : "log_cp10k",
+      "competition_submission_id" : 170812,
+      "type" : "method_train",
+      "type_info" : {
+        "label" : "Train",
+        "summary" : "Train a model to predict the expression of one modality from another.",
+        "description" : "This method trains a model to predict the expression of one modality from another.\n"
+      }
+    },
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "predict_modality/methods/simplemlp_train",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/train/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "simplemlp_train",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/train/config.vsh.yaml",
+          "functionalityNamespace" : "predict_modality/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/predict_modality/methods/simplemlp_train/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/simplemlp_train"
+      },
+      {
+        "name" : "predict_modality/methods/simplemlp_predict",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/predict/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "simplemlp_predict",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/predict/config.vsh.yaml",
+          "functionalityNamespace" : "predict_modality/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/predict_modality/methods/simplemlp_predict/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/simplemlp_predict"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/run/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/simplemlp",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { simplemlp_train } from "${meta.resources_dir}/../../../../nextflow/predict_modality/methods/simplemlp_train/main.nf"
+include { simplemlp_predict } from "${meta.resources_dir}/../../../../nextflow/predict_modality/methods/simplemlp_predict/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+workflow run_wf {
+  take: input_ch
+  main:
+  output_ch = input_ch
+
+    | simplemlp_train.run(
+      fromState: ["input_train_mod1", "input_train_mod2"],
+      toState: ["input_model": "output"]
+    )
+
+    | simplemlp_predict.run(
+      fromState: ["input_train_mod2", "input_test_mod1", "input_model", "input_transform"],
+      toState: ["output": "output"]
+    )
+
+    | map { tup ->
+      [tup[0], [output: tup[1].output]]
+    }
+
+  emit: output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/methods/simplemlp/nextflow.config b/target/nextflow/predict_modality/methods/simplemlp/nextflow.config
new file mode 100644
index 0000000000..0553453f64
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/simplemlp/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/methods/simplemlp'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/methods/simplemlp_predict/.config.vsh.yaml b/target/nextflow/predict_modality/methods/simplemlp_predict/.config.vsh.yaml
new file mode 100644
index 0000000000..6c71bc3788
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/simplemlp_predict/.config.vsh.yaml
@@ -0,0 +1,364 @@
+functionality:
+  name: "simplemlp_predict"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_model"
+    info:
+      label: "Pretrained model"
+      summary: "A pretrained model for predicting the expression of one modality from\
+        \ another."
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "../resources/"
+  info:
+    type: "method_predict"
+    type_info:
+      label: "Predict"
+      summary: "Make predictions using a trained model."
+      description: "This method makes predictions using a trained model.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_pytorch_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scikit-learn"
+    - "scanpy"
+    - "pytorch-lightning"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "hightime"
+    - "midcpu"
+    - "gpu"
+    - "highsharedmem"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/predict/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/simplemlp_predict"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/simplemlp_predict/simplemlp_predict"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/methods/simplemlp_predict/main.nf b/target/nextflow/predict_modality/methods/simplemlp_predict/main.nf
new file mode 100644
index 0000000000..b4ef8e604d
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/simplemlp_predict/main.nf
@@ -0,0 +1,3904 @@
+// simplemlp_predict 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "simplemlp_predict",
+    "namespace" : "predict_modality/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train_mod1",
+        "info" : {
+          "label" : "Train mod1",
+          "summary" : "The mod1 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_train_mod2",
+        "info" : {
+          "label" : "Train mod2",
+          "summary" : "The mod2 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod1",
+        "info" : {
+          "label" : "Test mod1",
+          "summary" : "The mod1 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_model",
+        "info" : {
+          "label" : "Pretrained model",
+          "summary" : "A pretrained model for predicting the expression of one modality from another."
+        },
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "A prediction of the mod2 expression values of the test cells",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Predicted normalized expression values",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/predict/"
+      },
+      {
+        "type" : "file",
+        "path" : "../resources/",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/predict/"
+      }
+    ],
+    "info" : {
+      "type" : "method_predict",
+      "type_info" : {
+        "label" : "Predict",
+        "summary" : "Make predictions using a trained model.",
+        "description" : "This method makes predictions using a trained model.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_pytorch_nvidia:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scikit-learn",
+            "scanpy",
+            "pytorch-lightning"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "highmem",
+          "hightime",
+          "midcpu",
+          "gpu",
+          "highsharedmem"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/predict/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/simplemlp_predict",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+from glob import glob
+import sys
+import numpy as np
+from scipy.sparse import csc_matrix
+import anndata as ad
+import torch
+from torch.utils.data import TensorDataset,DataLoader
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_train_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_model': $( if [ ! -z ${VIASH_PAR_INPUT_MODEL+x} ]; then echo "r'${VIASH_PAR_INPUT_MODEL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+resources_dir = f"{meta['resources_dir']}/resources"
+sys.path.append(resources_dir)
+from models import MLP
+import utils
+
+def _predict(model,dl):
+  model = model.cuda()
+  model.eval()
+  yps = []
+  for x in dl:
+    with torch.no_grad():
+      yp = model(x[0].cuda())
+      yps.append(yp.detach().cpu().numpy())
+  yp = np.vstack(yps)
+  return yp
+
+
+print('Load data', flush=True)
+input_train_mod2 = ad.read_h5ad(par['input_train_mod2'])
+input_test_mod1 = ad.read_h5ad(par['input_test_mod1'])
+
+# determine variables
+mod_1 = input_test_mod1.uns['modality']
+mod_2 = input_train_mod2.uns['modality']
+
+task = f'{mod_1}2{mod_2}'
+
+print('Load ymean', flush=True)
+ymean_path = f"{par['input_model']}/{task}_ymean.npy"
+ymean = np.load(ymean_path)
+
+print('Start predict', flush=True)
+if task == 'GEX2ATAC':
+    y_pred = ymean*np.ones([input_test_mod1.n_obs, input_test_mod1.n_vars])
+else:
+    folds = [0, 1, 2]
+
+    ymean = torch.from_numpy(ymean).float()
+    yaml_path=f"{resources_dir}/yaml/mlp_{task}.yaml"
+    config = utils.load_yaml(yaml_path)
+    X = input_test_mod1.layers["normalized"].toarray()
+    X = torch.from_numpy(X).float()
+    
+    te_ds = TensorDataset(X)
+    
+    yp = 0
+    for fold in folds:
+        # load_path = f"{par['input_model']}/{task}_fold_{fold}/version_0/checkpoints/*"
+        load_path = f"{par['input_model']}/{task}_fold_{fold}/**.ckpt"
+        print(load_path)
+        ckpt = glob(load_path)[0]
+        model_inf = MLP.load_from_checkpoint(
+            ckpt,
+            in_dim=X.shape[1],
+            out_dim=input_test_mod1.n_vars,
+            ymean=ymean,
+            config=config
+        )
+        te_loader = DataLoader(
+            te_ds,
+            batch_size=config.batch_size,
+            num_workers=0,
+            shuffle=False,
+            drop_last=False
+        )
+        yp = yp + _predict(model_inf, te_loader)
+
+    y_pred = yp/len(folds)
+
+y_pred = csc_matrix(y_pred)
+
+adata = ad.AnnData(
+    layers={"normalized": y_pred},
+    shape=y_pred.shape,
+    uns={
+        'dataset_id': input_test_mod1.uns['dataset_id'],
+        'method_id': meta['functionality_name'],
+    },
+)
+
+print('Write data', flush=True)
+adata.write_h5ad(par['output'], compression = "gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/predict_modality/methods/simplemlp_predict",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "highmem",
+    "hightime",
+    "midcpu",
+    "gpu",
+    "highsharedmem"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/methods/simplemlp_predict/nextflow.config b/target/nextflow/predict_modality/methods/simplemlp_predict/nextflow.config
new file mode 100644
index 0000000000..a97406bcad
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/simplemlp_predict/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/methods/simplemlp_predict'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/methods/simplemlp_predict/resources/models.py b/target/nextflow/predict_modality/methods/simplemlp_predict/resources/models.py
new file mode 100644
index 0000000000..25ce9b2995
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/simplemlp_predict/resources/models.py
@@ -0,0 +1,68 @@
+import torch
+import pytorch_lightning as pl
+import torch.nn as nn
+import torch.nn.functional as F
+
+class MLP(pl.LightningModule):
+    def __init__(self,in_dim,out_dim,ymean,config):
+        super(MLP, self).__init__()
+        self.ymean = ymean.cuda()
+        H1 = config.H1
+        H2 = config.H2
+        p = config.dropout
+        self.config = config
+        self.fc1 = nn.Linear(in_dim, H1)
+        self.fc2 = nn.Linear(H1,H2)
+        self.fc3 = nn.Linear(H1+H2, out_dim)
+        self.dp2 = nn.Dropout(p=p)
+
+    def forward(self, x):
+        x0 = x
+        x1 = F.relu(self.fc1(x))
+        x1 = self.dp2(x1)
+        x = F.relu(self.fc2(x1))
+        x = torch.cat([x,x1],dim=1)
+        x = self.fc3(x)
+        x = self.apply_mask(x)
+        return x
+    
+    def apply_mask(self,yp):
+        tmp = torch.ones_like(yp).float()*self.ymean
+        mask = tmp<self.config.threshold
+        mask = mask.float()
+        return yp*(1-mask) + tmp*mask
+    
+    def training_step(self, batch, batch_nb):
+        x,y = batch
+        yp = self(x)
+        criterion = nn.MSELoss()
+        loss = criterion(yp, y)
+        self.log('train_loss', loss, prog_bar=True)
+        return loss
+    
+    def validation_step(self, batch, batch_idx):
+        x,y = batch
+        yp = self(x)
+        criterion = nn.MSELoss()
+        loss = criterion(yp, y)
+        self.log('valid_RMSE', loss**0.5, prog_bar=True)
+        return loss
+    
+    def predict_step(self, batch, batch_idx):
+        if len(batch) == 2:
+            x,_ = batch
+        else:
+            x = batch
+        return self(x)
+    
+    def configure_optimizers(self):
+        lr = self.config.lr
+        wd = float(self.config.wd)
+        adam = torch.optim.Adam(self.parameters(), lr=lr, weight_decay=wd)
+        if self.config.lr_schedule == 'adam':
+            return adam
+        elif self.config.lr_schedule == 'adam_cosin':
+            slr = torch.optim.lr_scheduler.CosineAnnealingLR(adam, self.config.epochs)
+            return [adam], [slr]
+        else:
+            assert 0
\ No newline at end of file
diff --git a/target/nextflow/predict_modality/methods/simplemlp_predict/resources/utils.py b/target/nextflow/predict_modality/methods/simplemlp_predict/resources/utils.py
new file mode 100644
index 0000000000..d001b8e0f7
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/simplemlp_predict/resources/utils.py
@@ -0,0 +1,37 @@
+import yaml
+from collections import namedtuple
+
+
+def to_site_donor(data):
+    df = data.obs['batch'].copy().to_frame().reset_index()
+    df.columns = ['index','batch']
+    df['site'] = df['batch'].apply(lambda x: x[:2])
+    df['donor'] = df['batch'].apply(lambda x: x[2:]) 
+    return df
+
+
+def split(tr1, tr2, fold):
+    df = to_site_donor(tr1) 
+    mask = df['site'] == f's{fold+1}'
+    maskr = ~mask
+
+    Xt = tr1[mask].layers["normalized"].toarray()
+    X = tr1[maskr].layers["normalized"].toarray()
+
+    yt = tr2[mask].layers["normalized"].toarray()
+    y = tr2[maskr].layers["normalized"].toarray()
+
+    print(f"{X.shape}, {y.shape}, {Xt.shape}, {yt.shape}")
+
+    return X,y,Xt,yt
+
+
+def load_yaml(path):
+    with open(path) as f:
+        x = yaml.safe_load(f)
+    res = {}
+    for i in x:
+        res[i] = x[i]['value']
+    config = namedtuple('Config', res.keys())(**res)
+    print(config)
+    return config
diff --git a/target/nextflow/predict_modality/methods/simplemlp_predict/resources/yaml/mlp_ADT2GEX.yaml b/target/nextflow/predict_modality/methods/simplemlp_predict/resources/yaml/mlp_ADT2GEX.yaml
new file mode 100644
index 0000000000..13db5b490e
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/simplemlp_predict/resources/yaml/mlp_ADT2GEX.yaml
@@ -0,0 +1,28 @@
+# sample config defaults file
+epochs:
+  desc: Number of epochs to train over
+  value: 10
+batch_size:
+  desc: Size of each mini-batch
+  value: 512
+H1:
+  desc: Number of hidden neurons in 1st layer of MLP
+  value: 256
+H2:
+  desc: Number of hidden neurons in 2nd layer of MLP
+  value: 128
+dropout:
+  desc: probs of zeroing values
+  value: 0
+lr:
+  desc: learning rate
+  value: 0.001
+wd:
+  desc: weight decay
+  value: 1e-5
+threshold:
+  desc: threshold to set values to zero
+  value: 0
+lr_schedule:
+  desc: learning rate scheduler
+  value: adam
\ No newline at end of file
diff --git a/target/nextflow/predict_modality/methods/simplemlp_predict/resources/yaml/mlp_ATAC2GEX.yaml b/target/nextflow/predict_modality/methods/simplemlp_predict/resources/yaml/mlp_ATAC2GEX.yaml
new file mode 100644
index 0000000000..ee714a47ea
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/simplemlp_predict/resources/yaml/mlp_ATAC2GEX.yaml
@@ -0,0 +1,28 @@
+# sample config defaults file
+epochs:
+  desc: Number of epochs to train over
+  value: 10
+batch_size:
+  desc: Size of each mini-batch
+  value: 512
+H1:
+  desc: Number of hidden neurons in 1st layer of MLP
+  value: 256
+H2:
+  desc: Number of hidden neurons in 2nd layer of MLP
+  value: 128
+dropout:
+  desc: probs of zeroing values
+  value: 0.5
+lr:
+  desc: learning rate
+  value: 0.001
+wd:
+  desc: weight decay
+  value: 1e-5
+threshold:
+  desc: threshold to set values to zero
+  value: 0
+lr_schedule:
+  desc: learning rate scheduler
+  value: adam
\ No newline at end of file
diff --git a/target/nextflow/predict_modality/methods/simplemlp_predict/resources/yaml/mlp_GEX2ADT.yaml b/target/nextflow/predict_modality/methods/simplemlp_predict/resources/yaml/mlp_GEX2ADT.yaml
new file mode 100644
index 0000000000..80dfededd9
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/simplemlp_predict/resources/yaml/mlp_GEX2ADT.yaml
@@ -0,0 +1,28 @@
+# sample config defaults file
+epochs:
+  desc: Number of epochs to train over
+  value: 10 
+batch_size:
+  desc: Size of each mini-batch
+  value: 512
+H1:
+  desc: Number of hidden neurons in 1st layer of MLP
+  value: 1024 
+H2:
+  desc: Number of hidden neurons in 2nd layer of MLP
+  value: 512
+dropout:
+  desc: probs of zeroing values
+  value: 0
+lr:
+  desc: learning rate
+  value: 0.001
+wd:
+  desc: weight decay
+  value: 1e-5
+threshold:
+  desc: threshold to set values to zero
+  value: 0.05
+lr_schedule:
+  desc: learning rate scheduler
+  value: adam_cosin
\ No newline at end of file
diff --git a/target/nextflow/predict_modality/methods/simplemlp_train/.config.vsh.yaml b/target/nextflow/predict_modality/methods/simplemlp_train/.config.vsh.yaml
new file mode 100644
index 0000000000..bceb29f23e
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/simplemlp_train/.config.vsh.yaml
@@ -0,0 +1,336 @@
+functionality:
+  name: "simplemlp_train"
+  namespace: "predict_modality/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Pretrained model"
+      summary: "A pretrained model for predicting the expression of one modality from\
+        \ another."
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "../resources/"
+  info:
+    type: "method_train"
+    type_info:
+      label: "Train"
+      summary: "Train a model to predict the expression of one modality from another."
+      description: "This method trains a model to predict the expression of one modality\
+        \ from another.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_pytorch_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    pypi:
+    - "scikit-learn"
+    - "scanpy"
+    - "pytorch-lightning"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "highmem"
+    - "hightime"
+    - "midcpu"
+    - "gpu"
+    - "highsharedmem"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/train/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/simplemlp_train"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/simplemlp_train/simplemlp_train"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/methods/simplemlp_train/main.nf b/target/nextflow/predict_modality/methods/simplemlp_train/main.nf
new file mode 100644
index 0000000000..d1029b760a
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/simplemlp_train/main.nf
@@ -0,0 +1,3912 @@
+// simplemlp_train 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "simplemlp_train",
+    "namespace" : "predict_modality/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_train_mod1",
+        "info" : {
+          "label" : "Train mod1",
+          "summary" : "The mod1 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_train_mod2",
+        "info" : {
+          "label" : "Train mod2",
+          "summary" : "The mod2 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod1",
+        "info" : {
+          "label" : "Test mod1",
+          "summary" : "The mod1 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Pretrained model",
+          "summary" : "A pretrained model for predicting the expression of one modality from another."
+        },
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/train/"
+      },
+      {
+        "type" : "file",
+        "path" : "../resources/",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/train/"
+      }
+    ],
+    "info" : {
+      "type" : "method_train",
+      "type_info" : {
+        "label" : "Train",
+        "summary" : "Train a model to predict the expression of one modality from another.",
+        "description" : "This method trains a model to predict the expression of one modality from another.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_pytorch_nvidia:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "pypi" : [
+            "scikit-learn",
+            "scanpy",
+            "pytorch-lightning"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "highmem",
+          "hightime",
+          "midcpu",
+          "gpu",
+          "highsharedmem"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/train/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/simplemlp_train",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import os
+import math
+import logging
+from pathlib import Path
+
+import anndata as ad
+import numpy as np
+
+import torch
+import pytorch_lightning as pl
+from torch.utils.data import TensorDataset, DataLoader
+from pytorch_lightning.callbacks import ModelCheckpoint
+from pytorch_lightning.loggers import TensorBoardLogger,WandbLogger
+
+logging.basicConfig(level=logging.INFO)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_train_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_train_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TRAIN_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TRAIN_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test_mod1': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD1+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD1//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+resources_dir = f"{meta['resources_dir']}/resources"
+
+import sys
+sys.path.append(resources_dir)
+from models import MLP
+import utils
+
+def _train(X, y, Xt, yt, logger, config, num_workers):
+    
+    X = torch.from_numpy(X).float()
+    y = torch.from_numpy(y).float()
+    ymean = torch.mean(y, dim=0, keepdim=True)
+    
+    tr_ds = TensorDataset(X,y)
+    tr_loader = DataLoader(
+        tr_ds,
+        batch_size=config.batch_size,
+        num_workers=num_workers,
+        shuffle=True,
+        drop_last=True
+    )
+    
+    Xt = torch.from_numpy(Xt).float()
+    yt = torch.from_numpy(yt).float()
+    te_ds = TensorDataset(Xt,yt)
+    te_loader = DataLoader(
+        te_ds,
+        batch_size=config.batch_size,
+        num_workers=num_workers,
+        shuffle=False,
+        drop_last=False
+    )
+    
+    checkpoint_callback = ModelCheckpoint(
+        monitor='valid_RMSE',
+        dirpath=logger.save_dir,
+        save_top_k=1,
+    )
+    
+    trainer = pl.Trainer(
+        devices="auto",
+        enable_checkpointing=True,
+        logger=logger, 
+        max_epochs=config.epochs, 
+        callbacks=[checkpoint_callback],
+        default_root_dir=logger.save_dir,
+        # progress_bar_refresh_rate=5
+    )
+    
+    net = MLP(X.shape[1], y.shape[1], ymean, config)
+    trainer.fit(net, tr_loader, te_loader)
+    
+    yp = trainer.predict(net, te_loader, ckpt_path='best')
+    yp = torch.cat(yp, dim=0)
+    
+    score = ((yp-yt)**2).mean()**0.5
+    print(f"VALID RMSE {score:.3f}")
+    del trainer
+    return score,yp.detach().numpy()
+
+
+
+input_train_mod1 = ad.read_h5ad(par['input_train_mod1'])
+input_train_mod2 = ad.read_h5ad(par['input_train_mod2'])
+
+mod_1 = input_train_mod1.uns["modality"]
+mod_2 = input_train_mod2.uns["modality"]
+
+task = f'{mod_1}2{mod_2}'
+yaml_path = f'{resources_dir}/yaml/mlp_{task}.yaml'
+
+obs_info = utils.to_site_donor(input_train_mod1)
+# TODO: if we want this method to work for other datasets, resolve dependence on site notation
+sites = obs_info.site.unique()
+
+os.makedirs(par['output'], exist_ok=True)
+
+print('Compute ymean', flush=True)
+ymean = np.asarray(input_train_mod2.layers["normalized"].mean(axis=0))
+path = f"{par['output']}/{task}_ymean.npy"
+np.save(path, ymean)
+
+
+if task == "GEX2ATAC":
+    logging.info(f"No training required for this task ({task}).")
+    sys.exit(0)
+
+if not os.path.exists(yaml_path):
+    logging.error(f"No configuration file found for task '{task}'")
+    sys.exit(1)
+
+yaml_path = f'{resources_dir}/yaml/mlp_{task}.yaml'
+yps = []
+scores = []
+
+msgs = {}
+# TODO: if we want this method to work for other datasets, dont use hardcoded range
+for fold in range(3):
+
+    run_name = f"{task}_fold_{fold}"
+    save_path = f"{par['output']}/{run_name}"
+    num_workers = meta["cpus"] or 0
+
+    Path(save_path).mkdir(parents=True, exist_ok=True)   
+
+    X,y,Xt,yt = utils.split(input_train_mod1, input_train_mod2, fold)
+    
+    logger = TensorBoardLogger(save_path, name='') 
+    
+    config = utils.load_yaml(yaml_path)
+
+    if config.batch_size > X.shape[0]:
+        config = config._replace(batch_size=math.ceil(X.shape[0] / 2))
+
+    score, yp = _train(X, y, Xt, yt, logger, config, num_workers)
+    yps.append(yp)
+    scores.append(score)
+    msg = f"{task} Fold {fold} RMSE {score:.3f}"
+    msgs[f'Fold {fold}'] = f'{score:.3f}'
+    print(msg)
+
+yp = np.concatenate(yps)
+score = np.mean(scores)
+msgs['Overall'] = f'{score:.3f}'
+print('Overall', f'{score:.3f}')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/predict_modality/methods/simplemlp_train",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "highmem",
+    "hightime",
+    "midcpu",
+    "gpu",
+    "highsharedmem"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/methods/simplemlp_train/nextflow.config b/target/nextflow/predict_modality/methods/simplemlp_train/nextflow.config
new file mode 100644
index 0000000000..6c21c73acd
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/simplemlp_train/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/methods/simplemlp_train'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/methods/simplemlp_train/resources/models.py b/target/nextflow/predict_modality/methods/simplemlp_train/resources/models.py
new file mode 100644
index 0000000000..25ce9b2995
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/simplemlp_train/resources/models.py
@@ -0,0 +1,68 @@
+import torch
+import pytorch_lightning as pl
+import torch.nn as nn
+import torch.nn.functional as F
+
+class MLP(pl.LightningModule):
+    def __init__(self,in_dim,out_dim,ymean,config):
+        super(MLP, self).__init__()
+        self.ymean = ymean.cuda()
+        H1 = config.H1
+        H2 = config.H2
+        p = config.dropout
+        self.config = config
+        self.fc1 = nn.Linear(in_dim, H1)
+        self.fc2 = nn.Linear(H1,H2)
+        self.fc3 = nn.Linear(H1+H2, out_dim)
+        self.dp2 = nn.Dropout(p=p)
+
+    def forward(self, x):
+        x0 = x
+        x1 = F.relu(self.fc1(x))
+        x1 = self.dp2(x1)
+        x = F.relu(self.fc2(x1))
+        x = torch.cat([x,x1],dim=1)
+        x = self.fc3(x)
+        x = self.apply_mask(x)
+        return x
+    
+    def apply_mask(self,yp):
+        tmp = torch.ones_like(yp).float()*self.ymean
+        mask = tmp<self.config.threshold
+        mask = mask.float()
+        return yp*(1-mask) + tmp*mask
+    
+    def training_step(self, batch, batch_nb):
+        x,y = batch
+        yp = self(x)
+        criterion = nn.MSELoss()
+        loss = criterion(yp, y)
+        self.log('train_loss', loss, prog_bar=True)
+        return loss
+    
+    def validation_step(self, batch, batch_idx):
+        x,y = batch
+        yp = self(x)
+        criterion = nn.MSELoss()
+        loss = criterion(yp, y)
+        self.log('valid_RMSE', loss**0.5, prog_bar=True)
+        return loss
+    
+    def predict_step(self, batch, batch_idx):
+        if len(batch) == 2:
+            x,_ = batch
+        else:
+            x = batch
+        return self(x)
+    
+    def configure_optimizers(self):
+        lr = self.config.lr
+        wd = float(self.config.wd)
+        adam = torch.optim.Adam(self.parameters(), lr=lr, weight_decay=wd)
+        if self.config.lr_schedule == 'adam':
+            return adam
+        elif self.config.lr_schedule == 'adam_cosin':
+            slr = torch.optim.lr_scheduler.CosineAnnealingLR(adam, self.config.epochs)
+            return [adam], [slr]
+        else:
+            assert 0
\ No newline at end of file
diff --git a/target/nextflow/predict_modality/methods/simplemlp_train/resources/utils.py b/target/nextflow/predict_modality/methods/simplemlp_train/resources/utils.py
new file mode 100644
index 0000000000..d001b8e0f7
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/simplemlp_train/resources/utils.py
@@ -0,0 +1,37 @@
+import yaml
+from collections import namedtuple
+
+
+def to_site_donor(data):
+    df = data.obs['batch'].copy().to_frame().reset_index()
+    df.columns = ['index','batch']
+    df['site'] = df['batch'].apply(lambda x: x[:2])
+    df['donor'] = df['batch'].apply(lambda x: x[2:]) 
+    return df
+
+
+def split(tr1, tr2, fold):
+    df = to_site_donor(tr1) 
+    mask = df['site'] == f's{fold+1}'
+    maskr = ~mask
+
+    Xt = tr1[mask].layers["normalized"].toarray()
+    X = tr1[maskr].layers["normalized"].toarray()
+
+    yt = tr2[mask].layers["normalized"].toarray()
+    y = tr2[maskr].layers["normalized"].toarray()
+
+    print(f"{X.shape}, {y.shape}, {Xt.shape}, {yt.shape}")
+
+    return X,y,Xt,yt
+
+
+def load_yaml(path):
+    with open(path) as f:
+        x = yaml.safe_load(f)
+    res = {}
+    for i in x:
+        res[i] = x[i]['value']
+    config = namedtuple('Config', res.keys())(**res)
+    print(config)
+    return config
diff --git a/target/nextflow/predict_modality/methods/simplemlp_train/resources/yaml/mlp_ADT2GEX.yaml b/target/nextflow/predict_modality/methods/simplemlp_train/resources/yaml/mlp_ADT2GEX.yaml
new file mode 100644
index 0000000000..13db5b490e
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/simplemlp_train/resources/yaml/mlp_ADT2GEX.yaml
@@ -0,0 +1,28 @@
+# sample config defaults file
+epochs:
+  desc: Number of epochs to train over
+  value: 10
+batch_size:
+  desc: Size of each mini-batch
+  value: 512
+H1:
+  desc: Number of hidden neurons in 1st layer of MLP
+  value: 256
+H2:
+  desc: Number of hidden neurons in 2nd layer of MLP
+  value: 128
+dropout:
+  desc: probs of zeroing values
+  value: 0
+lr:
+  desc: learning rate
+  value: 0.001
+wd:
+  desc: weight decay
+  value: 1e-5
+threshold:
+  desc: threshold to set values to zero
+  value: 0
+lr_schedule:
+  desc: learning rate scheduler
+  value: adam
\ No newline at end of file
diff --git a/target/nextflow/predict_modality/methods/simplemlp_train/resources/yaml/mlp_ATAC2GEX.yaml b/target/nextflow/predict_modality/methods/simplemlp_train/resources/yaml/mlp_ATAC2GEX.yaml
new file mode 100644
index 0000000000..ee714a47ea
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/simplemlp_train/resources/yaml/mlp_ATAC2GEX.yaml
@@ -0,0 +1,28 @@
+# sample config defaults file
+epochs:
+  desc: Number of epochs to train over
+  value: 10
+batch_size:
+  desc: Size of each mini-batch
+  value: 512
+H1:
+  desc: Number of hidden neurons in 1st layer of MLP
+  value: 256
+H2:
+  desc: Number of hidden neurons in 2nd layer of MLP
+  value: 128
+dropout:
+  desc: probs of zeroing values
+  value: 0.5
+lr:
+  desc: learning rate
+  value: 0.001
+wd:
+  desc: weight decay
+  value: 1e-5
+threshold:
+  desc: threshold to set values to zero
+  value: 0
+lr_schedule:
+  desc: learning rate scheduler
+  value: adam
\ No newline at end of file
diff --git a/target/nextflow/predict_modality/methods/simplemlp_train/resources/yaml/mlp_GEX2ADT.yaml b/target/nextflow/predict_modality/methods/simplemlp_train/resources/yaml/mlp_GEX2ADT.yaml
new file mode 100644
index 0000000000..80dfededd9
--- /dev/null
+++ b/target/nextflow/predict_modality/methods/simplemlp_train/resources/yaml/mlp_GEX2ADT.yaml
@@ -0,0 +1,28 @@
+# sample config defaults file
+epochs:
+  desc: Number of epochs to train over
+  value: 10 
+batch_size:
+  desc: Size of each mini-batch
+  value: 512
+H1:
+  desc: Number of hidden neurons in 1st layer of MLP
+  value: 1024 
+H2:
+  desc: Number of hidden neurons in 2nd layer of MLP
+  value: 512
+dropout:
+  desc: probs of zeroing values
+  value: 0
+lr:
+  desc: learning rate
+  value: 0.001
+wd:
+  desc: weight decay
+  value: 1e-5
+threshold:
+  desc: threshold to set values to zero
+  value: 0.05
+lr_schedule:
+  desc: learning rate scheduler
+  value: adam_cosin
\ No newline at end of file
diff --git a/target/nextflow/predict_modality/metrics/correlation/.config.vsh.yaml b/target/nextflow/predict_modality/metrics/correlation/.config.vsh.yaml
new file mode 100644
index 0000000000..8a3960c534
--- /dev/null
+++ b/target/nextflow/predict_modality/metrics/correlation/.config.vsh.yaml
@@ -0,0 +1,290 @@
+functionality:
+  name: "correlation"
+  namespace: "predict_modality/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_prediction"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod2"
+    info:
+      label: "Test mod2"
+      summary: "The mod2 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "mean_pearson_per_cell"
+      label: "Mean pearson per cell"
+      summary: "The mean of the pearson values of per-cell expression value vectors."
+      description: "The mean of the pearson values of per-cell expression value vectors."
+      min: -1
+      max: 1
+      maximize: true
+      reference: "pearson1895regression"
+    - name: "mean_spearman_per_cell"
+      label: "Mean spearman per cell"
+      summary: "The mean of the spearman values of per-cell expression value vectors."
+      description: "The mean of the spearman values of per-cell expression value vectors."
+      min: -1
+      max: 1
+      maximize: true
+      reference: "kendall1938new"
+    - name: "mean_pearson_per_gene"
+      label: "Mean pearson per gene"
+      summary: "The mean of the pearson values of per-gene expression value vectors."
+      description: "The mean of the pearson values of per-gene expression value vectors."
+      min: -1
+      max: 1
+      maximize: true
+      reference: "pearson1895regression"
+    - name: "mean_spearman_per_gene"
+      label: "Mean spearman per gene"
+      summary: "The mean of the spearman values of per-gene expression value vectors."
+      description: "The mean of the spearman values of per-gene expression value vectors."
+      min: -1
+      max: 1
+      maximize: true
+      reference: "kendall1938new"
+    - name: "overall_pearson"
+      label: "Overall pearson"
+      summary: "The mean of the pearson values of vectorized expression matrices."
+      description: "The mean of the pearson values of vectorized expression matrices."
+      min: -1
+      max: 1
+      maximize: true
+      reference: "pearson1895regression"
+    - name: "overall_spearman"
+      label: "Overall spearman"
+      summary: "The mean of the spearman values of vectorized expression matrices."
+      description: "The mean of the spearman values of vectorized expression matrices."
+      min: -1
+      max: 1
+      maximize: true
+      reference: "kendall1938new"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A predict modality metric."
+      description: "A metric for evaluating predicted expression.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "proxyC"
+    - "testthat"
+    - "dynutils"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/metrics/correlation/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/metrics/correlation"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/metrics/correlation/correlation"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/metrics/correlation/main.nf b/target/nextflow/predict_modality/metrics/correlation/main.nf
new file mode 100644
index 0000000000..b4d92bf3a1
--- /dev/null
+++ b/target/nextflow/predict_modality/metrics/correlation/main.nf
@@ -0,0 +1,3784 @@
+// correlation 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "correlation",
+    "namespace" : "predict_modality/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_prediction",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "A prediction of the mod2 expression values of the test cells",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Predicted normalized expression values",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod2",
+        "info" : {
+          "label" : "Test mod2",
+          "summary" : "The mod2 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/metrics/correlation/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_metric_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "dest" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "mean_pearson_per_cell",
+          "label" : "Mean pearson per cell",
+          "summary" : "The mean of the pearson values of per-cell expression value vectors.",
+          "description" : "The mean of the pearson values of per-cell expression value vectors.",
+          "min" : -1,
+          "max" : 1,
+          "maximize" : true,
+          "reference" : "pearson1895regression"
+        },
+        {
+          "name" : "mean_spearman_per_cell",
+          "label" : "Mean spearman per cell",
+          "summary" : "The mean of the spearman values of per-cell expression value vectors.",
+          "description" : "The mean of the spearman values of per-cell expression value vectors.",
+          "min" : -1,
+          "max" : 1,
+          "maximize" : true,
+          "reference" : "kendall1938new"
+        },
+        {
+          "name" : "mean_pearson_per_gene",
+          "label" : "Mean pearson per gene",
+          "summary" : "The mean of the pearson values of per-gene expression value vectors.",
+          "description" : "The mean of the pearson values of per-gene expression value vectors.",
+          "min" : -1,
+          "max" : 1,
+          "maximize" : true,
+          "reference" : "pearson1895regression"
+        },
+        {
+          "name" : "mean_spearman_per_gene",
+          "label" : "Mean spearman per gene",
+          "summary" : "The mean of the spearman values of per-gene expression value vectors.",
+          "description" : "The mean of the spearman values of per-gene expression value vectors.",
+          "min" : -1,
+          "max" : 1,
+          "maximize" : true,
+          "reference" : "kendall1938new"
+        },
+        {
+          "name" : "overall_pearson",
+          "label" : "Overall pearson",
+          "summary" : "The mean of the pearson values of vectorized expression matrices.",
+          "description" : "The mean of the pearson values of vectorized expression matrices.",
+          "min" : -1,
+          "max" : 1,
+          "maximize" : true,
+          "reference" : "pearson1895regression"
+        },
+        {
+          "name" : "overall_spearman",
+          "label" : "Overall spearman",
+          "summary" : "The mean of the spearman values of vectorized expression matrices.",
+          "description" : "The mean of the spearman values of vectorized expression matrices.",
+          "min" : -1,
+          "max" : 1,
+          "maximize" : true,
+          "reference" : "kendall1938new"
+        }
+      ],
+      "type" : "metric",
+      "type_info" : {
+        "label" : "Metric",
+        "summary" : "A predict modality metric.",
+        "description" : "A metric for evaluating predicted expression.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "cran" : [
+            "proxyC",
+            "testthat",
+            "dynutils"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/metrics/correlation/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/metrics/correlation",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+cat("Load dependencies\\\\n")
+library(testthat, quietly = TRUE, warn.conflicts = FALSE)
+library(Matrix, quietly = TRUE, warn.conflicts = FALSE)
+requireNamespace("anndata", quietly = TRUE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_prediction" = $( if [ ! -z ${VIASH_PAR_INPUT_PREDICTION+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_PREDICTION" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_test_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_TEST_MOD2" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading solution file\\\\n")
+ad_sol <- anndata::read_h5ad(par\\$input_test_mod2)
+
+cat("Reading prediction file\\\\n")
+ad_pred <- anndata::read_h5ad(par\\$input_prediction)
+
+cat("Check prediction format\\\\n")
+expect_equal(
+  ad_sol\\$uns\\$dataset_id, ad_pred\\$uns\\$dataset_id,
+  info = "Prediction and solution have differing dataset_ids"
+)
+
+expect_true(
+  isTRUE(all.equal(dim(ad_sol), dim(ad_pred))),
+  info = "Dataset and prediction anndata objects should have the same shape / dimensions."
+)
+
+cat("Computing correlation metrics\\\\n")
+# Wrangle data
+tv <- ad_sol\\$layers[["normalized"]]
+pv <- ad_pred\\$layers[["normalized"]]
+
+# precompute sds
+tv_sd2 <- proxyC::colSds(tv)
+pv_sd2 <- proxyC::colSds(pv)
+tv_sd1 <- proxyC::rowSds(tv)
+pv_sd1 <- proxyC::rowSds(pv)
+
+# Compute metrics
+pearson_vec_1 <- diag(dynutils::calculate_similarity(tv, pv, method = "pearson", margin = 1, diag = TRUE, drop0 = TRUE))
+spearman_vec_1 <- diag(dynutils::calculate_similarity(tv, pv, method = "spearman", margin = 1, diag = TRUE, drop0 = TRUE))
+
+pearson_vec_1[tv_sd1 == 0 | pv_sd1 == 0] <- 0
+spearman_vec_1[tv_sd1 == 0 | pv_sd1 == 0] <- 0
+# pearson_vec_1[!is.finite(pearson_vec_1) | pearson_vec_1 > 10] <- 0
+# spearman_vec_1[!is.finite(spearman_vec_1) | spearman_vec_1 > 10] <- 0
+
+mean_pearson_per_cell <- mean(pearson_vec_1)
+mean_spearman_per_cell <- mean(spearman_vec_1)
+
+pearson_vec_2 <- diag(dynutils::calculate_similarity(tv, pv, method = "pearson", margin = 2, diag = TRUE, drop0 = TRUE))
+spearman_vec_2 <- diag(dynutils::calculate_similarity(tv, pv, method = "spearman", margin = 2, diag = TRUE, drop0 = TRUE))
+
+pearson_vec_2[tv_sd2 == 0 | pv_sd2 == 0] <- 0
+spearman_vec_2[tv_sd2 == 0 | pv_sd2 == 0] <- 0
+# pearson_vec_2[!is.finite(pearson_vec_2) | pearson_vec_2 > 10] <- 0
+# spearman_vec_2[!is.finite(spearman_vec_2) | spearman_vec_2 > 10] <- 0
+
+mean_pearson_per_gene <- mean(pearson_vec_2)
+mean_spearman_per_gene <- mean(spearman_vec_2)
+
+overall_pearson <- cor(as.vector(tv), as.vector(pv), method = "pearson")
+overall_spearman <- cor(as.vector(tv), as.vector(pv), method = "spearman")
+
+metric_ids <- c("mean_pearson_per_cell", "mean_spearman_per_cell", "mean_pearson_per_gene", "mean_spearman_per_gene", "overall_pearson", "overall_spearman")
+metric_values <- c(mean_pearson_per_cell, mean_spearman_per_cell, mean_pearson_per_gene, mean_spearman_per_gene, overall_pearson, overall_spearman)
+
+cat("Create output object\\\\n")
+out <- anndata::AnnData(
+  obs = data.frame(row.names = rownames(ad_sol), pearson = pearson_vec_1, spearman = spearman_vec_1),
+  var = data.frame(row.names = colnames(ad_sol), pearson = pearson_vec_2, spearman = spearman_vec_2),
+  uns = list(
+    dataset_id = ad_pred\\$uns\\$dataset_id,
+    method_id = ad_pred\\$uns\\$method_id,
+    metric_ids = metric_ids,
+    metric_values = metric_values
+  )
+)
+
+cat("Write output to h5ad file\\\\n")
+zzz <- out\\$write_h5ad(par\\$output, compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/predict_modality/metrics/correlation",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/metrics/correlation/nextflow.config b/target/nextflow/predict_modality/metrics/correlation/nextflow.config
new file mode 100644
index 0000000000..9ae112b7c9
--- /dev/null
+++ b/target/nextflow/predict_modality/metrics/correlation/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/metrics/correlation'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/metrics/mse/.config.vsh.yaml b/target/nextflow/predict_modality/metrics/mse/.config.vsh.yaml
new file mode 100644
index 0000000000..e25f6f00c8
--- /dev/null
+++ b/target/nextflow/predict_modality/metrics/mse/.config.vsh.yaml
@@ -0,0 +1,252 @@
+functionality:
+  name: "mse"
+  namespace: "predict_modality/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_prediction"
+    info:
+      label: "Prediction"
+      summary: "A prediction of the mod2 expression values of the test cells"
+      slots:
+        layers:
+        - type: "double"
+          name: "normalized"
+          description: "Predicted normalized expression values"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_test_mod2"
+    info:
+      label: "Test mod2"
+      summary: "The mod2 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file"
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+    dest: "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "rmse"
+      label: "RMSE"
+      summary: "The root mean squared error."
+      description: "The square root of the mean of the square of all of the error."
+      min: 0
+      max: "+inf"
+      maximize: false
+      reference: "chai2014root"
+    - name: "mae"
+      label: "MAE"
+      summary: "The mean absolute error."
+      description: "The average difference between the expression values and the predicted\
+        \ expression values."
+      min: 0
+      max: "+inf"
+      maximize: false
+      reference: "chai2014root"
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A predict modality metric."
+      description: "A metric for evaluating predicted expression.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "lowmem"
+    - "lowcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/metrics/mse/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/metrics/mse"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/metrics/mse/mse"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/metrics/mse/main.nf b/target/nextflow/predict_modality/metrics/mse/main.nf
new file mode 100644
index 0000000000..695b420bd2
--- /dev/null
+++ b/target/nextflow/predict_modality/metrics/mse/main.nf
@@ -0,0 +1,3684 @@
+// mse 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "mse",
+    "namespace" : "predict_modality/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_prediction",
+        "info" : {
+          "label" : "Prediction",
+          "summary" : "A prediction of the mod2 expression values of the test cells",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Predicted normalized expression values",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/prediction.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_test_mod2",
+        "info" : {
+          "label" : "Test mod2",
+          "summary" : "The mod2 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/metrics/mse/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_metric_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "dest" : "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "rmse",
+          "label" : "RMSE",
+          "summary" : "The root mean squared error.",
+          "description" : "The square root of the mean of the square of all of the error.",
+          "min" : 0,
+          "max" : "+inf",
+          "maximize" : false,
+          "reference" : "chai2014root"
+        },
+        {
+          "name" : "mae",
+          "label" : "MAE",
+          "summary" : "The mean absolute error.",
+          "description" : "The average difference between the expression values and the predicted expression values.",
+          "min" : 0,
+          "max" : "+inf",
+          "maximize" : false,
+          "reference" : "chai2014root"
+        }
+      ],
+      "type" : "metric",
+      "type_info" : {
+        "label" : "Metric",
+        "summary" : "A predict modality metric.",
+        "description" : "A metric for evaluating predicted expression.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "lowmem",
+          "lowcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/metrics/mse/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/metrics/mse",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import logging
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_prediction': $( if [ ! -z ${VIASH_PAR_INPUT_PREDICTION+x} ]; then echo "r'${VIASH_PAR_INPUT_PREDICTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_test_mod2': $( if [ ! -z ${VIASH_PAR_INPUT_TEST_MOD2+x} ]; then echo "r'${VIASH_PAR_INPUT_TEST_MOD2//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+logging.info("Reading solution file")
+ad_sol = ad.read_h5ad(par["input_test_mod2"])
+
+logging.info("Reading prediction file")
+ad_pred = ad.read_h5ad(par["input_prediction"])
+
+logging.info("Check prediction format")
+if ad_sol.uns["dataset_id"] != ad_pred.uns["dataset_id"]:
+  raise ValueError("Prediction and solution have differing dataset_ids")
+
+if ad_sol.shape != ad_pred.shape:
+  raise ValueError("Dataset and prediction anndata objects should have the same shape / dimensions.")
+
+logging.info("Computing MSE metrics")
+
+tmp = ad_sol.layers["normalized"] - ad_pred.layers["normalized"]
+rmse = np.sqrt(tmp.power(2).mean())
+mae = np.abs(tmp).mean()
+
+logging.info("Create output object")
+out = ad.AnnData(
+  uns = {
+    "dataset_id" : ad_pred.uns["dataset_id"],
+    "method_id" : ad_pred.uns["method_id"],
+    "metric_ids" : ["rmse", "mae"],
+    "metric_values" : [rmse, mae],
+  }
+)
+
+logging.info("Write output to h5ad file")
+out.write_h5ad(par["output"], compression=9)
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/predict_modality/metrics/mse",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "lowmem",
+    "lowcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/metrics/mse/nextflow.config b/target/nextflow/predict_modality/metrics/mse/nextflow.config
new file mode 100644
index 0000000000..3d47c1782b
--- /dev/null
+++ b/target/nextflow/predict_modality/metrics/mse/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/metrics/mse'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/process_dataset/.config.vsh.yaml b/target/nextflow/predict_modality/process_dataset/.config.vsh.yaml
new file mode 100644
index 0000000000..7e7a706e86
--- /dev/null
+++ b/target/nextflow/predict_modality/process_dataset/.config.vsh.yaml
@@ -0,0 +1,648 @@
+functionality:
+  name: "process_dataset"
+  namespace: "predict_modality"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_mod1"
+    info:
+      label: "Raw dataset RNA"
+      summary: "The RNA modality of the raw dataset."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_mod2"
+    info:
+      label: "Raw dataset mod2"
+      summary: "The second modality of the raw dataset. Must be an ADT or an ATAC\
+        \ dataset"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_train_mod1"
+    info:
+      label: "Train mod1"
+      summary: "The mod1 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_train_mod2"
+    info:
+      label: "Train mod2"
+      summary: "The mod2 expression values of the train cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_test_mod1"
+    info:
+      label: "Test mod1"
+      summary: "The mod1 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - name: "normalization_id"
+          type: "string"
+          description: "The unique identifier of the normalization method used."
+          required: true
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_test_mod2"
+    info:
+      label: "Test mod2"
+      summary: "The mod2 expression values of the test cells."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalized expression values"
+          required: true
+        obs:
+        - type: "string"
+          name: "batch"
+          description: "Batch information"
+          required: true
+        - type: "double"
+          name: "size_factors"
+          description: "The size factors of the cells prior to normalization."
+          required: false
+        var:
+        - type: "string"
+          name: "gene_ids"
+          description: "The gene identifiers (if available)"
+          required: false
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A score for the feature indicating how highly variable it\
+            \ is."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "common_dataset_id"
+          description: "A common identifier for the dataset"
+          required: false
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "gene_activity_var_names"
+          description: "Names of the gene activity matrix"
+          required: false
+        obsm:
+        - type: "double"
+          name: "gene_activity"
+          description: "ATAC gene activity"
+          required: false
+    example:
+    - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--seed"
+    description: "The seed for determining the train/test split."
+    info: null
+    default:
+    - 1
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--dataset_id"
+    description: "New dataset ID"
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean"
+    name: "--swap"
+    description: "Swap mod1 and mod2"
+    info: null
+    default:
+    - false
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/common/openproblems_neurips2021/bmmc_cite"
+    dest: "resources_test/common/openproblems_neurips2021/bmmc_cite"
+  info:
+    type: "process_dataset"
+    type_info:
+      label: "Data processor"
+      summary: "A predict modality dataset processor."
+      description: "A component for processing a Common Dataset into a task-specific\
+        \ dataset.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/process_dataset/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/process_dataset"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/process_dataset/process_dataset"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/process_dataset/main.nf b/target/nextflow/predict_modality/process_dataset/main.nf
new file mode 100644
index 0000000000..756eca6fba
--- /dev/null
+++ b/target/nextflow/predict_modality/process_dataset/main.nf
@@ -0,0 +1,4365 @@
+// process_dataset 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_dataset",
+    "namespace" : "predict_modality",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_mod1",
+        "info" : {
+          "label" : "Raw dataset RNA",
+          "summary" : "The RNA modality of the raw dataset.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_mod2",
+        "info" : {
+          "label" : "Raw dataset mod2",
+          "summary" : "The second modality of the raw dataset. Must be an ADT or an ATAC dataset",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_train_mod1",
+        "info" : {
+          "label" : "Train mod1",
+          "summary" : "The mod1 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_train_mod2",
+        "info" : {
+          "label" : "Train mod2",
+          "summary" : "The mod2 expression values of the train cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_test_mod1",
+        "info" : {
+          "label" : "Test mod1",
+          "summary" : "The mod1 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "normalization_id",
+                "type" : "string",
+                "description" : "The unique identifier of the normalization method used.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_test_mod2",
+        "info" : {
+          "label" : "Test mod2",
+          "summary" : "The mod2 expression values of the test cells.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalized expression values",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "Batch information",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "size_factors",
+                "description" : "The size factors of the cells prior to normalization.",
+                "required" : false
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "gene_ids",
+                "description" : "The gene identifiers (if available)",
+                "required" : false
+              },
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A score for the feature indicating how highly variable it is.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "common_dataset_id",
+                "description" : "A common identifier for the dataset",
+                "required" : false
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "gene_activity_var_names",
+                "description" : "Names of the gene activity matrix",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "gene_activity",
+                "description" : "ATAC gene activity",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--seed",
+        "description" : "The seed for determining the train/test split.",
+        "default" : [
+          1
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--dataset_id",
+        "description" : "New dataset ID",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "boolean",
+        "name" : "--swap",
+        "description" : "Swap mod1 and mod2",
+        "default" : [
+          false
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/process_dataset/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/common/openproblems_neurips2021/bmmc_cite",
+        "dest" : "resources_test/common/openproblems_neurips2021/bmmc_cite",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "process_dataset",
+      "type_info" : {
+        "label" : "Data processor",
+        "summary" : "A predict modality dataset processor.",
+        "description" : "A component for processing a Common Dataset into a task-specific dataset.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "hightime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/process_dataset/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/process_dataset",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+cat("Loading dependencies\\\\n")
+library(anndata, warn.conflicts = FALSE)
+library(Matrix, warn.conflicts = FALSE)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_mod1" = $( if [ ! -z ${VIASH_PAR_INPUT_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_MOD1" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_mod2" = $( if [ ! -z ${VIASH_PAR_INPUT_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_MOD2" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output_train_mod1" = $( if [ ! -z ${VIASH_PAR_OUTPUT_TRAIN_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT_TRAIN_MOD1" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output_train_mod2" = $( if [ ! -z ${VIASH_PAR_OUTPUT_TRAIN_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT_TRAIN_MOD2" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output_test_mod1" = $( if [ ! -z ${VIASH_PAR_OUTPUT_TEST_MOD1+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT_TEST_MOD1" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output_test_mod2" = $( if [ ! -z ${VIASH_PAR_OUTPUT_TEST_MOD2+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT_TEST_MOD2" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "seed" = $( if [ ! -z ${VIASH_PAR_SEED+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_SEED" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "dataset_id" = $( if [ ! -z ${VIASH_PAR_DATASET_ID+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_DATASET_ID" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "swap" = $( if [ ! -z ${VIASH_PAR_SWAP+x} ]; then echo -n "as.logical(toupper('"; echo -n "$VIASH_PAR_SWAP" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'))"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Using seed ", par\\$seed, "\\\\n", sep = "")
+set.seed(par\\$seed)
+
+cat("Reading input data\\\\n")
+ad1 <- anndata::read_h5ad(if (!par\\$swap) par\\$input_mod1 else par\\$input_mod2)
+ad2 <- anndata::read_h5ad(if (!par\\$swap) par\\$input_mod2 else par\\$input_mod1)
+
+# use heuristic to determine modality
+# TODO: should be removed once modality is stored in the uns
+determine_modality <- function(ad, mod1 = TRUE) {
+  if ("modality" %in% names(ad\\$uns)) {
+    ad\\$uns[["modality"]]
+  } else if ("feature_types" %in% colnames(ad\\$var)) {
+    unique(ad\\$var[["feature_types"]])
+  } else if (mod1) {
+    "GEX"
+  } else if (grepl("cite", ad\\$uns[["dataset_id"]])) {
+    "ADT"
+  } else if (grepl("multiome", ad\\$uns[["dataset_id"]])) {
+    "ATAC"
+  } else {
+    stop("Could not determine modality")
+  }
+}
+ad1_mod <- determine_modality(ad1, !par\\$swap)
+ad2_mod <- determine_modality(ad2, par\\$swap)
+
+# determine new uns
+uns_vars <- c("dataset_id", "dataset_name", "dataset_url", "dataset_reference", "dataset_summary", "dataset_description", "dataset_organism", "normalization_id")
+ad1_uns <- ad1\\$uns[uns_vars]
+ad2_uns <- ad2\\$uns[uns_vars]
+ad1_uns\\$modality <- ad1_mod
+ad2_uns\\$modality <- ad2_mod
+
+# Create new dataset id and name depending on the modality
+if (!is.null(par\\$dataset_id)) {
+  ad1_uns[["common_dataset_id"]] <- ad2_uns[["common_dataset_id"]] <- ad1_uns\\$dataset_id
+  ad1_uns\\$dataset_id <- ad2_uns\\$dataset_id <- par\\$dataset_id
+}
+
+new_dataset_name <- paste0(ad1_uns\\$dataset_name, " (", ad1_mod, "2", ad2_mod, ")")
+ad1_uns\\$dataset_name <- ad2_uns\\$dataset_name <- new_dataset_name
+
+# determine new obsm
+ad1_obsm <- ad2_obsm <- list()
+
+# determine new varm
+ad1_var <- ad1\\$var[, intersect(colnames(ad1\\$var), c("gene_ids", "hvg", "hvg_score")), drop = FALSE]
+ad2_var <- ad2\\$var[, intersect(colnames(ad2\\$var), c("gene_ids", "hvg", "hvg_score")), drop = FALSE]
+
+if (ad1_mod == "ATAC" && "gene_activity" %in% names(ad1\\$obsm)) {
+  # copy gene activity in new object
+  ad1_uns\\$gene_activity_var_names <- ad1\\$uns\\$gene_activity_var_names
+  ad1_obsm\\$gene_activity <- as(ad1\\$obsm\\$gene_activity, "CsparseMatrix")
+}
+
+if (ad2_mod == "ATAC") {
+  # subset to make the task computationally feasible
+  if (ncol(ad2) > 10000) {
+    poss_ix <- which(Matrix::colSums(ad2\\$layers[["normalized"]]) > 0)
+    sel_ix <- sort(sample(poss_ix, 10000))
+    ad2 <- ad2[, sel_ix]\\$copy()
+    ad2_var <- ad2_var[sel_ix, , drop = FALSE]
+  }
+
+  if ("gene_activity" %in% names(ad2\\$obsm)) {
+    # copy gene activity in new object
+    ad2_uns\\$gene_activity_var_names <- ad2\\$uns\\$gene_activity_var_names
+    ad2_obsm\\$gene_activity <- as(ad2\\$obsm\\$gene_activity, "CsparseMatrix")
+  }
+}
+
+cat("Creating train/test split\\\\n")
+is_train <- which(ad1\\$obs[["is_train"]] == "train")
+is_test <- which(!ad1\\$obs[["is_train"]] == "train")
+
+# sample cells
+if (length(is_test) > 1000) {
+  ct <- as.character(ad1\\$obs[["cell_type"]][is_test])
+  ct_tab <- table(ct)
+  ct_freq <- setNames(as.vector(ct_tab) / sum(ct_tab), names(ct_tab))
+  is_test <- sample(is_test, 1000, prob = sqrt(1 / ct_freq[ct]))
+}
+
+train_obs <- ad1\\$obs[is_train, intersect(colnames(ad1\\$obs), c("batch", "size_factors")), drop = FALSE]
+test_obs <- ad1\\$obs[is_test, intersect(colnames(ad1\\$obs), c("batch", "size_factors")), drop = FALSE]
+subset_mats <- function(li, obs_filt) {
+  out <- list()
+  for (n in names(li)) {
+    out[[n]] <- li[[n]][obs_filt, , drop = FALSE]
+  }
+  out
+}
+
+cat("Create train objects\\\\n")
+output_train_mod1 <- anndata::AnnData(
+  layers = subset_mats(list(counts = ad1\\$layers[["counts"]], normalized = ad1\\$layers[["normalized"]]), is_train),
+  obsm = subset_mats(ad1_obsm, is_train),
+  obs = train_obs,
+  var = ad1_var,
+  uns = ad1_uns
+)
+output_train_mod2 <- anndata::AnnData(
+  layers = subset_mats(list(counts = ad2\\$layers[["counts"]], normalized = ad2\\$layers[["normalized"]]), is_train),
+  obsm = subset_mats(ad2_obsm, is_train),
+  obs = train_obs,
+  var = ad2_var,
+  uns = ad2_uns
+)
+
+cat("Create test objects\\\\n")
+output_test_mod1 <- anndata::AnnData(
+  layers = subset_mats(list(counts = ad1\\$layers[["counts"]], normalized = ad1\\$layers[["normalized"]]), is_test),
+  obsm = subset_mats(ad1_obsm, is_test),
+  obs = test_obs,
+  var = ad1_var,
+  uns = ad1_uns
+)
+output_test_mod2 <- anndata::AnnData(
+  layers = subset_mats(list(counts = ad2\\$layers[["counts"]], normalized = ad2\\$layers[["normalized"]]), is_test),
+  obsm = subset_mats(ad2_obsm, is_test),
+  obs = test_obs,
+  var = ad2_var,
+  uns = ad2_uns
+)
+
+cat("Saving output files as h5ad\\\\n")
+zzz <- output_train_mod1\\$write_h5ad(par\\$output_train_mod1, compression = "gzip")
+zzz <- output_train_mod2\\$write_h5ad(par\\$output_train_mod2, compression = "gzip")
+zzz <- output_test_mod1\\$write_h5ad(par\\$output_test_mod1, compression = "gzip")
+zzz <- output_test_mod2\\$write_h5ad(par\\$output_test_mod2, compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/predict_modality/process_dataset",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "hightime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/process_dataset/nextflow.config b/target/nextflow/predict_modality/process_dataset/nextflow.config
new file mode 100644
index 0000000000..12fcd39a01
--- /dev/null
+++ b/target/nextflow/predict_modality/process_dataset/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/process_dataset'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/workflows/process_datasets/.config.vsh.yaml b/target/nextflow/predict_modality/workflows/process_datasets/.config.vsh.yaml
new file mode 100644
index 0000000000..836941bf16
--- /dev/null
+++ b/target/nextflow/predict_modality/workflows/process_datasets/.config.vsh.yaml
@@ -0,0 +1,647 @@
+functionality:
+  name: "process_datasets"
+  namespace: "predict_modality/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input_mod1"
+      info:
+        label: "Raw dataset RNA"
+        summary: "The RNA modality of the raw dataset."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors of the cells prior to normalization."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: true
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A score for the feature indicating how highly variable it\
+              \ is."
+            required: true
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - name: "normalization_id"
+            type: "string"
+            description: "The unique identifier of the normalization method used."
+            required: true
+          - type: "string"
+            name: "gene_activity_var_names"
+            description: "Names of the gene activity matrix"
+            required: false
+          obsm:
+          - type: "double"
+            name: "gene_activity"
+            description: "ATAC gene activity"
+            required: false
+      example:
+      - "resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod1.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_mod2"
+      info:
+        label: "Raw dataset mod2"
+        summary: "The second modality of the raw dataset. Must be an ADT or an ATAC\
+          \ dataset"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors of the cells prior to normalization."
+            required: false
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: true
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A score for the feature indicating how highly variable it\
+              \ is."
+            required: true
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - name: "normalization_id"
+            type: "string"
+            description: "The unique identifier of the normalization method used."
+            required: true
+          - type: "string"
+            name: "gene_activity_var_names"
+            description: "Names of the gene activity matrix"
+            required: false
+          obsm:
+          - type: "double"
+            name: "gene_activity"
+            description: "ATAC gene activity"
+            required: false
+      example:
+      - "resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod2.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_train_mod1"
+      info:
+        label: "Train mod1"
+        summary: "The mod1 expression values of the train cells."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors of the cells prior to normalization."
+            required: false
+          var:
+          - type: "string"
+            name: "gene_ids"
+            description: "The gene identifiers (if available)"
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A score for the feature indicating how highly variable it\
+              \ is."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "common_dataset_id"
+            description: "A common identifier for the dataset"
+            required: false
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - name: "normalization_id"
+            type: "string"
+            description: "The unique identifier of the normalization method used."
+            required: true
+          - type: "string"
+            name: "gene_activity_var_names"
+            description: "Names of the gene activity matrix"
+            required: false
+          obsm:
+          - type: "double"
+            name: "gene_activity"
+            description: "ATAC gene activity"
+            required: false
+      example:
+      - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_train_mod2"
+      info:
+        label: "Train mod2"
+        summary: "The mod2 expression values of the train cells."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors of the cells prior to normalization."
+            required: false
+          var:
+          - type: "string"
+            name: "gene_ids"
+            description: "The gene identifiers (if available)"
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A score for the feature indicating how highly variable it\
+              \ is."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "common_dataset_id"
+            description: "A common identifier for the dataset"
+            required: false
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - name: "normalization_id"
+            type: "string"
+            description: "The unique identifier of the normalization method used."
+            required: true
+          - type: "string"
+            name: "gene_activity_var_names"
+            description: "Names of the gene activity matrix"
+            required: false
+          obsm:
+          - type: "double"
+            name: "gene_activity"
+            description: "ATAC gene activity"
+            required: false
+      example:
+      - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_test_mod1"
+      info:
+        label: "Test mod1"
+        summary: "The mod1 expression values of the test cells."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors of the cells prior to normalization."
+            required: false
+          var:
+          - type: "string"
+            name: "gene_ids"
+            description: "The gene identifiers (if available)"
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A score for the feature indicating how highly variable it\
+              \ is."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "common_dataset_id"
+            description: "A common identifier for the dataset"
+            required: false
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - name: "normalization_id"
+            type: "string"
+            description: "The unique identifier of the normalization method used."
+            required: true
+          - type: "string"
+            name: "gene_activity_var_names"
+            description: "Names of the gene activity matrix"
+            required: false
+          obsm:
+          - type: "double"
+            name: "gene_activity"
+            description: "ATAC gene activity"
+            required: false
+      example:
+      - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_test_mod2"
+      info:
+        label: "Test mod2"
+        summary: "The mod2 expression values of the test cells."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors of the cells prior to normalization."
+            required: false
+          var:
+          - type: "string"
+            name: "gene_ids"
+            description: "The gene identifiers (if available)"
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A score for the feature indicating how highly variable it\
+              \ is."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "common_dataset_id"
+            description: "A common identifier for the dataset"
+            required: false
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "string"
+            name: "gene_activity_var_names"
+            description: "Names of the gene activity matrix"
+            required: false
+          obsm:
+          - type: "double"
+            name: "gene_activity"
+            description: "ATAC gene activity"
+            required: false
+      example:
+      - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "src/wf_utils/helper.nf"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/check_dataset_schema"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+    configInfo:
+      functionalityName: "check_dataset_schema"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/check_dataset_schema/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+  - name: "common/extract_metadata"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+    configInfo:
+      functionalityName: "extract_metadata"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/extract_metadata/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  - name: "predict_modality/process_dataset"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/process_dataset/config.vsh.yaml"
+    configInfo:
+      functionalityName: "process_dataset"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/process_dataset/config.vsh.yaml"
+      functionalityNamespace: "predict_modality"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/predict_modality/process_dataset/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/process_dataset"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/workflows/process_datasets/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/workflows/process_datasets"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/workflows/process_datasets/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/workflows/process_datasets/helper.nf b/target/nextflow/predict_modality/workflows/process_datasets/helper.nf
new file mode 100644
index 0000000000..7b3acd5b1c
--- /dev/null
+++ b/target/nextflow/predict_modality/workflows/process_datasets/helper.nf
@@ -0,0 +1,14 @@
+Map findArgumentSchema(Map config, String argument_id) {
+  def argument_groups =
+    (config.functionality.argument_groups ?: []) +
+    [
+      arguments: config.functionality.arguments ?: []
+    ]
+
+  def schema_value = argument_groups.findResult{ gr ->
+    gr.arguments.find { arg ->
+      arg.name == ("--" + argument_id)
+    }
+  }
+  return schema_value
+}
diff --git a/target/nextflow/predict_modality/workflows/process_datasets/main.nf b/target/nextflow/predict_modality/workflows/process_datasets/main.nf
new file mode 100644
index 0000000000..df7635fe2d
--- /dev/null
+++ b/target/nextflow/predict_modality/workflows/process_datasets/main.nf
@@ -0,0 +1,3980 @@
+// process_datasets 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_datasets",
+    "namespace" : "predict_modality/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input_mod1",
+            "info" : {
+              "label" : "Raw dataset RNA",
+              "summary" : "The RNA modality of the raw dataset.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors of the cells prior to normalization.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A score for the feature indicating how highly variable it is.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "normalization_id",
+                    "type" : "string",
+                    "description" : "The unique identifier of the normalization method used.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "gene_activity_var_names",
+                    "description" : "Names of the gene activity matrix",
+                    "required" : false
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "gene_activity",
+                    "description" : "ATAC gene activity",
+                    "required" : false
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod1.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_mod2",
+            "info" : {
+              "label" : "Raw dataset mod2",
+              "summary" : "The second modality of the raw dataset. Must be an ADT or an ATAC dataset",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors of the cells prior to normalization.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A score for the feature indicating how highly variable it is.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "normalization_id",
+                    "type" : "string",
+                    "description" : "The unique identifier of the normalization method used.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "gene_activity_var_names",
+                    "description" : "Names of the gene activity matrix",
+                    "required" : false
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "gene_activity",
+                    "description" : "ATAC gene activity",
+                    "required" : false
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/openproblems_neurips2021/bmmc_cite/dataset_mod2.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_train_mod1",
+            "info" : {
+              "label" : "Train mod1",
+              "summary" : "The mod1 expression values of the train cells.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors of the cells prior to normalization.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "gene_ids",
+                    "description" : "The gene identifiers (if available)",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A score for the feature indicating how highly variable it is.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "common_dataset_id",
+                    "description" : "A common identifier for the dataset",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "normalization_id",
+                    "type" : "string",
+                    "description" : "The unique identifier of the normalization method used.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "gene_activity_var_names",
+                    "description" : "Names of the gene activity matrix",
+                    "required" : false
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "gene_activity",
+                    "description" : "ATAC gene activity",
+                    "required" : false
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_train_mod2",
+            "info" : {
+              "label" : "Train mod2",
+              "summary" : "The mod2 expression values of the train cells.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors of the cells prior to normalization.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "gene_ids",
+                    "description" : "The gene identifiers (if available)",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A score for the feature indicating how highly variable it is.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "common_dataset_id",
+                    "description" : "A common identifier for the dataset",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "normalization_id",
+                    "type" : "string",
+                    "description" : "The unique identifier of the normalization method used.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "gene_activity_var_names",
+                    "description" : "Names of the gene activity matrix",
+                    "required" : false
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "gene_activity",
+                    "description" : "ATAC gene activity",
+                    "required" : false
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_test_mod1",
+            "info" : {
+              "label" : "Test mod1",
+              "summary" : "The mod1 expression values of the test cells.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors of the cells prior to normalization.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "gene_ids",
+                    "description" : "The gene identifiers (if available)",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A score for the feature indicating how highly variable it is.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "common_dataset_id",
+                    "description" : "A common identifier for the dataset",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "normalization_id",
+                    "type" : "string",
+                    "description" : "The unique identifier of the normalization method used.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "gene_activity_var_names",
+                    "description" : "Names of the gene activity matrix",
+                    "required" : false
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "gene_activity",
+                    "description" : "ATAC gene activity",
+                    "required" : false
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_test_mod2",
+            "info" : {
+              "label" : "Test mod2",
+              "summary" : "The mod2 expression values of the test cells.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors of the cells prior to normalization.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "gene_ids",
+                    "description" : "The gene identifiers (if available)",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A score for the feature indicating how highly variable it is.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "common_dataset_id",
+                    "description" : "A common identifier for the dataset",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "gene_activity_var_names",
+                    "description" : "Names of the gene activity matrix",
+                    "required" : false
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "gene_activity",
+                    "description" : "ATAC gene activity",
+                    "required" : false
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/workflows/process_datasets/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "src/wf_utils/helper.nf",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/check_dataset_schema",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "check_dataset_schema",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/check_dataset_schema/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+      },
+      {
+        "name" : "common/extract_metadata",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "extract_metadata",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/extract_metadata/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+      },
+      {
+        "name" : "predict_modality/process_dataset",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/process_dataset/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "process_dataset",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/process_dataset/config.vsh.yaml",
+          "functionalityNamespace" : "predict_modality",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/predict_modality/process_dataset/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/process_dataset"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/workflows/process_datasets/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/workflows/process_datasets",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { check_dataset_schema } from "${meta.resources_dir}/../../../../nextflow/common/check_dataset_schema/main.nf"
+include { extract_metadata } from "${meta.resources_dir}/../../../../nextflow/common/extract_metadata/main.nf"
+include { process_dataset } from "${meta.resources_dir}/../../../../nextflow/predict_modality/process_dataset/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  output_ch = input_ch
+  
+    // Check if the input datasets match the desired format --------------------------------
+    | check_dataset_schema.run(
+      key: "check_dataset_schema_mod1",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input_mod1")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input_mod1,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset_mod1": checks["exit_code"] == 0 ? state.input_mod1 : null,
+        ]
+      }
+    )
+
+    | check_dataset_schema.run(
+      key: "check_dataset_schema_mod2",
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input_mod2")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input_mod2,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset_mod2": checks["exit_code"] == 0 ? state.input_mod2 : null,
+        ]
+      }
+    )
+
+    // remove datasets which didn't pass the schema check
+    | filter { id, state ->
+      state.dataset_mod1 != null &&
+      state.dataset_mod2 != null
+    }
+
+    // Use datasets in both directions (mod1 -> mod2 and mod2 -> mod1) ---------------------
+    // extract the dataset metadata
+    | extract_metadata.run(
+      key: "extract_metadata",
+      fromState: [input: "dataset_mod1"],
+      toState: { id, output, state ->
+        def uns = readYaml(output.output).uns
+        state + [
+          "dataset_id": uns.dataset_id,
+          "normalization_id": uns.normalization_id
+        ]
+      }
+    )
+
+    // Add swap direction to the state and set new id
+    | flatMap{id, state -> 
+      ["normal", "swap"].collect { dir ->
+        // Add direction (normal / swap) to id  
+        // Note: this id is added before the normalisation id  
+        // Example old id: dataset_loader/dataset_id/normalization_id  
+        // Example new id: dataset_loader/dataset_id/direction/normalization_id
+        def orig_dataset_id = id.replaceAll("/${state.normalization_id}", "")
+        def normalization_id = id.replaceAll("^${orig_dataset_id}", "")
+        def new_dataset_id = orig_dataset_id + "/" + dir
+        def new_id = new_dataset_id + normalization_id
+
+        [new_id, state + [dataset_id: new_dataset_id, direction: dir, "_meta": [join_id: id]]]
+      }
+    }
+
+    | process_dataset.run(
+      fromState: { id, state ->
+        def swap_state = state.direction == "swap" ? true : false
+        [
+          dataset_id: state.dataset_id,
+          input_mod1: state.dataset_mod1,
+          input_mod2: state.dataset_mod2,
+          output_train_mod1: state.output_train_mod1,
+          output_train_mod2: state.output_train_mod2,
+          output_test_mod1: state.output_test_mod1,
+          output_test_mod2: state.output_test_mod2,
+          swap: swap_state
+        ]
+      },
+      toState: [
+        "output_train_mod1",
+        "output_train_mod2",
+        "output_test_mod1",
+        "output_test_mod2"
+      ]
+    )
+
+    // only output the files for which an output file was specified
+    | setState ([
+      "output_train_mod1",
+      "output_train_mod2",
+      "output_test_mod1",
+      "output_test_mod2",
+      "_meta"
+    ])
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/workflows/process_datasets/nextflow.config b/target/nextflow/predict_modality/workflows/process_datasets/nextflow.config
new file mode 100644
index 0000000000..07237499a2
--- /dev/null
+++ b/target/nextflow/predict_modality/workflows/process_datasets/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/workflows/process_datasets'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/workflows/run_benchmark/.config.vsh.yaml b/target/nextflow/predict_modality/workflows/run_benchmark/.config.vsh.yaml
new file mode 100644
index 0000000000..e22630486f
--- /dev/null
+++ b/target/nextflow/predict_modality/workflows/run_benchmark/.config.vsh.yaml
@@ -0,0 +1,715 @@
+functionality:
+  name: "run_benchmark"
+  namespace: "predict_modality/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input_train_mod1"
+      info:
+        label: "Train mod1"
+        summary: "The mod1 expression values of the train cells."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors of the cells prior to normalization."
+            required: false
+          var:
+          - type: "string"
+            name: "gene_ids"
+            description: "The gene identifiers (if available)"
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A score for the feature indicating how highly variable it\
+              \ is."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "common_dataset_id"
+            description: "A common identifier for the dataset"
+            required: false
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - name: "normalization_id"
+            type: "string"
+            description: "The unique identifier of the normalization method used."
+            required: true
+          - type: "string"
+            name: "gene_activity_var_names"
+            description: "Names of the gene activity matrix"
+            required: false
+          obsm:
+          - type: "double"
+            name: "gene_activity"
+            description: "ATAC gene activity"
+            required: false
+      example:
+      - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_train_mod2"
+      info:
+        label: "Train mod2"
+        summary: "The mod2 expression values of the train cells."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors of the cells prior to normalization."
+            required: false
+          var:
+          - type: "string"
+            name: "gene_ids"
+            description: "The gene identifiers (if available)"
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A score for the feature indicating how highly variable it\
+              \ is."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "common_dataset_id"
+            description: "A common identifier for the dataset"
+            required: false
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - name: "normalization_id"
+            type: "string"
+            description: "The unique identifier of the normalization method used."
+            required: true
+          - type: "string"
+            name: "gene_activity_var_names"
+            description: "Names of the gene activity matrix"
+            required: false
+          obsm:
+          - type: "double"
+            name: "gene_activity"
+            description: "ATAC gene activity"
+            required: false
+      example:
+      - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_test_mod1"
+      info:
+        label: "Test mod1"
+        summary: "The mod1 expression values of the test cells."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors of the cells prior to normalization."
+            required: false
+          var:
+          - type: "string"
+            name: "gene_ids"
+            description: "The gene identifiers (if available)"
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A score for the feature indicating how highly variable it\
+              \ is."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "common_dataset_id"
+            description: "A common identifier for the dataset"
+            required: false
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - name: "normalization_id"
+            type: "string"
+            description: "The unique identifier of the normalization method used."
+            required: true
+          - type: "string"
+            name: "gene_activity_var_names"
+            description: "Names of the gene activity matrix"
+            required: false
+          obsm:
+          - type: "double"
+            name: "gene_activity"
+            description: "ATAC gene activity"
+            required: false
+      example:
+      - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_test_mod2"
+      info:
+        label: "Test mod2"
+        summary: "The mod2 expression values of the test cells."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalized expression values"
+            required: true
+          obs:
+          - type: "string"
+            name: "batch"
+            description: "Batch information"
+            required: true
+          - type: "double"
+            name: "size_factors"
+            description: "The size factors of the cells prior to normalization."
+            required: false
+          var:
+          - type: "string"
+            name: "gene_ids"
+            description: "The gene identifiers (if available)"
+            required: false
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A score for the feature indicating how highly variable it\
+              \ is."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - type: "string"
+            name: "common_dataset_id"
+            description: "A common identifier for the dataset"
+            required: false
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "string"
+            name: "gene_activity_var_names"
+            description: "Names of the gene activity matrix"
+            required: false
+          obsm:
+          - type: "double"
+            name: "gene_activity"
+            description: "ATAC gene activity"
+            required: false
+      example:
+      - "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_scores"
+      description: "A yaml file containing the scores of each of the methods"
+      info: null
+      default:
+      - "score_uns.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_method_configs"
+      info: null
+      default:
+      - "method_configs.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_metric_configs"
+      info: null
+      default:
+      - "metric_configs.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_dataset_info"
+      info: null
+      default:
+      - "dataset_uns.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_task_info"
+      info: null
+      default:
+      - "task_info.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Methods"
+    arguments:
+    - type: "string"
+      name: "--method_ids"
+      description: "A list of method ids to run. If not specified, all methods will\
+        \ be run."
+      info: null
+      required: false
+      direction: "input"
+      multiple: true
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "src/tasks/predict_modality/api/task_info.yaml"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/check_dataset_schema"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+    configInfo:
+      functionalityName: "check_dataset_schema"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/check_dataset_schema/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+  - name: "common/extract_metadata"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+    configInfo:
+      functionalityName: "extract_metadata"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/extract_metadata/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  - name: "predict_modality/control_methods/mean_per_gene"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/meanpergene/config.vsh.yaml"
+    configInfo:
+      functionalityName: "mean_per_gene"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/meanpergene/config.vsh.yaml"
+      functionalityNamespace: "predict_modality/control_methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/predict_modality/control_methods/mean_per_gene/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/mean_per_gene"
+  - name: "predict_modality/control_methods/random_predict"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/random_predict/config.vsh.yaml"
+    configInfo:
+      functionalityName: "random_predict"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/random_predict/config.vsh.yaml"
+      functionalityNamespace: "predict_modality/control_methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/predict_modality/control_methods/random_predict/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/random_predict"
+  - name: "predict_modality/control_methods/zeros"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/zeros/config.vsh.yaml"
+    configInfo:
+      functionalityName: "zeros"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/zeros/config.vsh.yaml"
+      functionalityNamespace: "predict_modality/control_methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/predict_modality/control_methods/zeros/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/zeros"
+  - name: "predict_modality/control_methods/solution"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/solution/config.vsh.yaml"
+    configInfo:
+      functionalityName: "solution"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/solution/config.vsh.yaml"
+      functionalityNamespace: "predict_modality/control_methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/predict_modality/control_methods/solution/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/solution"
+  - name: "predict_modality/methods/knnr_py"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/knnr_py/config.vsh.yaml"
+    configInfo:
+      functionalityName: "knnr_py"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/knnr_py/config.vsh.yaml"
+      functionalityNamespace: "predict_modality/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/predict_modality/methods/knnr_py/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/knnr_py"
+  - name: "predict_modality/methods/knnr_r"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/knnr_r/config.vsh.yaml"
+    configInfo:
+      functionalityName: "knnr_r"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/knnr_r/config.vsh.yaml"
+      functionalityNamespace: "predict_modality/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/predict_modality/methods/knnr_r/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/knnr_r"
+  - name: "predict_modality/methods/lm"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/lm/config.vsh.yaml"
+    configInfo:
+      functionalityName: "lm"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/lm/config.vsh.yaml"
+      functionalityNamespace: "predict_modality/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/predict_modality/methods/lm/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/lm"
+  - name: "predict_modality/methods/lmds_irlba_rf"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/lmds_irlba_rf/config.vsh.yaml"
+    configInfo:
+      functionalityName: "lmds_irlba_rf"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/lmds_irlba_rf/config.vsh.yaml"
+      functionalityNamespace: "predict_modality/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/predict_modality/methods/lmds_irlba_rf/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/lmds_irlba_rf"
+  - name: "predict_modality/methods/guanlab_dengkw_pm"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/guanlab_dengkw_pm/config.vsh.yaml"
+    configInfo:
+      functionalityName: "guanlab_dengkw_pm"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/guanlab_dengkw_pm/config.vsh.yaml"
+      functionalityNamespace: "predict_modality/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/predict_modality/methods/guanlab_dengkw_pm/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/guanlab_dengkw_pm"
+  - name: "predict_modality/methods/simplemlp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/run/config.vsh.yaml"
+    configInfo:
+      functionalityName: "simplemlp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/run/config.vsh.yaml"
+      functionalityNamespace: "predict_modality/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/predict_modality/methods/simplemlp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/simplemlp"
+  - name: "predict_modality/methods/novel"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/run/config.vsh.yaml"
+    configInfo:
+      functionalityName: "novel"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/run/config.vsh.yaml"
+      functionalityNamespace: "predict_modality/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/predict_modality/methods/novel/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/novel"
+  - name: "predict_modality/metrics/correlation"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/metrics/correlation/config.vsh.yaml"
+    configInfo:
+      functionalityName: "correlation"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/metrics/correlation/config.vsh.yaml"
+      functionalityNamespace: "predict_modality/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/predict_modality/metrics/correlation/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/metrics/correlation"
+  - name: "predict_modality/metrics/mse"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/metrics/mse/config.vsh.yaml"
+    configInfo:
+      functionalityName: "mse"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/metrics/mse/config.vsh.yaml"
+      functionalityNamespace: "predict_modality/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/predict_modality/metrics/mse/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/metrics/mse"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/workflows/run_benchmark/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/workflows/run_benchmark"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/workflows/run_benchmark/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/predict_modality/workflows/run_benchmark/main.nf b/target/nextflow/predict_modality/workflows/run_benchmark/main.nf
new file mode 100644
index 0000000000..1c1c924af0
--- /dev/null
+++ b/target/nextflow/predict_modality/workflows/run_benchmark/main.nf
@@ -0,0 +1,4122 @@
+// run_benchmark 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "run_benchmark",
+    "namespace" : "predict_modality/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input_train_mod1",
+            "info" : {
+              "label" : "Train mod1",
+              "summary" : "The mod1 expression values of the train cells.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors of the cells prior to normalization.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "gene_ids",
+                    "description" : "The gene identifiers (if available)",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A score for the feature indicating how highly variable it is.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "common_dataset_id",
+                    "description" : "A common identifier for the dataset",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "normalization_id",
+                    "type" : "string",
+                    "description" : "The unique identifier of the normalization method used.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "gene_activity_var_names",
+                    "description" : "Names of the gene activity matrix",
+                    "required" : false
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "gene_activity",
+                    "description" : "ATAC gene activity",
+                    "required" : false
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod1.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_train_mod2",
+            "info" : {
+              "label" : "Train mod2",
+              "summary" : "The mod2 expression values of the train cells.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors of the cells prior to normalization.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "gene_ids",
+                    "description" : "The gene identifiers (if available)",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A score for the feature indicating how highly variable it is.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "common_dataset_id",
+                    "description" : "A common identifier for the dataset",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "normalization_id",
+                    "type" : "string",
+                    "description" : "The unique identifier of the normalization method used.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "gene_activity_var_names",
+                    "description" : "Names of the gene activity matrix",
+                    "required" : false
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "gene_activity",
+                    "description" : "ATAC gene activity",
+                    "required" : false
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/train_mod2.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_test_mod1",
+            "info" : {
+              "label" : "Test mod1",
+              "summary" : "The mod1 expression values of the test cells.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors of the cells prior to normalization.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "gene_ids",
+                    "description" : "The gene identifiers (if available)",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A score for the feature indicating how highly variable it is.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "common_dataset_id",
+                    "description" : "A common identifier for the dataset",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "normalization_id",
+                    "type" : "string",
+                    "description" : "The unique identifier of the normalization method used.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "gene_activity_var_names",
+                    "description" : "Names of the gene activity matrix",
+                    "required" : false
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "gene_activity",
+                    "description" : "ATAC gene activity",
+                    "required" : false
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod1.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_test_mod2",
+            "info" : {
+              "label" : "Test mod2",
+              "summary" : "The mod2 expression values of the test cells.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalized expression values",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "Batch information",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "size_factors",
+                    "description" : "The size factors of the cells prior to normalization.",
+                    "required" : false
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "gene_ids",
+                    "description" : "The gene identifiers (if available)",
+                    "required" : false
+                  },
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A score for the feature indicating how highly variable it is.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "common_dataset_id",
+                    "description" : "A common identifier for the dataset",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "gene_activity_var_names",
+                    "description" : "Names of the gene activity matrix",
+                    "required" : false
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "gene_activity",
+                    "description" : "ATAC gene activity",
+                    "required" : false
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/predict_modality/openproblems_neurips2021/bmmc_cite/swap/test_mod2.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_scores",
+            "description" : "A yaml file containing the scores of each of the methods",
+            "default" : [
+              "score_uns.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_method_configs",
+            "default" : [
+              "method_configs.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_metric_configs",
+            "default" : [
+              "metric_configs.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_dataset_info",
+            "default" : [
+              "dataset_uns.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_task_info",
+            "default" : [
+              "task_info.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Methods",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--method_ids",
+            "description" : "A list of method ids to run. If not specified, all methods will be run.",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : true,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/workflows/run_benchmark/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "src/tasks/predict_modality/api/task_info.yaml",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/check_dataset_schema",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "check_dataset_schema",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/check_dataset_schema/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+      },
+      {
+        "name" : "common/extract_metadata",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "extract_metadata",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/extract_metadata/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+      },
+      {
+        "name" : "predict_modality/control_methods/mean_per_gene",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/meanpergene/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "mean_per_gene",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/meanpergene/config.vsh.yaml",
+          "functionalityNamespace" : "predict_modality/control_methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/predict_modality/control_methods/mean_per_gene/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/mean_per_gene"
+      },
+      {
+        "name" : "predict_modality/control_methods/random_predict",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/random_predict/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "random_predict",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/random_predict/config.vsh.yaml",
+          "functionalityNamespace" : "predict_modality/control_methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/predict_modality/control_methods/random_predict/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/random_predict"
+      },
+      {
+        "name" : "predict_modality/control_methods/zeros",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/zeros/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "zeros",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/zeros/config.vsh.yaml",
+          "functionalityNamespace" : "predict_modality/control_methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/predict_modality/control_methods/zeros/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/zeros"
+      },
+      {
+        "name" : "predict_modality/control_methods/solution",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/solution/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "solution",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/control_methods/solution/config.vsh.yaml",
+          "functionalityNamespace" : "predict_modality/control_methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/predict_modality/control_methods/solution/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/control_methods/solution"
+      },
+      {
+        "name" : "predict_modality/methods/knnr_py",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/knnr_py/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "knnr_py",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/knnr_py/config.vsh.yaml",
+          "functionalityNamespace" : "predict_modality/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/predict_modality/methods/knnr_py/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/knnr_py"
+      },
+      {
+        "name" : "predict_modality/methods/knnr_r",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/knnr_r/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "knnr_r",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/knnr_r/config.vsh.yaml",
+          "functionalityNamespace" : "predict_modality/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/predict_modality/methods/knnr_r/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/knnr_r"
+      },
+      {
+        "name" : "predict_modality/methods/lm",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/lm/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "lm",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/lm/config.vsh.yaml",
+          "functionalityNamespace" : "predict_modality/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/predict_modality/methods/lm/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/lm"
+      },
+      {
+        "name" : "predict_modality/methods/lmds_irlba_rf",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/lmds_irlba_rf/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "lmds_irlba_rf",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/lmds_irlba_rf/config.vsh.yaml",
+          "functionalityNamespace" : "predict_modality/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/predict_modality/methods/lmds_irlba_rf/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/lmds_irlba_rf"
+      },
+      {
+        "name" : "predict_modality/methods/guanlab_dengkw_pm",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/guanlab_dengkw_pm/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "guanlab_dengkw_pm",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/guanlab_dengkw_pm/config.vsh.yaml",
+          "functionalityNamespace" : "predict_modality/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/predict_modality/methods/guanlab_dengkw_pm/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/guanlab_dengkw_pm"
+      },
+      {
+        "name" : "predict_modality/methods/simplemlp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/run/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "simplemlp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/simple_mlp/run/config.vsh.yaml",
+          "functionalityNamespace" : "predict_modality/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/predict_modality/methods/simplemlp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/simplemlp"
+      },
+      {
+        "name" : "predict_modality/methods/novel",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/run/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "novel",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/methods/novel/run/config.vsh.yaml",
+          "functionalityNamespace" : "predict_modality/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/predict_modality/methods/novel/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/methods/novel"
+      },
+      {
+        "name" : "predict_modality/metrics/correlation",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/metrics/correlation/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "correlation",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/metrics/correlation/config.vsh.yaml",
+          "functionalityNamespace" : "predict_modality/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/predict_modality/metrics/correlation/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/metrics/correlation"
+      },
+      {
+        "name" : "predict_modality/metrics/mse",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/metrics/mse/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "mse",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/metrics/mse/config.vsh.yaml",
+          "functionalityNamespace" : "predict_modality/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/predict_modality/metrics/mse/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/metrics/mse"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/predict_modality/workflows/run_benchmark/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/predict_modality/workflows/run_benchmark",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { check_dataset_schema } from "${meta.resources_dir}/../../../../nextflow/common/check_dataset_schema/main.nf"
+include { extract_metadata } from "${meta.resources_dir}/../../../../nextflow/common/extract_metadata/main.nf"
+include { mean_per_gene } from "${meta.resources_dir}/../../../../nextflow/predict_modality/control_methods/mean_per_gene/main.nf"
+include { random_predict } from "${meta.resources_dir}/../../../../nextflow/predict_modality/control_methods/random_predict/main.nf"
+include { zeros } from "${meta.resources_dir}/../../../../nextflow/predict_modality/control_methods/zeros/main.nf"
+include { solution } from "${meta.resources_dir}/../../../../nextflow/predict_modality/control_methods/solution/main.nf"
+include { knnr_py } from "${meta.resources_dir}/../../../../nextflow/predict_modality/methods/knnr_py/main.nf"
+include { knnr_r } from "${meta.resources_dir}/../../../../nextflow/predict_modality/methods/knnr_r/main.nf"
+include { lm } from "${meta.resources_dir}/../../../../nextflow/predict_modality/methods/lm/main.nf"
+include { lmds_irlba_rf } from "${meta.resources_dir}/../../../../nextflow/predict_modality/methods/lmds_irlba_rf/main.nf"
+include { guanlab_dengkw_pm } from "${meta.resources_dir}/../../../../nextflow/predict_modality/methods/guanlab_dengkw_pm/main.nf"
+include { simplemlp } from "${meta.resources_dir}/../../../../nextflow/predict_modality/methods/simplemlp/main.nf"
+include { novel } from "${meta.resources_dir}/../../../../nextflow/predict_modality/methods/novel/main.nf"
+include { correlation } from "${meta.resources_dir}/../../../../nextflow/predict_modality/metrics/correlation/main.nf"
+include { mse } from "${meta.resources_dir}/../../../../nextflow/predict_modality/metrics/mse/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // construct list of methods
+  methods = [
+    mean_per_gene,
+    random_predict,
+    zeros,
+    solution,
+    knnr_py,
+    knnr_r,
+    lm,
+    lmds_irlba_rf,
+    // newwave_knnr,
+    // random_forest,
+    guanlab_dengkw_pm,
+    simplemlp,
+    novel
+  ]
+
+  // construct list of metrics
+  metrics = [
+    correlation,
+    mse
+  ]
+
+  /****************************
+   * EXTRACT DATASET METADATA *
+   ****************************/
+  dataset_ch = input_ch
+
+    // store original id for later use
+    | map{ id, state ->
+      [id, state + ["_meta": [join_id: id]]]
+    }
+
+    // extract the dataset metadata
+    | extract_metadata.run(
+      key: "metadata_mod1",
+      fromState: [input: "input_train_mod1"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns_mod1: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | extract_metadata.run(
+      key: "metadata_mod2",
+      fromState: [input: "input_test_mod2"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns_mod2: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | map{ id, state ->
+      def rna_norm = state.dataset_uns_mod1.modality == "GEX" ? state.dataset_uns_mod1.normalization_id : state.dataset_uns_mod2.normalization_id
+      [id, state + [rna_norm: rna_norm]]
+    }
+
+  /***************************
+   * RUN METHODS AND METRICS *
+   ***************************/
+  score_ch = dataset_ch
+
+    // run all methods
+    | runEach(
+      components: methods,
+
+      // use the 'filter' argument to only run a method on the normalisation the component is asking for
+      filter: { id, state, comp ->
+        def norm = state.rna_norm
+        def pref = comp.config.functionality.info.preferred_normalization
+        // if the preferred normalisation is none at all,
+        // we can pass whichever dataset we want
+        def norm_check = (norm == "log_cp10k" && pref == "counts") || norm == pref
+        def method_check = !state.method_ids || state.method_ids.contains(comp.config.functionality.name)
+
+        method_check && norm_check
+      },
+
+      // define a new 'id' by appending the method name to the dataset id
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: { id, state, comp ->
+        def new_args = [
+          input_train_mod1: state.input_train_mod1,
+          input_train_mod2: state.input_train_mod2,
+          input_test_mod1: state.input_test_mod1
+        ]
+        if (comp.config.functionality.info.type == "control_method") {
+          new_args.input_test_mod2 = state.input_test_mod2
+        }
+        new_args
+      },
+
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          method_id: comp.config.functionality.name,
+          method_output: output.output
+        ]
+      }
+    )
+
+    // run all metrics
+    | runEach(
+      components: metrics,
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: [
+        input_test_mod2: "input_test_mod2", 
+        input_prediction: "method_output"
+      ],
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          metric_id: comp.config.functionality.name,
+          metric_output: output.output
+        ]
+      }
+    )
+
+  /******************************
+   * GENERATE OUTPUT YAML FILES *
+   ******************************/
+  // TODO: can we store everything below in a separate helper function?
+
+  // extract the dataset metadata
+  dataset_meta_ch = dataset_ch
+    // only keep one of the normalization methods
+    | filter{ id, state ->
+      state.rna_norm == "log_cp10k"
+    }
+    | joinStates { ids, states ->
+      // store the dataset metadata in a file
+      def dataset_uns = states.collect{state ->
+        def uns = state.dataset_uns_mod2.clone()
+        uns.remove("normalization_id")
+        uns
+      }
+      def dataset_uns_yaml_blob = toYamlBlob(dataset_uns)
+      def dataset_uns_file = tempFile("dataset_uns.yaml")
+      dataset_uns_file.write(dataset_uns_yaml_blob)
+
+      ["output", [output_dataset_info: dataset_uns_file]]
+    }
+
+  output_ch = score_ch
+
+    // extract the scores
+    | extract_metadata.run(
+      key: "extract_scores",
+      fromState: [input: "metric_output"],
+      toState: { id, output, state ->
+        state + [
+          score_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | joinStates { ids, states ->
+      // store the method configs in a file
+      def method_configs = methods.collect{it.config}
+      def method_configs_yaml_blob = toYamlBlob(method_configs)
+      def method_configs_file = tempFile("method_configs.yaml")
+      method_configs_file.write(method_configs_yaml_blob)
+
+      // store the metric configs in a file
+      def metric_configs = metrics.collect{it.config}
+      def metric_configs_yaml_blob = toYamlBlob(metric_configs)
+      def metric_configs_file = tempFile("metric_configs.yaml")
+      metric_configs_file.write(metric_configs_yaml_blob)
+
+      def task_info_file = meta.resources_dir.resolve("task_info.yaml")
+
+      // store the scores in a file
+      def score_uns = states.collect{it.score_uns}
+      def score_uns_yaml_blob = toYamlBlob(score_uns)
+      def score_uns_file = tempFile("score_uns.yaml")
+      score_uns_file.write(score_uns_yaml_blob)
+
+      def new_state = [
+        output_method_configs: method_configs_file,
+        output_metric_configs: metric_configs_file,
+        output_task_info: task_info_file,
+        output_scores: score_uns_file,
+        _meta: states[0]._meta
+      ]
+      
+      ["output", new_state]
+    }
+
+    // merge all of the output data 
+    | mix(dataset_meta_ch)
+    | joinStates{ ids, states ->
+      def mergedStates = states.inject([:]) { acc, m -> acc + m }
+      [ids[0], mergedStates]
+    }
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/predict_modality/workflows/run_benchmark/nextflow.config b/target/nextflow/predict_modality/workflows/run_benchmark/nextflow.config
new file mode 100644
index 0000000000..7f0747d764
--- /dev/null
+++ b/target/nextflow/predict_modality/workflows/run_benchmark/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'predict_modality/workflows/run_benchmark'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/predict_modality/workflows/run_benchmark/task_info.yaml b/target/nextflow/predict_modality/workflows/run_benchmark/task_info.yaml
new file mode 100644
index 0000000000..e0d1ed9da7
--- /dev/null
+++ b/target/nextflow/predict_modality/workflows/run_benchmark/task_info.yaml
@@ -0,0 +1,67 @@
+name: predict_modality
+label: Predict Modality
+summary: "Predicting the profiles of one modality (e.g. protein abundance) from another (e.g. mRNA expression)."
+image: "thumbnail.svg"
+motivation: |
+  Experimental techniques to measure multiple modalities within the same single cell are increasingly becoming available. 
+  The demand for these measurements is driven by the promise to provide a deeper insight into the state of a cell. 
+  Yet, the modalities are also intrinsically linked. We know that DNA must be accessible (ATAC data) to produce mRNA 
+  (expression data), and mRNA in turn is used as a template to produce protein (protein abundance). These processes 
+  are regulated often by the same molecules that they produce: for example, a protein may bind DNA to prevent the production 
+  of more mRNA. Understanding these regulatory processes would be transformative for synthetic biology and drug target discovery. 
+  Any method that can predict a modality from another must have accounted for these regulatory processes, but the demand for 
+  multi-modal data shows that this is not trivial.
+description: |
+  In this task, the goal is to take one modality and predict the other modality for all
+  features in each cell. This task requires translating information between multiple layers of
+  gene regulation. In some ways, this is similar to the task of machine translation. In machine translation, the same
+  sentiment is expressed in multiple languages and the goal is to train a model to represent the same meaning in a different
+  language. In this context, the same cellular state is measured in two different feature sets and the goal of this task
+  is to translate the information about cellular state from one modality to the other.
+authors:
+  - name: Robrecht Cannoodt
+    roles: [ author, maintainer ]
+    info:
+      github: rcannood
+      orcid: "0000-0003-3641-729X"
+  - name: Kai Waldrant
+    roles: [ contributor ]
+    info: 
+      github: KaiWaldrant
+      orcid: "0009-0003-8555-1361"
+  - name: Louise Deconinck
+    roles: [ author ]
+    info:
+      github: LouiseDck
+  - name: Alex Tong
+    roles: [ author ]
+    info:
+      github: atong01
+  - name: Bastian Rieck
+    roles: [ author ]
+    info:
+      github: Pseudomanifold
+  - name: Daniel Burkhardt
+    roles: [ author ]
+    info:
+      github: dburkhardt
+  - name: Alejandro Granados
+    roles: [ author ]
+    info:
+      github: agranado
+  - name: Kaiwen Deng
+    roles: [ contributor ]
+    info:
+      email: dengkw@umich.edu
+      github: nonztalk
+  - name: Xueer Chen
+    roles: [ contributor ]
+    info:
+      github: xuerchen
+      email: xc2579@columbia.edu
+  - name: Jiwei Liu
+    roles: [ contributor ]
+    info:
+      github: daxiongshu
+      email: jiweil@nvidia.com
+      orcid: "0000-0002-8799-9763"
diff --git a/target/nextflow/spatial_decomposition/control_methods/random_proportions/.config.vsh.yaml b/target/nextflow/spatial_decomposition/control_methods/random_proportions/.config.vsh.yaml
new file mode 100644
index 0000000000..584a8e1470
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/control_methods/random_proportions/.config.vsh.yaml
@@ -0,0 +1,282 @@
+functionality:
+  name: "random_proportions"
+  namespace: "spatial_decomposition/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, with true cell-type proportions for each spot / capture location."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_true"
+          description: "True cell type proportions for each spot"
+          required: true
+        uns:
+        - name: "cell_type_names"
+          type: "string"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - name: "dataset_id"
+          type: "string"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Random Proportions"
+    summary: "Negative control method that randomly assigns celltype proportions from\
+      \ a Dirichlet distribution."
+    description: "A negative control method with random assignment of predicted celltype\
+      \ proportions from a Dirichlet distribution.\n"
+    preferred_normalization: "counts"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "Control methods have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/control_methods/random_proportions/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/control_methods/random_proportions"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/control_methods/random_proportions/random_proportions"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatial_decomposition/control_methods/random_proportions/main.nf b/target/nextflow/spatial_decomposition/control_methods/random_proportions/main.nf
new file mode 100644
index 0000000000..5a225bd34f
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/control_methods/random_proportions/main.nf
@@ -0,0 +1,3721 @@
+// random_proportions 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "random_proportions",
+    "namespace" : "spatial_decomposition/control_methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_single_cell",
+        "info" : {
+          "label" : "Single cell data",
+          "summary" : "The single-cell data file used as reference for the spatial data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Cell type label IDs",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to values in `cell_type`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_spatial_masked",
+        "info" : {
+          "label" : "Spatial masked",
+          "summary" : "The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions_pred` in output",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "proportions_true",
+                "description" : "True cell type proportions for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "name" : "cell_type_names",
+                "type" : "string",
+                "description" : "Cell type names corresponding to columns of `proportions`",
+                "required" : true
+              },
+              {
+                "name" : "dataset_id",
+                "type" : "string",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Spatial data with estimated proportions.",
+          "description" : "Spatial data file with estimated cell type proportions.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "proportions_pred",
+                "description" : "Estimated cell type proportions for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/control_methods/random_proportions/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "dest" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Random Proportions",
+      "summary" : "Negative control method that randomly assigns celltype proportions from a Dirichlet distribution.",
+      "description" : "A negative control method with random assignment of predicted celltype proportions from a Dirichlet distribution.\n",
+      "preferred_normalization" : "counts",
+      "type" : "control_method",
+      "type_info" : {
+        "label" : "Control method",
+        "summary" : "Quality control methods for verifying the pipeline.",
+        "description" : "Control methods have the same interface as the regular methods\nbut also receive the solution object as input. It serves as a\nstarting point to test the relative accuracy of new methods in\nthe task, and also as a quality control for the metrics defined\nin the task.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "numpy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/control_methods/random_proportions/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/control_methods/random_proportions",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial_masked = ad.read_h5ad(par['input_spatial_masked'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Generate predictions', flush=True)
+label_distribution = input_single_cell.obs["cell_type"].value_counts()
+input_spatial_masked.obsm["proportions_pred"] = np.random.dirichlet(label_distribution, size=input_spatial_masked.shape[0])
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial_masked.obs[[]],
+  var=input_spatial_masked.var[[]],
+  uns={
+    'cell_type_names': input_spatial_masked.uns['cell_type_names'],
+    'dataset_id': input_spatial_masked.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial_masked.obsm['coordinates'],
+    'proportions_pred': input_spatial_masked.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial_masked.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatial_decomposition/control_methods/random_proportions",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatial_decomposition/control_methods/random_proportions/nextflow.config b/target/nextflow/spatial_decomposition/control_methods/random_proportions/nextflow.config
new file mode 100644
index 0000000000..caccc87718
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/control_methods/random_proportions/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatial_decomposition/control_methods/random_proportions'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatial_decomposition/control_methods/true_proportions/.config.vsh.yaml b/target/nextflow/spatial_decomposition/control_methods/true_proportions/.config.vsh.yaml
new file mode 100644
index 0000000000..60bcd2da71
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/control_methods/true_proportions/.config.vsh.yaml
@@ -0,0 +1,276 @@
+functionality:
+  name: "true_proportions"
+  namespace: "spatial_decomposition/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, with true cell-type proportions for each spot / capture location."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_true"
+          description: "True cell type proportions for each spot"
+          required: true
+        uns:
+        - name: "cell_type_names"
+          type: "string"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - name: "dataset_id"
+          type: "string"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "True Proportions"
+    summary: "Positive control method that assigns celltype proportions from the ground\
+      \ truth."
+    description: "A positive control method with perfect assignment of predicted celltype\
+      \ proportions from the ground truth.\n"
+    preferred_normalization: "counts"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "Control methods have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/control_methods/true_proportions/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/control_methods/true_proportions"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/control_methods/true_proportions/true_proportions"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatial_decomposition/control_methods/true_proportions/main.nf b/target/nextflow/spatial_decomposition/control_methods/true_proportions/main.nf
new file mode 100644
index 0000000000..e88d351fe6
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/control_methods/true_proportions/main.nf
@@ -0,0 +1,3709 @@
+// true_proportions 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "true_proportions",
+    "namespace" : "spatial_decomposition/control_methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_single_cell",
+        "info" : {
+          "label" : "Single cell data",
+          "summary" : "The single-cell data file used as reference for the spatial data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Cell type label IDs",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to values in `cell_type`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_spatial_masked",
+        "info" : {
+          "label" : "Spatial masked",
+          "summary" : "The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions_pred` in output",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "proportions_true",
+                "description" : "True cell type proportions for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "name" : "cell_type_names",
+                "type" : "string",
+                "description" : "Cell type names corresponding to columns of `proportions`",
+                "required" : true
+              },
+              {
+                "name" : "dataset_id",
+                "type" : "string",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Spatial data with estimated proportions.",
+          "description" : "Spatial data file with estimated cell type proportions.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "proportions_pred",
+                "description" : "Estimated cell type proportions for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/control_methods/true_proportions/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "dest" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "True Proportions",
+      "summary" : "Positive control method that assigns celltype proportions from the ground truth.",
+      "description" : "A positive control method with perfect assignment of predicted celltype proportions from the ground truth.\n",
+      "preferred_normalization" : "counts",
+      "type" : "control_method",
+      "type_info" : {
+        "label" : "Control method",
+        "summary" : "Quality control methods for verifying the pipeline.",
+        "description" : "Control methods have the same interface as the regular methods\nbut also receive the solution object as input. It serves as a\nstarting point to test the relative accuracy of new methods in\nthe task, and also as a quality control for the metrics defined\nin the task.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/control_methods/true_proportions/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/control_methods/true_proportions",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial_masked = ad.read_h5ad(par['input_spatial_masked'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Generate predictions', flush=True)
+input_spatial_masked.obsm["proportions_pred"] = input_solution.obsm["proportions_true"]
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial_masked.obs[[]],
+  var=input_spatial_masked.var[[]],
+  uns={
+    'cell_type_names': input_spatial_masked.uns['cell_type_names'],
+    'dataset_id': input_spatial_masked.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial_masked.obsm['coordinates'],
+    'proportions_pred': input_spatial_masked.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial_masked.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatial_decomposition/control_methods/true_proportions",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatial_decomposition/control_methods/true_proportions/nextflow.config b/target/nextflow/spatial_decomposition/control_methods/true_proportions/nextflow.config
new file mode 100644
index 0000000000..c37efb8c9d
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/control_methods/true_proportions/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatial_decomposition/control_methods/true_proportions'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatial_decomposition/dataset_simulator/.config.vsh.yaml b/target/nextflow/spatial_decomposition/dataset_simulator/.config.vsh.yaml
new file mode 100644
index 0000000000..bb4984546f
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/dataset_simulator/.config.vsh.yaml
@@ -0,0 +1,318 @@
+functionality:
+  name: "dataset_simulator"
+  namespace: "spatial_decomposition"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    description: "Single-cell reference dataset"
+    info:
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs."
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: false
+        - type: "integer"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: false
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: false
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+    example:
+    - "resources_test/common/cxg_mouse_pancreas_atlas/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--alpha"
+    description: "Alpha value to use for generating synthetic dataset"
+    info: null
+    default:
+    - 1.0
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_obs"
+    description: "Number of spatial observations to generate. Default value is 100."
+    info: null
+    default:
+    - 100
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--cell_lb"
+    description: "Lower bound for number of cells at each spot. Default value is 10."
+    info: null
+    default:
+    - 10
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--cell_ub"
+    description: "Upper bound for number of cells at each spot. Default value is 30."
+    info: null
+    default:
+    - 30
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--umi_lb"
+    description: "Lower bound for number of cells at each spot. Default value is 1000."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--umi_ub"
+    description: "Upper bound for number of UMIs at each spot. Default value is 5000."
+    info: null
+    default:
+    - 5000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--simulated_data"
+    description: "Simulated dataset"
+    info:
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs."
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: false
+        - type: "integer"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: false
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: false
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot."
+          required: true
+        - type: "double"
+          name: "proportions_true"
+          description: "True cell type proportions for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`."
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+    example:
+    - "dataset_simulated.h5ad"
+    must_exist: true
+    create_parent: true
+    required: false
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/common/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/common/cxg_mouse_pancreas_atlas"
+  info:
+    type: "dataset_simulator"
+    type_info:
+      label: "Dataset simulator"
+      summary: "Simulate cell aggregates from single-cell data."
+      description: "The dataset simulator creates cell-aggregates from the single-cell\
+        \ dataset by sampling from a Dirichlet distribution. The simulated data consists\
+        \ of the the spatial expression matrix, the XY coordinates of the spots, the\
+        \ cell-type proportions in each spot, and the reference single-cell data.\n"
+      variants:
+        alpha_1:
+          alpha: 1
+        alpha_5:
+          alpha: 5
+        alpha_0_5:
+          alpha: 0.5
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    - "scanpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+- type: "native"
+  id: "native"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/dataset_simulator/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/dataset_simulator"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/dataset_simulator/dataset_simulator"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatial_decomposition/dataset_simulator/main.nf b/target/nextflow/spatial_decomposition/dataset_simulator/main.nf
new file mode 100644
index 0000000000..856c3220b9
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/dataset_simulator/main.nf
@@ -0,0 +1,3926 @@
+// dataset_simulator 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "dataset_simulator",
+    "namespace" : "spatial_decomposition",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "description" : "Single-cell reference dataset",
+        "info" : {
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Cell type label IDs.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/cxg_mouse_pancreas_atlas/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "double",
+        "name" : "--alpha",
+        "description" : "Alpha value to use for generating synthetic dataset",
+        "default" : [
+          1.0
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_obs",
+        "description" : "Number of spatial observations to generate. Default value is 100.",
+        "default" : [
+          100
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--cell_lb",
+        "description" : "Lower bound for number of cells at each spot. Default value is 10.",
+        "default" : [
+          10
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--cell_ub",
+        "description" : "Upper bound for number of cells at each spot. Default value is 30.",
+        "default" : [
+          30
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--umi_lb",
+        "description" : "Lower bound for number of cells at each spot. Default value is 1000.",
+        "default" : [
+          1000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--umi_ub",
+        "description" : "Upper bound for number of UMIs at each spot. Default value is 5000.",
+        "default" : [
+          5000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--simulated_data",
+        "description" : "Simulated dataset",
+        "info" : {
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Cell type label IDs.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : false
+              },
+              {
+                "type" : "integer",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "proportions_true",
+                "description" : "True cell type proportions for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to values in `cell_type`.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "dataset_simulated.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : false,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/dataset_simulator/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/common/cxg_mouse_pancreas_atlas",
+        "dest" : "resources_test/common/cxg_mouse_pancreas_atlas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "dataset_simulator",
+      "type_info" : {
+        "label" : "Dataset simulator",
+        "summary" : "Simulate cell aggregates from single-cell data.",
+        "description" : "The dataset simulator creates cell-aggregates from the single-cell dataset by sampling from a Dirichlet distribution. The simulated data consists of the the spatial expression matrix, the XY coordinates of the spots, the cell-type proportions in each spot, and the reference single-cell data.\n",
+        "variants" : {
+          "alpha_1" : {
+            "alpha" : 1
+          },
+          "alpha_5" : {
+            "alpha" : 5
+          },
+          "alpha_0_5" : {
+            "alpha" : 0.5
+          }
+        }
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "numpy",
+            "scanpy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/dataset_simulator/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/dataset_simulator",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+from typing import Sequence
+from typing import Union
+
+import anndata as ad
+import numpy as np
+import scanpy as sc
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'alpha': $( if [ ! -z ${VIASH_PAR_ALPHA+x} ]; then echo "float(r'${VIASH_PAR_ALPHA//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_obs': $( if [ ! -z ${VIASH_PAR_N_OBS+x} ]; then echo "int(r'${VIASH_PAR_N_OBS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'cell_lb': $( if [ ! -z ${VIASH_PAR_CELL_LB+x} ]; then echo "int(r'${VIASH_PAR_CELL_LB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'cell_ub': $( if [ ! -z ${VIASH_PAR_CELL_UB+x} ]; then echo "int(r'${VIASH_PAR_CELL_UB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'umi_lb': $( if [ ! -z ${VIASH_PAR_UMI_LB+x} ]; then echo "int(r'${VIASH_PAR_UMI_LB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'umi_ub': $( if [ ! -z ${VIASH_PAR_UMI_UB+x} ]; then echo "int(r'${VIASH_PAR_UMI_UB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'simulated_data': $( if [ ! -z ${VIASH_PAR_SIMULATED_DATA+x} ]; then echo "r'${VIASH_PAR_SIMULATED_DATA//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+CELLTYPE_MIN_CELLS = 25
+
+# Reading input dataset
+adata = ad.read_h5ad(par['input'])
+
+
+def generate_synthetic_dataset(
+    adata: ad.AnnData,
+    alpha: Union[float, Sequence] = 1.0,
+    n_obs: int = 1000,
+    cell_lb: int = 10,
+    cell_ub: int = 30,
+    umi_lb: int = 1000,
+    umi_ub: int = 5000,
+) -> ad.AnnData:
+    """Create cell-aggregate samples for ground-truth spatial decomposition task.
+
+    Parameters
+    ----------
+    adata: AnnData
+        Anndata object.
+    type_column: str
+        name of column in \\`adata.obs\\` where cell type labels are given
+    alpha: Union[float,Sequence]
+        alpha value in dirichlet distribution. If single number then all alpha_i values
+        will be set to this value. Default value is 1.
+    n_obs: int
+        number of spatial observations to generate. Default value is 1000.
+    cell_lb: int
+        lower bound for number of cells at each spot. Default value is 10.
+    cell_ub: int
+        upper bound for number of cells at each spot. Default value is 30.
+    umi_lb: int
+        lower bound for number of UMIs at each spot. Default value is 10.
+    umi_ub: int
+        upper bound for number of UMIs at each spot. Default value is 30.
+
+    Returns
+    -------
+    AnnData with:
+        - \\`adata_merged.X\\`: simulated counts (aggregate of sc dataset).
+        - \\`adata_merged.obsm["proportions_true"]\\`: true proportion values.
+        - \\`adata_merged.obsm["coordinates"]\\`: coordinates of each spot.
+        - \\`adata_merged.obsm["n_cells"]\\`: number of cells from each type at every location.
+
+    """
+    
+    # remove rare celltypes
+    adata = filter_celltypes(adata)
+
+    # set random generator seed
+    rng = np.random.default_rng(42)
+
+    # get single cell expression data
+    counts = adata.layers['counts']
+    # get cell annotations/labels
+    labels = adata.obs['cell_type'].values
+    # get unique labels
+    uni_labs = np.unique(labels)
+    # count number of labels
+    n_labs = len(uni_labs)
+    # get number of genes
+    n_genes = adata.shape[1]
+
+    # create dict with indices of each label
+    label_indices = dict()
+    for label in uni_labs:
+        label_indices[label] = np.where(labels == label)[0]
+
+    # adjust alpha to vector if single scalar
+    if not hasattr(alpha, "__len__"):
+        alpha = np.ones(n_labs) * alpha
+    else:
+        assert len(alpha) == n_labs, "alpha must be same size as number of cell types"
+
+    # generate probability of sampling label at each spot
+    sp_props = rng.dirichlet(alpha, size=n_obs)
+    # number of cells present at each spot
+    n_cells = rng.integers(cell_lb, cell_ub, size=n_obs)
+
+    # initialize spatial expression matrix
+    sp_x = np.zeros((n_obs, n_genes))
+    # initialize spatial proportion matrix
+    sp_p = np.zeros((n_obs, n_labs))
+    # initialize spatial cell number matrix
+    sp_c = np.zeros(sp_p.shape)
+
+    # generate expression vector for each spot (s)
+    for s in range(n_obs):
+        # number of cells from each label at s
+        raw_s = rng.multinomial(n_cells[s], pvals=sp_props[s, :])
+        # store number of cells from each type at s
+        sp_c[s, :] = raw_s
+        # compute proportion of each type at s
+        prop_s = raw_s / n_cells[s]
+        # store proportion of each type at s
+        sp_p[s, :] = prop_s
+
+        # initialize transcript pool at s
+        pool_s = np.zeros(n_genes)
+
+        # add molecules to transcript pool
+        for lab, n in enumerate(raw_s):
+            # get indices of cells from which transcripts should be added
+            idx_sl = rng.choice(label_indices[uni_labs[lab]], size=n)
+            # add molecules to pool
+            pool_s += counts[idx_sl, :].sum(axis=0).A.flatten()
+
+        # number of UMIs at spot s
+        n_umis = rng.integers(umi_lb, umi_ub)
+        # compute probability of sampling UMI from gene
+        prob_pool_s = pool_s / pool_s.sum()
+
+        # sample transcripts from pool
+        sp_x[s, :] = np.random.multinomial(n=n_umis, pvals=prob_pool_s)
+
+    obs_names = ["spatial_{}".format(x) for x in range(n_obs)]
+    adata_spatial = ad.AnnData(
+        sp_x,
+        obs=dict(obs_names=obs_names),
+        var=dict(var_names=adata.var_names),
+    )
+
+    # fake coordinates
+    adata_spatial.obsm["coordinates"] = rng.random((adata_spatial.shape[0], 2))
+    adata_spatial.obsm["proportions_true"] = sp_p
+    adata_spatial.obs["n_cells"] = n_cells
+    adata_spatial.obsm["n_cells"] = sp_c
+    
+    adata_merged = ad.concat(
+        {"sc": adata, "sp": adata_spatial}, 
+        label="modality",
+        join="outer", 
+        index_unique=None, 
+        merge="unique", 
+        uns_merge="unique"
+    )
+    adata_merged.X[adata_merged.X == np.inf] = adata_merged.X.max()  # remove inf
+    adata_merged.layers["counts"] = adata_merged.X
+    adata_merged.uns["cell_type_names"] = uni_labs
+    return adata_merged
+
+
+def filter_celltypes(adata, min_cells=CELLTYPE_MIN_CELLS):
+    """Filter rare celltypes from an AnnData"""
+    celltype_counts = adata.obs["cell_type"].value_counts() >= min_cells
+    keep_cells = np.isin(adata.obs["cell_type"], celltype_counts.index[celltype_counts])
+    return adata[adata.obs.index[keep_cells]].copy()
+
+
+def filter_genes_cells(adata):
+    """Remove empty cells and genes."""
+    if "var_names_all" not in adata.uns:
+        # fill in original var names before filtering
+        adata.uns["var_names_all"] = adata.var.index.to_numpy()
+    sc.pp.filter_genes(adata, min_cells=1)
+    sc.pp.filter_cells(adata, min_counts=2)
+
+
+adata.X = adata.layers["counts"]
+sc.pp.filter_genes(adata, min_counts=10)
+adata_merged = generate_synthetic_dataset(adata, 
+    alpha=par['alpha'], 
+    n_obs=par['n_obs'], 
+    cell_lb=par['cell_lb'], 
+    cell_ub=par['cell_ub'], 
+    umi_lb=par['umi_lb'], 
+    umi_ub=par['umi_ub'] 
+)
+adata_merged.uns["spatial_data_summary"] = f"Dirichlet alpha={par['alpha']}"
+filter_genes_cells(adata_merged)
+adata_merged.X = None
+
+# Convert non-string objects to categoricals to avoid
+# TypeError: Can't implicitly convert non-string objects to strings
+# In this case, the error is raised when there are NA values in .obs columns with dtype object (boolean).
+# The resulting anndata object cannot be written to a file.
+# This conversion is handled in later versions of anndata (0.10)
+for col in adata_merged.obs:
+    if adata_merged.obs[col].dtype == 'object':
+        adata_merged.obs[col] = adata_merged.obs[col].astype('category')
+
+print("Writing output to file")
+adata_merged.write_h5ad(par["simulated_data"])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatial_decomposition/dataset_simulator",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatial_decomposition/dataset_simulator/nextflow.config b/target/nextflow/spatial_decomposition/dataset_simulator/nextflow.config
new file mode 100644
index 0000000000..f2ea79020b
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/dataset_simulator/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatial_decomposition/dataset_simulator'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatial_decomposition/methods/cell2location/.config.vsh.yaml b/target/nextflow/spatial_decomposition/methods/cell2location/.config.vsh.yaml
new file mode 100644
index 0000000000..e1faf728b5
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/cell2location/.config.vsh.yaml
@@ -0,0 +1,340 @@
+functionality:
+  name: "cell2location"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--detection_alpha"
+    description: "Hyperparameter controlling normalisation of within-experiment variation\
+      \ in RNA detection."
+    info: null
+    default:
+    - 20.0
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_cells_per_location"
+    description: "The expected average cell abundance. It is a tissue-dependent hyper-prior\
+      \ which can be estimated from  histology images"
+    info: null
+    default:
+    - 20
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean"
+    name: "--hard_coded_reference"
+    description: "Whether to use hard-coded reference or negative binomial regression\
+      \ model to account for batch effects. Hard-coded reference used by default."
+    info: null
+    default:
+    - true
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "boolean"
+    name: "--amortised"
+    description: "Whether to use amortised inference."
+    info: null
+    default:
+    - false
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--num_samples"
+    description: "Number of samples to use for summarising posterior distribution."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--sc_batch_size"
+    description: "Batch size used to train regression model for estimation of reference\
+      \ single-cell gene expression signature."
+    info: null
+    default:
+    - 2500
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--st_batch_size"
+    description: "Batch size used to train cell2location model for spatial mapping."
+    info: null
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_sc"
+    description: "Maximum number of epochs to train regression model for estimation\
+      \ of reference single-cell gene expression signature."
+    info: null
+    default:
+    - 250
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_st"
+    description: "Maximum number of epochs to train cell2location model for spatial\
+      \ mapping."
+    info: null
+    default:
+    - 30000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Cell2Location"
+    summary: "Cell2location uses a Bayesian model to resolve cell types in spatial\
+      \ transcriptomic data and create comprehensive cellular maps of diverse tissues."
+    description: "Cell2location is a decomposition method based on Negative Binomial\
+      \ regression that is able to account for batch effects in estimating the single-cell\
+      \ gene expression signature used for the spatial decomposition step. \nNote\
+      \ that when batch information is unavailable for this task, we can use either\
+      \ a hard-coded reference, or a negative-binomial learned reference without batch\
+      \ labels. The parameter alpha refers to the detection efficiency prior.\n"
+    preferred_normalization: "counts"
+    variants:
+      cell2location_amortised_detection_alpha_20:
+        detection_alpha: 20
+        amortised: true
+      cell2location_detection_alpha_1:
+        detection_alpha: 1
+      cell2location_detection_alpha_20:
+        detection_alpha: 20
+      cell2location_detection_alpha_20_nb:
+        detection_alpha: 20
+        hard_coded_reference: false
+      cell2location_detection_alpha_200:
+        detection_alpha: 200
+    reference: "kleshchevnikov2022cell2location"
+    documentation_url: "https://cell2location.readthedocs.io/en/latest/"
+    repository_url: "https://github.com/BayraktarLab/cell2location"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scvi-tools==1.0.4"
+    - "cell2location"
+    - "jax==0.4.23"
+    - "jaxlib==0.4.23"
+    - "scipy<1.13"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/cell2location/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/cell2location"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/cell2location/cell2location"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatial_decomposition/methods/cell2location/main.nf b/target/nextflow/spatial_decomposition/methods/cell2location/main.nf
new file mode 100644
index 0000000000..4facd4a2d8
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/cell2location/main.nf
@@ -0,0 +1,3873 @@
+// cell2location 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "cell2location",
+    "namespace" : "spatial_decomposition/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_single_cell",
+        "info" : {
+          "label" : "Single cell data",
+          "summary" : "The single-cell data file used as reference for the spatial data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Cell type label IDs",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to values in `cell_type`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_spatial_masked",
+        "info" : {
+          "label" : "Spatial masked",
+          "summary" : "The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions_pred` in output",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Spatial data with estimated proportions.",
+          "description" : "Spatial data file with estimated cell type proportions.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "proportions_pred",
+                "description" : "Estimated cell type proportions for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "double",
+        "name" : "--detection_alpha",
+        "description" : "Hyperparameter controlling normalisation of within-experiment variation in RNA detection.",
+        "default" : [
+          20.0
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_cells_per_location",
+        "description" : "The expected average cell abundance. It is a tissue-dependent hyper-prior which can be estimated from  histology images",
+        "default" : [
+          20
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "boolean",
+        "name" : "--hard_coded_reference",
+        "description" : "Whether to use hard-coded reference or negative binomial regression model to account for batch effects. Hard-coded reference used by default.",
+        "default" : [
+          true
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "boolean",
+        "name" : "--amortised",
+        "description" : "Whether to use amortised inference.",
+        "default" : [
+          false
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--num_samples",
+        "description" : "Number of samples to use for summarising posterior distribution.",
+        "default" : [
+          1000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--sc_batch_size",
+        "description" : "Batch size used to train regression model for estimation of reference single-cell gene expression signature.",
+        "default" : [
+          2500
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--st_batch_size",
+        "description" : "Batch size used to train cell2location model for spatial mapping.",
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--max_epochs_sc",
+        "description" : "Maximum number of epochs to train regression model for estimation of reference single-cell gene expression signature.",
+        "default" : [
+          250
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--max_epochs_st",
+        "description" : "Maximum number of epochs to train cell2location model for spatial mapping.",
+        "default" : [
+          30000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/cell2location/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "dest" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Cell2Location",
+      "summary" : "Cell2location uses a Bayesian model to resolve cell types in spatial transcriptomic data and create comprehensive cellular maps of diverse tissues.",
+      "description" : "Cell2location is a decomposition method based on Negative Binomial regression that is able to account for batch effects in estimating the single-cell gene expression signature used for the spatial decomposition step. \nNote that when batch information is unavailable for this task, we can use either a hard-coded reference, or a negative-binomial learned reference without batch labels. The parameter alpha refers to the detection efficiency prior.\n",
+      "preferred_normalization" : "counts",
+      "variants" : {
+        "cell2location_amortised_detection_alpha_20" : {
+          "detection_alpha" : 20,
+          "amortised" : true
+        },
+        "cell2location_detection_alpha_1" : {
+          "detection_alpha" : 1
+        },
+        "cell2location_detection_alpha_20" : {
+          "detection_alpha" : 20
+        },
+        "cell2location_detection_alpha_20_nb" : {
+          "detection_alpha" : 20,
+          "hard_coded_reference" : false
+        },
+        "cell2location_detection_alpha_200" : {
+          "detection_alpha" : 200
+        }
+      },
+      "reference" : "kleshchevnikov2022cell2location",
+      "documentation_url" : "https://cell2location.readthedocs.io/en/latest/",
+      "repository_url" : "https://github.com/BayraktarLab/cell2location",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatial composition method.",
+        "description" : "Method to estimate cell type proportions from spatial and single cell data"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scvi-tools==1.0.4",
+            "cell2location",
+            "jax==0.4.23",
+            "jaxlib==0.4.23",
+            "scipy<1.13"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "hightime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/cell2location/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/cell2location",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import numpy as np
+from cell2location.cluster_averages.cluster_averages import compute_cluster_averages
+from cell2location.models import Cell2location
+from cell2location.models import RegressionModel
+from torch.nn import ELU
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'detection_alpha': $( if [ ! -z ${VIASH_PAR_DETECTION_ALPHA+x} ]; then echo "float(r'${VIASH_PAR_DETECTION_ALPHA//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_cells_per_location': $( if [ ! -z ${VIASH_PAR_N_CELLS_PER_LOCATION+x} ]; then echo "int(r'${VIASH_PAR_N_CELLS_PER_LOCATION//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'hard_coded_reference': $( if [ ! -z ${VIASH_PAR_HARD_CODED_REFERENCE+x} ]; then echo "r'${VIASH_PAR_HARD_CODED_REFERENCE//\\'/\\'\\"\\'\\"r\\'}'.lower() == 'true'"; else echo None; fi ),
+  'amortised': $( if [ ! -z ${VIASH_PAR_AMORTISED+x} ]; then echo "r'${VIASH_PAR_AMORTISED//\\'/\\'\\"\\'\\"r\\'}'.lower() == 'true'"; else echo None; fi ),
+  'num_samples': $( if [ ! -z ${VIASH_PAR_NUM_SAMPLES+x} ]; then echo "int(r'${VIASH_PAR_NUM_SAMPLES//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'sc_batch_size': $( if [ ! -z ${VIASH_PAR_SC_BATCH_SIZE+x} ]; then echo "int(r'${VIASH_PAR_SC_BATCH_SIZE//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'st_batch_size': $( if [ ! -z ${VIASH_PAR_ST_BATCH_SIZE+x} ]; then echo "int(r'${VIASH_PAR_ST_BATCH_SIZE//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'max_epochs_sc': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_SC+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_SC//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'max_epochs_st': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_ST+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_ST//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+input_single_cell.X = input_single_cell.layers["counts"]
+input_spatial.X = input_spatial.layers["counts"]
+
+if not par["hard_coded_reference"]:
+  if "batch" in input_single_cell.obs.columns:
+      input_single_cell.obs["batch_key"] = input_single_cell.obs["batch"].copy()
+  else:
+    input_single_cell.obs["batch_key"] = "all"
+  # REFERENCE SIGNATURE ESTIMATION FROM scRNA
+  # prepare anndata for the regression model
+  RegressionModel.setup_anndata(
+    adata=input_single_cell,
+    # 10X reaction / sample / batch
+    batch_key="batch_key",
+    # cell type, covariate used for constructing signatures
+    labels_key="cell_type",
+  )
+  sc_model = RegressionModel(input_single_cell)
+  sc_model.train(max_epochs=par["max_epochs_sc"], batch_size=par["sc_batch_size"])
+  # In this section, we export the estimated cell abundance
+  # (summary of the posterior distribution).
+  input_single_cell = sc_model.export_posterior(
+    input_single_cell,
+    sample_kwargs={"num_samples": par["num_samples"], "batch_size": par["sc_batch_size"]},
+  )
+  # export estimated expression in each cluster
+  try:
+    means_per_cluster = input_single_cell.varm["means_per_cluster_mu_fg"]
+  except KeyError:
+    # sometimes varm fails for unknown reason
+    means_per_cluster = input_single_cell.var
+  means_per_cluster = means_per_cluster[
+    [
+      f"means_per_cluster_mu_fg_{i}"
+      for i in input_single_cell.uns["mod"]["factor_names"]
+    ]
+  ].copy()
+  means_per_cluster.columns = input_single_cell.uns["mod"]["factor_names"]
+else:
+  means_per_cluster = compute_cluster_averages(
+    input_single_cell,
+    labels="cell_type",
+    layer=None,
+    use_raw=False,
+  )
+
+# SPATIAL MAPPING
+# find shared genes and subset both anndata and reference signatures
+intersect = np.intersect1d(input_spatial.var_names, means_per_cluster.index)
+input_spatial = input_spatial[:, intersect].copy()
+means_per_cluster = means_per_cluster.loc[intersect, :].copy()
+
+# prepare anndata for cell2location model
+input_spatial.obs["sample"] = "all"
+Cell2location.setup_anndata(adata=input_spatial, batch_key="sample")
+cell2location_kwargs = dict(
+    cell_state_df=means_per_cluster,
+    # the expected average cell abundance: tissue-dependent hyper-prior which can be estimated from paired histology:
+    # here = average in the simulated dataset
+    N_cells_per_location=par["n_cells_per_location"],
+    # hyperparameter controlling normalisation of within-experiment variation in RNA detection:
+    detection_alpha=par["detection_alpha"],
+)
+if par["amortised"]:
+    cell2location_kwargs["amortised"] = True
+    cell2location_kwargs["encoder_mode"] = "multiple"
+    cell2location_kwargs["encoder_kwargs"] = {
+        "dropout_rate": 0.1,
+        "n_hidden": {
+            "single": 256,
+            "n_s_cells_per_location": 10,
+            "b_s_groups_per_location": 10,
+            "z_sr_groups_factors": 64,
+            "w_sf": 256,
+            "detection_y_s": 20,
+        },
+        "use_batch_norm": False,
+        "use_layer_norm": True,
+        "n_layers": 1,
+        "activation_fn": ELU,
+    }
+# create and train the model
+st_model = Cell2location(input_spatial, **cell2location_kwargs)
+st_model.train(
+    max_epochs=par["max_epochs_st"],
+    # train using full data (batch_size=None)
+    batch_size=par["st_batch_size"],
+    # use all data points in training because we need to estimate cell abundance at all locations
+    train_size=1,
+)
+# In this section, we export the estimated cell abundance (summary of the posterior distribution).
+input_spatial = st_model.export_posterior(
+    input_spatial,
+    sample_kwargs={
+        "num_samples": par["num_samples"],
+        "batch_size": par["st_batch_size"],
+    },
+)
+
+input_spatial.obsm["proportions_pred"] = input_spatial.obsm["q05_cell_abundance_w_sf"].values
+input_spatial.obsm["proportions_pred"] /= input_spatial.obsm["proportions_pred"].sum(axis=1)[:, None]
+
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  },
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par["output"], compression="gzip")
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatial_decomposition/methods/cell2location",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "hightime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatial_decomposition/methods/cell2location/nextflow.config b/target/nextflow/spatial_decomposition/methods/cell2location/nextflow.config
new file mode 100644
index 0000000000..b0386b5f74
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/cell2location/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatial_decomposition/methods/cell2location'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatial_decomposition/methods/destvi/.config.vsh.yaml b/target/nextflow/spatial_decomposition/methods/destvi/.config.vsh.yaml
new file mode 100644
index 0000000000..a783c7bbae
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/destvi/.config.vsh.yaml
@@ -0,0 +1,246 @@
+functionality:
+  name: "destvi"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_sc"
+    description: "Number of epochs to train the Conditional version of single-cell\
+      \ Variational Inference (CondSCVI) model using MAP inference."
+    info: null
+    default:
+    - 500
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_sp"
+    description: "Number of epochs to train the DestVI model using MAP inference."
+    info: null
+    default:
+    - 2000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "DestVI"
+    summary: "DestVI is a probabilistic method for multi-resolution analysis for spatial\
+      \ transcriptomics that explicitly models continuous variation within cell types"
+    description: "Deconvolution of Spatial Transcriptomics profiles using Variational\
+      \ Inference (DestVI) is a spatial decomposition method that leverages a conditional\
+      \ generative model of spatial transcriptomics down to the sub-cell-type variation\
+      \ level, which is then used to decompose the cell-type proportions determining\
+      \ the spatial organization of a tissue.\n"
+    preferred_normalization: "counts"
+    reference: "lopez2022destvi"
+    documentation_url: "https://docs.scvi-tools.org/en/stable/user_guide/models/destvi.html"
+    repository_url: "https://github.com/scverse/scvi-tools"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_pytorch_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scvi-tools>=1.1.0"
+    upgrade: true
+  - type: "docker"
+    run:
+    - "pip install -U \"jax[cuda12_pip]\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "midmem"
+    - "midcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/destvi/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/destvi"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/destvi/destvi"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatial_decomposition/methods/destvi/main.nf b/target/nextflow/spatial_decomposition/methods/destvi/main.nf
new file mode 100644
index 0000000000..05f579f41e
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/destvi/main.nf
@@ -0,0 +1,3680 @@
+// destvi 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "destvi",
+    "namespace" : "spatial_decomposition/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_single_cell",
+        "info" : {
+          "label" : "Single cell data",
+          "summary" : "The single-cell data file used as reference for the spatial data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Cell type label IDs",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to values in `cell_type`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_spatial_masked",
+        "info" : {
+          "label" : "Spatial masked",
+          "summary" : "The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions_pred` in output",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Spatial data with estimated proportions.",
+          "description" : "Spatial data file with estimated cell type proportions.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "proportions_pred",
+                "description" : "Estimated cell type proportions for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--max_epochs_sc",
+        "description" : "Number of epochs to train the Conditional version of single-cell Variational Inference (CondSCVI) model using MAP inference.",
+        "default" : [
+          500
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--max_epochs_sp",
+        "description" : "Number of epochs to train the DestVI model using MAP inference.",
+        "default" : [
+          2000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/destvi/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "dest" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "DestVI",
+      "summary" : "DestVI is a probabilistic method for multi-resolution analysis for spatial transcriptomics that explicitly models continuous variation within cell types",
+      "description" : "Deconvolution of Spatial Transcriptomics profiles using Variational Inference (DestVI) is a spatial decomposition method that leverages a conditional generative model of spatial transcriptomics down to the sub-cell-type variation level, which is then used to decompose the cell-type proportions determining the spatial organization of a tissue.\n",
+      "preferred_normalization" : "counts",
+      "reference" : "lopez2022destvi",
+      "documentation_url" : "https://docs.scvi-tools.org/en/stable/user_guide/models/destvi.html",
+      "repository_url" : "https://github.com/scverse/scvi-tools",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatial composition method.",
+        "description" : "Method to estimate cell type proportions from spatial and single cell data"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_pytorch_nvidia:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scvi-tools>=1.1.0"
+          ],
+          "upgrade" : true
+        },
+        {
+          "type" : "docker",
+          "run" : [
+            "pip install -U \\"jax[cuda12_pip]\\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+          ]
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "hightime",
+          "midmem",
+          "midcpu",
+          "gpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/destvi/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/destvi",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+from scvi.model import CondSCVI
+from scvi.model import DestVI
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'max_epochs_sc': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_SC+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_SC//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'max_epochs_sp': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_SP+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_SP//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+input_single_cell.X = input_single_cell.layers["counts"]
+input_spatial.X = input_spatial.layers["counts"]
+
+CondSCVI.setup_anndata(input_single_cell, labels_key="cell_type")
+sc_model = CondSCVI(input_single_cell, weight_obs=False)
+sc_model.train(
+  max_epochs=par['max_epochs_sc'],
+  early_stopping=True,
+  train_size=0.9,
+  validation_size=0.1,
+  early_stopping_monitor="elbo_validation",
+)
+
+DestVI.setup_anndata(input_spatial)
+st_model = DestVI.from_rna_model(input_spatial, sc_model)
+st_model.train(
+  max_epochs=par['max_epochs_sp'],
+  batch_size=min(int(input_spatial.n_obs / 20 + 3), 128),
+  plan_kwargs={"min_kl_weight": 3.0, "max_kl_weight": 3},
+)
+input_spatial.obsm["proportions_pred"] = st_model.get_proportions().to_numpy()
+
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  }
+)
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatial_decomposition/methods/destvi",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "hightime",
+    "midmem",
+    "midcpu",
+    "gpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatial_decomposition/methods/destvi/nextflow.config b/target/nextflow/spatial_decomposition/methods/destvi/nextflow.config
new file mode 100644
index 0000000000..c4ec80dce9
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/destvi/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatial_decomposition/methods/destvi'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatial_decomposition/methods/nmfreg/.config.vsh.yaml b/target/nextflow/spatial_decomposition/methods/nmfreg/.config.vsh.yaml
new file mode 100644
index 0000000000..75ceb519c2
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/nmfreg/.config.vsh.yaml
@@ -0,0 +1,231 @@
+functionality:
+  name: "nmfreg"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_components"
+    description: "Number of components to use for non-negative matrix factorization."
+    info: null
+    default:
+    - 30
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "NMFreg"
+    summary: "NMFreg reconstructs gene expression as a weighted combination of cell\
+      \ type signatures defined by scRNA-seq."
+    description: "Non-Negative Matrix Factorization regression (NMFreg) is a decomposition\
+      \ method that reconstructs expression of each spatial location as a weighted\
+      \ combination of cell-type signatures defined by scRNA-seq. It was originally\
+      \ developed for Slide-seq data. This is a re-implementation from https://github.com/tudaga/NMFreg_tutorial.\n"
+    preferred_normalization: "counts"
+    reference: "rodriques2019slide"
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html"
+    repository_url: "https://github.com/tudaga/NMFreg_tutorial/tree/master?tab=readme-ov-file"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    - "scipy"
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/nmfreg/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/nmfreg"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/nmfreg/nmfreg"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatial_decomposition/methods/nmfreg/main.nf b/target/nextflow/spatial_decomposition/methods/nmfreg/main.nf
new file mode 100644
index 0000000000..ce742146c3
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/nmfreg/main.nf
@@ -0,0 +1,3690 @@
+// nmfreg 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "nmfreg",
+    "namespace" : "spatial_decomposition/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_single_cell",
+        "info" : {
+          "label" : "Single cell data",
+          "summary" : "The single-cell data file used as reference for the spatial data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Cell type label IDs",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to values in `cell_type`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_spatial_masked",
+        "info" : {
+          "label" : "Spatial masked",
+          "summary" : "The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions_pred` in output",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Spatial data with estimated proportions.",
+          "description" : "Spatial data file with estimated cell type proportions.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "proportions_pred",
+                "description" : "Estimated cell type proportions for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_components",
+        "description" : "Number of components to use for non-negative matrix factorization.",
+        "default" : [
+          30
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/nmfreg/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "dest" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "NMFreg",
+      "summary" : "NMFreg reconstructs gene expression as a weighted combination of cell type signatures defined by scRNA-seq.",
+      "description" : "Non-Negative Matrix Factorization regression (NMFreg) is a decomposition method that reconstructs expression of each spatial location as a weighted combination of cell-type signatures defined by scRNA-seq. It was originally developed for Slide-seq data. This is a re-implementation from https://github.com/tudaga/NMFreg_tutorial.\n",
+      "preferred_normalization" : "counts",
+      "reference" : "rodriques2019slide",
+      "documentation_url" : "https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html",
+      "repository_url" : "https://github.com/tudaga/NMFreg_tutorial/tree/master?tab=readme-ov-file",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatial composition method.",
+        "description" : "Method to estimate cell type proportions from spatial and single cell data"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "numpy",
+            "scipy",
+            "scikit-learn"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/nmfreg/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/nmfreg",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import numpy as np
+from scipy.optimize import nnls
+from scipy.sparse import issparse
+from sklearn.decomposition import NMF
+from sklearn.preprocessing import StandardScaler
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_components': $( if [ ! -z ${VIASH_PAR_N_COMPONENTS+x} ]; then echo "int(r'${VIASH_PAR_N_COMPONENTS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+n_types = input_single_cell.obs["cell_type"].cat.categories.shape[0]
+
+# Learn from reference
+X = input_single_cell.layers['counts']
+X_norm = X / X.sum(1)
+X_scaled = StandardScaler(with_mean=False).fit_transform(X_norm)
+model = NMF(
+  n_components=par['n_components'],
+  init="random",
+  random_state=42
+)
+Ha = model.fit_transform(X_scaled)
+Wa = model.components_
+
+cluster_df = input_single_cell.obs[["cell_type"]].copy()
+cluster_df.loc[:, "factor"] = np.argmax(Ha, axis=1)
+cluster_df.loc[:, "code"] = cluster_df.cell_type.values.codes
+factor_to_cluster_map = np.array(
+  [
+    np.histogram(
+      cluster_df.loc[cluster_df.factor == k, "code"],
+      bins=n_types,
+      range=(0, n_types),
+    )[0]
+    for k in range(par['n_components'])
+  ]
+).T
+
+factor_to_best_celltype = np.argmax(factor_to_cluster_map, axis=0)
+
+factor_to_best_celltype_matrix = np.zeros((par['n_components'], n_types))
+for i, j in enumerate(factor_to_best_celltype):
+  factor_to_best_celltype_matrix[i, j] = 1
+
+Ha_norm = StandardScaler(with_mean=False).fit_transform(Ha)
+sc_deconv = np.dot(Ha_norm**2, factor_to_best_celltype_matrix)
+sc_deconv = sc_deconv / sc_deconv.sum(1)[:, np.newaxis]
+
+# Start run on actual spatial data
+X_sp = input_spatial.layers['counts']
+X_sp_norm = X_sp / X_sp.sum(1)
+X_sp_scaled = StandardScaler(with_mean=False).fit_transform(X_sp_norm)
+
+bead_prop_soln = np.array([nnls(Wa.T, X_sp_scaled[b, : ].toarray().reshape(-1))[0] for b in range(X_sp_scaled.shape[0])])
+bead_prop_soln = StandardScaler(with_mean=False).fit_transform(bead_prop_soln)
+bead_prop = np.dot(bead_prop_soln, factor_to_best_celltype_matrix)
+
+prop = bead_prop / bead_prop.sum(1)[:, np.newaxis]
+input_spatial.obsm["proportions_pred"] = prop
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatial_decomposition/methods/nmfreg",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatial_decomposition/methods/nmfreg/nextflow.config b/target/nextflow/spatial_decomposition/methods/nmfreg/nextflow.config
new file mode 100644
index 0000000000..413a530743
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/nmfreg/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatial_decomposition/methods/nmfreg'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatial_decomposition/methods/nnls/.config.vsh.yaml b/target/nextflow/spatial_decomposition/methods/nnls/.config.vsh.yaml
new file mode 100644
index 0000000000..1231898156
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/nnls/.config.vsh.yaml
@@ -0,0 +1,217 @@
+functionality:
+  name: "nnls"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "NNLS"
+    summary: "NNLS is a decomposition method based on Non-Negative Least Square Regression."
+    description: "NonNegative Least Squares (NNLS), is a convex optimization problem\
+      \ with convex constraints. It was used by the AutoGeneS method to infer cellular\
+      \ proporrtions by solvong a multi-objective optimization problem.\n"
+    preferred_normalization: "counts"
+    reference: "aliee2021autogenes"
+    documentation_url: "https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.nnls.html"
+    repository_url: "https://github.com/scipy/scipy"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    - "scipy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/nnls/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/nnls"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/nnls/nnls"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatial_decomposition/methods/nnls/main.nf b/target/nextflow/spatial_decomposition/methods/nnls/main.nf
new file mode 100644
index 0000000000..2170b5afb4
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/nnls/main.nf
@@ -0,0 +1,3648 @@
+// nnls 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "nnls",
+    "namespace" : "spatial_decomposition/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_single_cell",
+        "info" : {
+          "label" : "Single cell data",
+          "summary" : "The single-cell data file used as reference for the spatial data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Cell type label IDs",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to values in `cell_type`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_spatial_masked",
+        "info" : {
+          "label" : "Spatial masked",
+          "summary" : "The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions_pred` in output",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Spatial data with estimated proportions.",
+          "description" : "Spatial data file with estimated cell type proportions.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "proportions_pred",
+                "description" : "Estimated cell type proportions for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/nnls/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "dest" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "NNLS",
+      "summary" : "NNLS is a decomposition method based on Non-Negative Least Square Regression.",
+      "description" : "NonNegative Least Squares (NNLS), is a convex optimization problem with convex constraints. It was used by the AutoGeneS method to infer cellular proporrtions by solvong a multi-objective optimization problem.\n",
+      "preferred_normalization" : "counts",
+      "reference" : "aliee2021autogenes",
+      "documentation_url" : "https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.nnls.html",
+      "repository_url" : "https://github.com/scipy/scipy",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatial composition method.",
+        "description" : "Method to estimate cell type proportions from spatial and single cell data"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "numpy",
+            "scipy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/nnls/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/nnls",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import numpy as np
+from scipy.optimize import nnls
+from scipy.sparse import issparse
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+# Compute means over each 'cell_type'
+labels = input_single_cell.obs['cell_type'].cat.categories
+n_var = input_single_cell.shape[1]
+means = np.empty((labels.shape[0], n_var))
+for i, lab in enumerate(labels):
+  adata_lab = input_single_cell[input_single_cell.obs['cell_type'] == lab]
+  x_lab = adata_lab.layers['counts']
+  means[i, :] = x_lab.mean(axis=0).flatten()
+adata_means = ad.AnnData(means)
+adata_means.obs_names = labels
+adata_means.var_names = input_single_cell.var_names
+
+X = adata_means.X.T
+y = input_spatial.layers['counts'].T
+res = np.zeros((y.shape[1], X.shape[1]))  # (voxels, cells)
+for i in range(y.shape[1]):
+  x, _ = nnls(X, y[:, i].toarray().reshape(-1))
+  res[i] = x
+
+# Normalize coefficients to sum to 1
+res[res < 0] = 0
+res = res / res.sum(axis=1, keepdims=1)
+
+input_spatial.obsm["proportions_pred"] = res
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatial_decomposition/methods/nnls",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatial_decomposition/methods/nnls/nextflow.config b/target/nextflow/spatial_decomposition/methods/nnls/nextflow.config
new file mode 100644
index 0000000000..3161ccf836
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/nnls/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatial_decomposition/methods/nnls'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatial_decomposition/methods/rctd/.config.vsh.yaml b/target/nextflow/spatial_decomposition/methods/rctd/.config.vsh.yaml
new file mode 100644
index 0000000000..3e39f290dc
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/rctd/.config.vsh.yaml
@@ -0,0 +1,247 @@
+functionality:
+  name: "rctd"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--fc_cutoff"
+    description: "Minimum log-fold-change (across cell types) for genes to be included\
+      \ in the platform effect normalization step."
+    info: null
+    default:
+    - 0.5
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "double"
+    name: "--fc_cutoff_reg"
+    description: "Minimum log-fold-change (across cell types) for genes to be included\
+      \ in the RCTD step."
+    info: null
+    default:
+    - 0.75
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "RCTD"
+    summary: "RCTD learns cell type profiles from scRNA-seq to decompose cell type\
+      \ mixtures while correcting for differences across sequencing technologies."
+    description: "RCTD (Robust Cell Type Decomposition) is a decomposition method\
+      \ that uses signatures learnt from single-cell data to decompose spatial expression\
+      \ of tissues. It is able to use a platform effect normalization step, which\
+      \ normalizes the scRNA-seq cell type profiles to match the platform effects\
+      \ of the spatial transcriptomics dataset.\n"
+    preferred_normalization: "counts"
+    reference: "cable2021robust"
+    documentation_url: "https://raw.githack.com/dmcable/spacexr/master/vignettes/spatial-transcriptomics.html"
+    repository_url: "https://github.com/dmcable/spacexr"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "Matrix"
+    - "pak"
+    bioc_force_install: false
+  - type: "r"
+    script:
+    - "pak::pkg_install(\"dmcable/spacexr\")"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/rctd/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/rctd"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/rctd/rctd"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatial_decomposition/methods/rctd/main.nf b/target/nextflow/spatial_decomposition/methods/rctd/main.nf
new file mode 100644
index 0000000000..d085435e85
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/rctd/main.nf
@@ -0,0 +1,3718 @@
+// rctd 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "rctd",
+    "namespace" : "spatial_decomposition/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_single_cell",
+        "info" : {
+          "label" : "Single cell data",
+          "summary" : "The single-cell data file used as reference for the spatial data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Cell type label IDs",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to values in `cell_type`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_spatial_masked",
+        "info" : {
+          "label" : "Spatial masked",
+          "summary" : "The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions_pred` in output",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Spatial data with estimated proportions.",
+          "description" : "Spatial data file with estimated cell type proportions.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "proportions_pred",
+                "description" : "Estimated cell type proportions for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "double",
+        "name" : "--fc_cutoff",
+        "description" : "Minimum log-fold-change (across cell types) for genes to be included in the platform effect normalization step.",
+        "default" : [
+          0.5
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "double",
+        "name" : "--fc_cutoff_reg",
+        "description" : "Minimum log-fold-change (across cell types) for genes to be included in the RCTD step.",
+        "default" : [
+          0.75
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/rctd/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "dest" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "RCTD",
+      "summary" : "RCTD learns cell type profiles from scRNA-seq to decompose cell type mixtures while correcting for differences across sequencing technologies.",
+      "description" : "RCTD (Robust Cell Type Decomposition) is a decomposition method that uses signatures learnt from single-cell data to decompose spatial expression of tissues. It is able to use a platform effect normalization step, which normalizes the scRNA-seq cell type profiles to match the platform effects of the spatial transcriptomics dataset.\n",
+      "preferred_normalization" : "counts",
+      "reference" : "cable2021robust",
+      "documentation_url" : "https://raw.githack.com/dmcable/spacexr/master/vignettes/spatial-transcriptomics.html",
+      "repository_url" : "https://github.com/dmcable/spacexr",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatial composition method.",
+        "description" : "Method to estimate cell type proportions from spatial and single cell data"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "cran" : [
+            "Matrix",
+            "pak"
+          ],
+          "bioc_force_install" : false
+        },
+        {
+          "type" : "r",
+          "script" : [
+            "pak::pkg_install(\\"dmcable/spacexr\\")"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/rctd/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/rctd",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+library(anndata)
+library(spacexr)
+library(Matrix)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_single_cell" = $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_SINGLE_CELL" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_spatial_masked" = $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "fc_cutoff" = $( if [ ! -z ${VIASH_PAR_FC_CUTOFF+x} ]; then echo -n "as.numeric('"; echo -n "$VIASH_PAR_FC_CUTOFF" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "fc_cutoff_reg" = $( if [ ! -z ${VIASH_PAR_FC_CUTOFF_REG+x} ]; then echo -n "as.numeric('"; echo -n "$VIASH_PAR_FC_CUTOFF_REG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading input files\\\\n")
+input_single_cell <- anndata::read_h5ad(par\\$input_single_cell)
+input_spatial <- anndata::read_h5ad(par\\$input_spatial)
+
+# set spatial coordinates for the single cell data
+coordinates <- matrix(1, dim(input_single_cell)[1], 2)
+rownames(coordinates) <- rownames(input_single_cell)
+input_single_cell\\$obsm <- list(coordinates = coordinates)
+
+# remove rare cell types to prevent RCTD error
+# celltype_counts <- table(input_single_cell\\$obs\\$cell_type)
+# input_single_cell <- input_single_cell[input_single_cell\\$obs\\$cell_type %in% names(as.table(celltype_counts[celltype_counts > 25]))]
+
+# get single cell reference counts
+sc_counts <- t(input_single_cell\\$layers['counts'])
+# get single cell reference labels
+sc_cell_types <- factor(input_single_cell\\$obs\\$cell_type)
+names(sc_cell_types) <- rownames(input_single_cell)
+# construct reference object (specific for RCTD)
+reference <- Reference(sc_counts, sc_cell_types)
+
+# get spatial data counts
+sp_counts <- t(input_spatial\\$layers['counts'])
+# get spatial data coordinates
+sp_coords <- as.data.frame(input_spatial\\$obsm['coordinates'])
+colnames(sp_coords) <- c("x", "y")
+rownames(sp_coords) <- rownames(input_spatial)
+# create spatial object to use in RCTD
+puck <- SpatialRNA(sp_coords, sp_counts)
+
+# create RCTD object from reference and spatialRNA objects
+if (!is.null(meta\\$cpus)) {
+max_cores <- meta\\$cpus
+} else {
+max_cores <- 1
+}
+rctd <- create.RCTD(
+  puck,
+  reference,
+  max_cores = max_cores,
+  fc_cutoff = par\\$fc_cutoff,
+  fc_cutoff_reg = par\\$fc_cutoff_reg,
+  test_mode = FALSE,
+  UMI_min_sigma = 100,
+  CELL_MIN_INSTANCE = 1
+)
+
+# run analysis and get results
+rctd <- run.RCTD(rctd)
+results <- rctd@results
+cell_type_names <- rctd@cell_type_info\\$info[[2]]
+
+# extract proportions and normalize them (to sum to one)
+norm_weights <- sweep(results\\$weights, 1, rowSums(results\\$weights), "/")
+norm_weights <- as.matrix(norm_weights)
+coordinates <- as.matrix(sp_coords)
+
+cat("Write output AnnData to file\\\\n")
+output <- anndata::AnnData(
+  shape = input_spatial\\$shape, 
+  obs = input_spatial\\$obs,
+  var = input_spatial\\$var,
+  uns = list(
+    cell_type_names = input_spatial\\$uns['cell_type_names'],
+    dataset_id = input_spatial\\$uns[["dataset_id"]],
+    method_id = meta[["functionality_name"]]
+  ),
+  obsm = list(
+    coordinates = coordinates,
+    proportions_pred = norm_weights
+  ),
+  layers = list(
+    counts = input_spatial\\$layers['counts']
+  )
+)
+output\\$write_h5ad(par[["output"]], compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatial_decomposition/methods/rctd",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatial_decomposition/methods/rctd/nextflow.config b/target/nextflow/spatial_decomposition/methods/rctd/nextflow.config
new file mode 100644
index 0000000000..bc1aec4a7c
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/rctd/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatial_decomposition/methods/rctd'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatial_decomposition/methods/seurat/.config.vsh.yaml b/target/nextflow/spatial_decomposition/methods/seurat/.config.vsh.yaml
new file mode 100644
index 0000000000..e5fd71c9ad
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/seurat/.config.vsh.yaml
@@ -0,0 +1,241 @@
+functionality:
+  name: "seurat"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_pcs"
+    description: "Number of principal components."
+    info: null
+    default:
+    - 30
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--sctransform_n_cells"
+    description: "Number of cells sampled to build NB regression."
+    info: null
+    default:
+    - 5000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Seurat"
+    summary: "Seurat method that is based on Canonical Correlation Analysis (CCA)."
+    description: "This method applies the 'anchor'-based integration workflow introduced\
+      \ in Seurat v3, that enables the probabilistic transfer of annotations from\
+      \ a reference to a query set. First, mutual nearest neighbors (anchors) are\
+      \ identified from the reference scRNA-seq and query spatial datasets. Then,\
+      \ annotations are transfered from the single cell reference data to the sptial\
+      \ data along with prediction scores for each spot.\n"
+    preferred_normalization: "counts"
+    reference: "stuart2019comprehensive"
+    documentation_url: "https://satijalab.org/seurat/articles/spatial_vignette"
+    repository_url: "https://github.com/satijalab/seurat"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_r:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "Matrix"
+    - "Seurat"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/seurat/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/seurat"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/seurat/seurat"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatial_decomposition/methods/seurat/main.nf b/target/nextflow/spatial_decomposition/methods/seurat/main.nf
new file mode 100644
index 0000000000..9f66d87a38
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/seurat/main.nf
@@ -0,0 +1,3717 @@
+// seurat 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "seurat",
+    "namespace" : "spatial_decomposition/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_single_cell",
+        "info" : {
+          "label" : "Single cell data",
+          "summary" : "The single-cell data file used as reference for the spatial data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Cell type label IDs",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to values in `cell_type`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_spatial_masked",
+        "info" : {
+          "label" : "Spatial masked",
+          "summary" : "The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions_pred` in output",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Spatial data with estimated proportions.",
+          "description" : "Spatial data file with estimated cell type proportions.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "proportions_pred",
+                "description" : "Estimated cell type proportions for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_pcs",
+        "description" : "Number of principal components.",
+        "default" : [
+          30
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--sctransform_n_cells",
+        "description" : "Number of cells sampled to build NB regression.",
+        "default" : [
+          5000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/seurat/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "dest" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Seurat",
+      "summary" : "Seurat method that is based on Canonical Correlation Analysis (CCA).",
+      "description" : "This method applies the 'anchor'-based integration workflow introduced in Seurat v3, that enables the probabilistic transfer of annotations from a reference to a query set. First, mutual nearest neighbors (anchors) are identified from the reference scRNA-seq and query spatial datasets. Then, annotations are transfered from the single cell reference data to the sptial data along with prediction scores for each spot.\n",
+      "preferred_normalization" : "counts",
+      "reference" : "stuart2019comprehensive",
+      "documentation_url" : "https://satijalab.org/seurat/articles/spatial_vignette",
+      "repository_url" : "https://github.com/satijalab/seurat",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatial composition method.",
+        "description" : "Method to estimate cell type proportions from spatial and single cell data"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_r:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "cran" : [
+            "Matrix",
+            "Seurat"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/seurat/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/seurat",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+library(anndata)
+library(Seurat)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_single_cell" = $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_SINGLE_CELL" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "input_spatial_masked" = $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_SPATIAL_MASKED" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "n_pcs" = $( if [ ! -z ${VIASH_PAR_N_PCS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_PCS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "sctransform_n_cells" = $( if [ ! -z ${VIASH_PAR_SCTRANSFORM_N_CELLS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_SCTRANSFORM_N_CELLS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Reading input files\\\\n")
+input_single_cell <- anndata::read_h5ad(par\\$input_single_cell)
+input_spatial <- anndata::read_h5ad(par\\$input_spatial)
+
+cat(">> Converting AnnData to Seurat\\\\n")
+anndataToSeurat <- function(adata, assay) {
+  obj <- SeuratObject::CreateSeuratObject(counts = as(Matrix::t(adata\\$layers[["counts"]]), "CsparseMatrix"), assay = assay)
+  obj <- SeuratObject::AddMetaData(object = obj, metadata = adata\\$obs)
+  obj
+}
+
+seurat_sc <- anndataToSeurat(input_single_cell, "RNA")
+seurat_sp <- anndataToSeurat(input_spatial, "spatial")
+
+cat(">> Generate predictions\\\\n")
+
+# Normalize and do dimred for spatial data
+seurat_sp <- SCTransform(
+  seurat_sp,
+  assay = "spatial",
+  ncells = min(par\\$sctransform_n_cells, nrow(seurat_sp)),
+  verbose = TRUE,
+  conserve.memory = TRUE
+)
+
+seurat_sp <- RunPCA(seurat_sp, assay = "SCT", verbose = FALSE, n_pcs = par\\$n_pcs)
+
+# Normalize and do dimred for single cell data
+seurat_sc <- SCTransform(
+  seurat_sc,
+  assay = "RNA",
+  ncells = min(par\\$sctransform_n_cells, nrow(seurat_sc)),
+  verbose = TRUE,
+  conserve.memory = TRUE
+)
+
+seurat_sc <- RunPCA(seurat_sc, verbose = FALSE, n_pcs = par\\$n_pcs)
+
+# find anchors (MNN's to compute adjustmen vectors)
+anchors <- FindTransferAnchors(
+  reference = seurat_sc,
+  query = seurat_sp,
+  normalization.method = "SCT"
+)
+
+# transfer labels from single cell data to spatial
+predictions_assay <- TransferData(
+  anchorset = anchors,
+  refdata = as.factor(as.character(seurat_sc@meta.data\\$cell_type)),
+  prediction.assay = TRUE,
+  weight.reduction = seurat_sp[["pca"]],
+  dims = 1:par\\$n_pcs
+)
+
+# format data and return results
+predictions <- LayerData(predictions_assay, layer = "data")
+predictions <- predictions[!(rownames(predictions) == "max"), ]
+predictions <- t(predictions)
+
+sp_coords <- as.data.frame(input_spatial\\$obsm['coordinates'])
+colnames(sp_coords) <- c("x", "y")
+rownames(sp_coords) <- rownames(input_spatial)
+sp_coords <- as.matrix(sp_coords)
+
+cat("Write output AnnData to file\\\\n")
+output <- anndata::AnnData(
+  shape = input_spatial\\$shape, 
+  obs = input_spatial\\$obs,
+  var = input_spatial\\$var,
+  uns = list(
+    cell_type_names = input_spatial\\$uns['cell_type_names'],
+    dataset_id = input_spatial\\$uns[["dataset_id"]],
+    method_id = meta[["functionality_name"]]
+  ),
+  obsm = list(
+    coordinates = sp_coords,
+    proportions_pred = predictions
+  ),
+  layers = list(
+    counts = input_spatial\\$layers['counts']
+  )
+)
+output\\$write_h5ad(par[["output"]], compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatial_decomposition/methods/seurat",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatial_decomposition/methods/seurat/nextflow.config b/target/nextflow/spatial_decomposition/methods/seurat/nextflow.config
new file mode 100644
index 0000000000..9237d4028d
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/seurat/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatial_decomposition/methods/seurat'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatial_decomposition/methods/stereoscope/.config.vsh.yaml b/target/nextflow/spatial_decomposition/methods/stereoscope/.config.vsh.yaml
new file mode 100644
index 0000000000..b5e80b3a67
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/stereoscope/.config.vsh.yaml
@@ -0,0 +1,243 @@
+functionality:
+  name: "stereoscope"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_sc"
+    description: "Number of of epochs to train RNAStereoscope model."
+    info: null
+    default:
+    - 100
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_epochs_sp"
+    description: "Number of of epochs to train SpatialStereoscope model."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Stereoscope"
+    summary: "Stereoscope is a decomposition method based on Negative Binomial regression."
+    description: "Stereoscope is a decomposition method based on Negative Binomial\
+      \ regression. It is similar in scope and implementation to cell2location but\
+      \ less flexible to incorporate additional covariates such as batch effects and\
+      \ other type of experimental design annotations.\n"
+    preferred_normalization: "counts"
+    reference: "andersson2020single"
+    documentation_url: "https://docs.scvi-tools.org/en/stable/user_guide/models/stereoscope.html"
+    repository_url: "https://github.com/scverse/scvi-tools"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_pytorch_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scvi-tools>=1.1.0"
+    upgrade: true
+  - type: "docker"
+    run:
+    - "pip install -U \"jax[cuda12_pip]\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "midmem"
+    - "midcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/stereoscope/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/stereoscope"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/stereoscope/stereoscope"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatial_decomposition/methods/stereoscope/main.nf b/target/nextflow/spatial_decomposition/methods/stereoscope/main.nf
new file mode 100644
index 0000000000..a9e5898121
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/stereoscope/main.nf
@@ -0,0 +1,3679 @@
+// stereoscope 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "stereoscope",
+    "namespace" : "spatial_decomposition/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_single_cell",
+        "info" : {
+          "label" : "Single cell data",
+          "summary" : "The single-cell data file used as reference for the spatial data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Cell type label IDs",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to values in `cell_type`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_spatial_masked",
+        "info" : {
+          "label" : "Spatial masked",
+          "summary" : "The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions_pred` in output",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Spatial data with estimated proportions.",
+          "description" : "Spatial data file with estimated cell type proportions.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "proportions_pred",
+                "description" : "Estimated cell type proportions for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--max_epochs_sc",
+        "description" : "Number of of epochs to train RNAStereoscope model.",
+        "default" : [
+          100
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--max_epochs_sp",
+        "description" : "Number of of epochs to train SpatialStereoscope model.",
+        "default" : [
+          1000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/stereoscope/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "dest" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Stereoscope",
+      "summary" : "Stereoscope is a decomposition method based on Negative Binomial regression.",
+      "description" : "Stereoscope is a decomposition method based on Negative Binomial regression. It is similar in scope and implementation to cell2location but less flexible to incorporate additional covariates such as batch effects and other type of experimental design annotations.\n",
+      "preferred_normalization" : "counts",
+      "reference" : "andersson2020single",
+      "documentation_url" : "https://docs.scvi-tools.org/en/stable/user_guide/models/stereoscope.html",
+      "repository_url" : "https://github.com/scverse/scvi-tools",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatial composition method.",
+        "description" : "Method to estimate cell type proportions from spatial and single cell data"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_pytorch_nvidia:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scvi-tools>=1.1.0"
+          ],
+          "upgrade" : true
+        },
+        {
+          "type" : "docker",
+          "run" : [
+            "pip install -U \\"jax[cuda12_pip]\\" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html\n"
+          ]
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "hightime",
+          "midmem",
+          "midcpu",
+          "gpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/stereoscope/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/stereoscope",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+from scvi.external import RNAStereoscope
+from scvi.external import SpatialStereoscope
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'max_epochs_sc': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_SC+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_SC//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'max_epochs_sp': $( if [ ! -z ${VIASH_PAR_MAX_EPOCHS_SP+x} ]; then echo "int(r'${VIASH_PAR_MAX_EPOCHS_SP//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+input_single_cell.X = input_single_cell.layers["counts"]
+input_spatial.X = input_spatial.layers["counts"]
+
+print('Generate predictions', flush=True)
+
+RNAStereoscope.setup_anndata(input_single_cell, labels_key="cell_type")
+sc_model = RNAStereoscope(input_single_cell)
+sc_model.train(
+  max_epochs=par["max_epochs_sc"],
+  # early_stopping=True,
+  # early_stopping_monitor="elbo_validation"
+)
+
+SpatialStereoscope.setup_anndata(input_spatial)
+st_model = SpatialStereoscope.from_rna_model(input_spatial, sc_model)
+st_model.train(
+  max_epochs=par["max_epochs_sp"],
+  # early_stopping=True,
+  # early_stopping_monitor="elbo_validation"
+)
+input_spatial.obsm["proportions_pred"] = st_model.get_proportions().to_numpy()
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  },
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatial_decomposition/methods/stereoscope",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "hightime",
+    "midmem",
+    "midcpu",
+    "gpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatial_decomposition/methods/stereoscope/nextflow.config b/target/nextflow/spatial_decomposition/methods/stereoscope/nextflow.config
new file mode 100644
index 0000000000..ea72b7b650
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/stereoscope/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatial_decomposition/methods/stereoscope'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatial_decomposition/methods/tangram/.config.vsh.yaml b/target/nextflow/spatial_decomposition/methods/tangram/.config.vsh.yaml
new file mode 100644
index 0000000000..3db411dd74
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/tangram/.config.vsh.yaml
@@ -0,0 +1,240 @@
+functionality:
+  name: "tangram"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--num_epochs"
+    description: "Number of epochs to use while mapping single cells to spatial locations."
+    info: null
+    default:
+    - 1000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_markers"
+    description: "Number of marker genes to use."
+    info: null
+    default:
+    - 100
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Tangram"
+    summary: "Tanagram maps single-cell gene expression data onto spatial gene expression\
+      \ data by fitting gene expression on shared genes"
+    description: "Tangram is a method to map gene expression signatures from scRNA-seq\
+      \ data to spatial data. It performs the cell type mapping by learning a similarity\
+      \ matrix between single-cell and spatial locations based on gene expression\
+      \ profiles.\n"
+    preferred_normalization: "counts"
+    reference: "biancalani2021deep"
+    documentation_url: "https://tangram-sc.readthedocs.io/en/latest/index.html"
+    repository_url: "https://github.com/broadinstitute/Tangram"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "tangram-sc"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/tangram/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/tangram"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/tangram/tangram"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatial_decomposition/methods/tangram/main.nf b/target/nextflow/spatial_decomposition/methods/tangram/main.nf
new file mode 100644
index 0000000000..aed54661d2
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/tangram/main.nf
@@ -0,0 +1,3694 @@
+// tangram 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "tangram",
+    "namespace" : "spatial_decomposition/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_single_cell",
+        "info" : {
+          "label" : "Single cell data",
+          "summary" : "The single-cell data file used as reference for the spatial data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Cell type label IDs",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to values in `cell_type`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_spatial_masked",
+        "info" : {
+          "label" : "Spatial masked",
+          "summary" : "The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions_pred` in output",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Spatial data with estimated proportions.",
+          "description" : "Spatial data file with estimated cell type proportions.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "proportions_pred",
+                "description" : "Estimated cell type proportions for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--num_epochs",
+        "description" : "Number of epochs to use while mapping single cells to spatial locations.",
+        "default" : [
+          1000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_markers",
+        "description" : "Number of marker genes to use.",
+        "default" : [
+          100
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/tangram/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "dest" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Tangram",
+      "summary" : "Tanagram maps single-cell gene expression data onto spatial gene expression data by fitting gene expression on shared genes",
+      "description" : "Tangram is a method to map gene expression signatures from scRNA-seq data to spatial data. It performs the cell type mapping by learning a similarity matrix between single-cell and spatial locations based on gene expression profiles.\n",
+      "preferred_normalization" : "counts",
+      "reference" : "biancalani2021deep",
+      "documentation_url" : "https://tangram-sc.readthedocs.io/en/latest/index.html",
+      "repository_url" : "https://github.com/broadinstitute/Tangram",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatial composition method.",
+        "description" : "Method to estimate cell type proportions from spatial and single cell data"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "tangram-sc"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/tangram/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/tangram",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import pandas as pd
+import scanpy as sc
+import tangram as tg
+import torch
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'num_epochs': $( if [ ! -z ${VIASH_PAR_NUM_EPOCHS+x} ]; then echo "int(r'${VIASH_PAR_NUM_EPOCHS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'n_markers': $( if [ ! -z ${VIASH_PAR_N_MARKERS+x} ]; then echo "int(r'${VIASH_PAR_N_MARKERS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+print('Generate predictions', flush=True)
+# analysis based on github.com/broadinstitute/Tangram/blob/master/tutorial_tangram_with_squidpy.ipynb
+# using tangram from PyPi, not github version
+
+input_single_cell.X = input_single_cell.layers["counts"]
+input_spatial.X = input_spatial.layers["counts"]
+
+# pre-process single cell data
+sc.pp.normalize_total(input_single_cell, 1e4)
+sc.pp.log1p(input_single_cell)
+# identify marker genes
+sc.tl.rank_genes_groups(input_single_cell, groupby="cell_type", use_raw=False)
+
+# extract marker genes to data frame
+markers_df = pd.DataFrame(input_single_cell.uns["rank_genes_groups"]["names"]).iloc[0:par['n_markers'], :]
+
+# get union of all marker genes
+markers = list(set(markers_df.melt().value.values))
+
+# match genes between single cell and spatial data
+tg.pp_adatas(input_single_cell, input_spatial, genes=markers)
+
+# get device
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+
+# map single cells to spatial locations
+ad_map = tg.map_cells_to_space(
+  input_single_cell,
+  input_spatial,
+  device=device,
+  num_epochs=par['num_epochs'],
+)
+
+# transfer labels from mapped cells to spatial location
+tg.project_cell_annotations(adata_map=ad_map, adata_sp=input_spatial, annotation="cell_type")
+
+# normalize scores
+pred_props = input_spatial.obsm["tangram_ct_pred"].to_numpy()
+input_spatial.obsm["proportions_pred"] = pred_props / pred_props.sum(axis=1)[:, None]
+
+# remove un-normalized predictions
+del input_spatial.obsm["tangram_ct_pred"]
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  },
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatial_decomposition/methods/tangram",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatial_decomposition/methods/tangram/nextflow.config b/target/nextflow/spatial_decomposition/methods/tangram/nextflow.config
new file mode 100644
index 0000000000..a64f19b277
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/tangram/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatial_decomposition/methods/tangram'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatial_decomposition/methods/vanillanmf/.config.vsh.yaml b/target/nextflow/spatial_decomposition/methods/vanillanmf/.config.vsh.yaml
new file mode 100644
index 0000000000..b2b7452da2
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/vanillanmf/.config.vsh.yaml
@@ -0,0 +1,233 @@
+functionality:
+  name: "vanillanmf"
+  namespace: "spatial_decomposition/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_iter"
+    description: "Maximum number of iterations before timing out."
+    info: null
+    default:
+    - 4000
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "NMF"
+    summary: "NMF reconstructs gene expression as a weighted combination of cell type\
+      \ signatures defined by scRNA-seq."
+    description: "NMF is a decomposition method based on Non-negative Matrix Factorization\
+      \ (NMF) that reconstructs expression of each spatial location as a weighted\
+      \ combination of cell-type signatures defined by scRNA-seq. It is a simpler\
+      \ baseline than NMFreg as it only performs the NMF step based on mean expression\
+      \ signatures of cell types, returning the weights loading of the NMF as (normalized)\
+      \ cell type proportions, without the regression step.\n"
+    preferred_normalization: "counts"
+    reference: "cichocki2009fast"
+    documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html"
+    repository_url: "https://github.com/scikit-learn/scikit-learn/blob/92c9b1866/sklearn/decomposition/"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatial composition method."
+      description: "Method to estimate cell type proportions from spatial and single\
+        \ cell data"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "numpy"
+    - "scipy"
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/vanillanmf/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/vanillanmf"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/vanillanmf/vanillanmf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatial_decomposition/methods/vanillanmf/main.nf b/target/nextflow/spatial_decomposition/methods/vanillanmf/main.nf
new file mode 100644
index 0000000000..55d30a7e1f
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/vanillanmf/main.nf
@@ -0,0 +1,3676 @@
+// vanillanmf 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "vanillanmf",
+    "namespace" : "spatial_decomposition/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_single_cell",
+        "info" : {
+          "label" : "Single cell data",
+          "summary" : "The single-cell data file used as reference for the spatial data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Cell type label IDs",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to values in `cell_type`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_spatial_masked",
+        "info" : {
+          "label" : "Spatial masked",
+          "summary" : "The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions_pred` in output",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Spatial data with estimated proportions.",
+          "description" : "Spatial data file with estimated cell type proportions.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "proportions_pred",
+                "description" : "Estimated cell type proportions for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--max_iter",
+        "description" : "Maximum number of iterations before timing out.",
+        "default" : [
+          4000
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/vanillanmf/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "dest" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "NMF",
+      "summary" : "NMF reconstructs gene expression as a weighted combination of cell type signatures defined by scRNA-seq.",
+      "description" : "NMF is a decomposition method based on Non-negative Matrix Factorization (NMF) that reconstructs expression of each spatial location as a weighted combination of cell-type signatures defined by scRNA-seq. It is a simpler baseline than NMFreg as it only performs the NMF step based on mean expression signatures of cell types, returning the weights loading of the NMF as (normalized) cell type proportions, without the regression step.\n",
+      "preferred_normalization" : "counts",
+      "reference" : "cichocki2009fast",
+      "documentation_url" : "https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html",
+      "repository_url" : "https://github.com/scikit-learn/scikit-learn/blob/92c9b1866/sklearn/decomposition/",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatial composition method.",
+        "description" : "Method to estimate cell type proportions from spatial and single cell data"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "numpy",
+            "scipy",
+            "scikit-learn"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/vanillanmf/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/vanillanmf",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import numpy as np
+from scipy.sparse import issparse
+from sklearn.decomposition import NMF
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_single_cell': $( if [ ! -z ${VIASH_PAR_INPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_INPUT_SINGLE_CELL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_spatial_masked': $( if [ ! -z ${VIASH_PAR_INPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_INPUT_SPATIAL_MASKED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'max_iter': $( if [ ! -z ${VIASH_PAR_MAX_ITER+x} ]; then echo "int(r'${VIASH_PAR_MAX_ITER//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_single_cell = ad.read_h5ad(par['input_single_cell'])
+input_spatial = ad.read_h5ad(par['input_spatial_masked'])
+
+print('Generate predictions', flush=True)
+
+n_types = input_single_cell.obs["cell_type"].cat.categories.shape[0]
+vanila_nmf_model = NMF(
+  n_components=n_types,
+  beta_loss="kullback-leibler",
+  solver="mu",
+  max_iter=par['max_iter'],
+  alpha_W=0.1,
+  alpha_H=0.1,
+  init="custom",
+  random_state=42,
+)
+
+# Make profiles from single-cell expression dataset
+# Compute means over each 'cell_type'
+labels = input_single_cell.obs['cell_type'].cat.categories
+n_var = input_single_cell.shape[1]
+means = np.empty((labels.shape[0], n_var))
+for i, lab in enumerate(labels):
+  adata_lab = input_single_cell[input_single_cell.obs['cell_type'] == lab]
+  x_lab = adata_lab.layers['counts']
+  means[i, :] = x_lab.mean(axis=0).flatten()
+adata_means = ad.AnnData(means)
+adata_means.obs_names = labels
+adata_means.var_names = input_single_cell.var_names
+
+X = input_spatial.layers['counts'].toarray()
+
+Wa = vanila_nmf_model.fit_transform(
+  X.astype(adata_means.X.dtype),
+  H=adata_means.X,
+  W=np.ones((input_spatial.shape[0], n_types), dtype=adata_means.X.dtype),
+)
+
+prop = Wa / Wa.sum(1)[:, np.newaxis]
+input_spatial.obsm["proportions_pred"] = prop
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  obs=input_spatial.obs[[]],
+  var=input_spatial.var[[]],
+  uns={
+    'cell_type_names': input_spatial.uns['cell_type_names'],
+    'dataset_id': input_spatial.uns['dataset_id'],
+    'method_id': meta['functionality_name']
+  },
+  obsm={
+    'coordinates': input_spatial.obsm['coordinates'],
+    'proportions_pred': input_spatial.obsm['proportions_pred']
+  },
+  layers={
+    'counts': input_spatial.layers['counts']
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatial_decomposition/methods/vanillanmf",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatial_decomposition/methods/vanillanmf/nextflow.config b/target/nextflow/spatial_decomposition/methods/vanillanmf/nextflow.config
new file mode 100644
index 0000000000..580abd242b
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/methods/vanillanmf/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatial_decomposition/methods/vanillanmf'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatial_decomposition/metrics/r2/.config.vsh.yaml b/target/nextflow/spatial_decomposition/metrics/r2/.config.vsh.yaml
new file mode 100644
index 0000000000..a6c38c3c4f
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/metrics/r2/.config.vsh.yaml
@@ -0,0 +1,252 @@
+functionality:
+  name: "r2"
+  namespace: "spatial_decomposition/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_method"
+    info:
+      label: "Output"
+      summary: "Spatial data with estimated proportions."
+      description: "Spatial data file with estimated cell type proportions."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_pred"
+          description: "Estimated cell type proportions for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, with true cell-type proportions for each spot / capture location."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_true"
+          description: "True cell type proportions for each spot"
+          required: true
+        uns:
+        - name: "cell_type_names"
+          type: "string"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - name: "dataset_id"
+          type: "string"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file."
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "r2"
+      label: "R2"
+      summary: "R2 represents the proportion of variance in the true proportions which\
+        \ is explained by the predicted proportions."
+      description: "R2, or the “coefficient of determination”, reports the fraction\
+        \ of the true proportion values' variance that can be explained by the predicted\
+        \ proportion values. The best score, and upper bound, is 1.0. There is no\
+        \ fixed lower bound for the metric. The uniform/non-weighted average across\
+        \ all cell types/states is used to summarise performance. By default, cases\
+        \ resulting in a score of NaN (perfect predictions) or -Inf (imperfect predictions)\
+        \ are replaced with 1.0 (perfect predictions) or 0.0 (imperfect predictions)\
+        \ respectively.\n"
+      reference: "miles2005rsquared"
+      documentation_url: "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html"
+      repository_url: "https://github.com/scikit-learn/scikit-learn/tree/5c4aa5d0d90ba66247d675d4c3fc2fdfba3c39ff"
+      min: "-inf"
+      max: 1
+      maximize: true
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A spatial decomposition metric."
+      description: "A metric for evaluating accuracy of cell type proportion estimate\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "scikit-learn"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/metrics/r2/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/metrics/r2"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/metrics/r2/r2"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatial_decomposition/metrics/r2/main.nf b/target/nextflow/spatial_decomposition/metrics/r2/main.nf
new file mode 100644
index 0000000000..6a3911379e
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/metrics/r2/main.nf
@@ -0,0 +1,3669 @@
+// r2 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "r2",
+    "namespace" : "spatial_decomposition/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_method",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Spatial data with estimated proportions.",
+          "description" : "Spatial data file with estimated cell type proportions.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "proportions_pred",
+                "description" : "Estimated cell type proportions for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "proportions_true",
+                "description" : "True cell type proportions for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "name" : "cell_type_names",
+                "type" : "string",
+                "description" : "Cell type names corresponding to columns of `proportions`",
+                "required" : true
+              },
+              {
+                "name" : "dataset_id",
+                "type" : "string",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file.",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/metrics/r2/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_metric_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "dest" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "r2",
+          "label" : "R2",
+          "summary" : "R2 represents the proportion of variance in the true proportions which is explained by the predicted proportions.",
+          "description" : "R2, or the “coefficient of determination”, reports the fraction of the true proportion values' variance that can be explained by the predicted proportion values. The best score, and upper bound, is 1.0. There is no fixed lower bound for the metric. The uniform/non-weighted average across all cell types/states is used to summarise performance. By default, cases resulting in a score of NaN (perfect predictions) or -Inf (imperfect predictions) are replaced with 1.0 (perfect predictions) or 0.0 (imperfect predictions) respectively.\n",
+          "reference" : "miles2005rsquared",
+          "documentation_url" : "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html",
+          "repository_url" : "https://github.com/scikit-learn/scikit-learn/tree/5c4aa5d0d90ba66247d675d4c3fc2fdfba3c39ff",
+          "min" : "-inf",
+          "max" : 1,
+          "maximize" : true
+        }
+      ],
+      "type" : "metric",
+      "type_info" : {
+        "label" : "Metric",
+        "summary" : "A spatial decomposition metric.",
+        "description" : "A metric for evaluating accuracy of cell type proportion estimate\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scikit-learn"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/metrics/r2/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/metrics/r2",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import sklearn.metrics
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_method': $( if [ ! -z ${VIASH_PAR_INPUT_METHOD+x} ]; then echo "r'${VIASH_PAR_INPUT_METHOD//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_method = ad.read_h5ad(par['input_method'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Compute metrics', flush=True)
+prop_true = input_solution.obsm["proportions_true"]
+prop_pred = input_method.obsm["proportions_pred"]
+r2_score = sklearn.metrics.r2_score(
+  prop_true, prop_pred, sample_weight=None, multioutput="uniform_average"
+)
+
+uns_metric_ids = [ 'r2' ]
+uns_metric_values = [ r2_score ]
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  uns={
+    'dataset_id': input_method.uns['dataset_id'],
+    'method_id': input_method.uns['method_id'],
+    'metric_ids': uns_metric_ids,
+    'metric_values': uns_metric_values
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatial_decomposition/metrics/r2",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatial_decomposition/metrics/r2/nextflow.config b/target/nextflow/spatial_decomposition/metrics/r2/nextflow.config
new file mode 100644
index 0000000000..32fb7ef71d
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/metrics/r2/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatial_decomposition/metrics/r2'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatial_decomposition/process_dataset/.config.vsh.yaml b/target/nextflow/spatial_decomposition/process_dataset/.config.vsh.yaml
new file mode 100644
index 0000000000..52ba4fe5fb
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/process_dataset/.config.vsh.yaml
@@ -0,0 +1,308 @@
+functionality:
+  name: "process_dataset"
+  namespace: "spatial_decomposition"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Common Dataset"
+      summary: "A subset of the common dataset."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs."
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: true
+        var:
+        - type: "boolean"
+          name: "hvg"
+          description: "Whether or not the feature is considered to be a 'highly variable\
+            \ gene'"
+          required: true
+        - type: "double"
+          name: "hvg_score"
+          description: "A ranking of the features by hvg."
+          required: true
+        obsm:
+        - type: "double"
+          name: "X_pca"
+          description: "The resulting PCA embedding."
+          required: true
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot."
+          required: false
+        - type: "double"
+          name: "proportions_true"
+          description: "True cell type proportions for each spot"
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: false
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/dataset_simulated.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_single_cell"
+    info:
+      label: "Single cell data"
+      summary: "The single-cell data file used as reference for the spatial data"
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obs:
+        - type: "string"
+          name: "cell_type"
+          description: "Cell type label IDs"
+          required: true
+        - type: "string"
+          name: "batch"
+          description: "A batch identifier. This label is very context-dependent and\
+            \ may be a combination of the tissue, assay, donor, etc."
+          required: false
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to values in `cell_type`"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_spatial_masked"
+    info:
+      label: "Spatial masked"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, without cell-type proportions for each spot."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        uns:
+        - type: "string"
+          name: "cell_type_names"
+          description: "Cell type names corresponding to columns of `proportions_pred`\
+            \ in output"
+          required: true
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_solution"
+    info:
+      label: "Solution"
+      summary: "The spatial data file containing transcription profiles for each capture\
+        \ location, with true cell-type proportions for each spot / capture location."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts"
+          required: true
+        obsm:
+        - type: "double"
+          name: "coordinates"
+          description: "XY coordinates for each spot"
+          required: true
+        - type: "double"
+          name: "proportions_true"
+          description: "True cell type proportions for each spot"
+          required: true
+        uns:
+        - name: "cell_type_names"
+          type: "string"
+          description: "Cell type names corresponding to columns of `proportions`"
+          required: true
+        - name: "dataset_id"
+          type: "string"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+        - type: "string"
+          name: "normalization_id"
+          description: "Which normalization was used"
+          required: true
+    example:
+    - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/subset_anndata.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/common/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/common/cxg_mouse_pancreas_atlas"
+  - type: "file"
+    path: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+    dest: "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas"
+  info:
+    type: "process_dataset"
+    type_info:
+      label: "Data processor"
+      summary: "A spatial decomposition dataset processor."
+      description: "Prepare a common dataset for the spatial_decomposition task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_python:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/process_dataset/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/process_dataset"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/process_dataset/process_dataset"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatial_decomposition/process_dataset/main.nf b/target/nextflow/spatial_decomposition/process_dataset/main.nf
new file mode 100644
index 0000000000..0824dca930
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/process_dataset/main.nf
@@ -0,0 +1,3768 @@
+// process_dataset 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_dataset",
+    "namespace" : "spatial_decomposition",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Common Dataset",
+          "summary" : "A subset of the common dataset.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Cell type label IDs.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "boolean",
+                "name" : "hvg",
+                "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "hvg_score",
+                "description" : "A ranking of the features by hvg.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "X_pca",
+                "description" : "The resulting PCA embedding.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot.",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "proportions_true",
+                "description" : "True cell type proportions for each spot",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to values in `cell_type`",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/dataset_simulated.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_single_cell",
+        "info" : {
+          "label" : "Single cell data",
+          "summary" : "The single-cell data file used as reference for the spatial data",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obs" : [
+              {
+                "type" : "string",
+                "name" : "cell_type",
+                "description" : "Cell type label IDs",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "batch",
+                "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                "required" : false
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to values in `cell_type`",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_spatial_masked",
+        "info" : {
+          "label" : "Spatial masked",
+          "summary" : "The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "cell_type_names",
+                "description" : "Cell type names corresponding to columns of `proportions_pred` in output",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "coordinates",
+                "description" : "XY coordinates for each spot",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "proportions_true",
+                "description" : "True cell type proportions for each spot",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "name" : "cell_type_names",
+                "type" : "string",
+                "description" : "Cell type names corresponding to columns of `proportions`",
+                "required" : true
+              },
+              {
+                "name" : "dataset_id",
+                "type" : "string",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "normalization_id",
+                "description" : "Which normalization was used",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/process_dataset/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/helper_functions/subset_anndata.py",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/common/cxg_mouse_pancreas_atlas",
+        "dest" : "resources_test/common/cxg_mouse_pancreas_atlas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "dest" : "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "process_dataset",
+      "type_info" : {
+        "label" : "Data processor",
+        "summary" : "A spatial decomposition dataset processor.",
+        "description" : "Prepare a common dataset for the spatial_decomposition task.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_python:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/process_dataset/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/process_dataset",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import sys 
+import numpy as np
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_single_cell': $( if [ ! -z ${VIASH_PAR_OUTPUT_SINGLE_CELL+x} ]; then echo "r'${VIASH_PAR_OUTPUT_SINGLE_CELL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_spatial_masked': $( if [ ! -z ${VIASH_PAR_OUTPUT_SPATIAL_MASKED+x} ]; then echo "r'${VIASH_PAR_OUTPUT_SPATIAL_MASKED//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_solution': $( if [ ! -z ${VIASH_PAR_OUTPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_OUTPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta['resources_dir'])
+from subset_anndata import read_config_slots_info, subset_anndata
+
+print(">> Load dataset", flush=True)
+adata = ad.read_h5ad(par["input"])
+
+# TO DO: Non-integer values in the counts layer are detected as un-normalized data by some methods, thereby causing them to fail.
+adata.layers['counts'] = adata.layers['counts'].floor()
+
+print(">> Figuring out which data needs to be copied to which output file", flush=True)
+slot_info = read_config_slots_info(meta["config"])
+
+print(">> Split dataset by modality", flush=True)
+is_sp = adata.obs["modality"] == "sp"
+adata_sp = adata[is_sp, :].copy()
+adata_sc = adata[~is_sp, :].copy()
+
+print(">> Create dataset for methods", flush=True)
+output_spatial_masked = subset_anndata(adata_sp, slot_info['output_spatial_masked'])
+output_single_cell = subset_anndata(adata_sc, slot_info['output_single_cell'])
+
+print(">> Create solution object for metrics", flush=True)
+output_solution = subset_anndata(adata_sp, slot_info['output_solution'])
+
+print(">> Write to disk", flush=True)
+output_spatial_masked.write_h5ad(par["output_spatial_masked"])
+output_single_cell.write_h5ad(par["output_single_cell"])
+output_solution.write_h5ad(par["output_solution"])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatial_decomposition/process_dataset",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatial_decomposition/process_dataset/nextflow.config b/target/nextflow/spatial_decomposition/process_dataset/nextflow.config
new file mode 100644
index 0000000000..ff8cc20073
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/process_dataset/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatial_decomposition/process_dataset'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatial_decomposition/process_dataset/subset_anndata.py b/target/nextflow/spatial_decomposition/process_dataset/subset_anndata.py
new file mode 100644
index 0000000000..80bd160872
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/process_dataset/subset_anndata.py
@@ -0,0 +1,83 @@
+"""Helper functions related to subsetting AnnData objects based on the file format 
+specifications in the .config.vsh.yaml and slot mapping overrides."""
+
+def read_config_slots_info(config_file, slot_mapping = {}):
+    """Read the .config.vsh.yaml to find out which output slots need to be copied to which output file.
+    
+    Arguments:
+    config_file -- Path to the .config.vsh.yaml file (required).
+    slot_mapping -- Which slots to retain. Must be a dictionary whose keys are the names 
+      of the AnnData structs, and values is another dictionary with destination value
+      names as keys and source value names as values.
+      Example of slot_mapping: 
+        ```
+        slot_mapping = {
+          "layers": {
+            "counts": par["layer_counts"],
+          },
+          "obs": {
+            "cell_type": par["obs_cell_type"],
+            "batch": par["obs_batch"],
+          }
+        }
+        ```
+    """
+    import yaml
+    import re
+
+    # read output spec from yaml
+    with open(config_file, "r") as object_name:
+        config = yaml.safe_load(object_name)
+
+    output_struct_slots = {}
+
+    # fetch info on which slots should be copied to which file
+    for arg in config["functionality"]["arguments"]:
+        # argument is an output file with a slot specification
+        if arg["direction"] == "output" and arg.get("info", {}).get("slots"):
+            object_name = re.sub("--", "", arg["name"])
+            
+            struct_slots = arg['info']['slots']
+            out = {}
+            for (struct, slots) in struct_slots.items():
+                out_struct = {}
+                for slot in slots:
+                    # if slot_mapping[struct][slot['name']] exists, use that as the source slot name
+                    # otherwise use slot['name']
+                    source_slot = slot_mapping.get(struct, {}).get(slot["name"], slot["name"])
+                    out_struct[slot["name"]] = source_slot
+                out[struct] = out_struct
+
+            output_struct_slots[object_name] = out
+
+    return output_struct_slots
+
+# create new anndata objects according to api spec
+def subset_anndata(adata, slot_info):
+    """Create new anndata object according to slot info specifications.
+    
+    Arguments:
+    adata -- An AnnData object to subset (required)
+    slot_info -- Which slots to retain, typically one of the items in the output of read_config_slots_info.
+      Must be a dictionary whose keys are the names of the AnnData structs, and values is another 
+      dictionary with destination value names as keys and source value names as values. 
+      """
+    import pandas as pd
+    import anndata as ad
+
+    structs = ["layers", "obs", "var", "uns", "obsp", "obsm", "varp", "varm"]
+    kwargs = {}
+
+    for struct in structs:
+        slot_mapping = slot_info.get(struct, {})
+        data = {dest : getattr(adata, struct)[src] for (dest, src) in slot_mapping.items()}
+        if len(data) > 0:
+            if struct in ['obs', 'var']:
+                data = pd.concat(data, axis=1)
+            kwargs[struct] = data
+        elif struct in ['obs', 'var']:
+            # if no columns need to be copied, we still need an 'obs' and a 'var' 
+            # to help determine the shape of the adata
+            kwargs[struct] = getattr(adata, struct).iloc[:,[]]
+
+    return ad.AnnData(**kwargs)
\ No newline at end of file
diff --git a/target/nextflow/spatial_decomposition/workflows/process_datasets/.config.vsh.yaml b/target/nextflow/spatial_decomposition/workflows/process_datasets/.config.vsh.yaml
new file mode 100644
index 0000000000..e727ec9bf9
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/workflows/process_datasets/.config.vsh.yaml
@@ -0,0 +1,353 @@
+functionality:
+  name: "process_datasets"
+  namespace: "spatial_decomposition/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input"
+      info:
+        label: "Common Dataset"
+        summary: "A subset of the common dataset."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts."
+            required: true
+          obs:
+          - type: "string"
+            name: "cell_type"
+            description: "Cell type label IDs."
+            required: true
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: true
+          var:
+          - type: "boolean"
+            name: "hvg"
+            description: "Whether or not the feature is considered to be a 'highly\
+              \ variable gene'"
+            required: true
+          - type: "double"
+            name: "hvg_score"
+            description: "A ranking of the features by hvg."
+            required: true
+          obsm:
+          - type: "double"
+            name: "X_pca"
+            description: "The resulting PCA embedding."
+            required: true
+          - type: "double"
+            name: "coordinates"
+            description: "XY coordinates for each spot."
+            required: false
+          - type: "double"
+            name: "proportions_true"
+            description: "True cell type proportions for each spot"
+            required: false
+          uns:
+          - type: "string"
+            name: "cell_type_names"
+            description: "Cell type names corresponding to values in `cell_type`"
+            required: false
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+      example:
+      - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/dataset_simulated.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "double"
+      name: "--alpha"
+      info: null
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_single_cell"
+      info:
+        label: "Single cell data"
+        summary: "The single-cell data file used as reference for the spatial data"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "cell_type"
+            description: "Cell type label IDs"
+            required: true
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          uns:
+          - type: "string"
+            name: "cell_type_names"
+            description: "Cell type names corresponding to values in `cell_type`"
+            required: true
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+      example:
+      - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_spatial_masked"
+      info:
+        label: "Spatial masked"
+        summary: "The spatial data file containing transcription profiles for each\
+          \ capture location, without cell-type proportions for each spot."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obsm:
+          - type: "double"
+            name: "coordinates"
+            description: "XY coordinates for each spot"
+            required: true
+          uns:
+          - type: "string"
+            name: "cell_type_names"
+            description: "Cell type names corresponding to columns of `proportions_pred`\
+              \ in output"
+            required: true
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+      example:
+      - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_solution"
+      info:
+        label: "Solution"
+        summary: "The spatial data file containing transcription profiles for each\
+          \ capture location, with true cell-type proportions for each spot / capture\
+          \ location."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obsm:
+          - type: "double"
+            name: "coordinates"
+            description: "XY coordinates for each spot"
+            required: true
+          - type: "double"
+            name: "proportions_true"
+            description: "True cell type proportions for each spot"
+            required: true
+          uns:
+          - name: "cell_type_names"
+            type: "string"
+            description: "Cell type names corresponding to columns of `proportions`"
+            required: true
+          - name: "dataset_id"
+            type: "string"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--simulated_data"
+      info: null
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "src/wf_utils/helper.nf"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/check_dataset_schema"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+    configInfo:
+      functionalityName: "check_dataset_schema"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/check_dataset_schema/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+  - name: "spatial_decomposition/dataset_simulator"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/dataset_simulator/config.vsh.yaml"
+    configInfo:
+      functionalityName: "dataset_simulator"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/dataset_simulator/config.vsh.yaml"
+      functionalityNamespace: "spatial_decomposition"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatial_decomposition/dataset_simulator/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/dataset_simulator"
+  - name: "spatial_decomposition/process_dataset"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/process_dataset/config.vsh.yaml"
+    configInfo:
+      functionalityName: "process_dataset"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/process_dataset/config.vsh.yaml"
+      functionalityNamespace: "spatial_decomposition"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatial_decomposition/process_dataset/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/process_dataset"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/workflows/process_datasets/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/workflows/process_datasets"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/workflows/process_datasets/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatial_decomposition/workflows/process_datasets/helper.nf b/target/nextflow/spatial_decomposition/workflows/process_datasets/helper.nf
new file mode 100644
index 0000000000..7b3acd5b1c
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/workflows/process_datasets/helper.nf
@@ -0,0 +1,14 @@
+Map findArgumentSchema(Map config, String argument_id) {
+  def argument_groups =
+    (config.functionality.argument_groups ?: []) +
+    [
+      arguments: config.functionality.arguments ?: []
+    ]
+
+  def schema_value = argument_groups.findResult{ gr ->
+    gr.arguments.find { arg ->
+      arg.name == ("--" + argument_id)
+    }
+  }
+  return schema_value
+}
diff --git a/target/nextflow/spatial_decomposition/workflows/process_datasets/main.nf b/target/nextflow/spatial_decomposition/workflows/process_datasets/main.nf
new file mode 100644
index 0000000000..7c63a25bf7
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/workflows/process_datasets/main.nf
@@ -0,0 +1,3481 @@
+// process_datasets 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_datasets",
+    "namespace" : "spatial_decomposition/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input",
+            "info" : {
+              "label" : "Common Dataset",
+              "summary" : "A subset of the common dataset.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts.",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Cell type label IDs.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "boolean",
+                    "name" : "hvg",
+                    "description" : "Whether or not the feature is considered to be a 'highly variable gene'",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "hvg_score",
+                    "description" : "A ranking of the features by hvg.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "X_pca",
+                    "description" : "The resulting PCA embedding.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "coordinates",
+                    "description" : "XY coordinates for each spot.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "proportions_true",
+                    "description" : "True cell type proportions for each spot",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_names",
+                    "description" : "Cell type names corresponding to values in `cell_type`",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/dataset_simulated.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "double",
+            "name" : "--alpha",
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_single_cell",
+            "info" : {
+              "label" : "Single cell data",
+              "summary" : "The single-cell data file used as reference for the spatial data",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Cell type label IDs",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_names",
+                    "description" : "Cell type names corresponding to values in `cell_type`",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_spatial_masked",
+            "info" : {
+              "label" : "Spatial masked",
+              "summary" : "The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "coordinates",
+                    "description" : "XY coordinates for each spot",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_names",
+                    "description" : "Cell type names corresponding to columns of `proportions_pred` in output",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_solution",
+            "info" : {
+              "label" : "Solution",
+              "summary" : "The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "coordinates",
+                    "description" : "XY coordinates for each spot",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "proportions_true",
+                    "description" : "True cell type proportions for each spot",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "name" : "cell_type_names",
+                    "type" : "string",
+                    "description" : "Cell type names corresponding to columns of `proportions`",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_id",
+                    "type" : "string",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--simulated_data",
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/workflows/process_datasets/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "src/wf_utils/helper.nf",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/check_dataset_schema",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "check_dataset_schema",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/check_dataset_schema/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+      },
+      {
+        "name" : "spatial_decomposition/dataset_simulator",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/dataset_simulator/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "dataset_simulator",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/dataset_simulator/config.vsh.yaml",
+          "functionalityNamespace" : "spatial_decomposition",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatial_decomposition/dataset_simulator/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/dataset_simulator"
+      },
+      {
+        "name" : "spatial_decomposition/process_dataset",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/process_dataset/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "process_dataset",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/process_dataset/config.vsh.yaml",
+          "functionalityNamespace" : "spatial_decomposition",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatial_decomposition/process_dataset/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/process_dataset"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/workflows/process_datasets/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/workflows/process_datasets",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { check_dataset_schema } from "${meta.resources_dir}/../../../../nextflow/common/check_dataset_schema/main.nf"
+include { dataset_simulator } from "${meta.resources_dir}/../../../../nextflow/spatial_decomposition/dataset_simulator/main.nf"
+include { process_dataset } from "${meta.resources_dir}/../../../../nextflow/spatial_decomposition/process_dataset/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    | check_dataset_schema.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset": checks["exit_code"] == 0 ? state.input : null,
+        ]
+      }
+    )
+
+    // remove datasets which didn't pass the schema check
+    | filter { id, state ->
+      state.dataset != null
+    }
+
+    | dataset_simulator.run(
+      runIf: {id, state -> state.alpha}, 
+      fromState: [ 
+        input: "dataset", 
+        alpha: "alpha"
+      ],
+      toState: [ dataset: "simulated_data"], 
+      auto: [publish: true]
+    )
+
+    | process_dataset.run(
+      fromState: [ input: "dataset" ],
+      toState: [
+        output_single_cell: "output_single_cell",
+        output_spatial_masked: "output_spatial_masked",
+        output_solution: "output_solution" 
+      ]
+    )
+
+    // only output the files for which an output file was specified
+    | setState(["output_single_cell", "output_spatial_masked", "output_solution"])
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatial_decomposition/workflows/process_datasets/nextflow.config b/target/nextflow/spatial_decomposition/workflows/process_datasets/nextflow.config
new file mode 100644
index 0000000000..9878874a88
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/workflows/process_datasets/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatial_decomposition/workflows/process_datasets'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatial_decomposition/workflows/run_benchmark/.config.vsh.yaml b/target/nextflow/spatial_decomposition/workflows/run_benchmark/.config.vsh.yaml
new file mode 100644
index 0000000000..0816fe4d55
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/workflows/run_benchmark/.config.vsh.yaml
@@ -0,0 +1,497 @@
+functionality:
+  name: "run_benchmark"
+  namespace: "spatial_decomposition/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input_single_cell"
+      info:
+        label: "Single cell data"
+        summary: "The single-cell data file used as reference for the spatial data"
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obs:
+          - type: "string"
+            name: "cell_type"
+            description: "Cell type label IDs"
+            required: true
+          - type: "string"
+            name: "batch"
+            description: "A batch identifier. This label is very context-dependent\
+              \ and may be a combination of the tissue, assay, donor, etc."
+            required: false
+          uns:
+          - type: "string"
+            name: "cell_type_names"
+            description: "Cell type names corresponding to values in `cell_type`"
+            required: true
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+      example:
+      - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_spatial_masked"
+      info:
+        label: "Spatial masked"
+        summary: "The spatial data file containing transcription profiles for each\
+          \ capture location, without cell-type proportions for each spot."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obsm:
+          - type: "double"
+            name: "coordinates"
+            description: "XY coordinates for each spot"
+            required: true
+          uns:
+          - type: "string"
+            name: "cell_type_names"
+            description: "Cell type names corresponding to columns of `proportions_pred`\
+              \ in output"
+            required: true
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+      example:
+      - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_solution"
+      info:
+        label: "Solution"
+        summary: "The spatial data file containing transcription profiles for each\
+          \ capture location, with true cell-type proportions for each spot / capture\
+          \ location."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts"
+            required: true
+          obsm:
+          - type: "double"
+            name: "coordinates"
+            description: "XY coordinates for each spot"
+            required: true
+          - type: "double"
+            name: "proportions_true"
+            description: "True cell type proportions for each spot"
+            required: true
+          uns:
+          - name: "cell_type_names"
+            type: "string"
+            description: "Cell type names corresponding to columns of `proportions`"
+            required: true
+          - name: "dataset_id"
+            type: "string"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+          - type: "string"
+            name: "normalization_id"
+            description: "Which normalization was used"
+            required: true
+      example:
+      - "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_scores"
+      description: "A yaml file containing the scores of each of the methods"
+      info: null
+      default:
+      - "score_uns.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_method_configs"
+      info: null
+      default:
+      - "method_configs.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_metric_configs"
+      info: null
+      default:
+      - "metric_configs.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_dataset_info"
+      info: null
+      default:
+      - "dataset_uns.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_task_info"
+      info: null
+      default:
+      - "task_info.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "../../api/task_info.yaml"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/check_dataset_schema"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+    configInfo:
+      functionalityName: "check_dataset_schema"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/check_dataset_schema/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+  - name: "common/extract_metadata"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+    configInfo:
+      functionalityName: "extract_metadata"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/extract_metadata/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  - name: "spatial_decomposition/control_methods/random_proportions"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/control_methods/random_proportions/config.vsh.yaml"
+    configInfo:
+      functionalityName: "random_proportions"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/control_methods/random_proportions/config.vsh.yaml"
+      functionalityNamespace: "spatial_decomposition/control_methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatial_decomposition/control_methods/random_proportions/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/control_methods/random_proportions"
+  - name: "spatial_decomposition/control_methods/true_proportions"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/control_methods/true_proportions/config.vsh.yaml"
+    configInfo:
+      functionalityName: "true_proportions"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/control_methods/true_proportions/config.vsh.yaml"
+      functionalityNamespace: "spatial_decomposition/control_methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatial_decomposition/control_methods/true_proportions/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/control_methods/true_proportions"
+  - name: "spatial_decomposition/methods/cell2location"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/cell2location/config.vsh.yaml"
+    configInfo:
+      functionalityName: "cell2location"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/cell2location/config.vsh.yaml"
+      functionalityNamespace: "spatial_decomposition/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatial_decomposition/methods/cell2location/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/cell2location"
+  - name: "spatial_decomposition/methods/destvi"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/destvi/config.vsh.yaml"
+    configInfo:
+      functionalityName: "destvi"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/destvi/config.vsh.yaml"
+      functionalityNamespace: "spatial_decomposition/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatial_decomposition/methods/destvi/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/destvi"
+  - name: "spatial_decomposition/methods/nmfreg"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/nmfreg/config.vsh.yaml"
+    configInfo:
+      functionalityName: "nmfreg"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/nmfreg/config.vsh.yaml"
+      functionalityNamespace: "spatial_decomposition/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatial_decomposition/methods/nmfreg/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/nmfreg"
+  - name: "spatial_decomposition/methods/nnls"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/nnls/config.vsh.yaml"
+    configInfo:
+      functionalityName: "nnls"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/nnls/config.vsh.yaml"
+      functionalityNamespace: "spatial_decomposition/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatial_decomposition/methods/nnls/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/nnls"
+  - name: "spatial_decomposition/methods/rctd"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/rctd/config.vsh.yaml"
+    configInfo:
+      functionalityName: "rctd"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/rctd/config.vsh.yaml"
+      functionalityNamespace: "spatial_decomposition/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatial_decomposition/methods/rctd/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/rctd"
+  - name: "spatial_decomposition/methods/seurat"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/seurat/config.vsh.yaml"
+    configInfo:
+      functionalityName: "seurat"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/seurat/config.vsh.yaml"
+      functionalityNamespace: "spatial_decomposition/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatial_decomposition/methods/seurat/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/seurat"
+  - name: "spatial_decomposition/methods/stereoscope"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/stereoscope/config.vsh.yaml"
+    configInfo:
+      functionalityName: "stereoscope"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/stereoscope/config.vsh.yaml"
+      functionalityNamespace: "spatial_decomposition/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatial_decomposition/methods/stereoscope/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/stereoscope"
+  - name: "spatial_decomposition/methods/tangram"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/tangram/config.vsh.yaml"
+    configInfo:
+      functionalityName: "tangram"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/tangram/config.vsh.yaml"
+      functionalityNamespace: "spatial_decomposition/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatial_decomposition/methods/tangram/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/tangram"
+  - name: "spatial_decomposition/methods/vanillanmf"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/vanillanmf/config.vsh.yaml"
+    configInfo:
+      functionalityName: "vanillanmf"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/vanillanmf/config.vsh.yaml"
+      functionalityNamespace: "spatial_decomposition/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatial_decomposition/methods/vanillanmf/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/vanillanmf"
+  - name: "spatial_decomposition/metrics/r2"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/metrics/r2/config.vsh.yaml"
+    configInfo:
+      functionalityName: "r2"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/metrics/r2/config.vsh.yaml"
+      functionalityNamespace: "spatial_decomposition/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatial_decomposition/metrics/r2/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/metrics/r2"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/workflows/run_benchmark/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/workflows/run_benchmark"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/workflows/run_benchmark/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatial_decomposition/workflows/run_benchmark/main.nf b/target/nextflow/spatial_decomposition/workflows/run_benchmark/main.nf
new file mode 100644
index 0000000000..e2087d3ea7
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/workflows/run_benchmark/main.nf
@@ -0,0 +1,3781 @@
+// run_benchmark 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "run_benchmark",
+    "namespace" : "spatial_decomposition/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input_single_cell",
+            "info" : {
+              "label" : "Single cell data",
+              "summary" : "The single-cell data file used as reference for the spatial data",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "obs" : [
+                  {
+                    "type" : "string",
+                    "name" : "cell_type",
+                    "description" : "Cell type label IDs",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "batch",
+                    "description" : "A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.",
+                    "required" : false
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_names",
+                    "description" : "Cell type names corresponding to values in `cell_type`",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_spatial_masked",
+            "info" : {
+              "label" : "Spatial masked",
+              "summary" : "The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "coordinates",
+                    "description" : "XY coordinates for each spot",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "cell_type_names",
+                    "description" : "Cell type names corresponding to columns of `proportions_pred` in output",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_solution",
+            "info" : {
+              "label" : "Solution",
+              "summary" : "The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "coordinates",
+                    "description" : "XY coordinates for each spot",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "proportions_true",
+                    "description" : "True cell type proportions for each spot",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "name" : "cell_type_names",
+                    "type" : "string",
+                    "description" : "Cell type names corresponding to columns of `proportions`",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_id",
+                    "type" : "string",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "normalization_id",
+                    "description" : "Which normalization was used",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_scores",
+            "description" : "A yaml file containing the scores of each of the methods",
+            "default" : [
+              "score_uns.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_method_configs",
+            "default" : [
+              "method_configs.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_metric_configs",
+            "default" : [
+              "metric_configs.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_dataset_info",
+            "default" : [
+              "dataset_uns.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_task_info",
+            "default" : [
+              "task_info.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/workflows/run_benchmark/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "../../api/task_info.yaml",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/workflows/run_benchmark/"
+      }
+    ],
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/check_dataset_schema",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "check_dataset_schema",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/check_dataset_schema/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+      },
+      {
+        "name" : "common/extract_metadata",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "extract_metadata",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/extract_metadata/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+      },
+      {
+        "name" : "spatial_decomposition/control_methods/random_proportions",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/control_methods/random_proportions/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "random_proportions",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/control_methods/random_proportions/config.vsh.yaml",
+          "functionalityNamespace" : "spatial_decomposition/control_methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatial_decomposition/control_methods/random_proportions/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/control_methods/random_proportions"
+      },
+      {
+        "name" : "spatial_decomposition/control_methods/true_proportions",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/control_methods/true_proportions/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "true_proportions",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/control_methods/true_proportions/config.vsh.yaml",
+          "functionalityNamespace" : "spatial_decomposition/control_methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatial_decomposition/control_methods/true_proportions/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/control_methods/true_proportions"
+      },
+      {
+        "name" : "spatial_decomposition/methods/cell2location",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/cell2location/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "cell2location",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/cell2location/config.vsh.yaml",
+          "functionalityNamespace" : "spatial_decomposition/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatial_decomposition/methods/cell2location/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/cell2location"
+      },
+      {
+        "name" : "spatial_decomposition/methods/destvi",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/destvi/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "destvi",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/destvi/config.vsh.yaml",
+          "functionalityNamespace" : "spatial_decomposition/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatial_decomposition/methods/destvi/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/destvi"
+      },
+      {
+        "name" : "spatial_decomposition/methods/nmfreg",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/nmfreg/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "nmfreg",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/nmfreg/config.vsh.yaml",
+          "functionalityNamespace" : "spatial_decomposition/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatial_decomposition/methods/nmfreg/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/nmfreg"
+      },
+      {
+        "name" : "spatial_decomposition/methods/nnls",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/nnls/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "nnls",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/nnls/config.vsh.yaml",
+          "functionalityNamespace" : "spatial_decomposition/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatial_decomposition/methods/nnls/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/nnls"
+      },
+      {
+        "name" : "spatial_decomposition/methods/rctd",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/rctd/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "rctd",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/rctd/config.vsh.yaml",
+          "functionalityNamespace" : "spatial_decomposition/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatial_decomposition/methods/rctd/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/rctd"
+      },
+      {
+        "name" : "spatial_decomposition/methods/seurat",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/seurat/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "seurat",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/seurat/config.vsh.yaml",
+          "functionalityNamespace" : "spatial_decomposition/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatial_decomposition/methods/seurat/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/seurat"
+      },
+      {
+        "name" : "spatial_decomposition/methods/stereoscope",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/stereoscope/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "stereoscope",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/stereoscope/config.vsh.yaml",
+          "functionalityNamespace" : "spatial_decomposition/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatial_decomposition/methods/stereoscope/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/stereoscope"
+      },
+      {
+        "name" : "spatial_decomposition/methods/tangram",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/tangram/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "tangram",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/tangram/config.vsh.yaml",
+          "functionalityNamespace" : "spatial_decomposition/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatial_decomposition/methods/tangram/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/tangram"
+      },
+      {
+        "name" : "spatial_decomposition/methods/vanillanmf",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/vanillanmf/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "vanillanmf",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/methods/vanillanmf/config.vsh.yaml",
+          "functionalityNamespace" : "spatial_decomposition/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatial_decomposition/methods/vanillanmf/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/methods/vanillanmf"
+      },
+      {
+        "name" : "spatial_decomposition/metrics/r2",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/metrics/r2/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "r2",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/metrics/r2/config.vsh.yaml",
+          "functionalityNamespace" : "spatial_decomposition/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatial_decomposition/metrics/r2/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/metrics/r2"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatial_decomposition/workflows/run_benchmark/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatial_decomposition/workflows/run_benchmark",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { check_dataset_schema } from "${meta.resources_dir}/../../../../nextflow/common/check_dataset_schema/main.nf"
+include { extract_metadata } from "${meta.resources_dir}/../../../../nextflow/common/extract_metadata/main.nf"
+include { random_proportions } from "${meta.resources_dir}/../../../../nextflow/spatial_decomposition/control_methods/random_proportions/main.nf"
+include { true_proportions } from "${meta.resources_dir}/../../../../nextflow/spatial_decomposition/control_methods/true_proportions/main.nf"
+include { cell2location } from "${meta.resources_dir}/../../../../nextflow/spatial_decomposition/methods/cell2location/main.nf"
+include { destvi } from "${meta.resources_dir}/../../../../nextflow/spatial_decomposition/methods/destvi/main.nf"
+include { nmfreg } from "${meta.resources_dir}/../../../../nextflow/spatial_decomposition/methods/nmfreg/main.nf"
+include { nnls } from "${meta.resources_dir}/../../../../nextflow/spatial_decomposition/methods/nnls/main.nf"
+include { rctd } from "${meta.resources_dir}/../../../../nextflow/spatial_decomposition/methods/rctd/main.nf"
+include { seurat } from "${meta.resources_dir}/../../../../nextflow/spatial_decomposition/methods/seurat/main.nf"
+include { stereoscope } from "${meta.resources_dir}/../../../../nextflow/spatial_decomposition/methods/stereoscope/main.nf"
+include { tangram } from "${meta.resources_dir}/../../../../nextflow/spatial_decomposition/methods/tangram/main.nf"
+include { vanillanmf } from "${meta.resources_dir}/../../../../nextflow/spatial_decomposition/methods/vanillanmf/main.nf"
+include { r2 } from "${meta.resources_dir}/../../../../nextflow/spatial_decomposition/metrics/r2/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // construct list of methods
+  methods = [
+    random_proportions,
+    true_proportions,
+    cell2location,
+    destvi,
+    nmfreg,
+    nnls,
+    rctd,
+    seurat,
+    stereoscope,
+    tangram,
+    vanillanmf
+  ]
+
+  // construct list of metrics
+  metrics = [
+    r2
+  ]
+
+
+  /****************************
+   * EXTRACT DATASET METADATA *
+   ****************************/
+  dataset_ch = input_ch
+    // store join id
+    | map{ id, state -> 
+      [id, state + ["_meta": [join_id: id]]]
+    }
+    
+    // extract the dataset metadata
+    | extract_metadata.run(
+      fromState: [input: "input_solution"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+  /***************************
+   * RUN METHODS AND METRICS *
+   ***************************/
+  score_ch = dataset_ch
+
+    // run all methods
+    | runEach(
+      components: methods,
+
+      // use the 'filter' argument to only run a method on the normalisation the component is asking for
+      filter: { id, state, comp ->
+        def norm = state.dataset_uns.normalization_id
+        def pref = comp.config.functionality.info.preferred_normalization
+        // if the preferred normalisation is none at all,
+        // we can pass whichever dataset we want
+        def norm_check = (norm == "log_cp10k" && pref == "counts") || norm == pref
+        def method_check = !state.method_ids || state.method_ids.contains(comp.config.functionality.name)
+
+        method_check && norm_check
+      },
+      
+      // define a new 'id' by appending the method name to the dataset id
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: { id, state, comp ->
+        def new_args = [
+          input_single_cell: state.input_single_cell, 
+          input_spatial_masked: state.input_spatial_masked
+        ]
+        if (comp.config.functionality.info.type == "control_method") {
+          new_args.input_solution = state.input_solution
+        }
+        new_args
+      },
+
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          method_id: comp.config.functionality.name,
+          method_output: output.output
+        ]
+      }
+    )
+
+    // run all metrics
+    | runEach(
+      components: metrics,
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: { id, state, comp ->
+        [
+          input_solution: state.input_solution,
+          input_method: state.method_output
+        ]
+      },
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          metric_id: comp.config.functionality.name,
+          metric_output: output.output
+        ]
+      }
+    )
+
+  /******************************
+   * GENERATE OUTPUT YAML FILES *
+   ******************************/
+  // TODO: can we store everything below in a separate helper function?
+
+  // extract the dataset metadata
+  dataset_meta_ch = dataset_ch
+    | joinStates { ids, states ->
+      // store the dataset metadata in a file
+      def dataset_uns = states.collect{state ->
+        def uns = state.dataset_uns.clone()
+        uns.remove("normalization_id")
+        uns
+      }
+      def dataset_uns_yaml_blob = toYamlBlob(dataset_uns)
+      def dataset_uns_file = tempFile("dataset_uns.yaml")
+      dataset_uns_file.write(dataset_uns_yaml_blob)
+
+      ["output", [output_dataset_info: dataset_uns_file]]
+    }
+
+  output_ch = score_ch
+
+    // extract the scores
+    | extract_metadata.run(
+      key: "extract_scores",
+      fromState: [input: "metric_output"],
+      toState: { id, output, state ->
+        state + [
+          score_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | joinStates { ids, states ->
+      // store the method configs in a file
+      def method_configs = methods.collect{it.config}
+      def method_configs_yaml_blob = toYamlBlob(method_configs)
+      def method_configs_file = tempFile("method_configs.yaml")
+      method_configs_file.write(method_configs_yaml_blob)
+
+      // store the metric configs in a file
+      def metric_configs = metrics.collect{it.config}
+      def metric_configs_yaml_blob = toYamlBlob(metric_configs)
+      def metric_configs_file = tempFile("metric_configs.yaml")
+      metric_configs_file.write(metric_configs_yaml_blob)
+
+      def task_info_file = meta.resources_dir.resolve("task_info.yaml")
+
+      // store the scores in a file
+      def score_uns = states.collect{it.score_uns}
+      def score_uns_yaml_blob = toYamlBlob(score_uns)
+      def score_uns_file = tempFile("score_uns.yaml")
+      score_uns_file.write(score_uns_yaml_blob)
+
+      def new_state = [
+        output_method_configs: method_configs_file,
+        output_metric_configs: metric_configs_file,
+        output_task_info: task_info_file,
+        output_scores: score_uns_file,
+        _meta: states[0]._meta
+      ]
+
+      ["output", new_state]
+    }
+
+    // merge all of the output data 
+    | mix(dataset_meta_ch)
+    | joinStates{ ids, states ->
+      def mergedStates = states.inject([:]) { acc, m -> acc + m }
+      [ids[0], mergedStates]
+    }
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatial_decomposition/workflows/run_benchmark/nextflow.config b/target/nextflow/spatial_decomposition/workflows/run_benchmark/nextflow.config
new file mode 100644
index 0000000000..ff5c63dc7d
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/workflows/run_benchmark/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatial_decomposition/workflows/run_benchmark'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatial_decomposition/workflows/run_benchmark/task_info.yaml b/target/nextflow/spatial_decomposition/workflows/run_benchmark/task_info.yaml
new file mode 100644
index 0000000000..0fa3e16723
--- /dev/null
+++ b/target/nextflow/spatial_decomposition/workflows/run_benchmark/task_info.yaml
@@ -0,0 +1,23 @@
+name: spatial_decomposition
+label: "Spatial decomposition"
+summary: "Estimation of cell type proportions per spot in 2D space from spatial transcriptomic data coupled with corresponding single-cell data"
+motivation: |
+  Spatial decomposition (also often referred to as Spatial deconvolution) is applicable to spatial transcriptomics data where the transcription profile of each capture location (spot, voxel, bead, etc.) do not share a bijective relationship with the cells in the tissue, i.e., multiple cells may contribute to the same capture location. The task of spatial decomposition then refers to estimating the composition of cell types/states that are present at each capture location. The cell type/states estimates are presented as proportion values, representing the proportion of the cells at each capture location that belong to a given cell type.
+description: |
+  We distinguish between _reference-based_ decomposition and _de novo_ decomposition, where the former leverage external data (e.g., scRNA-seq or scNuc-seq) to guide the inference process, while the latter only work with the spatial data. We require that all datasets have an associated reference single cell data set, but methods are free to ignore this information. 
+  Due to the lack of real datasets with the necessary ground-truth, this task makes use of a simulated dataset generated by creating cell-aggregates by sampling from a Dirichlet distribution. The ground-truth dataset consists of the spatial expression matrix, XY coordinates of the spots, true cell-type proportions for each spot, and the reference single-cell data (from which cell aggregated were simulated).
+authors:
+  - name: "Giovanni Palla"
+    roles: [ author, maintainer ]
+    info: 
+      github: giovp
+  - name: "Scott Gigante"
+    roles: [ author ]
+    info: 
+      github: scottgigante
+      orcid: "0000-0002-4544-2764"
+  - name: "Sai Nirmayi Yasa"
+    roles: [ author ]
+    info: 
+      github: sainirmayi
+      orcid: "0009-0003-6319-9803"
\ No newline at end of file
diff --git a/target/nextflow/spatially_variable_genes/control_methods/random_ranking/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/control_methods/random_ranking/.config.vsh.yaml
new file mode 100644
index 0000000000..8054411ce5
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/control_methods/random_ranking/.config.vsh.yaml
@@ -0,0 +1,247 @@
+functionality:
+  name: "random_ranking"
+  namespace: "spatially_variable_genes/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Anndata with true spatial variability."
+      description: "Anndata with true spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature (e.g., ESEMBL gene id suffixed\
+            \ with alpha value)."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: true
+        - type: "string"
+          name: "orig_feature_name"
+          description: "Original human-readable name for the feature, usually a gene\
+            \ symbol."
+          required: true
+        - type: "double"
+          name: "true_spatial_var_score"
+          description: "True spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: true
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Random Ranking"
+    summary: "Negative control method that randomly rank genes."
+    description: "A negative control method with random ranking of genes.\n"
+    preferred_normalization: "counts"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "Control methods have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "pandas"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/control_methods/random_ranking/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/control_methods/random_ranking"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/control_methods/random_ranking/random_ranking"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/control_methods/random_ranking/main.nf b/target/nextflow/spatially_variable_genes/control_methods/random_ranking/main.nf
new file mode 100644
index 0000000000..132ffd1c4d
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/control_methods/random_ranking/main.nf
@@ -0,0 +1,3656 @@
+// random_ranking 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "random_ranking",
+    "namespace" : "spatially_variable_genes/control_methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_data",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset without spatially variable genes.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "Anndata with true spatial variability.",
+          "description" : "Anndata with true spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature (e.g., ESEMBL gene id suffixed with alpha value).",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "orig_feature_name",
+                "description" : "Original human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "true_spatial_var_score",
+                "description" : "True spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Anndata with estimate spatial variability.",
+          "description" : "Anndata with estimated spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Feature ID",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "Feature name",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "pred_spatial_var_score",
+                "description" : "Predicted spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/control_methods/random_ranking/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "dest" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Random Ranking",
+      "summary" : "Negative control method that randomly rank genes.",
+      "description" : "A negative control method with random ranking of genes.\n",
+      "preferred_normalization" : "counts",
+      "type" : "control_method",
+      "type_info" : {
+        "label" : "Control method",
+        "summary" : "Quality control methods for verifying the pipeline.",
+        "description" : "Control methods have the same interface as the regular methods\nbut also receive the solution object as input. It serves as a\nstarting point to test the relative accuracy of new methods in\nthe task, and also as a quality control for the metrics defined\nin the task.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "ghcr.io/openproblems-bio/base_python:1.0.4",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "pandas"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/control_methods/random_ranking/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/control_methods/random_ranking",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import numpy as np
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Generate predictions', flush=True)
+input_data = ad.read_h5ad(par['input_data'])
+
+df = input_data.var[["feature_id"]]
+
+np.random.seed(0)
+df['pred_spatial_var_score'] = np.random.rand(len(df['feature_id']))
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': input_data.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/control_methods/random_ranking",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/control_methods/random_ranking/nextflow.config b/target/nextflow/spatially_variable_genes/control_methods/random_ranking/nextflow.config
new file mode 100644
index 0000000000..492aa82a77
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/control_methods/random_ranking/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/control_methods/random_ranking'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/control_methods/true_ranking/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/control_methods/true_ranking/.config.vsh.yaml
new file mode 100644
index 0000000000..e05813c572
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/control_methods/true_ranking/.config.vsh.yaml
@@ -0,0 +1,247 @@
+functionality:
+  name: "true_ranking"
+  namespace: "spatially_variable_genes/control_methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Anndata with true spatial variability."
+      description: "Anndata with true spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature (e.g., ESEMBL gene id suffixed\
+            \ with alpha value)."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: true
+        - type: "string"
+          name: "orig_feature_name"
+          description: "Original human-readable name for the feature, usually a gene\
+            \ symbol."
+          required: true
+        - type: "double"
+          name: "true_spatial_var_score"
+          description: "True spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: true
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "True Ranking"
+    summary: "Positive control method that correctly rank genes."
+    description: "A positive control method with correct ranking of genes.\n"
+    preferred_normalization: "counts"
+    type: "control_method"
+    type_info:
+      label: "Control method"
+      summary: "Quality control methods for verifying the pipeline."
+      description: "Control methods have the same interface as the regular methods\n\
+        but also receive the solution object as input. It serves as a\nstarting point\
+        \ to test the relative accuracy of new methods in\nthe task, and also as a\
+        \ quality control for the metrics defined\nin the task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "pandas"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/control_methods/true_ranking/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/control_methods/true_ranking"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/control_methods/true_ranking/true_ranking"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/control_methods/true_ranking/main.nf b/target/nextflow/spatially_variable_genes/control_methods/true_ranking/main.nf
new file mode 100644
index 0000000000..39fc591e5a
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/control_methods/true_ranking/main.nf
@@ -0,0 +1,3653 @@
+// true_ranking 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "true_ranking",
+    "namespace" : "spatially_variable_genes/control_methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_data",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset without spatially variable genes.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "Anndata with true spatial variability.",
+          "description" : "Anndata with true spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature (e.g., ESEMBL gene id suffixed with alpha value).",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "orig_feature_name",
+                "description" : "Original human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "true_spatial_var_score",
+                "description" : "True spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Anndata with estimate spatial variability.",
+          "description" : "Anndata with estimated spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Feature ID",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "Feature name",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "pred_spatial_var_score",
+                "description" : "Predicted spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/control_methods/true_ranking/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "dest" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "True Ranking",
+      "summary" : "Positive control method that correctly rank genes.",
+      "description" : "A positive control method with correct ranking of genes.\n",
+      "preferred_normalization" : "counts",
+      "type" : "control_method",
+      "type_info" : {
+        "label" : "Control method",
+        "summary" : "Quality control methods for verifying the pipeline.",
+        "description" : "Control methods have the same interface as the regular methods\nbut also receive the solution object as input. It serves as a\nstarting point to test the relative accuracy of new methods in\nthe task, and also as a quality control for the metrics defined\nin the task.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "ghcr.io/openproblems-bio/base_python:1.0.4",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "pandas"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/control_methods/true_ranking/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/control_methods/true_ranking",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Generate predictions', flush=True)
+input_solution = ad.read_h5ad(par['input_solution'])
+
+df = input_solution.var[["feature_id", "true_spatial_var_score"]]
+df.rename(columns={'true_spatial_var_score': 'pred_spatial_var_score'}, inplace=True)
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': input_solution.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/control_methods/true_ranking",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/control_methods/true_ranking/nextflow.config b/target/nextflow/spatially_variable_genes/control_methods/true_ranking/nextflow.config
new file mode 100644
index 0000000000..03c5318120
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/control_methods/true_ranking/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/control_methods/true_ranking'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/methods/boostgp/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/methods/boostgp/.config.vsh.yaml
new file mode 100644
index 0000000000..9d3af24675
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/boostgp/.config.vsh.yaml
@@ -0,0 +1,209 @@
+functionality:
+  name: "boostgp"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_iter"
+    description: "Number of iterations."
+    info:
+      test_default: 7
+    default:
+    - 10
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "BOOST-GP"
+    summary: "Bayesian modeling of spatial molecular profiling data via Gaussian process"
+    description: "BOOST-GP a novel Bayesian hierarchical model to analyze spatial\
+      \ transcriptomics data, \nwith several unique characteristics. It models the\
+      \ zero-inflated and over-dispersed \ncounts by deploying a zero-inflated negative\
+      \ binomial model that greatly increases \nmodel stability and robustness. Besides,\
+      \ the Bayesian inference framework allows us \nto borrow strength in parameter\
+      \ estimation in a de novo fashion. As a result, \nthe proposed model shows competitive\
+      \ performances in accuracy and robustness \nover existing methods in both simulation\
+      \ studies and two real data applications.\n"
+    preferred_normalization: "counts"
+    reference: "li2021bayesian"
+    documentation_url: "https://github.com/Minzhe/BOOST-GP"
+    repository_url: "https://github.com/Minzhe/BOOST-GP"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_r:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone https://github.com/Minzhe/BOOST-GP.git /opt/BOOST-GP\n"
+  - type: "r"
+    cran:
+    - "RcppDist"
+    - "ggplot2"
+    - "anndata"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "veryhightime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/boostgp/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/boostgp"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/boostgp/boostgp"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/methods/boostgp/main.nf b/target/nextflow/spatially_variable_genes/methods/boostgp/main.nf
new file mode 100644
index 0000000000..51a063efd1
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/boostgp/main.nf
@@ -0,0 +1,3628 @@
+// boostgp 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "boostgp",
+    "namespace" : "spatially_variable_genes/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_data",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset without spatially variable genes.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Anndata with estimate spatial variability.",
+          "description" : "Anndata with estimated spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Feature ID",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "Feature name",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "pred_spatial_var_score",
+                "description" : "Predicted spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_iter",
+        "description" : "Number of iterations.",
+        "info" : {
+          "test_default" : 7
+        },
+        "default" : [
+          10
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/boostgp/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "dest" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "BOOST-GP",
+      "summary" : "Bayesian modeling of spatial molecular profiling data via Gaussian process",
+      "description" : "BOOST-GP a novel Bayesian hierarchical model to analyze spatial transcriptomics data, \nwith several unique characteristics. It models the zero-inflated and over-dispersed \ncounts by deploying a zero-inflated negative binomial model that greatly increases \nmodel stability and robustness. Besides, the Bayesian inference framework allows us \nto borrow strength in parameter estimation in a de novo fashion. As a result, \nthe proposed model shows competitive performances in accuracy and robustness \nover existing methods in both simulation studies and two real data applications.\n",
+      "preferred_normalization" : "counts",
+      "reference" : "li2021bayesian",
+      "documentation_url" : "https://github.com/Minzhe/BOOST-GP",
+      "repository_url" : "https://github.com/Minzhe/BOOST-GP",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatially variable gene identification method.",
+        "description" : "Method to identify spatially variable genes"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "ghcr.io/openproblems-bio/base_r:1.0.4",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "apt",
+          "packages" : [
+            "git"
+          ],
+          "interactive" : false
+        },
+        {
+          "type" : "docker",
+          "run" : [
+            "git clone https://github.com/Minzhe/BOOST-GP.git /opt/BOOST-GP\n"
+          ]
+        },
+        {
+          "type" : "r",
+          "cran" : [
+            "RcppDist",
+            "ggplot2",
+            "anndata"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "veryhightime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/boostgp/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/boostgp",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+library(RcppDist)
+library(anndata)
+
+dest <- getwd()
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_data" = $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_DATA" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "n_iter" = $( if [ ! -z ${VIASH_PAR_N_ITER+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_N_ITER" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+# VIASH END
+
+cat("Load data\\\\n")
+adata <- anndata::read_h5ad(par\\$input_data)
+
+setwd("/opt/BOOST-GP")
+source("./R/boost.gp.R")
+
+counts <- as.matrix(adata\\$layers[["counts"]])
+colnames(counts) <- adata\\$var_names
+rownames(counts) <- adata\\$obs_names
+mode(counts) <- "integer"
+
+loc <- as.data.frame(adata\\$obsm[["spatial"]])
+rownames(loc) <- adata\\$obs_names
+colnames(loc) <- c("x", "y")
+
+cat("Run BOOST-GP\\\\n")
+df <- as.data.frame(boost.gp(Y = counts, loc = loc, iter = par\\$n_iter, burn = 5))
+
+df\\$feature_id <- rownames(df)
+df <- subset(df, select = c("feature_id", "PPI"))
+colnames(df) <- c("feature_id", "pred_spatial_var_score")
+
+# save output
+cat("Write output AnnData to file\\\\n")
+output <- anndata::AnnData(
+    shape = adata\\$shape,
+    var = df,
+    uns = list(
+        "dataset_id" = adata\\$uns[["dataset_id"]],
+        "method_id" = meta[["functionality_name"]]
+    )
+)
+
+zzz <- output\\$write_h5ad(paste0(dest, "/", par\\$output), compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/methods/boostgp",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "veryhightime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/methods/boostgp/nextflow.config b/target/nextflow/spatially_variable_genes/methods/boostgp/nextflow.config
new file mode 100644
index 0000000000..1f48502d61
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/boostgp/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/methods/boostgp'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/methods/gpcounts/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/methods/gpcounts/.config.vsh.yaml
new file mode 100644
index 0000000000..3fc391c12b
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/gpcounts/.config.vsh.yaml
@@ -0,0 +1,218 @@
+functionality:
+  name: "gpcounts"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--n_features"
+    description: "Number of features to include."
+    info:
+      test_default: 120
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "GPcounts"
+    summary: "GPcounts is non-parametric modelling of temporal and spatial counts\
+      \ data from RNA-seq experiments."
+    description: "The GPcounts package implements GP regression methods for modelling\
+      \ counts data using a \nnegative binomial likelihood function. Computational\
+      \ efficiency is achieved through the use of \nvariational Bayesian inference.\
+      \ The GP function models changes in the mean of the negative binomial \nlikelihood\
+      \ through a logarithmic link function and the dispersion parameter is fitted\
+      \ by maximum \nlikelihood. We validate the method on simulated time course data,\
+      \ showing better performance to identify \nchanges in over-dispersed counts\
+      \ data than methods based on Gaussian or Poisson likelihoods. \n"
+    preferred_normalization: "counts"
+    reference: "bintayyash2021non"
+    documentation_url: "https://github.com/ManchesterBioinference/GPcounts/blob/master/demo_notebooks/GPcounts_spatial.ipynb"
+    repository_url: "https://github.com/ManchesterBioinference/GPcounts"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "openproblems/base_tensorflow_nvidia:1.0.0"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    interactive: false
+  - type: "python"
+    user: false
+    packages:
+    - "tensorflow-probability"
+    - "tensorflow[and-cuda]"
+    - "gpflow"
+    - "scipy==1.9.1"
+    upgrade: true
+  - type: "docker"
+    run:
+    - "git clone https://github.com/markvdw/RobustGP.git /opt/RobustGP && \\\ngit\
+      \ clone https://github.com/lzj1769/GPcounts.git /opt/GPcounts\n"
+  - type: "python"
+    user: false
+    packages:
+    - "/opt/RobustGP"
+    - "/opt/GPcounts"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "veryhightime"
+    - "midmem"
+    - "midcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/gpcounts/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/gpcounts"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/gpcounts/gpcounts"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/methods/gpcounts/main.nf b/target/nextflow/spatially_variable_genes/methods/gpcounts/main.nf
new file mode 100644
index 0000000000..88d3ee1cdf
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/gpcounts/main.nf
@@ -0,0 +1,3672 @@
+// gpcounts 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "gpcounts",
+    "namespace" : "spatially_variable_genes/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_data",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset without spatially variable genes.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Anndata with estimate spatial variability.",
+          "description" : "Anndata with estimated spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Feature ID",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "Feature name",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "pred_spatial_var_score",
+                "description" : "Predicted spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--n_features",
+        "description" : "Number of features to include.",
+        "info" : {
+          "test_default" : 120
+        },
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/gpcounts/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "dest" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "GPcounts",
+      "summary" : "GPcounts is non-parametric modelling of temporal and spatial counts data from RNA-seq experiments.",
+      "description" : "The GPcounts package implements GP regression methods for modelling counts data using a \nnegative binomial likelihood function. Computational efficiency is achieved through the use of \nvariational Bayesian inference. The GP function models changes in the mean of the negative binomial \nlikelihood through a logarithmic link function and the dispersion parameter is fitted by maximum \nlikelihood. We validate the method on simulated time course data, showing better performance to identify \nchanges in over-dispersed counts data than methods based on Gaussian or Poisson likelihoods. \n",
+      "preferred_normalization" : "counts",
+      "reference" : "bintayyash2021non",
+      "documentation_url" : "https://github.com/ManchesterBioinference/GPcounts/blob/master/demo_notebooks/GPcounts_spatial.ipynb",
+      "repository_url" : "https://github.com/ManchesterBioinference/GPcounts",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatially variable gene identification method.",
+        "description" : "Method to identify spatially variable genes"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "openproblems/base_tensorflow_nvidia:1.0.0",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "apt",
+          "packages" : [
+            "git"
+          ],
+          "interactive" : false
+        },
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "tensorflow-probability",
+            "tensorflow[and-cuda]",
+            "gpflow",
+            "scipy==1.9.1"
+          ],
+          "upgrade" : true
+        },
+        {
+          "type" : "docker",
+          "run" : [
+            "git clone https://github.com/markvdw/RobustGP.git /opt/RobustGP && \\\\\ngit clone https://github.com/lzj1769/GPcounts.git /opt/GPcounts\n"
+          ]
+        },
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "/opt/RobustGP",
+            "/opt/GPcounts"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "veryhightime",
+          "midmem",
+          "midcpu",
+          "gpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/gpcounts/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/gpcounts",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import statsmodels.api as sm
+import statsmodels.formula.api as smf
+import pandas as pd
+import anndata as ad
+import scipy
+from GPcounts.RNA_seq_GP import rna_seq_gp
+import warnings
+warnings.filterwarnings('ignore')
+
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'n_features': $( if [ ! -z ${VIASH_PAR_N_FEATURES+x} ]; then echo "int(r'${VIASH_PAR_N_FEATURES//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run GPcounts')
+
+# Subset if required
+if par['n_features']:
+    adata = adata[:, :par['n_features']]
+
+counts = adata.layers["counts"]
+if scipy.sparse.issparse(counts): 
+    counts = counts.todense()
+
+# spatialx = [str(i) for i in adata.obsm['spatial'][:, 0]]
+# spatialy = [str(i) for i in adata.obsm['spatial'][:, 1]]
+
+# index_names = [i+'x'+j for i, j in zip(spatialx, spatialy)]
+# Y = pd.DataFrame(data=counts, index=index_names, columns=adata.var.index)
+
+# spatial_locations = pd.DataFrame(index=Y.index)
+# spatial_locations['x'] = Y.index.str.split('x').str.get(0).map(float)
+# spatial_locations['y'] = Y.index.str.split('x').str.get(1).map(float)
+
+# spatial_locations['total_counts'] = Y.sum(1)
+
+Y = pd.DataFrame(data=counts, 
+                index=adata.obs_names, 
+                columns=adata.var_names)
+
+spatial_locations = pd.DataFrame(data=adata.obsm['spatial'], 
+                                index=adata.obs_names, 
+                                columns=['x', 'y'])
+spatial_locations['total_counts'] = Y.sum(1)
+
+Y = Y.loc[spatial_locations.index]
+X = spatial_locations[['x', 'y']]
+
+scales = []
+for i in range(0, len(Y.columns)):
+    model = smf.glm(formula="Y.iloc[:,i]~0+spatial_locations['total_counts']", data=Y,
+                    family=sm.families.NegativeBinomial(sm.families.links.identity())).fit()
+    res = model.params[0]*spatial_locations['total_counts']
+    scales.append(res)
+scalesdf = pd.DataFrame(scales)
+scalesdf = scalesdf.T
+
+Y = Y.T
+X = X[['x', 'y']]
+
+sparse = True
+nb_scaled = True  # set the nb_scaled argument to True to pass the scale factors
+gene_name = Y.index
+likelihood = 'Negative_binomial'
+gp_counts = rna_seq_gp(
+    X, Y.loc[gene_name], sparse=sparse, M=250, scale=scalesdf, safe_mode=False)
+
+log_likelihood_ratio = gp_counts.One_sample_test(likelihood)
+
+df = gp_counts.calculate_FDR(log_likelihood_ratio)
+
+# save results
+df = df.loc[adata.var_names][['log_likelihood_ratio']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/methods/gpcounts",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "veryhightime",
+    "midmem",
+    "midcpu",
+    "gpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/methods/gpcounts/nextflow.config b/target/nextflow/spatially_variable_genes/methods/gpcounts/nextflow.config
new file mode 100644
index 0000000000..7cc5fe528f
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/gpcounts/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/methods/gpcounts'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/methods/moran_i/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/methods/moran_i/.config.vsh.yaml
new file mode 100644
index 0000000000..e6b15914e4
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/moran_i/.config.vsh.yaml
@@ -0,0 +1,201 @@
+functionality:
+  name: "moran_i"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--coord_type_moran_i"
+    description: "Type of coordinate system. Valid options are \"grid\" for grid coordinates\
+      \ or \"generic\" for generic coordinates."
+    info: null
+    default:
+    - "generic"
+    required: false
+    choices:
+    - "grid"
+    - "generic"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Moran's I"
+    summary: "Moran's I is a measurement of spatial autocorrelation."
+    description: "The MoranI global spatial auto-correlation statistics evaluates\
+      \ whether features (i.e. genes) \nshows a pattern that is clustered, dispersed\
+      \ or random in the tissue are under consideration.\n"
+    preferred_normalization: "counts"
+    reference: "palla2022squidpy"
+    documentation_url: "https://squidpy.readthedocs.io/en/stable/api/squidpy.gr.spatial_autocorr.html"
+    repository_url: "https://github.com/scverse/squidpy"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "pandas"
+    - "squidpy==1.4.1"
+    - "matplotlib==3.8.3"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/moran_i/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/moran_i"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/moran_i/moran_i"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/methods/moran_i/main.nf b/target/nextflow/spatially_variable_genes/methods/moran_i/main.nf
new file mode 100644
index 0000000000..e5cd603b1a
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/moran_i/main.nf
@@ -0,0 +1,3602 @@
+// moran_i 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "moran_i",
+    "namespace" : "spatially_variable_genes/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_data",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset without spatially variable genes.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Anndata with estimate spatial variability.",
+          "description" : "Anndata with estimated spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Feature ID",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "Feature name",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "pred_spatial_var_score",
+                "description" : "Predicted spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--coord_type_moran_i",
+        "description" : "Type of coordinate system. Valid options are \\"grid\\" for grid coordinates or \\"generic\\" for generic coordinates.",
+        "default" : [
+          "generic"
+        ],
+        "required" : false,
+        "choices" : [
+          "grid",
+          "generic"
+        ],
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/moran_i/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "dest" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Moran's I",
+      "summary" : "Moran's I is a measurement of spatial autocorrelation.",
+      "description" : "The MoranI global spatial auto-correlation statistics evaluates whether features (i.e. genes) \nshows a pattern that is clustered, dispersed or random in the tissue are under consideration.\n",
+      "preferred_normalization" : "counts",
+      "reference" : "palla2022squidpy",
+      "documentation_url" : "https://squidpy.readthedocs.io/en/stable/api/squidpy.gr.spatial_autocorr.html",
+      "repository_url" : "https://github.com/scverse/squidpy",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatially variable gene identification method.",
+        "description" : "Method to identify spatially variable genes"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "ghcr.io/openproblems-bio/base_python:1.0.4",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "pandas",
+            "squidpy==1.4.1",
+            "matplotlib==3.8.3"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/moran_i/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/moran_i",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import warnings
+warnings.filterwarnings('ignore')
+
+import anndata as ad
+import squidpy as sq
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'coord_type_moran_i': $( if [ ! -z ${VIASH_PAR_COORD_TYPE_MORAN_I+x} ]; then echo "r'${VIASH_PAR_COORD_TYPE_MORAN_I//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run moranI', flush=True)
+sq.gr.spatial_neighbors(adata,
+                        coord_type=par['coord_type_moran_i'],
+                        delaunay=True)
+
+sq.gr.spatial_autocorr(adata,
+                       mode="moran",
+                       layer='normalized',
+                       n_perms=100,
+                       genes=adata.var_names)
+
+# save results
+df = adata.uns["moranI"]
+df = df.loc[adata.var_names][['I']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/methods/moran_i",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/methods/moran_i/nextflow.config b/target/nextflow/spatially_variable_genes/methods/moran_i/nextflow.config
new file mode 100644
index 0000000000..f3085163a9
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/moran_i/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/methods/moran_i'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/methods/nnsvg/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/methods/nnsvg/.config.vsh.yaml
new file mode 100644
index 0000000000..1a364f6ed5
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/nnsvg/.config.vsh.yaml
@@ -0,0 +1,190 @@
+functionality:
+  name: "nnsvg"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "nnSVG"
+    summary: "nnSVG is based on nearest-neighbor Gaussian process (NNGP) models to\
+      \ estimate parameters in GPs"
+    description: "nnSVG identifies genes that vary in expression continuously across\
+      \ the entire tissue or within a priori defined \nspatial domains. It uses gene-specific\
+      \ estimates of length scale parameters within the Gaussian process models, \n\
+      and scales linearly with the number of spatial locations.\n"
+    preferred_normalization: "counts"
+    reference: "weber2023nnsvg"
+    documentation_url: "https://bioconductor.org/packages/release/bioc/vignettes/nnSVG/inst/doc/nnSVG.html"
+    repository_url: "https://github.com/lmweber/nnSVG"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_r:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    cran:
+    - "anndata"
+    - "dplyr"
+    bioc:
+    - "SpatialExperiment"
+    - "scran"
+    - "nnSVG"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/nnsvg/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/nnsvg"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/nnsvg/nnsvg"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/methods/nnsvg/main.nf b/target/nextflow/spatially_variable_genes/methods/nnsvg/main.nf
new file mode 100644
index 0000000000..e805d44fcb
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/nnsvg/main.nf
@@ -0,0 +1,3622 @@
+// nnsvg 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "nnsvg",
+    "namespace" : "spatially_variable_genes/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_data",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset without spatially variable genes.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Anndata with estimate spatial variability.",
+          "description" : "Anndata with estimated spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Feature ID",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "Feature name",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "pred_spatial_var_score",
+                "description" : "Predicted spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/nnsvg/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "dest" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "nnSVG",
+      "summary" : "nnSVG is based on nearest-neighbor Gaussian process (NNGP) models to estimate parameters in GPs",
+      "description" : "nnSVG identifies genes that vary in expression continuously across the entire tissue or within a priori defined \nspatial domains. It uses gene-specific estimates of length scale parameters within the Gaussian process models, \nand scales linearly with the number of spatial locations.\n",
+      "preferred_normalization" : "counts",
+      "reference" : "weber2023nnsvg",
+      "documentation_url" : "https://bioconductor.org/packages/release/bioc/vignettes/nnSVG/inst/doc/nnSVG.html",
+      "repository_url" : "https://github.com/lmweber/nnSVG",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatially variable gene identification method.",
+        "description" : "Method to identify spatially variable genes"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "ghcr.io/openproblems-bio/base_r:1.0.4",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "cran" : [
+            "anndata",
+            "dplyr"
+          ],
+          "bioc" : [
+            "SpatialExperiment",
+            "scran",
+            "nnSVG"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/nnsvg/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/nnsvg",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+suppressMessages(library(SpatialExperiment))
+suppressMessages(library(scran))
+suppressMessages(library(nnSVG))
+suppressMessages(library(anndata))
+suppressMessages(library(dplyr))
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_data" = $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_DATA" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+# VIASH END
+
+# load data
+cat('Load data')
+adata <- read_h5ad(par\\$input_data)
+counts <- t(as.matrix(adata\\$layers[['counts']]))
+    
+colnames(counts) <- adata\\$obs_names
+rownames(counts) <- adata\\$var_names
+    
+loc <- as.data.frame(adata\\$obsm[['spatial']])
+
+row_data = adata\\$var
+row_data\\$gene_id = rownames(row_data)
+row_data\\$feature_type = "Gene Expression"
+
+colnames(loc) <- c("x", "y")
+rownames(loc) <- colnames(counts)
+
+spe <- SpatialExperiment(
+    assays = list(counts = counts),
+    rowData = row_data,
+    colData = loc, 
+    spatialCoordsNames = c("x", "y"))
+
+# calculate logcounts (log-transformed normalized counts) using scran package
+# using library size factors
+spe <- computeLibraryFactors(spe)
+spe <- logNormCounts(spe)
+
+# run nnSVG
+if (!is.null(meta\\$cpus)) {
+    n_cpus <- meta\\$cpus
+} else {
+    n_cpus <- 1
+}
+
+cat('Run nnSVG')
+spe <- nnSVG(spe, n_threads=n_cpus)
+
+df <- as.data.frame(rowData(spe)) %>%
+    subset(select = c('feature_id', 'LR_stat'))
+
+colnames(df) <- c('feature_id', 'pred_spatial_var_score')
+rownames(df) <- NULL
+
+# save output
+cat("Write output AnnData to file\\\\n")
+output = anndata::AnnData(
+    shape = adata\\$shape, 
+    var=df,
+    uns=list('dataset_id' = adata\\$uns[['dataset_id']],
+             'method_id' =  meta[['functionality_name']]))
+
+anndata::write_h5ad(anndata = output, filename = par\\$output)
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/methods/nnsvg",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/methods/nnsvg/nextflow.config b/target/nextflow/spatially_variable_genes/methods/nnsvg/nextflow.config
new file mode 100644
index 0000000000..1929ca731b
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/nnsvg/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/methods/nnsvg'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/methods/scgco/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/methods/scgco/.config.vsh.yaml
new file mode 100644
index 0000000000..a1e51fd848
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/scgco/.config.vsh.yaml
@@ -0,0 +1,224 @@
+functionality:
+  name: "scgco"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "scGCO"
+    summary: "Identification of spatially variable genes with graph cuts."
+    description: "Single-cell gene expression data with positional information is\
+      \ critical to dissect \nmechanisms and architectures of multicellular organisms,\
+      \ but the potential is limited \nby the scalability of current data analysis\
+      \ strategies. Here, we present scGCO, \na method based on fast optimization\
+      \ of hidden Markov Random Fields with graph cuts \nto identify spatially variable\
+      \ genes. Comparing to existing methods, scGCO delivers \na superior performance\
+      \ with lower false positive rate and improved specificity, \nwhile demonstrates\
+      \ a more robust performance in the presence of noises. \nCritically, scGCO scales\
+      \ near linearly with inputs and demonstrates orders of \nmagnitude better running\
+      \ time and memory requirement than existing methods, \nand could represent a\
+      \ valuable solution when spatial transcriptomics data grows \ninto millions\
+      \ of data points and beyond..\n"
+    preferred_normalization: "counts"
+    reference: "zhang2022identification"
+    documentation_url: "https://github.com/WangPeng-Lab/scGCO/blob/master/code/Tutorial/scGCO_tutorial.ipynb"
+    repository_url: "https://github.com/WangPeng-Lab/scGCO"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.9.16"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    - "procps"
+    - "libhdf5-dev"
+    - "cmake"
+    - "gdal-bin"
+    - "libgdal-dev"
+    interactive: false
+  - type: "docker"
+    run:
+    - "pip install Cython==0.29.33 numpy==1.23.5 scipy==1.9.1\n"
+  - type: "docker"
+    run:
+    - "git clone https://github.com/lzj1769/scGCO_simple.git /opt/scGCO/scGCO_simple\n"
+  - type: "python"
+    user: false
+    packages:
+    - "h5py==3.8.0"
+    - "pandas==1.5.3"
+    - "parmap==1.6.0"
+    - "scanpy==1.9.3"
+    - "tqdm==4.65.0"
+    - "anndata==0.8.0"
+    - "matplotlib==3.7.1"
+    - "scikit-learn==1.2.2"
+    - "hdbscan"
+    - "seaborn==0.12.2"
+    - "pysal==2.0.0"
+    - "pygco==0.0.16"
+    - "shapely==2.0.1"
+    - "networkx==2.5"
+    - "scikit-image"
+    - "pyyaml"
+    - "requests"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/scgco/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/scgco"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/scgco/scgco"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/methods/scgco/main.nf b/target/nextflow/spatially_variable_genes/methods/scgco/main.nf
new file mode 100644
index 0000000000..bea14f4c93
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/scgco/main.nf
@@ -0,0 +1,3643 @@
+// scgco 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "scgco",
+    "namespace" : "spatially_variable_genes/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_data",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset without spatially variable genes.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Anndata with estimate spatial variability.",
+          "description" : "Anndata with estimated spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Feature ID",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "Feature name",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "pred_spatial_var_score",
+                "description" : "Predicted spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/scgco/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "dest" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "scGCO",
+      "summary" : "Identification of spatially variable genes with graph cuts.",
+      "description" : "Single-cell gene expression data with positional information is critical to dissect \nmechanisms and architectures of multicellular organisms, but the potential is limited \nby the scalability of current data analysis strategies. Here, we present scGCO, \na method based on fast optimization of hidden Markov Random Fields with graph cuts \nto identify spatially variable genes. Comparing to existing methods, scGCO delivers \na superior performance with lower false positive rate and improved specificity, \nwhile demonstrates a more robust performance in the presence of noises. \nCritically, scGCO scales near linearly with inputs and demonstrates orders of \nmagnitude better running time and memory requirement than existing methods, \nand could represent a valuable solution when spatial transcriptomics data grows \ninto millions of data points and beyond..\n",
+      "preferred_normalization" : "counts",
+      "reference" : "zhang2022identification",
+      "documentation_url" : "https://github.com/WangPeng-Lab/scGCO/blob/master/code/Tutorial/scGCO_tutorial.ipynb",
+      "repository_url" : "https://github.com/WangPeng-Lab/scGCO",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatially variable gene identification method.",
+        "description" : "Method to identify spatially variable genes"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "python:3.9.16",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "apt",
+          "packages" : [
+            "git",
+            "procps",
+            "libhdf5-dev",
+            "cmake",
+            "gdal-bin",
+            "libgdal-dev"
+          ],
+          "interactive" : false
+        },
+        {
+          "type" : "docker",
+          "run" : [
+            "pip install Cython==0.29.33 numpy==1.23.5 scipy==1.9.1\n"
+          ]
+        },
+        {
+          "type" : "docker",
+          "run" : [
+            "git clone https://github.com/lzj1769/scGCO_simple.git /opt/scGCO/scGCO_simple\n"
+          ]
+        },
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "h5py==3.8.0",
+            "pandas==1.5.3",
+            "parmap==1.6.0",
+            "scanpy==1.9.3",
+            "tqdm==4.65.0",
+            "anndata==0.8.0",
+            "matplotlib==3.7.1",
+            "scikit-learn==1.2.2",
+            "hdbscan",
+            "seaborn==0.12.2",
+            "pysal==2.0.0",
+            "pygco==0.0.16",
+            "shapely==2.0.1",
+            "networkx==2.5",
+            "scikit-image",
+            "pyyaml",
+            "requests"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/scgco/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/scgco",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import warnings
+warnings.filterwarnings('ignore')
+
+import pandas as pd
+import anndata as ad
+import numpy as np
+import scipy
+import sys
+sys.path.append("/opt/scGCO")
+
+from scGCO_simple import *
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+counts = adata.layers["counts"]
+if scipy.sparse.issparse(counts): 
+    counts = counts.todense()
+
+data = pd.DataFrame(
+    counts,
+    columns=adata.var_names,
+    index=adata.obs_names
+)
+
+print('Run scGCO', flush=True)
+data_norm = normalize_count_cellranger(data)
+
+exp = data.iloc[:, 0]
+locs = adata.obsm['spatial'].copy()
+
+print('Create graph with weight', flush=True)
+cellGraph = create_graph_with_weight(locs, exp)
+gmmDict = gmm_model(data_norm)
+
+print('Identify spatial genes', flush=True)
+df = identify_spatial_genes(locs, data_norm, cellGraph, gmmDict)
+
+# save results
+df = df.loc[adata.var_names][['fdr']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+# Transform the values via -log10 to make sure a bigger score represents a 
+# higher spatial variation
+df['pred_spatial_var_score'] = -np.log10(df['pred_spatial_var_score'].tolist())
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/methods/scgco",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/methods/scgco/nextflow.config b/target/nextflow/spatially_variable_genes/methods/scgco/nextflow.config
new file mode 100644
index 0000000000..6f361e4573
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/scgco/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/methods/scgco'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/methods/sepal/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/methods/sepal/.config.vsh.yaml
new file mode 100644
index 0000000000..6326707ec7
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/sepal/.config.vsh.yaml
@@ -0,0 +1,216 @@
+functionality:
+  name: "sepal"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--max_neighs_sepal"
+    description: "Maximum number of neighbors of a node in the spatial graph. Valid\
+      \ options are 4 (square-grid) and 6 (hexagonal-grid)."
+    info: null
+    default:
+    - 6
+    required: false
+    choices:
+    - 4
+    - 6
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--coord_type_sepal"
+    description: "Type of coordinate system. Valid options are \"grid\" for grid coordinates\
+      \ or \"generic\" for generic coordinates."
+    info: null
+    default:
+    - "grid"
+    required: false
+    choices:
+    - "grid"
+    - "generic"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Sepal"
+    summary: "Sepal simulates diffusion of individual transcripts to extract genes\
+      \ with spatial patterns."
+    description: "This method assesses the degree of randomness exhibited by each\
+      \ transcript profile and rank them accordingly.\n"
+    preferred_normalization: "counts"
+    reference: "andersson2021sepal"
+    documentation_url: "https://squidpy.readthedocs.io/en/stable/api/squidpy.gr.sepal.html"
+    repository_url: "https://github.com/scverse/squidpy"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "pandas"
+    - "squidpy==1.4.1"
+    - "matplotlib==3.8.3"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/sepal/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/sepal"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/sepal/sepal"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/methods/sepal/main.nf b/target/nextflow/spatially_variable_genes/methods/sepal/main.nf
new file mode 100644
index 0000000000..ff2f44959d
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/sepal/main.nf
@@ -0,0 +1,3616 @@
+// sepal 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "sepal",
+    "namespace" : "spatially_variable_genes/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_data",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset without spatially variable genes.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Anndata with estimate spatial variability.",
+          "description" : "Anndata with estimated spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Feature ID",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "Feature name",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "pred_spatial_var_score",
+                "description" : "Predicted spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--max_neighs_sepal",
+        "description" : "Maximum number of neighbors of a node in the spatial graph. Valid options are 4 (square-grid) and 6 (hexagonal-grid).",
+        "default" : [
+          6
+        ],
+        "required" : false,
+        "choices" : [
+          4,
+          6
+        ],
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--coord_type_sepal",
+        "description" : "Type of coordinate system. Valid options are \\"grid\\" for grid coordinates or \\"generic\\" for generic coordinates.",
+        "default" : [
+          "grid"
+        ],
+        "required" : false,
+        "choices" : [
+          "grid",
+          "generic"
+        ],
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/sepal/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "dest" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Sepal",
+      "summary" : "Sepal simulates diffusion of individual transcripts to extract genes with spatial patterns.",
+      "description" : "This method assesses the degree of randomness exhibited by each transcript profile and rank them accordingly.\n",
+      "preferred_normalization" : "counts",
+      "reference" : "andersson2021sepal",
+      "documentation_url" : "https://squidpy.readthedocs.io/en/stable/api/squidpy.gr.sepal.html",
+      "repository_url" : "https://github.com/scverse/squidpy",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatially variable gene identification method.",
+        "description" : "Method to identify spatially variable genes"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "ghcr.io/openproblems-bio/base_python:1.0.4",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "pandas",
+            "squidpy==1.4.1",
+            "matplotlib==3.8.3"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/sepal/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/sepal",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import squidpy as sq
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'max_neighs_sepal': $( if [ ! -z ${VIASH_PAR_MAX_NEIGHS_SEPAL+x} ]; then echo "int(r'${VIASH_PAR_MAX_NEIGHS_SEPAL//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'coord_type_sepal': $( if [ ! -z ${VIASH_PAR_COORD_TYPE_SEPAL+x} ]; then echo "r'${VIASH_PAR_COORD_TYPE_SEPAL//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Generate predictions', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+sq.gr.spatial_neighbors(adata,
+                        coord_type=par['coord_type_sepal'],
+                        delaunay=False)
+
+sq.gr.sepal(adata, 
+            layer='normalized',
+            max_neighs=par['max_neighs_sepal'], 
+            genes=adata.var_names,
+            n_jobs=1)
+
+# save results
+df = adata.uns["sepal_score"]
+df = df.loc[adata.var_names][['sepal_score']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/methods/sepal",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/methods/sepal/nextflow.config b/target/nextflow/spatially_variable_genes/methods/sepal/nextflow.config
new file mode 100644
index 0000000000..bc15cb3bf0
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/sepal/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/methods/sepal'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/methods/somde/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/methods/somde/.config.vsh.yaml
new file mode 100644
index 0000000000..a8c2c315d4
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/somde/.config.vsh.yaml
@@ -0,0 +1,192 @@
+functionality:
+  name: "somde"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SOMDE"
+    summary: "SOMDE is a scalable method for identifying spatially variable genes\
+      \ with self-organizing map."
+    description: "SOMDE uses self-organizing map to cluster neighboring cells into\
+      \ nodes, and then uses a Gaussian process \nto fit the node-level spatial gene\
+      \ expression to identify SVgenes. Experiments show that SOMDE is about \n5 to\
+      \ 50 times faster than existing methods with comparable results. \nThe adjustable\
+      \ resolution of SOMDE makes it the only method that can give results in about\
+      \ \n5 min in large datasets of more than 20 000 sequencing sites.\n"
+    preferred_normalization: "counts"
+    reference: "hao2021somde"
+    documentation_url: "https://github.com/WhirlFirst/somde/blob/master/slide_seq0819_11_SOM.ipynb"
+    repository_url: "https://github.com/XuegongLab/somde"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "somde"
+    - "scanpy==1.9.8"
+    - "pandas==2.2.1"
+    - "numpy==1.26.4"
+    - "scipy==1.11.4"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/somde/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/somde"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/somde/somde"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/methods/somde/main.nf b/target/nextflow/spatially_variable_genes/methods/somde/main.nf
new file mode 100644
index 0000000000..6cc1d91272
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/somde/main.nf
@@ -0,0 +1,3597 @@
+// somde 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "somde",
+    "namespace" : "spatially_variable_genes/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_data",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset without spatially variable genes.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Anndata with estimate spatial variability.",
+          "description" : "Anndata with estimated spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Feature ID",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "Feature name",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "pred_spatial_var_score",
+                "description" : "Predicted spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/somde/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "dest" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "SOMDE",
+      "summary" : "SOMDE is a scalable method for identifying spatially variable genes with self-organizing map.",
+      "description" : "SOMDE uses self-organizing map to cluster neighboring cells into nodes, and then uses a Gaussian process \nto fit the node-level spatial gene expression to identify SVgenes. Experiments show that SOMDE is about \n5 to 50 times faster than existing methods with comparable results. \nThe adjustable resolution of SOMDE makes it the only method that can give results in about \n5 min in large datasets of more than 20 000 sequencing sites.\n",
+      "preferred_normalization" : "counts",
+      "reference" : "hao2021somde",
+      "documentation_url" : "https://github.com/WhirlFirst/somde/blob/master/slide_seq0819_11_SOM.ipynb",
+      "repository_url" : "https://github.com/XuegongLab/somde",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatially variable gene identification method.",
+        "description" : "Method to identify spatially variable genes"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "ghcr.io/openproblems-bio/base_python:1.0.4",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "somde",
+            "scanpy==1.9.8",
+            "pandas==2.2.1",
+            "numpy==1.26.4",
+            "scipy==1.11.4"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/somde/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/somde",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import pandas as pd
+import numpy as np
+import scanpy as sc
+from somde import SomNode
+import scipy
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run SOMDE', flush=True)
+counts = adata.layers["counts"]
+if scipy.sparse.issparse(counts): 
+    counts = counts.todense()
+    
+data = pd.DataFrame(
+    counts, 
+    columns=adata.var_names, 
+    index=adata.obs_names
+)
+
+X = pd.DataFrame(adata.obsm["spatial"], 
+                     index=adata.obs_names, 
+                     columns=["x", "y"]).values.astype(np.float32)
+    
+som = SomNode(X, k=10)
+ndf, ninfo = som.mtx(data.transpose())
+nres = som.norm() 
+
+df, SVnum = som.run()
+
+# save results
+df.set_index("g", inplace=True)
+df = df.loc[adata.var_names][['FSV']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/methods/somde",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/methods/somde/nextflow.config b/target/nextflow/spatially_variable_genes/methods/somde/nextflow.config
new file mode 100644
index 0000000000..d239d9ddf1
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/somde/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/methods/somde'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/methods/spagcn/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/methods/spagcn/.config.vsh.yaml
new file mode 100644
index 0000000000..7f56ffeee7
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spagcn/.config.vsh.yaml
@@ -0,0 +1,206 @@
+functionality:
+  name: "spagcn"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SpaGCN"
+    summary: "Integrating gene expression, spatial location and histology to identify\
+      \ spatial domains and spatially variable genes by graph convolutional network."
+    description: "To elucidate spatial gene expression variation, we present SpaGCN,\
+      \ a graph convolutional \nnetwork approach that integrates gene expression,\
+      \ spatial location and histology in SRT data analysis. \nThrough graph convolution,\
+      \ SpaGCN aggregates gene expression of each spot from its neighboring spots,\
+      \ \nwhich enables the identification of spatial domains with coherent expression\
+      \ and histology. \nThe subsequent domain guided differential expression (DE)\
+      \ analysis then detects genes with \nenriched expression patterns in the identified\
+      \ domains. Analyzing seven SRT datasets using \nSpaGCN, we show it can detect\
+      \ genes with much more enriched spatial expression patterns than competing methods.\
+      \ Furthermore, genes detected by SpaGCN are transferrable and can be utilized\
+      \ to study spatial variation of gene expression in other datasets. SpaGCN is\
+      \ computationally \nfast, platform independent, making it a desirable tool for\
+      \ diverse SRT studies.\n"
+    preferred_normalization: "counts"
+    reference: "hu2021spagcn"
+    documentation_url: "https://github.com/jianhuupenn/SpaGCN/blob/master/tutorial/tutorial.ipynb"
+    repository_url: "https://github.com/jianhuupenn/SpaGCN"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    - "procps"
+    - "libhdf5-dev"
+    - "cmake"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone https://github.com/jianhuupenn/SpaGCN.git /opt/SpaGCN\n"
+  - type: "python"
+    user: false
+    packages:
+    - "numpy<2.0"
+    - "/opt/SpaGCN/SpaGCN_package"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spagcn/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spagcn"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spagcn/spagcn"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/methods/spagcn/main.nf b/target/nextflow/spatially_variable_genes/methods/spagcn/main.nf
new file mode 100644
index 0000000000..bff1560d40
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spagcn/main.nf
@@ -0,0 +1,3689 @@
+// spagcn 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "spagcn",
+    "namespace" : "spatially_variable_genes/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_data",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset without spatially variable genes.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Anndata with estimate spatial variability.",
+          "description" : "Anndata with estimated spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Feature ID",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "Feature name",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "pred_spatial_var_score",
+                "description" : "Predicted spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spagcn/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "dest" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "SpaGCN",
+      "summary" : "Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network.",
+      "description" : "To elucidate spatial gene expression variation, we present SpaGCN, a graph convolutional \nnetwork approach that integrates gene expression, spatial location and histology in SRT data analysis. \nThrough graph convolution, SpaGCN aggregates gene expression of each spot from its neighboring spots, \nwhich enables the identification of spatial domains with coherent expression and histology. \nThe subsequent domain guided differential expression (DE) analysis then detects genes with \nenriched expression patterns in the identified domains. Analyzing seven SRT datasets using \nSpaGCN, we show it can detect genes with much more enriched spatial expression patterns than competing methods. Furthermore, genes detected by SpaGCN are transferrable and can be utilized to study spatial variation of gene expression in other datasets. SpaGCN is computationally \nfast, platform independent, making it a desirable tool for diverse SRT studies.\n",
+      "preferred_normalization" : "counts",
+      "reference" : "hu2021spagcn",
+      "documentation_url" : "https://github.com/jianhuupenn/SpaGCN/blob/master/tutorial/tutorial.ipynb",
+      "repository_url" : "https://github.com/jianhuupenn/SpaGCN",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatially variable gene identification method.",
+        "description" : "Method to identify spatially variable genes"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "ghcr.io/openproblems-bio/base_python:1.0.4",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "apt",
+          "packages" : [
+            "git",
+            "procps",
+            "libhdf5-dev",
+            "cmake"
+          ],
+          "interactive" : false
+        },
+        {
+          "type" : "docker",
+          "run" : [
+            "git clone https://github.com/jianhuupenn/SpaGCN.git /opt/SpaGCN\n"
+          ]
+        },
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "numpy<2.0",
+            "/opt/SpaGCN/SpaGCN_package"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spagcn/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spagcn",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import SpaGCN as spg
+import pandas as pd
+import numpy as np
+import scanpy as sc
+import random
+import torch
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+# run normalization
+adata.X = adata.layers['counts'].copy()
+sc.pp.normalize_total(adata=adata)
+sc.pp.log1p(adata)
+
+print('Run SpaGCN', flush=True)
+random_seed = 100
+
+# Set seed
+random.seed(random_seed)
+torch.manual_seed(random_seed)
+np.random.seed(random_seed)
+
+p = 0.5
+min_in_group_fraction = 0
+min_in_out_group_ratio = 0
+min_fold_change = 0
+
+
+adj = spg.calculate_adj_matrix(
+    x=adata.obsm["spatial"][:, 0],
+    y=adata.obsm["spatial"][:, 1],
+    histology=False
+)
+l = spg.search_l(p, adj, start=0.01, end=1000, tol=0.01, max_run=100)
+
+clf = spg.SpaGCN()
+clf.set_l(l)
+
+# Run
+clf.train(
+    adata,
+    adj,
+    init_spa=True,
+    init="louvain",
+    res=0.5,
+    tol=5e-3,
+    lr=0.05,
+    max_epochs=200,
+)
+
+y_pred, prob = clf.predict()
+adata.obs["pred"] = y_pred
+de_genes_all = list()
+n_clusters = len(adata.obs["pred"].unique())
+
+# identify DE genes
+for target in range(n_clusters):
+    print(f"target: {target}")
+    start, end = np.quantile(adj[adj != 0], q=0.001), np.quantile(
+        adj[adj != 0], q=0.1
+    )
+    r = spg.search_radius(
+        target_cluster=target,
+        cell_id=adata.obs.index.tolist(),
+        x=adata.obsm["spatial"][:, 0],
+        y=adata.obsm["spatial"][:, 1],
+        pred=adata.obs["pred"].tolist(),
+        start=start,
+        end=end,
+        num_min=10,
+        num_max=14,
+        max_run=100,
+    )
+
+    try:
+        nbr_domians = spg.find_neighbor_clusters(
+            target_cluster=target,
+            cell_id=adata.obs.index.tolist(),
+            x=adata.obsm["spatial"][:, 0],
+            y=adata.obsm["spatial"][:, 1],
+            pred=adata.obs["pred"].tolist(),
+            radius=r,
+            ratio=0,
+        )
+
+        de_genes_info = spg.rank_genes_groups(
+            input_adata=adata,
+            target_cluster=target,
+            nbr_list=nbr_domians,
+            label_col="pred",
+            adj_nbr=True,
+            log=True,
+        )
+        de_genes_all.append(de_genes_info)
+    except (RuntimeError, TypeError, NameError):
+        pass
+
+if len(de_genes_all) == 0:
+    df = adata.var
+    df['pvals_adj'] = np.random.random(adata.n_vars)
+else:
+    df_res = pd.concat(de_genes_all)
+    df_res = df_res.groupby(["genes"]).min()
+    df_res = df_res.loc[adata.var_names]
+    df = pd.concat([df_res, adata.var], axis=1)
+
+# save results
+df = df.loc[adata.var_names][['pvals_adj']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+# reverse it to make sure a bigger score represents a higher spatial variation
+df['pred_spatial_var_score'] = -np.log10(df['pred_spatial_var_score'])
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/methods/spagcn",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/methods/spagcn/nextflow.config b/target/nextflow/spatially_variable_genes/methods/spagcn/nextflow.config
new file mode 100644
index 0000000000..98ee4037f0
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spagcn/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/methods/spagcn'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/methods/spagft/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/methods/spagft/.config.vsh.yaml
new file mode 100644
index 0000000000..01f30df3b1
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spagft/.config.vsh.yaml
@@ -0,0 +1,216 @@
+functionality:
+  name: "spagft"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SpaGFT"
+    summary: "SpaGFT is a graph Fourier transform for tissue module identification\
+      \ from spatially resolved transcriptomics"
+    description: "The tissue module (TM) was defined as an architectural area containing\
+      \ recurrent cellular \ncommunities executing specific biological functions at\
+      \ different tissue sites. \nHowever, the computational identification of TMs\
+      \ poses challenges owing to their various \nlength scales, convoluted biological\
+      \ processes, not well-defined molecular features, and \nirregular spatial patterns.\
+      \ Here, we present a hypothesis-free graph Fourier transform model, \nSpaGFT,\
+      \ to characterize TMs. For the first time, SpaGFT transforms complex gene expression\
+      \ \npatterns into simple, but informative signals, leading to the accurate identification\
+      \ of \nspatially variable genes (SVGs) at a fast computational speed. Based\
+      \ on clustering the \ntransformed signals of the SVGs, SpaGFT provides a novel\
+      \ computational framework for TM \ncharacterization. Three case studies were\
+      \ used to illustrate TM identities, the biological \nprocesses of convoluted\
+      \ TMs in the lymph node, and conserved TMs across multiple samples constituting\
+      \ \nthe complex organ. The superior accuracy, scalability, and interpretability\
+      \ of SpaGFT indicate \nthat it is a novel and powerful tool for the investigation\
+      \ of TMs to gain new insights into a variety \nof biological questions.\n"
+    preferred_normalization: "counts"
+    reference: "chang2022spatial"
+    documentation_url: "https://spagft.readthedocs.io/en/latest/"
+    repository_url: "https://github.com/jxLiu-bio/SpaGFT"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.10"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    - "procps"
+    - "libhdf5-dev"
+    - "cmake"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone https://github.com/jxLiu-bio/SpaGFT.git /opt/SpaGFT\n"
+  - type: "python"
+    user: false
+    packages:
+    - "h5py"
+    - "numba==0.55.1"
+    - "louvain==0.7.1"
+    - "chardet==5.1.0"
+    - "charset-normalizer==3.1.0"
+    - "anndata"
+    - "/opt/SpaGFT"
+    - "mizani==0.9.3"
+    - "pyyaml"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spagft/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spagft"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spagft/spagft"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/methods/spagft/main.nf b/target/nextflow/spatially_variable_genes/methods/spagft/main.nf
new file mode 100644
index 0000000000..4c3c3408a9
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spagft/main.nf
@@ -0,0 +1,3608 @@
+// spagft 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "spagft",
+    "namespace" : "spatially_variable_genes/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_data",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset without spatially variable genes.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Anndata with estimate spatial variability.",
+          "description" : "Anndata with estimated spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Feature ID",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "Feature name",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "pred_spatial_var_score",
+                "description" : "Predicted spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spagft/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "dest" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "SpaGFT",
+      "summary" : "SpaGFT is a graph Fourier transform for tissue module identification from spatially resolved transcriptomics",
+      "description" : "The tissue module (TM) was defined as an architectural area containing recurrent cellular \ncommunities executing specific biological functions at different tissue sites. \nHowever, the computational identification of TMs poses challenges owing to their various \nlength scales, convoluted biological processes, not well-defined molecular features, and \nirregular spatial patterns. Here, we present a hypothesis-free graph Fourier transform model, \nSpaGFT, to characterize TMs. For the first time, SpaGFT transforms complex gene expression \npatterns into simple, but informative signals, leading to the accurate identification of \nspatially variable genes (SVGs) at a fast computational speed. Based on clustering the \ntransformed signals of the SVGs, SpaGFT provides a novel computational framework for TM \ncharacterization. Three case studies were used to illustrate TM identities, the biological \nprocesses of convoluted TMs in the lymph node, and conserved TMs across multiple samples constituting \nthe complex organ. The superior accuracy, scalability, and interpretability of SpaGFT indicate \nthat it is a novel and powerful tool for the investigation of TMs to gain new insights into a variety \nof biological questions.\n",
+      "preferred_normalization" : "counts",
+      "reference" : "chang2022spatial",
+      "documentation_url" : "https://spagft.readthedocs.io/en/latest/",
+      "repository_url" : "https://github.com/jxLiu-bio/SpaGFT",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatially variable gene identification method.",
+        "description" : "Method to identify spatially variable genes"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "python:3.10",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "apt",
+          "packages" : [
+            "git",
+            "procps",
+            "libhdf5-dev",
+            "cmake"
+          ],
+          "interactive" : false
+        },
+        {
+          "type" : "docker",
+          "run" : [
+            "git clone https://github.com/jxLiu-bio/SpaGFT.git /opt/SpaGFT\n"
+          ]
+        },
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "h5py",
+            "numba==0.55.1",
+            "louvain==0.7.1",
+            "chardet==5.1.0",
+            "charset-normalizer==3.1.0",
+            "anndata",
+            "/opt/SpaGFT",
+            "mizani==0.9.3",
+            "pyyaml"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spagft/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spagft",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import SpaGFT as spg
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run SpaGFT', flush=True)
+
+adata.X = adata.layers['normalized'].copy()
+
+adata.obs.loc[:, ['array_row', 'array_col']] = adata.obsm['spatial']
+
+(ratio_low, ratio_high) = spg.gft.determine_frequency_ratio(adata, ratio_neighbors=1)
+
+df = spg.detect_svg(adata,
+                    spatial_info=['array_row', 'array_col'],
+                    ratio_low_freq=ratio_low,
+                    ratio_high_freq=ratio_high,
+                    ratio_neighbors=1,
+                    filter_peaks=True,
+                    S=6)
+
+
+# save results
+df = df.loc[adata.var_names][['gft_score']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/methods/spagft",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/methods/spagft/nextflow.config b/target/nextflow/spatially_variable_genes/methods/spagft/nextflow.config
new file mode 100644
index 0000000000..2d0894ced9
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spagft/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/methods/spagft'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/methods/spanve/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/methods/spanve/.config.vsh.yaml
new file mode 100644
index 0000000000..5438639a22
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spanve/.config.vsh.yaml
@@ -0,0 +1,203 @@
+functionality:
+  name: "spanve"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "Spanve"
+    summary: "Spanve is a non-parametric statistical approach based on modeling space\
+      \ dependence as a distance of two distributions for detecting SV genes."
+    description: "The depiction of in situ gene expression through spatial transcriptomics\
+      \ facilitates the inference of cell \nfunction mechanisms. To build spatial\
+      \ maps of transcriptomes, the first and crucial step is to \nidentify spatially\
+      \ variable (SV) genes. However, current methods fall short in dealing with \n\
+      large-scale spatial transcriptomics data and may result in a high false positive\
+      \ rate due to the \nmodeling of gene expression into parametric distributions.\
+      \ \nThis paper introduces Spanve (https://github.com/zjupgx/Spanve), a non-parametric\
+      \ statistical approach \nbased on modeling space dependence as a distance of\
+      \ two distributions for detecting SV genes. \nThe high computing efficiency\
+      \ and accuracy of Spanve is demonstrated through comprehensive benchmarking.\
+      \ \nAdditionally, Spanve can detect clustering-friendly SV genes and spatially\
+      \ variable co-expression, \nfacilitating the identification of spatial tissue\
+      \ domains by an imputation.   \n"
+    preferred_normalization: "counts"
+    reference: "cai2023spanve"
+    documentation_url: "https://github.com/zjupgx/Spanve/blob/main/tutorial.ipynb"
+    repository_url: "https://github.com/zjupgx/Spanve"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone https://github.com/gx-Cai/Spanve.git /opt/Spanve\n"
+  - type: "python"
+    user: false
+    packages:
+    - "/opt/Spanve"
+    - "numpy==1.26.4"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spanve/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spanve"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spanve/spanve"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/methods/spanve/main.nf b/target/nextflow/spatially_variable_genes/methods/spanve/main.nf
new file mode 100644
index 0000000000..e3e12bbd1f
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spanve/main.nf
@@ -0,0 +1,3587 @@
+// spanve 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "spanve",
+    "namespace" : "spatially_variable_genes/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_data",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset without spatially variable genes.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Anndata with estimate spatial variability.",
+          "description" : "Anndata with estimated spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Feature ID",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "Feature name",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "pred_spatial_var_score",
+                "description" : "Predicted spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spanve/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "dest" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "Spanve",
+      "summary" : "Spanve is a non-parametric statistical approach based on modeling space dependence as a distance of two distributions for detecting SV genes.",
+      "description" : "The depiction of in situ gene expression through spatial transcriptomics facilitates the inference of cell \nfunction mechanisms. To build spatial maps of transcriptomes, the first and crucial step is to \nidentify spatially variable (SV) genes. However, current methods fall short in dealing with \nlarge-scale spatial transcriptomics data and may result in a high false positive rate due to the \nmodeling of gene expression into parametric distributions. \nThis paper introduces Spanve (https://github.com/zjupgx/Spanve), a non-parametric statistical approach \nbased on modeling space dependence as a distance of two distributions for detecting SV genes. \nThe high computing efficiency and accuracy of Spanve is demonstrated through comprehensive benchmarking. \nAdditionally, Spanve can detect clustering-friendly SV genes and spatially variable co-expression, \nfacilitating the identification of spatial tissue domains by an imputation.   \n",
+      "preferred_normalization" : "counts",
+      "reference" : "cai2023spanve",
+      "documentation_url" : "https://github.com/zjupgx/Spanve/blob/main/tutorial.ipynb",
+      "repository_url" : "https://github.com/zjupgx/Spanve",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatially variable gene identification method.",
+        "description" : "Method to identify spatially variable genes"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "ghcr.io/openproblems-bio/base_python:1.0.4",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "apt",
+          "packages" : [
+            "git"
+          ],
+          "interactive" : false
+        },
+        {
+          "type" : "docker",
+          "run" : [
+            "git clone https://github.com/gx-Cai/Spanve.git /opt/Spanve\n"
+          ]
+        },
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "/opt/Spanve",
+            "numpy==1.26.4"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spanve/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spanve",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+from Spanve import Spanve
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+print('Run Spanve', flush=True)
+adata.X = adata.layers['counts']
+spanve = Spanve(adata)
+spanve.fit(verbose=False)
+
+# save results
+df = spanve.result_df
+df = df.loc[adata.var_names][['ent']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/methods/spanve",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/methods/spanve/nextflow.config b/target/nextflow/spatially_variable_genes/methods/spanve/nextflow.config
new file mode 100644
index 0000000000..1279d2223a
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spanve/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/methods/spanve'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/methods/spark/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/methods/spark/.config.vsh.yaml
new file mode 100644
index 0000000000..c4db3a3085
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spark/.config.vsh.yaml
@@ -0,0 +1,184 @@
+functionality:
+  name: "spark"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SPARK"
+    summary: "Spatial PAttern Recognition via Kernels"
+    description: "SPARK builds upon a generalized linear spatial model (GLSM) with\
+      \ a variety of spatial kernels to accommodate count data.\nWith a newly developed\
+      \ penalized quasi-likelihood (PQL) algorithm, SPARK is scalable to analyzing\
+      \ tens of \nthousands of genes across tens of thousands spatial locations.\n"
+    preferred_normalization: "counts"
+    reference: "sun2020statistical"
+    documentation_url: "https://xzhoulab.github.io/SPARK/02_SPARK_Example/"
+    repository_url: "https://github.com/xzhoulab/SPARK"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_r:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    github:
+    - "xzhoulab/SPARK"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "veryhightime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spark/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spark"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spark/spark"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/methods/spark/main.nf b/target/nextflow/spatially_variable_genes/methods/spark/main.nf
new file mode 100644
index 0000000000..fec6f9cb40
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spark/main.nf
@@ -0,0 +1,3620 @@
+// spark 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "spark",
+    "namespace" : "spatially_variable_genes/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_data",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset without spatially variable genes.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Anndata with estimate spatial variability.",
+          "description" : "Anndata with estimated spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Feature ID",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "Feature name",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "pred_spatial_var_score",
+                "description" : "Predicted spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spark/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "dest" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "SPARK",
+      "summary" : "Spatial PAttern Recognition via Kernels",
+      "description" : "SPARK builds upon a generalized linear spatial model (GLSM) with a variety of spatial kernels to accommodate count data.\nWith a newly developed penalized quasi-likelihood (PQL) algorithm, SPARK is scalable to analyzing tens of \nthousands of genes across tens of thousands spatial locations.\n",
+      "preferred_normalization" : "counts",
+      "reference" : "sun2020statistical",
+      "documentation_url" : "https://xzhoulab.github.io/SPARK/02_SPARK_Example/",
+      "repository_url" : "https://github.com/xzhoulab/SPARK",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatially variable gene identification method.",
+        "description" : "Method to identify spatially variable genes"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "ghcr.io/openproblems-bio/base_r:1.0.4",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "github" : [
+            "xzhoulab/SPARK"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "veryhightime",
+          "highmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spark/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spark",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+suppressMessages(library(SPARK))
+suppressMessages(library(anndata))
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_data" = $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_DATA" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+# VIASH END
+
+# load data
+cat("Load data\\\\n")
+adata <- anndata::read_h5ad(par\\$input_data)
+counts <- t(as.matrix(adata\\$layers[["counts"]]))
+colnames(counts) <- adata\\$obs_names
+rownames(counts) <- adata\\$var_names
+info <- as.data.frame(adata\\$obsm[["spatial"]])
+rownames(info) <- colnames(counts)
+colnames(info) <- c("x", "y")
+
+# run SPARK
+cat("Run SPARK\\\\n")
+if (!is.null(meta\\$cpus)) {
+    n_cpus <- meta\\$cpus
+} else {
+    n_cpus <- 1
+}
+
+spark <- CreateSPARKObject(
+    counts = counts, percentage = 0,
+    min_total_counts = 0, location = info[, 1:2]
+)
+
+spark@lib_size <- apply(spark@counts, 2, sum)
+spark <- spark.vc(spark,
+    covariates = NULL,
+    lib_size = spark@lib_size,
+    num_core = n_cpus,
+    verbose = FALSE
+)
+
+## Calculating pval
+spark <- spark.test(spark,
+    check_positive = T,
+    verbose = F
+)
+
+df <- as.data.frame(spark@res_mtest)
+
+df\\$feature_id <- rownames(df)
+
+df <- subset(df, select = c("feature_id", "adjusted_pvalue"))
+colnames(df) <- c("feature_id", "pred_spatial_var_score")
+
+# because SPARK only generates p-values, we here transform the values
+# via -log10 to make sure a bigger score represents a higher spatial variation
+df\\$pred_spatial_var_score <- -log10(df\\$pred_spatial_var_score)
+
+# save output
+cat("Write output AnnData to file\\\\n")
+output <- anndata::AnnData(
+    shape = adata\\$shape,
+    var = df,
+    uns = list(
+        "dataset_id" = adata\\$uns[["dataset_id"]],
+        "method_id" = meta[["functionality_name"]]
+    )
+)
+
+anndata::write_h5ad(anndata = output, filename = par\\$output)
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/methods/spark",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "veryhightime",
+    "highmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/methods/spark/nextflow.config b/target/nextflow/spatially_variable_genes/methods/spark/nextflow.config
new file mode 100644
index 0000000000..aba43da6eb
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spark/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/methods/spark'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/methods/spark_x/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/methods/spark_x/.config.vsh.yaml
new file mode 100644
index 0000000000..fc999a2424
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spark_x/.config.vsh.yaml
@@ -0,0 +1,191 @@
+functionality:
+  name: "spark_x"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SPARK-X"
+    summary: "SPARK-X is a non-parametric method for rapid and effective detection\
+      \ of spatially expressed genes in large spatial transcriptomic studies."
+    description: "Spatial transcriptomic studies are becoming increasingly common\
+      \ and large, posing important \nstatistical and computational challenges for\
+      \ many analytic tasks. Here, we present SPARK-X, \na non-parametric method for\
+      \ rapid and effective detection of spatially expressed genes in large \nspatial\
+      \ transcriptomic studies. SPARK-X not only produces effective type I error control\
+      \ and \nhigh power but also brings orders of magnitude computational savings.\
+      \ We apply SPARK-X to \nanalyze three large datasets, one of which is only analyzable\
+      \ by SPARK-X. In these data, \nSPARK-X identifies many spatially expressed genes\
+      \ including those that are spatially \nexpressed within the same cell type,\
+      \ revealing new biological insights.\n"
+    preferred_normalization: "counts"
+    reference: "zhu2021spark"
+    documentation_url: "https://xzhoulab.github.io/SPARK/02_SPARK_Example/"
+    repository_url: "https://github.com/xzhoulab/SPARK"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_r:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    github:
+    - "xzhoulab/SPARK"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spark_x/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spark_x"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spark_x/spark_x"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/methods/spark_x/main.nf b/target/nextflow/spatially_variable_genes/methods/spark_x/main.nf
new file mode 100644
index 0000000000..2821140c8e
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spark_x/main.nf
@@ -0,0 +1,3602 @@
+// spark_x 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "spark_x",
+    "namespace" : "spatially_variable_genes/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_data",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset without spatially variable genes.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Anndata with estimate spatial variability.",
+          "description" : "Anndata with estimated spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Feature ID",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "Feature name",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "pred_spatial_var_score",
+                "description" : "Predicted spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spark_x/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "dest" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "SPARK-X",
+      "summary" : "SPARK-X is a non-parametric method for rapid and effective detection of spatially expressed genes in large spatial transcriptomic studies.",
+      "description" : "Spatial transcriptomic studies are becoming increasingly common and large, posing important \nstatistical and computational challenges for many analytic tasks. Here, we present SPARK-X, \na non-parametric method for rapid and effective detection of spatially expressed genes in large \nspatial transcriptomic studies. SPARK-X not only produces effective type I error control and \nhigh power but also brings orders of magnitude computational savings. We apply SPARK-X to \nanalyze three large datasets, one of which is only analyzable by SPARK-X. In these data, \nSPARK-X identifies many spatially expressed genes including those that are spatially \nexpressed within the same cell type, revealing new biological insights.\n",
+      "preferred_normalization" : "counts",
+      "reference" : "zhu2021spark",
+      "documentation_url" : "https://xzhoulab.github.io/SPARK/02_SPARK_Example/",
+      "repository_url" : "https://github.com/xzhoulab/SPARK",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatially variable gene identification method.",
+        "description" : "Method to identify spatially variable genes"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "ghcr.io/openproblems-bio/base_r:1.0.4",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "github" : [
+            "xzhoulab/SPARK"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spark_x/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spark_x",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+suppressMessages(library(SPARK))
+suppressMessages(library(anndata))
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input_data" = $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT_DATA" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+# VIASH END
+
+# load data
+cat("Load data\\\\n")
+adata <- anndata::read_h5ad(par\\$input_data)
+counts <- t(as.matrix(adata\\$layers[["counts"]]))
+colnames(counts) <- adata\\$obs_names
+rownames(counts) <- adata\\$var_names
+info <- as.data.frame(adata\\$obsm[["spatial"]])
+rownames(info) <- colnames(counts)
+colnames(info) <- c("x", "y")
+
+# run SPARK-X
+cat("Load SPARK-X\\\\n")
+if (!is.null(meta\\$cpus)) {
+    n_cpus <- meta\\$cpus
+} else {
+    n_cpus <- 1
+}
+
+sparkX <- sparkx(counts, info[, 1:2], numCores = n_cpus, option = "mixture")
+
+df <- as.data.frame(sparkX\\$res_mtest)
+df\\$feature_id <- rownames(df)
+df <- subset(df, select = c("feature_id", "adjustedPval"))
+colnames(df) <- c("feature_id", "pred_spatial_var_score")
+rownames(df) <- NULL
+
+# because SPARK-X only generates p-values, we here transform the values
+# via -log10 to make sure a bigger score represents a higher spatial variation
+df\\$pred_spatial_var_score <- -log10(df\\$pred_spatial_var_score)
+
+# save output
+cat("Write output AnnData to file\\\\n")
+output <- anndata::AnnData(
+    shape = adata\\$shape,
+    var = df,
+    uns = list(
+        "dataset_id" = adata\\$uns[["dataset_id"]],
+        "method_id" = meta[["functionality_name"]]
+    )
+)
+
+anndata::write_h5ad(anndata = output, filename = par\\$output)
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/methods/spark_x",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/methods/spark_x/nextflow.config b/target/nextflow/spatially_variable_genes/methods/spark_x/nextflow.config
new file mode 100644
index 0000000000..a91cf9ad7e
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spark_x/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/methods/spark_x'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/methods/spatialde/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/methods/spatialde/.config.vsh.yaml
new file mode 100644
index 0000000000..a687146e8c
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spatialde/.config.vsh.yaml
@@ -0,0 +1,197 @@
+functionality:
+  name: "spatialde"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SpatialDE"
+    summary: "SpatialDE is a method for identify spatially variable genes based on\
+      \ Gaussian Process model "
+    description: "SpatialDE decomposes expression variability into spatial and nonspatial\
+      \ components using two random effect terms: a spatial variance term that parametrizes\
+      \ gene expression covariance by pairwise distances of samples, and a noise term\
+      \ that models nonspatial variability.\n"
+    preferred_normalization: "counts"
+    reference: "svensson2018spatialde"
+    documentation_url: "https://github.com/Teichlab/SpatialDE"
+    repository_url: "https://github.com/Teichlab/SpatialDE"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone https://github.com/Teichlab/SpatialDE.git /opt/SpatialDE\n"
+  - type: "python"
+    user: false
+    packages:
+    - "/opt/SpatialDE/Python-module"
+    - "scanpy==1.9.8"
+    - "pandas==2.2.1"
+    - "numpy==1.26.4"
+    - "scipy==1.11.4"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spatialde/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spatialde"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spatialde/spatialde"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/methods/spatialde/main.nf b/target/nextflow/spatially_variable_genes/methods/spatialde/main.nf
new file mode 100644
index 0000000000..4370f845d9
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spatialde/main.nf
@@ -0,0 +1,3610 @@
+// spatialde 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "spatialde",
+    "namespace" : "spatially_variable_genes/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_data",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset without spatially variable genes.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Anndata with estimate spatial variability.",
+          "description" : "Anndata with estimated spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Feature ID",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "Feature name",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "pred_spatial_var_score",
+                "description" : "Predicted spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spatialde/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "dest" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "SpatialDE",
+      "summary" : "SpatialDE is a method for identify spatially variable genes based on Gaussian Process model ",
+      "description" : "SpatialDE decomposes expression variability into spatial and nonspatial components using two random effect terms: a spatial variance term that parametrizes gene expression covariance by pairwise distances of samples, and a noise term that models nonspatial variability.\n",
+      "preferred_normalization" : "counts",
+      "reference" : "svensson2018spatialde",
+      "documentation_url" : "https://github.com/Teichlab/SpatialDE",
+      "repository_url" : "https://github.com/Teichlab/SpatialDE",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatially variable gene identification method.",
+        "description" : "Method to identify spatially variable genes"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "ghcr.io/openproblems-bio/base_python:1.0.4",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "apt",
+          "packages" : [
+            "git"
+          ],
+          "interactive" : false
+        },
+        {
+          "type" : "docker",
+          "run" : [
+            "git clone https://github.com/Teichlab/SpatialDE.git /opt/SpatialDE\n"
+          ]
+        },
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "/opt/SpatialDE/Python-module",
+            "scanpy==1.9.8",
+            "pandas==2.2.1",
+            "numpy==1.26.4",
+            "scipy==1.11.4"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "hightime",
+          "highmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spatialde/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spatialde",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import warnings
+warnings.filterwarnings('ignore')
+
+import scanpy as sc
+import anndata as ad
+import NaiveDE
+import SpatialDE
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+# run spatialDE
+print('Run spatialDE')
+sc.pp.calculate_qc_metrics(adata, 
+                           layer='counts', 
+                           inplace=True, 
+                           percent_top=[10])
+    
+counts = sc.get.obs_df(adata, 
+                       keys=list(adata.var_names), 
+                       use_raw=False, 
+                       layer='counts')
+
+total_counts = sc.get.obs_df(adata, keys=["total_counts"])
+norm_expr = NaiveDE.stabilize(counts.T).T
+resid_expr = NaiveDE.regress_out(total_counts, 
+                                 norm_expr.T, 
+                                 "np.log(total_counts)").T
+    
+df = SpatialDE.run(adata.obsm["spatial"], resid_expr)
+
+# save results
+df.set_index("g", inplace=True)
+df = df.loc[adata.var_names][['FSV']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/methods/spatialde",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "hightime",
+    "highmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/methods/spatialde/nextflow.config b/target/nextflow/spatially_variable_genes/methods/spatialde/nextflow.config
new file mode 100644
index 0000000000..3016b8eca6
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spatialde/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/methods/spatialde'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/methods/spatialde2/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/methods/spatialde2/.config.vsh.yaml
new file mode 100644
index 0000000000..2335b75249
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spatialde2/.config.vsh.yaml
@@ -0,0 +1,213 @@
+functionality:
+  name: "spatialde2"
+  namespace: "spatially_variable_genes/methods"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_data"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_method_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    label: "SpatialDE2"
+    summary: "SpatialDE2: Fast and localized variance component analysis of spatial\
+      \ transcriptomics"
+    description: "Spatial transcriptomics is now a mature technology, allowing to\
+      \ assay gene expression changes \nin the histological context of complex tissues.\
+      \ A canonical analysis workflow starts with the \nidentification of tissue zones\
+      \ that share similar expression profiles, followed by the detection \nof highly\
+      \ variable or spatially variable genes. Rapid increases in the scale and complexity\
+      \ of \nspatial transcriptomic datasets demand that these analysis steps are\
+      \ conducted in a consistent \nand integrated manner, a requirement that is not\
+      \ met by current methods. To address this, we \nhere present SpatialDE2, which\
+      \ unifies the mapping of tissue zones and spatial variable gene \ndetection\
+      \ as integrated software framework, while at the same time advancing current\
+      \ algorithms \nfor both of these steps. Formulated in a Bayesian framework,\
+      \ the model accounts for the Poisson \ncount noise, while simultaneously offering\
+      \ superior computational speed compared to previous methods. \nWe validate SpatialDE2\
+      \ using simulated data and illustrate its utility in the context of two real-world\
+      \ \napplications to the spatial transcriptomics profiles of the mouse brain\
+      \ and human endometrium.\n"
+    preferred_normalization: "counts"
+    reference: "kats2021spatialde2"
+    documentation_url: "https://pmbio.github.io/SpatialDE/"
+    repository_url: "https://github.com/PMBio/SpatialDE"
+    type: "method"
+    type_info:
+      label: "Method"
+      summary: "A spatially variable gene identification method."
+      description: "Method to identify spatially variable genes"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "python:3.7.12"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "apt"
+    packages:
+    - "git"
+    - "procps"
+    - "libhdf5-dev"
+    - "cmake"
+    interactive: false
+  - type: "docker"
+    run:
+    - "git clone https://github.com/PMBio/SpatialDE.git /opt/SpatialDE2\n"
+  - type: "python"
+    user: false
+    packages:
+    - "scanpy"
+    - "anndata"
+    - "patsy"
+    - "/opt/SpatialDE2"
+    - "pyyaml"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    - "gpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spatialde2/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spatialde2"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spatialde2/spatialde2"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/methods/spatialde2/main.nf b/target/nextflow/spatially_variable_genes/methods/spatialde2/main.nf
new file mode 100644
index 0000000000..b0d1fbc6ca
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spatialde2/main.nf
@@ -0,0 +1,3613 @@
+// spatialde2 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "spatialde2",
+    "namespace" : "spatially_variable_genes/methods",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_data",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset without spatially variable genes.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Anndata with estimate spatial variability.",
+          "description" : "Anndata with estimated spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Feature ID",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "Feature name",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "pred_spatial_var_score",
+                "description" : "Predicted spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spatialde2/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_method_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "dest" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "label" : "SpatialDE2",
+      "summary" : "SpatialDE2: Fast and localized variance component analysis of spatial transcriptomics",
+      "description" : "Spatial transcriptomics is now a mature technology, allowing to assay gene expression changes \nin the histological context of complex tissues. A canonical analysis workflow starts with the \nidentification of tissue zones that share similar expression profiles, followed by the detection \nof highly variable or spatially variable genes. Rapid increases in the scale and complexity of \nspatial transcriptomic datasets demand that these analysis steps are conducted in a consistent \nand integrated manner, a requirement that is not met by current methods. To address this, we \nhere present SpatialDE2, which unifies the mapping of tissue zones and spatial variable gene \ndetection as integrated software framework, while at the same time advancing current algorithms \nfor both of these steps. Formulated in a Bayesian framework, the model accounts for the Poisson \ncount noise, while simultaneously offering superior computational speed compared to previous methods. \nWe validate SpatialDE2 using simulated data and illustrate its utility in the context of two real-world \napplications to the spatial transcriptomics profiles of the mouse brain and human endometrium.\n",
+      "preferred_normalization" : "counts",
+      "reference" : "kats2021spatialde2",
+      "documentation_url" : "https://pmbio.github.io/SpatialDE/",
+      "repository_url" : "https://github.com/PMBio/SpatialDE",
+      "type" : "method",
+      "type_info" : {
+        "label" : "Method",
+        "summary" : "A spatially variable gene identification method.",
+        "description" : "Method to identify spatially variable genes"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "python:3.7.12",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "apt",
+          "packages" : [
+            "git",
+            "procps",
+            "libhdf5-dev",
+            "cmake"
+          ],
+          "interactive" : false
+        },
+        {
+          "type" : "docker",
+          "run" : [
+            "git clone https://github.com/PMBio/SpatialDE.git /opt/SpatialDE2\n"
+          ]
+        },
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "scanpy",
+            "anndata",
+            "patsy",
+            "/opt/SpatialDE2",
+            "pyyaml"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu",
+          "gpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spatialde2/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spatialde2",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import scanpy as sc
+import anndata as ad
+import SpatialDE as sd
+import NaiveDE
+import warnings
+warnings.filterwarnings("ignore")
+
+
+# VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_data': $( if [ ! -z ${VIASH_PAR_INPUT_DATA+x} ]; then echo "r'${VIASH_PAR_INPUT_DATA//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+# VIASH END
+
+print('Load data', flush=True)
+adata = ad.read_h5ad(par['input_data'])
+
+# run SpatialDE2
+print('Run spatialDE2', flush=True)
+adata.X = adata.layers['counts'].copy()
+sc.pp.calculate_qc_metrics(adata, inplace=True, percent_top=[10])
+
+counts = sc.get.obs_df(adata,
+                       keys=list(adata.var_names),
+                       use_raw=False,
+                       layer='counts')
+
+total_counts = sc.get.obs_df(adata, keys=["total_counts"])
+norm_expr = NaiveDE.stabilize(counts.T).T
+adata.X = NaiveDE.regress_out(
+    total_counts, norm_expr.T, "np.log(total_counts)").T
+
+# run SpatialDE2
+df = sd.fit(adata, normalized=True, control=None)
+df.set_index("gene", inplace=True)
+
+# save results
+df = df.loc[adata.var_names][['FSV']]
+df = df.reset_index()
+df.columns = ['feature_id', 'pred_spatial_var_score']
+
+output = ad.AnnData(var=df,
+                    uns={'dataset_id': adata.uns['dataset_id'],
+                         'method_id': meta['functionality_name']})
+
+print("Write output to file", flush=True)
+output.write_h5ad(par['output'])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/methods/spatialde2",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu",
+    "gpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/methods/spatialde2/nextflow.config b/target/nextflow/spatially_variable_genes/methods/spatialde2/nextflow.config
new file mode 100644
index 0000000000..6955a72730
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/methods/spatialde2/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/methods/spatialde2'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/metrics/correlation/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/metrics/correlation/.config.vsh.yaml
new file mode 100644
index 0000000000..b0213d3e7c
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/metrics/correlation/.config.vsh.yaml
@@ -0,0 +1,241 @@
+functionality:
+  name: "correlation"
+  namespace: "spatially_variable_genes/metrics"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input_method"
+    info:
+      label: "Output"
+      summary: "Anndata with estimate spatial variability."
+      description: "Anndata with estimated spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Feature ID"
+          required: true
+        - type: "string"
+          name: "feature_name"
+          description: "Feature name"
+          required: false
+        - type: "double"
+          name: "pred_spatial_var_score"
+          description: "Predicted spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--input_solution"
+    info:
+      label: "Solution"
+      summary: "Anndata with true spatial variability."
+      description: "Anndata with true spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature (e.g., ESEMBL gene id suffixed\
+            \ with alpha value)."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: true
+        - type: "string"
+          name: "orig_feature_name"
+          description: "Original human-readable name for the feature, usually a gene\
+            \ symbol."
+          required: true
+        - type: "double"
+          name: "true_spatial_var_score"
+          description: "True spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: true
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Score"
+      summary: "Metric score file."
+      slots:
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - type: "string"
+          name: "method_id"
+          description: "A unique identifier for the method"
+          required: true
+        - type: "string"
+          name: "metric_ids"
+          description: "One or more unique metric identifiers"
+          multiple: true
+          required: true
+        - type: "double"
+          name: "metric_values"
+          description: "The metric values obtained for the given prediction. Must\
+            \ be of same length as 'metric_ids'."
+          multiple: true
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/score.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/check_metric_config.py"
+    is_executable: true
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  - type: "file"
+    path: "src/common/library.bib"
+  info:
+    metrics:
+    - name: "correlation"
+      label: "correlation"
+      summary: "Correlation represents the agreement of true and predicted spatial\
+        \ variability."
+      description: "Kendall rank correlation coefficient measures the ordinal association\
+        \ between two measured quantities. The best score and upper bound is 1 (observations\
+        \ have an identical rank), while the lower bound is -1 (observations have\
+        \ a completely different rank).\n"
+      reference: "kendall1938new"
+      documentation_url: "https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient"
+      repository_url: "https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html"
+      min: -1
+      max: 1
+      maximize: true
+    type: "metric"
+    type_info:
+      label: "Metric"
+      summary: "A spatially variable genes identification metric."
+      description: "A metric for evaluating accuracy spatially variable genes identification\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "pandas"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "native"
+  id: "native"
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "midmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/metrics/correlation/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/metrics/correlation"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/metrics/correlation/correlation"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/metrics/correlation/main.nf b/target/nextflow/spatially_variable_genes/metrics/correlation/main.nf
new file mode 100644
index 0000000000..f07b2820ca
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/metrics/correlation/main.nf
@@ -0,0 +1,3652 @@
+// correlation 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "correlation",
+    "namespace" : "spatially_variable_genes/metrics",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input_method",
+        "info" : {
+          "label" : "Output",
+          "summary" : "Anndata with estimate spatial variability.",
+          "description" : "Anndata with estimated spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Feature ID",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "Feature name",
+                "required" : false
+              },
+              {
+                "type" : "double",
+                "name" : "pred_spatial_var_score",
+                "description" : "Predicted spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/output.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--input_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "Anndata with true spatial variability.",
+          "description" : "Anndata with true spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature (e.g., ESEMBL gene id suffixed with alpha value).",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "orig_feature_name",
+                "description" : "Original human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "true_spatial_var_score",
+                "description" : "True spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Score",
+          "summary" : "Metric score file.",
+          "slots" : {
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "method_id",
+                "description" : "A unique identifier for the method",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "metric_ids",
+                "description" : "One or more unique metric identifiers",
+                "multiple" : true,
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "metric_values",
+                "description" : "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'.",
+                "multiple" : true,
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/score.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/metrics/correlation/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/check_metric_config.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "dest" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/library.bib",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "metrics" : [
+        {
+          "name" : "correlation",
+          "label" : "correlation",
+          "summary" : "Correlation represents the agreement of true and predicted spatial variability.",
+          "description" : "Kendall rank correlation coefficient measures the ordinal association between two measured quantities. The best score and upper bound is 1 (observations have an identical rank), while the lower bound is -1 (observations have a completely different rank).\n",
+          "reference" : "kendall1938new",
+          "documentation_url" : "https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient",
+          "repository_url" : "https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html",
+          "min" : -1,
+          "max" : 1,
+          "maximize" : true
+        }
+      ],
+      "type" : "metric",
+      "type_info" : {
+        "label" : "Metric",
+        "summary" : "A spatially variable genes identification metric.",
+        "description" : "A metric for evaluating accuracy spatially variable genes identification\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "ghcr.io/openproblems-bio/base_python:1.0.4",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "pandas"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "native",
+      "id" : "native"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "midmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/metrics/correlation/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/metrics/correlation",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import pandas as pd
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input_method': $( if [ ! -z ${VIASH_PAR_INPUT_METHOD+x} ]; then echo "r'${VIASH_PAR_INPUT_METHOD//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_solution': $( if [ ! -z ${VIASH_PAR_INPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_INPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+print('Reading input files', flush=True)
+input_method = ad.read_h5ad(par['input_method'])
+input_solution = ad.read_h5ad(par['input_solution'])
+
+print('Compute metrics', flush=True)
+df = pd.merge(input_method.var, input_solution.var, how='left', on='feature_id')
+groupby = df.groupby('orig_feature_name', observed=True)
+corr = groupby.apply(lambda x: x['pred_spatial_var_score'].corr(x['true_spatial_var_score'], method='kendall'))
+
+uns_metric_ids = [ 'correlation' ]
+uns_metric_values = [ corr.mean() ]
+
+print("Write output AnnData to file", flush=True)
+output = ad.AnnData(
+  uns={
+    'dataset_id': input_method.uns['dataset_id'],
+    'method_id': input_method.uns['method_id'],
+    'metric_ids': uns_metric_ids,
+    'metric_values': uns_metric_values
+  }
+)
+output.write_h5ad(par['output'], compression='gzip')
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/metrics/correlation",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "midmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/metrics/correlation/nextflow.config b/target/nextflow/spatially_variable_genes/metrics/correlation/nextflow.config
new file mode 100644
index 0000000000..2398dd563c
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/metrics/correlation/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/metrics/correlation'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/process_dataset/select_reference/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/process_dataset/select_reference/.config.vsh.yaml
new file mode 100644
index 0000000000..43d557ea0e
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/process_dataset/select_reference/.config.vsh.yaml
@@ -0,0 +1,254 @@
+functionality:
+  name: "select_reference"
+  namespace: "spatially_variable_genes/process_dataset"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Common Dataset"
+      summary: "A subset of the common dataset."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "counts"
+          description: "Normalized expression values."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: false
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+    example:
+    - "resources_test/common/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--input_layer"
+    description: "Which layer to use as input."
+    info: null
+    default:
+    - "normalized"
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Common Dataset"
+      summary: "A subset of the common dataset."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "counts"
+          description: "Normalized expression values."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: false
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+    example:
+    - "resources_test/common/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "string"
+    name: "--coord_type_proc"
+    description: "How to create spatial graph to select reference genes."
+    info: null
+    default:
+    - "grid"
+    required: false
+    choices:
+    - "grid"
+    - "generic"
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--num_features"
+    description: "The number of variable genes to select"
+    info: null
+    default:
+    - 200
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  description: "Compute SVG"
+  test_resources:
+  - type: "file"
+    path: "resources_test/common/mouse_brain_coronal_section1"
+    dest: "resources_test/common/mouse_brain_coronal_section1"
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  info:
+    type: "dataset_processor"
+    type_info:
+      label: "select_reference"
+      description: "Computes the spatially variable genes scores and select certain\
+        \ number of SVGs as reference.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "python"
+    user: false
+    packages:
+    - "squidpy"
+    upgrade: true
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "midcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/select_reference/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/process_dataset/select_reference"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/process_dataset/select_reference/select_reference"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/process_dataset/select_reference/main.nf b/target/nextflow/spatially_variable_genes/process_dataset/select_reference/main.nf
new file mode 100644
index 0000000000..820f26569b
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/process_dataset/select_reference/main.nf
@@ -0,0 +1,3674 @@
+// select_reference 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "select_reference",
+    "namespace" : "spatially_variable_genes/process_dataset",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Common Dataset",
+          "summary" : "A subset of the common dataset.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "counts",
+                "description" : "Normalized expression values.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--input_layer",
+        "description" : "Which layer to use as input.",
+        "default" : [
+          "normalized"
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Common Dataset",
+          "summary" : "A subset of the common dataset.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "counts",
+                "description" : "Normalized expression values.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "string",
+        "name" : "--coord_type_proc",
+        "description" : "How to create spatial graph to select reference genes.",
+        "default" : [
+          "grid"
+        ],
+        "required" : false,
+        "choices" : [
+          "grid",
+          "generic"
+        ],
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--num_features",
+        "description" : "The number of variable genes to select",
+        "default" : [
+          200
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/select_reference/"
+      }
+    ],
+    "description" : "Compute SVG",
+    "test_resources" : [
+      {
+        "type" : "file",
+        "path" : "resources_test/common/mouse_brain_coronal_section1",
+        "dest" : "resources_test/common/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "dataset_processor",
+      "type_info" : {
+        "label" : "select_reference",
+        "description" : "Computes the spatially variable genes scores and select certain number of SVGs as reference.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "ghcr.io/openproblems-bio/base_python:1.0.4",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "python",
+          "user" : false,
+          "packages" : [
+            "squidpy"
+          ],
+          "upgrade" : true
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "midcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/select_reference/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/process_dataset/select_reference",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import squidpy as sq
+
+### VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'input_layer': $( if [ ! -z ${VIASH_PAR_INPUT_LAYER+x} ]; then echo "r'${VIASH_PAR_INPUT_LAYER//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output': $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo "r'${VIASH_PAR_OUTPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'coord_type_proc': $( if [ ! -z ${VIASH_PAR_COORD_TYPE_PROC+x} ]; then echo "r'${VIASH_PAR_COORD_TYPE_PROC//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'num_features': $( if [ ! -z ${VIASH_PAR_NUM_FEATURES+x} ]; then echo "int(r'${VIASH_PAR_NUM_FEATURES//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+### VIASH END
+
+print(">> Load data", flush=True)
+adata = ad.read_h5ad(par['input'])
+
+print(">> Run Moran's I spatial autocorrelation", flush=True)
+sq.gr.spatial_neighbors(adata, 
+                        coord_type=par['coord_type_proc'], 
+                        delaunay=False)
+sq.gr.spatial_autocorr(adata, 
+                       layer="normalized",
+                       mode="moran", 
+                       n_perms=100, n_jobs=10, 
+                       genes=adata.var_names)
+
+n_svgs = par['num_features']
+sel_genes = (
+    adata.uns["moranI"]["I"].sort_values(ascending=False).head(n_svgs).index.tolist()
+)
+
+adata = adata[:, sel_genes]
+
+print(">> Writing data", flush=True)
+adata.write_h5ad(par['output'])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/process_dataset/select_reference",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "midcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/process_dataset/select_reference/nextflow.config b/target/nextflow/spatially_variable_genes/process_dataset/select_reference/nextflow.config
new file mode 100644
index 0000000000..8e7bd3daa8
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/process_dataset/select_reference/nextflow.config
@@ -0,0 +1,87 @@
+manifest {
+  name = 'spatially_variable_genes/process_dataset/select_reference'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+  description = 'Compute SVG'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/process_dataset/simulate_svg/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/process_dataset/simulate_svg/.config.vsh.yaml
new file mode 100644
index 0000000000..53ae88ee4a
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/process_dataset/simulate_svg/.config.vsh.yaml
@@ -0,0 +1,250 @@
+functionality:
+  name: "simulate_svg"
+  namespace: "spatially_variable_genes/process_dataset"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Common Dataset"
+      summary: "A subset of the common dataset."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "counts"
+          description: "Normalized expression values."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, usually a ENSEMBL gene\
+            \ id."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, usually a gene symbol."
+          required: true
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: false
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: false
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: false
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: false
+    example:
+    - "resources_test/common/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output"
+    info:
+      label: "Common Dataset"
+      summary: "A subset of the common dataset."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: true
+        - type: "string"
+          name: "orig_feature_id"
+          description: "Original unique identifier for the feature, usually a ENSEMBL\
+            \ gene id."
+          required: false
+        - type: "string"
+          name: "orig_feature_name"
+          description: "Original human-readable name for the feature, usually a gene\
+            \ symbol."
+          required: true
+        - type: "double"
+          name: "true_spatial_var_score"
+          description: "True spatial variability score"
+          required: true
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: true
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/simulated_dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--gp_k"
+    description: "Dimension of basis used for the Gaussian process smoother."
+    info:
+      test_default: 50
+    default:
+    - 500
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "integer"
+    name: "--select_top_variable_genes"
+    description: "Number of top variable genes to use for subsetting."
+    info: null
+    default:
+    - 50
+    required: false
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "r_script"
+    path: "script.R"
+    is_executable: true
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/common/mouse_brain_coronal_section1"
+    dest: "resources_test/common/mouse_brain_coronal_section1"
+  info:
+    type: "process_dataset"
+    type_info:
+      label: "Data processor"
+      summary: "A spatially variable genes simulator."
+      description: "Simulate spatially variable and spatially non-variable genes.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_r:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  setup:
+  - type: "r"
+    github:
+    - "SONGDONGYUAN1994/scDesign3"
+    bioc_force_install: false
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "hightime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/simulate_svg/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/process_dataset/simulate_svg"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/process_dataset/simulate_svg/simulate_svg"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/process_dataset/simulate_svg/main.nf b/target/nextflow/spatially_variable_genes/process_dataset/simulate_svg/main.nf
new file mode 100644
index 0000000000..b147f4a982
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/process_dataset/simulate_svg/main.nf
@@ -0,0 +1,3837 @@
+// simulate_svg 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "simulate_svg",
+    "namespace" : "spatially_variable_genes/process_dataset",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Common Dataset",
+          "summary" : "A subset of the common dataset.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "counts",
+                "description" : "Normalized expression values.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/common/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output",
+        "info" : {
+          "label" : "Common Dataset",
+          "summary" : "A subset of the common dataset.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "orig_feature_id",
+                "description" : "Original unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "orig_feature_name",
+                "description" : "Original human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "true_spatial_var_score",
+                "description" : "True spatial variability score",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/simulated_dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--gp_k",
+        "description" : "Dimension of basis used for the Gaussian process smoother.",
+        "info" : {
+          "test_default" : 50
+        },
+        "default" : [
+          500
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "integer",
+        "name" : "--select_top_variable_genes",
+        "description" : "Number of top variable genes to use for subsetting.",
+        "default" : [
+          50
+        ],
+        "required" : false,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "r_script",
+        "path" : "script.R",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/simulate_svg/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/common/mouse_brain_coronal_section1",
+        "dest" : "resources_test/common/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "process_dataset",
+      "type_info" : {
+        "label" : "Data processor",
+        "summary" : "A spatially variable genes simulator.",
+        "description" : "Simulate spatially variable and spatially non-variable genes.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "ghcr.io/openproblems-bio/base_r:1.0.4",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems",
+      "setup" : [
+        {
+          "type" : "r",
+          "github" : [
+            "SONGDONGYUAN1994/scDesign3"
+          ],
+          "bioc_force_install" : false
+        }
+      ]
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "hightime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/simulate_svg/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/process_dataset/simulate_svg",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+requireNamespace("scDesign3", quietly = TRUE)
+requireNamespace("anndata", quietly = TRUE)
+requireNamespace("Matrix", quietly = TRUE)
+requireNamespace("SingleCellExperiment", quietly = TRUE)
+library(rlang)
+
+# set random seed
+set.seed(2024)
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+# treat warnings as errors
+.viash_orig_warn <- options(warn = 2)
+
+par <- list(
+  "input" = $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_INPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "output" = $( if [ ! -z ${VIASH_PAR_OUTPUT+x} ]; then echo -n "'"; echo -n "$VIASH_PAR_OUTPUT" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "gp_k" = $( if [ ! -z ${VIASH_PAR_GP_K+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_GP_K" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "select_top_variable_genes" = $( if [ ! -z ${VIASH_PAR_SELECT_TOP_VARIABLE_GENES+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_PAR_SELECT_TOP_VARIABLE_GENES" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+meta <- list(
+  "functionality_name" = $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo -n "'"; echo -n "$VIASH_META_FUNCTIONALITY_NAME" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "resources_dir" = $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_RESOURCES_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "executable" = $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo -n "'"; echo -n "$VIASH_META_EXECUTABLE" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "config" = $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo -n "'"; echo -n "$VIASH_META_CONFIG" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "temp_dir" = $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo -n "'"; echo -n "$VIASH_META_TEMP_DIR" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "'"; else echo NULL; fi ),
+  "cpus" = $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo -n "as.integer('"; echo -n "$VIASH_META_CPUS" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_b" = $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_B" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_kb" = $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_KB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_mb" = $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_MB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_gb" = $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_GB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_tb" = $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_TB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi ),
+  "memory_pb" = $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo -n "bit64::as.integer64('"; echo -n "$VIASH_META_MEMORY_PB" | sed "s#['\\\\]#\\\\\\\\&#g"; echo "')"; else echo NULL; fi )
+)
+dep <- list(
+  
+)
+
+
+# restore original warn setting
+options(.viash_orig_warn)
+rm(.viash_orig_warn)
+
+## VIASH END
+
+cat("Read AnnData\\\\n")
+adata <- anndata::read_h5ad(par\\$input)
+
+cat("Transform into SCE\\\\n")
+df_loc <- as.data.frame(adata\\$obsm[['spatial']])
+colnames(df_loc) <- c("spatial1", "spatial2")
+rownames(df_loc) <- adata\\$obs_names
+
+ref_sce <- SingleCellExperiment::SingleCellExperiment(
+  list(counts = Matrix::t(adata\\$layers[["counts"]])),
+  colData = df_loc
+)
+
+ref_sce
+
+# check the number of genes in reference object
+n_genes <- dim(ref_sce)[1]
+
+mu_formula <- paste0(
+  "s(spatial1, spatial2, bs = 'gp', k = ", par\\$gp_k, ")"
+)
+
+if (n_genes > par\\$select_top_variable_genes) {
+  cat("Select ", par\\$select_top_variable_genes, " genes among ", n_genes, " reference genes ", "\\\\n", sep = "")
+
+  cat("Transform into scDesign3 data format\\\\n")
+  ref_data <- scDesign3::construct_data(
+    sce = ref_sce,
+    assay_use = "counts",
+    celltype = NULL,
+    pseudotime = NULL,
+    spatial = c("spatial1", "spatial2"),
+    other_covariates = NULL,
+    corr_by = "1"
+  )
+
+  cat("Fit regression models for each feature\\\\n")
+  ref_marginal <- scDesign3::fit_marginal(
+    data = ref_data,
+    predictor = "gene",
+    mu_formula = mu_formula,
+    sigma_formula = "1",
+    family_use = "nb",
+    parallelization = "pbmcmapply",
+    n_cores = 2L,
+    usebam = FALSE,
+    trace = TRUE
+  )
+
+  cat("Subset to the top variable genes\\\\n")
+  dev_explain <- sapply(ref_marginal, function(x) {
+    if (length(x\\$fit) == 1 && is.na(x\\$fit)) {
+      return(NA_real_)
+    }
+    summary(x\\$fit)\\$dev.expl
+  })
+  top_sel <- names(sort(dev_explain, decreasing = TRUE))[seq_len(par\\$select_top_variable_genes)]
+} else {
+  top_sel <- adata\\$var_names
+}
+
+ref_sce <- ref_sce[top_sel, ]
+var_subset <- adata\\$var[top_sel, , drop = FALSE]
+
+cat("Transform subset matrix into scDesign3 data format\\\\n")
+ref_data <- scDesign3::construct_data(
+  sce = ref_sce,
+  assay_use = "counts",
+  celltype = NULL,
+  pseudotime = NULL,
+  spatial = c("spatial1", "spatial2"),
+  other_covariates = NULL,
+  corr_by = "1"
+)
+
+cat("Fit expression of each gene with GP model\\\\n")
+ref_marginal <- scDesign3::fit_marginal(
+  data = ref_data,
+  predictor = "gene",
+  mu_formula = mu_formula,
+  sigma_formula = "1",
+  family_use = "nb",
+  parallelization = "pbmcmapply",
+  n_cores = 2L,
+  usebam = FALSE,
+  trace = TRUE
+)
+
+cat("Fit a copula, obtain AIC and BIC\\\\n")
+ref_copula <- scDesign3::fit_copula(
+  sce = ref_sce,
+  assay_use = "counts",
+  marginal_list = ref_marginal,
+  family_use = "nb",
+  copula = "gaussian",
+  parallelization = "pbmcmapply",
+  n_cores = 2L,
+  input_data = ref_data\\$dat
+)
+
+cat("Extract out the estimated parameters\\\\n")
+ref_para <- scDesign3::extract_para(
+  sce = ref_sce,
+  marginal_list = ref_marginal,
+  family_use = "nb",
+  new_covariate = ref_data\\$newCovariate,
+  data = ref_data\\$dat,
+  parallelization = "pbmcmapply",
+  n_cores = 2L
+)
+
+cat("Simulate the new count matrix\\\\n")
+# generate non-spatially variable mean values with shuffling
+shuffle_idx <- sample(nrow(ref_para\\$mean_mat))
+non_de_mat <- ref_para\\$mean_mat[shuffle_idx, ]
+
+# simulate data with varied spatial variability
+outputs <- lapply(seq(0, 1.0, 0.05), function(alpha){
+  cat("Simulate data with alpha = ", alpha, "\\\\n", sep = "")
+  counts <- scDesign3::simu_new(
+    sce = ref_sce,
+    mean_mat = alpha * ref_para\\$mean_mat + (1 - alpha) * non_de_mat,
+    sigma_mat = ref_para\\$sigma_mat,
+    zero_mat = ref_para\\$zero_mat,
+    quantile_mat = NULL,
+    copula_list = ref_copula\\$copula_list,
+    n_cores = 5L,
+    family_use = "nb",
+    input_data = ref_data\\$dat,
+    new_covariate = ref_data\\$newCovariate,
+    important_feature = rep(TRUE, nrow(ref_sce)),
+    filtered_gene = NULL
+  )
+
+  if ("feature_id" %in% names(var_subset)) {
+    new_var <- data.frame(
+      feature_id = paste0(var_subset\\$feature_id, "_", alpha),
+      feature_name = paste0(var_subset\\$feature_name, "_", alpha),
+      orig_feature_id = var_subset\\$feature_id,
+      orig_feature_name = var_subset\\$feature_name,
+      true_spatial_var_score = alpha
+    )
+    rownames(counts) <- new_var\\$feature_id
+    rownames(new_var) <- new_var\\$feature_id
+  } else {
+    new_var <- data.frame(
+      feature_id = paste0(var_subset\\$feature_name, "_", alpha),
+      feature_name = paste0(var_subset\\$feature_name, "_", alpha),
+      orig_feature_name = var_subset\\$feature_name,
+      true_spatial_var_score = alpha
+    )
+    rownames(counts) <- new_var\\$feature_name
+    rownames(new_var) <- new_var\\$feature_name
+  }
+
+  list(
+    counts = Matrix::t(counts),
+    var = new_var
+  )
+})
+
+cat("Collecting final output\\\\n", sep = "")
+final_counts <- do.call(cbind, lapply(outputs, function(x) x\\$counts))
+final_var <- do.call(rbind, lapply(outputs, function(x) x\\$var))
+final_uns <- adata\\$uns[c("dataset_id", "dataset_name", "dataset_description", "dataset_summary", "dataset_url", "dataset_organism", "dataset_reference")]
+
+output <- anndata::AnnData(
+  layers = list(counts = final_counts),
+  obs = adata\\$obs,
+  var = final_var,
+  obsm = adata\\$obsm,
+  uns = final_uns
+)
+
+zzz <- output\\$write_h5ad(par\\$output, compression = "gzip")
+VIASHMAIN
+Rscript "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/process_dataset/simulate_svg",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "hightime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/process_dataset/simulate_svg/nextflow.config b/target/nextflow/spatially_variable_genes/process_dataset/simulate_svg/nextflow.config
new file mode 100644
index 0000000000..c2f3e4e9c6
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/process_dataset/simulate_svg/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/process_dataset/simulate_svg'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/process_dataset/split_dataset/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/process_dataset/split_dataset/.config.vsh.yaml
new file mode 100644
index 0000000000..e1594af642
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/process_dataset/split_dataset/.config.vsh.yaml
@@ -0,0 +1,270 @@
+functionality:
+  name: "split_dataset"
+  namespace: "spatially_variable_genes/process_dataset"
+  version: "2.0.0"
+  arguments:
+  - type: "file"
+    name: "--input"
+    info:
+      label: "Common Dataset"
+      summary: "A subset of the common dataset."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: true
+        - type: "string"
+          name: "orig_feature_id"
+          description: "Original unique identifier for the feature, usually a ENSEMBL\
+            \ gene id."
+          required: false
+        - type: "string"
+          name: "orig_feature_name"
+          description: "Original human-readable name for the feature, usually a gene\
+            \ symbol."
+          required: true
+        - type: "double"
+          name: "true_spatial_var_score"
+          description: "True spatial variability score"
+          required: true
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: true
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/simulated_dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "input"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_dataset"
+    info:
+      label: "Dataset"
+      summary: "The dataset without spatially variable genes."
+      slots:
+        layers:
+        - type: "integer"
+          name: "counts"
+          description: "Raw counts."
+          required: true
+        - type: "double"
+          name: "normalized"
+          description: "Normalised expression values"
+          required: true
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature, in this case a ENSEMBL\
+            \ gene id suffixed with alpha value."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: false
+        obsm:
+        - type: "double"
+          name: "spatial"
+          description: "Spatial coordinates for each spot."
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: false
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  - type: "file"
+    name: "--output_solution"
+    info:
+      label: "Solution"
+      summary: "Anndata with true spatial variability."
+      description: "Anndata with true spatial variability score for each gene."
+      slots:
+        var:
+        - type: "string"
+          name: "feature_id"
+          description: "Unique identifier for the feature (e.g., ESEMBL gene id suffixed\
+            \ with alpha value)."
+          required: false
+        - type: "string"
+          name: "feature_name"
+          description: "A human-readable name for the feature, in this case a gene\
+            \ symbol suffixed with alpha value."
+          required: true
+        - type: "string"
+          name: "orig_feature_name"
+          description: "Original human-readable name for the feature, usually a gene\
+            \ symbol."
+          required: true
+        - type: "double"
+          name: "true_spatial_var_score"
+          description: "True spatial variability score"
+          required: true
+        uns:
+        - type: "string"
+          name: "dataset_id"
+          description: "A unique identifier for the dataset"
+          required: true
+        - name: "dataset_name"
+          type: "string"
+          description: "Nicely formatted name."
+          required: true
+        - type: "string"
+          name: "dataset_url"
+          description: "Link to the original source of the dataset."
+          required: true
+        - name: "dataset_reference"
+          type: "string"
+          description: "Bibtex reference of the paper in which the dataset was published."
+          required: false
+        - name: "dataset_summary"
+          type: "string"
+          description: "Short description of the dataset."
+          required: true
+        - name: "dataset_description"
+          type: "string"
+          description: "Long description of the dataset."
+          required: true
+        - name: "dataset_organism"
+          type: "string"
+          description: "The organism of the sample in the dataset."
+          required: true
+    example:
+    - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+    must_exist: true
+    create_parent: true
+    required: true
+    direction: "output"
+    multiple: false
+    multiple_sep: ":"
+    dest: "par"
+  resources:
+  - type: "python_script"
+    path: "script.py"
+    is_executable: true
+  - type: "file"
+    path: "src/common/helper_functions/subset_anndata.py"
+  test_resources:
+  - type: "python_script"
+    path: "src/common/comp_tests/run_and_check_adata.py"
+    is_executable: true
+  - type: "file"
+    path: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+    dest: "resources_test/spatially_variable_genes/mouse_brain_coronal_section1"
+  info:
+    type: "process_dataset"
+    type_info:
+      label: "Data processor"
+      summary: "A spatially variable genes dataset processor."
+      description: "Split the common dataset for the spatially_variable_genes task.\n"
+  status: "enabled"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "docker"
+  id: "docker"
+  image: "ghcr.io/openproblems-bio/base_python:1.0.4"
+  target_organization: "openproblems-bio/openproblems"
+  target_registry: "ghcr.io"
+  namespace_separator: "/"
+  resolve_volume: "Automatic"
+  chown: true
+  setup_strategy: "ifneedbepullelsecachedbuild"
+  target_image_source: "https://github.com/openproblems-bio/openproblems"
+  entrypoint: []
+  cmd: null
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    label:
+    - "midtime"
+    - "highmem"
+    - "highcpu"
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/split_dataset/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/process_dataset/split_dataset"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/process_dataset/split_dataset/split_dataset"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/process_dataset/split_dataset/main.nf b/target/nextflow/spatially_variable_genes/process_dataset/split_dataset/main.nf
new file mode 100644
index 0000000000..84b896cae3
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/process_dataset/split_dataset/main.nf
@@ -0,0 +1,3693 @@
+// split_dataset 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "split_dataset",
+    "namespace" : "spatially_variable_genes/process_dataset",
+    "version" : "2.0.0",
+    "arguments" : [
+      {
+        "type" : "file",
+        "name" : "--input",
+        "info" : {
+          "label" : "Common Dataset",
+          "summary" : "A subset of the common dataset.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "orig_feature_id",
+                "description" : "Original unique identifier for the feature, usually a ENSEMBL gene id.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "orig_feature_name",
+                "description" : "Original human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "true_spatial_var_score",
+                "description" : "True spatial variability score",
+                "required" : true
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/simulated_dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "input",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_dataset",
+        "info" : {
+          "label" : "Dataset",
+          "summary" : "The dataset without spatially variable genes.",
+          "slots" : {
+            "layers" : [
+              {
+                "type" : "integer",
+                "name" : "counts",
+                "description" : "Raw counts.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "normalized",
+                "description" : "Normalised expression values",
+                "required" : true
+              }
+            ],
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : false
+              }
+            ],
+            "obsm" : [
+              {
+                "type" : "double",
+                "name" : "spatial",
+                "description" : "Spatial coordinates for each spot.",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : false
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      },
+      {
+        "type" : "file",
+        "name" : "--output_solution",
+        "info" : {
+          "label" : "Solution",
+          "summary" : "Anndata with true spatial variability.",
+          "description" : "Anndata with true spatial variability score for each gene.",
+          "slots" : {
+            "var" : [
+              {
+                "type" : "string",
+                "name" : "feature_id",
+                "description" : "Unique identifier for the feature (e.g., ESEMBL gene id suffixed with alpha value).",
+                "required" : false
+              },
+              {
+                "type" : "string",
+                "name" : "feature_name",
+                "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "orig_feature_name",
+                "description" : "Original human-readable name for the feature, usually a gene symbol.",
+                "required" : true
+              },
+              {
+                "type" : "double",
+                "name" : "true_spatial_var_score",
+                "description" : "True spatial variability score",
+                "required" : true
+              }
+            ],
+            "uns" : [
+              {
+                "type" : "string",
+                "name" : "dataset_id",
+                "description" : "A unique identifier for the dataset",
+                "required" : true
+              },
+              {
+                "name" : "dataset_name",
+                "type" : "string",
+                "description" : "Nicely formatted name.",
+                "required" : true
+              },
+              {
+                "type" : "string",
+                "name" : "dataset_url",
+                "description" : "Link to the original source of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_reference",
+                "type" : "string",
+                "description" : "Bibtex reference of the paper in which the dataset was published.",
+                "required" : false
+              },
+              {
+                "name" : "dataset_summary",
+                "type" : "string",
+                "description" : "Short description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_description",
+                "type" : "string",
+                "description" : "Long description of the dataset.",
+                "required" : true
+              },
+              {
+                "name" : "dataset_organism",
+                "type" : "string",
+                "description" : "The organism of the sample in the dataset.",
+                "required" : true
+              }
+            ]
+          }
+        },
+        "example" : [
+          "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+        ],
+        "must_exist" : true,
+        "create_parent" : true,
+        "required" : true,
+        "direction" : "output",
+        "multiple" : false,
+        "multiple_sep" : ":",
+        "dest" : "par"
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "python_script",
+        "path" : "script.py",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/split_dataset/"
+      },
+      {
+        "type" : "file",
+        "path" : "src/common/helper_functions/subset_anndata.py",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "test_resources" : [
+      {
+        "type" : "python_script",
+        "path" : "src/common/comp_tests/run_and_check_adata.py",
+        "is_executable" : true,
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      },
+      {
+        "type" : "file",
+        "path" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "dest" : "resources_test/spatially_variable_genes/mouse_brain_coronal_section1",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "info" : {
+      "type" : "process_dataset",
+      "type_info" : {
+        "label" : "Data processor",
+        "summary" : "A spatially variable genes dataset processor.",
+        "description" : "Split the common dataset for the spatially_variable_genes task.\n"
+      }
+    },
+    "status" : "enabled",
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "docker",
+      "id" : "docker",
+      "image" : "ghcr.io/openproblems-bio/base_python:1.0.4",
+      "target_organization" : "openproblems-bio/openproblems",
+      "target_registry" : "ghcr.io",
+      "namespace_separator" : "/",
+      "resolve_volume" : "Automatic",
+      "chown" : true,
+      "setup_strategy" : "ifneedbepullelsecachedbuild",
+      "target_image_source" : "https://github.com/openproblems-bio/openproblems"
+    },
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "label" : [
+          "midtime",
+          "highmem",
+          "highcpu"
+        ],
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/split_dataset/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/process_dataset/split_dataset",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+
+
+// inner workflow
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  def rawScript = '''set -e
+tempscript=".viash_script.sh"
+cat > "$tempscript" << VIASHMAIN
+import anndata as ad
+import sys 
+
+## VIASH START
+# The following code has been auto-generated by Viash.
+par = {
+  'input': $( if [ ! -z ${VIASH_PAR_INPUT+x} ]; then echo "r'${VIASH_PAR_INPUT//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_dataset': $( if [ ! -z ${VIASH_PAR_OUTPUT_DATASET+x} ]; then echo "r'${VIASH_PAR_OUTPUT_DATASET//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'output_solution': $( if [ ! -z ${VIASH_PAR_OUTPUT_SOLUTION+x} ]; then echo "r'${VIASH_PAR_OUTPUT_SOLUTION//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi )
+}
+meta = {
+  'functionality_name': $( if [ ! -z ${VIASH_META_FUNCTIONALITY_NAME+x} ]; then echo "r'${VIASH_META_FUNCTIONALITY_NAME//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'resources_dir': $( if [ ! -z ${VIASH_META_RESOURCES_DIR+x} ]; then echo "r'${VIASH_META_RESOURCES_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'executable': $( if [ ! -z ${VIASH_META_EXECUTABLE+x} ]; then echo "r'${VIASH_META_EXECUTABLE//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'config': $( if [ ! -z ${VIASH_META_CONFIG+x} ]; then echo "r'${VIASH_META_CONFIG//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'temp_dir': $( if [ ! -z ${VIASH_META_TEMP_DIR+x} ]; then echo "r'${VIASH_META_TEMP_DIR//\\'/\\'\\"\\'\\"r\\'}'"; else echo None; fi ),
+  'cpus': $( if [ ! -z ${VIASH_META_CPUS+x} ]; then echo "int(r'${VIASH_META_CPUS//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_b': $( if [ ! -z ${VIASH_META_MEMORY_B+x} ]; then echo "int(r'${VIASH_META_MEMORY_B//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_kb': $( if [ ! -z ${VIASH_META_MEMORY_KB+x} ]; then echo "int(r'${VIASH_META_MEMORY_KB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_mb': $( if [ ! -z ${VIASH_META_MEMORY_MB+x} ]; then echo "int(r'${VIASH_META_MEMORY_MB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_gb': $( if [ ! -z ${VIASH_META_MEMORY_GB+x} ]; then echo "int(r'${VIASH_META_MEMORY_GB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_tb': $( if [ ! -z ${VIASH_META_MEMORY_TB+x} ]; then echo "int(r'${VIASH_META_MEMORY_TB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi ),
+  'memory_pb': $( if [ ! -z ${VIASH_META_MEMORY_PB+x} ]; then echo "int(r'${VIASH_META_MEMORY_PB//\\'/\\'\\"\\'\\"r\\'}')"; else echo None; fi )
+}
+dep = {
+  
+}
+
+## VIASH END
+
+sys.path.append(meta['resources_dir'])
+from subset_anndata import read_config_slots_info, subset_anndata
+
+print(">> Load dataset", flush=True)
+adata = ad.read_h5ad(par["input"])
+
+print(">> Figuring out which data needs to be copied to which output file", flush=True)
+slot_info = read_config_slots_info(meta["config"])
+
+print(">> Create dataset for methods", flush=True)
+output_dataset = subset_anndata(adata, slot_info['output_dataset'])
+
+print(">> Create solution object for metrics", flush=True)
+output_solution = subset_anndata(adata, slot_info['output_solution'])
+
+print(">> Write to disk", flush=True)
+output_dataset.write_h5ad(par["output_dataset"])
+output_solution.write_h5ad(par["output_solution"])
+VIASHMAIN
+python -B "$tempscript"
+'''
+  
+  return vdsl3WorkflowFactory(args, meta, rawScript)
+}
+
+
+
+/**
+  * Generate a workflow for VDSL3 modules.
+  * 
+  * This function is called by the workflowFactory() function.
+  * 
+  * Input channel: [id, input_map]
+  * Output channel: [id, output_map]
+  * 
+  * Internally, this workflow will convert the input channel
+  * to a format which the Nextflow module will be able to handle.
+  */
+def vdsl3WorkflowFactory(Map args, Map meta, String rawScript) {
+  def key = args["key"]
+  def processObj = null
+
+  workflow processWf {
+    take: input_
+    main:
+
+    if (processObj == null) {
+      processObj = _vdsl3ProcessFactory(args, meta, rawScript)
+    }
+    
+    output_ = input_
+      | map { tuple ->
+        def id = tuple[0]
+        def data_ = tuple[1]
+
+        if (workflow.stubRun) {
+          // add id if missing
+          data_ = [id: 'stub'] + data_
+        }
+
+        // process input files separately
+        def inputPaths = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "input" }
+          .collect { par ->
+            def val = data_.containsKey(par.plainName) ? data_[par.plainName] : []
+            def inputFiles = []
+            if (val == null) {
+              inputFiles = []
+            } else if (val instanceof List) {
+              inputFiles = val
+            } else if (val instanceof Path) {
+              inputFiles = [ val ]
+            } else {
+              inputFiles = []
+            }
+            if (!workflow.stubRun) {
+              // throw error when an input file doesn't exist
+              inputFiles.each{ file -> 
+                assert file.exists() :
+                  "Error in module '${key}' id '${id}' argument '${par.plainName}'.\n" +
+                  "  Required input file does not exist.\n" +
+                  "  Path: '$file'.\n" +
+                  "  Expected input file to exist"
+              }
+            }
+            inputFiles 
+          } 
+
+        // remove input files
+        def argsExclInputFiles = meta.config.functionality.allArguments
+          .findAll { (it.type != "file" || it.direction != "input") && data_.containsKey(it.plainName) }
+          .collectEntries { par ->
+            def parName = par.plainName
+            def val = data_[parName]
+            if (par.multiple && val instanceof Collection) {
+              val = val.join(par.multiple_sep)
+            }
+            if (par.direction == "output" && par.type == "file") {
+              val = val.replaceAll('\\$id', id).replaceAll('\\$key', key)
+            }
+            [parName, val]
+          }
+
+        [ id ] + inputPaths + [ argsExclInputFiles, meta.resources_dir ]
+      }
+      | processObj
+      | map { output ->
+        def outputFiles = meta.config.functionality.allArguments
+          .findAll { it.type == "file" && it.direction == "output" }
+          .indexed()
+          .collectEntries{ index, par ->
+            def out = output[index + 1]
+            // strip dummy '.exitcode' file from output (see nextflow-io/nextflow#2678)
+            if (!out instanceof List || out.size() <= 1) {
+              if (par.multiple) {
+                out = []
+              } else {
+                assert !par.required :
+                    "Error in module '${key}' id '${output[0]}' argument '${par.plainName}'.\n" +
+                    "  Required output file is missing"
+                out = null
+              }
+            } else if (out.size() == 2 && !par.multiple) {
+              out = out[1]
+            } else {
+              out = out.drop(1)
+            }
+            [ par.plainName, out ]
+          }
+        
+        // drop null outputs
+        outputFiles.removeAll{it.value == null}
+
+        [ output[0], outputFiles ]
+      }
+    emit: output_
+  }
+
+  return processWf
+}
+
+// depends on: session?
+def _vdsl3ProcessFactory(Map workflowArgs, Map meta, String rawScript) {
+  // autodetect process key
+  def wfKey = workflowArgs["key"]
+  def procKeyPrefix = "${wfKey}_process"
+  def scriptMeta = nextflow.script.ScriptMeta.current()
+  def existing = scriptMeta.getProcessNames().findAll{it.startsWith(procKeyPrefix)}
+  def numbers = existing.collect{it.replace(procKeyPrefix, "0").toInteger()}
+  def newNumber = (numbers + [-1]).max() + 1
+
+  def procKey = newNumber == 0 ? procKeyPrefix : "$procKeyPrefix$newNumber"
+
+  if (newNumber > 0) {
+    log.warn "Key for module '${wfKey}' is duplicated.\n",
+      "If you run a component multiple times in the same workflow,\n" +
+      "it's recommended you set a unique key for every call,\n" +
+      "for example: ${wfKey}.run(key: \"foo\")."
+  }
+
+  // subset directives and convert to list of tuples
+  def drctv = workflowArgs.directives
+
+  // TODO: unit test the two commands below
+  // convert publish array into tags
+  def valueToStr = { val ->
+    // ignore closures
+    if (val instanceof CharSequence) {
+      if (!val.matches('^[{].*[}]$')) {
+        '"' + val + '"'
+      } else {
+        val
+      }
+    } else if (val instanceof List) {
+      "[" + val.collect{valueToStr(it)}.join(", ") + "]"
+    } else if (val instanceof Map) {
+      "[" + val.collect{k, v -> k + ": " + valueToStr(v)}.join(", ") + "]"
+    } else {
+      val.inspect()
+    }
+  }
+
+  // multiple entries allowed: label, publishdir
+  def drctvStrs = drctv.collect { key, value ->
+    if (key in ["label", "publishDir"]) {
+      value.collect{ val ->
+        if (val instanceof Map) {
+          "\n$key " + val.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+        } else if (val == null) {
+          ""
+        } else {
+          "\n$key " + valueToStr(val)
+        }
+      }.join()
+    } else if (value instanceof Map) {
+      "\n$key " + value.collect{ k, v -> k + ": " + valueToStr(v) }.join(", ")
+    } else {
+      "\n$key " + valueToStr(value)
+    }
+  }.join()
+
+  def inputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "input" }
+    .collect { ', path(viash_par_' + it.plainName + ', stageAs: "_viash_par/' + it.plainName + '_?/*")' }
+    .join()
+
+  def outputPaths = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par ->
+      // insert dummy into every output (see nextflow-io/nextflow#2678)
+      if (!par.multiple) {
+        ', path{[".exitcode", args.' + par.plainName + ']}'
+      } else {
+        ', path{[".exitcode"] + args.' + par.plainName + '}'
+      }
+    }
+    .join()
+
+  // TODO: move this functionality somewhere else?
+  if (workflowArgs.auto.transcript) {
+    outputPaths = outputPaths + ', path{[".exitcode", ".command*"]}'
+  } else {
+    outputPaths = outputPaths + ', path{[".exitcode"]}'
+  }
+
+  // create dirs for output files (based on BashWrapper.createParentFiles)
+  def createParentStr = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" && it.create_parent }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"mkdir_parent \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"] : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // construct inputFileExports
+  def inputFileExports = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction.toLowerCase() == "input" }
+    .collect { par ->
+      def viash_par_contents = "(viash_par_${par.plainName} instanceof List ? viash_par_${par.plainName}.join(\"${par.multiple_sep}\") : viash_par_${par.plainName})"
+      "\n\${viash_par_${par.plainName}.empty ? \"\" : \"export VIASH_PAR_${par.plainName.toUpperCase()}=\\\"\" + ${viash_par_contents} + \"\\\"\"}"
+    }
+
+  // NOTE: if using docker, use /tmp instead of tmpDir!
+  def tmpDir = java.nio.file.Paths.get(
+    System.getenv('NXF_TEMP') ?: 
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('VIASH_TMPDIR') ?: 
+    System.getenv('VIASH_TEMPDIR') ?: 
+    System.getenv('VIASH_TMP') ?: 
+    System.getenv('TEMP') ?: 
+    System.getenv('TMPDIR') ?: 
+    System.getenv('TEMPDIR') ?:
+    System.getenv('TMP') ?: 
+    '/tmp'
+  ).toAbsolutePath()
+
+  // construct stub
+  def stub = meta.config.functionality.allArguments
+    .findAll { it.type == "file" && it.direction == "output" }
+    .collect { par -> 
+      "\${ args.containsKey(\"${par.plainName}\") ? \"touch2 \\\"\" + (args[\"${par.plainName}\"] instanceof String ? args[\"${par.plainName}\"].replace(\"_*\", \"_0\") : args[\"${par.plainName}\"].join('\" \"')) + \"\\\"\" : \"\" }"
+    }
+    .join("\n")
+
+  // escape script
+  def escapedScript = rawScript.replace('\\', '\\\\').replace('$', '\\$').replace('"""', '\\"\\"\\"')
+
+  // publishdir assert
+  def assertStr = (workflowArgs.auto.publish == true) || workflowArgs.auto.transcript ? 
+    """\nassert task.publishDir.size() > 0: "if auto.publish is true, params.publish_dir needs to be defined.\\n  Example: --publish_dir './output/'" """ :
+    ""
+
+  // generate process string
+  def procStr = 
+  """nextflow.enable.dsl=2
+  |
+  |process $procKey {$drctvStrs
+  |input:
+  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
+  |output:
+  |  tuple val("\$id")$outputPaths, optional: true
+  |stub:
+  |\"\"\"
+  |touch2() { mkdir -p "\\\$(dirname "\\\$1")" && touch "\\\$1" ; }
+  |$stub
+  |\"\"\"
+  |script:$assertStr
+  |def escapeText = { s -> s.toString().replaceAll('([`"])', '\\\\\\\\\$1') }
+  |def parInject = args
+  |  .findAll{key, value -> value != null}
+  |  .collect{key, value -> "export VIASH_PAR_\${key.toUpperCase()}=\\\"\${escapeText(value)}\\\""}
+  |  .join("\\n")
+  |\"\"\"
+  |# meta exports
+  |export VIASH_META_RESOURCES_DIR="\${resourcesDir}"
+  |export VIASH_META_TEMP_DIR="${['docker', 'podman', 'charliecloud'].any{ it == workflow.containerEngine } ? '/tmp' : tmpDir}"
+  |export VIASH_META_FUNCTIONALITY_NAME="${meta.config.functionality.name}"
+  |# export VIASH_META_EXECUTABLE="\\\$VIASH_META_RESOURCES_DIR/\\\$VIASH_META_FUNCTIONALITY_NAME"
+  |export VIASH_META_CONFIG="\\\$VIASH_META_RESOURCES_DIR/.config.vsh.yaml"
+  |\${task.cpus ? "export VIASH_META_CPUS=\$task.cpus" : "" }
+  |\${task.memory?.bytes != null ? "export VIASH_META_MEMORY_B=\$task.memory.bytes" : "" }
+  |if [ ! -z \\\${VIASH_META_MEMORY_B+x} ]; then
+  |  export VIASH_META_MEMORY_KB=\\\$(( (\\\$VIASH_META_MEMORY_B+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_MB=\\\$(( (\\\$VIASH_META_MEMORY_KB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_GB=\\\$(( (\\\$VIASH_META_MEMORY_MB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_TB=\\\$(( (\\\$VIASH_META_MEMORY_GB+1023) / 1024 ))
+  |  export VIASH_META_MEMORY_PB=\\\$(( (\\\$VIASH_META_MEMORY_TB+1023) / 1024 ))
+  |fi
+  |
+  |# meta synonyms
+  |export VIASH_TEMP="\\\$VIASH_META_TEMP_DIR"
+  |export TEMP_DIR="\\\$VIASH_META_TEMP_DIR"
+  |
+  |# create output dirs if need be
+  |function mkdir_parent {
+  |  for file in "\\\$@"; do 
+  |    mkdir -p "\\\$(dirname "\\\$file")"
+  |  done
+  |}
+  |$createParentStr
+  |
+  |# argument exports${inputFileExports.join()}
+  |\$parInject
+  |
+  |# process script
+  |${escapedScript}
+  |\"\"\"
+  |}
+  |""".stripMargin()
+
+  // TODO: print on debug
+  // if (workflowArgs.debug == true) {
+  //   println("######################\n$procStr\n######################")
+  // }
+
+  // write process to temp file
+  def tempFile = java.nio.file.Files.createTempFile("viash-process-${procKey}-", ".nf")
+  addShutdownHook { java.nio.file.Files.deleteIfExists(tempFile) }
+  tempFile.text = procStr
+
+  // create process from temp file
+  def binding = new nextflow.script.ScriptBinding([:])
+  def session = nextflow.Nextflow.getSession()
+  def parser = new nextflow.script.ScriptParser(session)
+    .setModule(true)
+    .setBinding(binding)
+  def moduleScript = parser.runScript(tempFile)
+    .getScript()
+
+  // register module in meta
+  def module = new nextflow.script.IncludeDef.Module(name: procKey)
+  scriptMeta.addModule(moduleScript, module.name, module.alias)
+
+  // retrieve and return process from meta
+  return scriptMeta.getProcess(procKey)
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "container" : {
+    "registry" : "ghcr.io",
+    "image" : "openproblems-bio/openproblems/spatially_variable_genes/process_dataset/split_dataset",
+    "tag" : "2.0.0"
+  },
+  "label" : [
+    "midtime",
+    "highmem",
+    "highcpu"
+  ],
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/process_dataset/split_dataset/nextflow.config b/target/nextflow/spatially_variable_genes/process_dataset/split_dataset/nextflow.config
new file mode 100644
index 0000000000..1f7d2f0155
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/process_dataset/split_dataset/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/process_dataset/split_dataset'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/process_dataset/split_dataset/subset_anndata.py b/target/nextflow/spatially_variable_genes/process_dataset/split_dataset/subset_anndata.py
new file mode 100644
index 0000000000..80bd160872
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/process_dataset/split_dataset/subset_anndata.py
@@ -0,0 +1,83 @@
+"""Helper functions related to subsetting AnnData objects based on the file format 
+specifications in the .config.vsh.yaml and slot mapping overrides."""
+
+def read_config_slots_info(config_file, slot_mapping = {}):
+    """Read the .config.vsh.yaml to find out which output slots need to be copied to which output file.
+    
+    Arguments:
+    config_file -- Path to the .config.vsh.yaml file (required).
+    slot_mapping -- Which slots to retain. Must be a dictionary whose keys are the names 
+      of the AnnData structs, and values is another dictionary with destination value
+      names as keys and source value names as values.
+      Example of slot_mapping: 
+        ```
+        slot_mapping = {
+          "layers": {
+            "counts": par["layer_counts"],
+          },
+          "obs": {
+            "cell_type": par["obs_cell_type"],
+            "batch": par["obs_batch"],
+          }
+        }
+        ```
+    """
+    import yaml
+    import re
+
+    # read output spec from yaml
+    with open(config_file, "r") as object_name:
+        config = yaml.safe_load(object_name)
+
+    output_struct_slots = {}
+
+    # fetch info on which slots should be copied to which file
+    for arg in config["functionality"]["arguments"]:
+        # argument is an output file with a slot specification
+        if arg["direction"] == "output" and arg.get("info", {}).get("slots"):
+            object_name = re.sub("--", "", arg["name"])
+            
+            struct_slots = arg['info']['slots']
+            out = {}
+            for (struct, slots) in struct_slots.items():
+                out_struct = {}
+                for slot in slots:
+                    # if slot_mapping[struct][slot['name']] exists, use that as the source slot name
+                    # otherwise use slot['name']
+                    source_slot = slot_mapping.get(struct, {}).get(slot["name"], slot["name"])
+                    out_struct[slot["name"]] = source_slot
+                out[struct] = out_struct
+
+            output_struct_slots[object_name] = out
+
+    return output_struct_slots
+
+# create new anndata objects according to api spec
+def subset_anndata(adata, slot_info):
+    """Create new anndata object according to slot info specifications.
+    
+    Arguments:
+    adata -- An AnnData object to subset (required)
+    slot_info -- Which slots to retain, typically one of the items in the output of read_config_slots_info.
+      Must be a dictionary whose keys are the names of the AnnData structs, and values is another 
+      dictionary with destination value names as keys and source value names as values. 
+      """
+    import pandas as pd
+    import anndata as ad
+
+    structs = ["layers", "obs", "var", "uns", "obsp", "obsm", "varp", "varm"]
+    kwargs = {}
+
+    for struct in structs:
+        slot_mapping = slot_info.get(struct, {})
+        data = {dest : getattr(adata, struct)[src] for (dest, src) in slot_mapping.items()}
+        if len(data) > 0:
+            if struct in ['obs', 'var']:
+                data = pd.concat(data, axis=1)
+            kwargs[struct] = data
+        elif struct in ['obs', 'var']:
+            # if no columns need to be copied, we still need an 'obs' and a 'var' 
+            # to help determine the shape of the adata
+            kwargs[struct] = getattr(adata, struct).iloc[:,[]]
+
+    return ad.AnnData(**kwargs)
\ No newline at end of file
diff --git a/target/nextflow/spatially_variable_genes/workflows/process_datasets/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/workflows/process_datasets/.config.vsh.yaml
new file mode 100644
index 0000000000..1bbc548930
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/workflows/process_datasets/.config.vsh.yaml
@@ -0,0 +1,467 @@
+functionality:
+  name: "process_datasets"
+  namespace: "spatially_variable_genes/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input"
+      description: "Input dataset"
+      info:
+        label: "Common Dataset"
+        summary: "A subset of the common dataset."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts."
+            required: true
+          - type: "double"
+            name: "counts"
+            description: "Normalized expression values."
+            required: true
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, usually a ENSEMBL gene\
+              \ id."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, usually a gene symbol."
+            required: true
+          obsm:
+          - type: "double"
+            name: "spatial"
+            description: "Spatial coordinates for each spot."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: false
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: false
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: false
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: false
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: false
+      example:
+      - "resources_test/common/mouse_brain_coronal_section1/dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_dataset"
+      info:
+        label: "Dataset"
+        summary: "The dataset without spatially variable genes."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts."
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalised expression values"
+            required: true
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, in this case a ENSEMBL\
+              \ gene id suffixed with alpha value."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, in this case a gene\
+              \ symbol suffixed with alpha value."
+            required: false
+          obsm:
+          - type: "double"
+            name: "spatial"
+            description: "Spatial coordinates for each spot."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: false
+      example:
+      - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_solution"
+      info:
+        label: "Solution"
+        summary: "Anndata with true spatial variability."
+        description: "Anndata with true spatial variability score for each gene."
+        slots:
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature (e.g., ESEMBL gene id\
+              \ suffixed with alpha value)."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, in this case a gene\
+              \ symbol suffixed with alpha value."
+            required: true
+          - type: "string"
+            name: "orig_feature_name"
+            description: "Original human-readable name for the feature, usually a\
+              \ gene symbol."
+            required: true
+          - type: "double"
+            name: "true_spatial_var_score"
+            description: "True spatial variability score"
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: true
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: true
+      example:
+      - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--dataset_simulated_normalized"
+      info:
+        label: "Common Dataset"
+        summary: "A subset of the common dataset."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts."
+            required: true
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, in this case a ENSEMBL\
+              \ gene id suffixed with alpha value."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, in this case a gene\
+              \ symbol suffixed with alpha value."
+            required: true
+          - type: "string"
+            name: "orig_feature_id"
+            description: "Original unique identifier for the feature, usually a ENSEMBL\
+              \ gene id."
+            required: false
+          - type: "string"
+            name: "orig_feature_name"
+            description: "Original human-readable name for the feature, usually a\
+              \ gene symbol."
+            required: true
+          - type: "double"
+            name: "true_spatial_var_score"
+            description: "True spatial variability score"
+            required: true
+          obsm:
+          - type: "double"
+            name: "spatial"
+            description: "Spatial coordinates for each spot."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: true
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: true
+      example:
+      - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/simulated_dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: false
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Simulation options"
+    arguments:
+    - type: "integer"
+      name: "--gp_k_sim"
+      description: "Dimension of basis used for the Gaussian process smoother."
+      info:
+        test_value: 50
+      default:
+      - 500
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--select_top_variable_genes_sim"
+      description: "Number of top variable genes to use for subsetting."
+      info: null
+      default:
+      - 50
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Reference genes"
+    arguments:
+    - type: "integer"
+      name: "--num_reference_genes"
+      description: "Number of top SVGs to select as reference."
+      info: null
+      default:
+      - 200
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--coord_type_proc"
+      description: "How to create spatial graph to select reference genes."
+      info: null
+      default:
+      - "grid"
+      required: false
+      choices:
+      - "grid"
+      - "generic"
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Normalization options"
+    arguments:
+    - type: "integer"
+      name: "--n_cp"
+      description: "Number of counts per cell. When set to -1, will use None."
+      info: null
+      default:
+      - -1
+      required: false
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "src/wf_utils/helper.nf"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/check_dataset_schema"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+    configInfo:
+      functionalityName: "check_dataset_schema"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/check_dataset_schema/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+  - name: "spatially_variable_genes/process_dataset/select_reference"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/select_reference/config.vsh.yaml"
+    configInfo:
+      functionalityName: "select_reference"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/select_reference/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/process_dataset"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/process_dataset/select_reference/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/process_dataset/select_reference"
+  - name: "spatially_variable_genes/process_dataset/simulate_svg"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/simulate_svg/config.vsh.yaml"
+    configInfo:
+      functionalityName: "simulate_svg"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/simulate_svg/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/process_dataset"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/process_dataset/simulate_svg/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/process_dataset/simulate_svg"
+  - name: "datasets/normalization/log_cp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+    configInfo:
+      functionalityName: "log_cp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml"
+      functionalityNamespace: "datasets/normalization"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/datasets/normalization/log_cp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp"
+  - name: "spatially_variable_genes/process_dataset/split_dataset"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/split_dataset/config.vsh.yaml"
+    configInfo:
+      functionalityName: "split_dataset"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/split_dataset/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/process_dataset"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/process_dataset/split_dataset/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/process_dataset/split_dataset"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/workflows/process_datasets/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/workflows/process_datasets"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/workflows/process_datasets/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/workflows/process_datasets/helper.nf b/target/nextflow/spatially_variable_genes/workflows/process_datasets/helper.nf
new file mode 100644
index 0000000000..7b3acd5b1c
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/workflows/process_datasets/helper.nf
@@ -0,0 +1,14 @@
+Map findArgumentSchema(Map config, String argument_id) {
+  def argument_groups =
+    (config.functionality.argument_groups ?: []) +
+    [
+      arguments: config.functionality.arguments ?: []
+    ]
+
+  def schema_value = argument_groups.findResult{ gr ->
+    gr.arguments.find { arg ->
+      arg.name == ("--" + argument_id)
+    }
+  }
+  return schema_value
+}
diff --git a/target/nextflow/spatially_variable_genes/workflows/process_datasets/main.nf b/target/nextflow/spatially_variable_genes/workflows/process_datasets/main.nf
new file mode 100644
index 0000000000..1d52ddd4bf
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/workflows/process_datasets/main.nf
@@ -0,0 +1,3657 @@
+// process_datasets 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "process_datasets",
+    "namespace" : "spatially_variable_genes/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input",
+            "description" : "Input dataset",
+            "info" : {
+              "label" : "Common Dataset",
+              "summary" : "A subset of the common dataset.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "counts",
+                    "description" : "Normalized expression values.",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "spatial",
+                    "description" : "Spatial coordinates for each spot.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : false
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/common/mouse_brain_coronal_section1/dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_dataset",
+            "info" : {
+              "label" : "Dataset",
+              "summary" : "The dataset without spatially variable genes.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalised expression values",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                    "required" : false
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "spatial",
+                    "description" : "Spatial coordinates for each spot.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : false
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_solution",
+            "info" : {
+              "label" : "Solution",
+              "summary" : "Anndata with true spatial variability.",
+              "description" : "Anndata with true spatial variability score for each gene.",
+              "slots" : {
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature (e.g., ESEMBL gene id suffixed with alpha value).",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "orig_feature_name",
+                    "description" : "Original human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "true_spatial_var_score",
+                    "description" : "True spatial variability score",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--dataset_simulated_normalized",
+            "info" : {
+              "label" : "Common Dataset",
+              "summary" : "A subset of the common dataset.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts.",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "orig_feature_id",
+                    "description" : "Original unique identifier for the feature, usually a ENSEMBL gene id.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "orig_feature_name",
+                    "description" : "Original human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "true_spatial_var_score",
+                    "description" : "True spatial variability score",
+                    "required" : true
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "spatial",
+                    "description" : "Spatial coordinates for each spot.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/simulated_dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : false,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Simulation options",
+        "arguments" : [
+          {
+            "type" : "integer",
+            "name" : "--gp_k_sim",
+            "description" : "Dimension of basis used for the Gaussian process smoother.",
+            "info" : {
+              "test_value" : 50
+            },
+            "default" : [
+              500
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--select_top_variable_genes_sim",
+            "description" : "Number of top variable genes to use for subsetting.",
+            "default" : [
+              50
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Reference genes",
+        "arguments" : [
+          {
+            "type" : "integer",
+            "name" : "--num_reference_genes",
+            "description" : "Number of top SVGs to select as reference.",
+            "default" : [
+              200
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--coord_type_proc",
+            "description" : "How to create spatial graph to select reference genes.",
+            "default" : [
+              "grid"
+            ],
+            "required" : false,
+            "choices" : [
+              "grid",
+              "generic"
+            ],
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Normalization options",
+        "arguments" : [
+          {
+            "type" : "integer",
+            "name" : "--n_cp",
+            "description" : "Number of counts per cell. When set to -1, will use None.",
+            "default" : [
+              -1
+            ],
+            "required" : false,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/workflows/process_datasets/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "src/wf_utils/helper.nf",
+        "parent" : "file:///home/runner/work/openproblems/openproblems/"
+      }
+    ],
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/check_dataset_schema",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "check_dataset_schema",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/check_dataset_schema/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+      },
+      {
+        "name" : "spatially_variable_genes/process_dataset/select_reference",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/select_reference/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "select_reference",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/select_reference/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/process_dataset",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/process_dataset/select_reference/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/process_dataset/select_reference"
+      },
+      {
+        "name" : "spatially_variable_genes/process_dataset/simulate_svg",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/simulate_svg/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "simulate_svg",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/simulate_svg/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/process_dataset",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/process_dataset/simulate_svg/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/process_dataset/simulate_svg"
+      },
+      {
+        "name" : "datasets/normalization/log_cp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "log_cp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/datasets/normalization/log_cp/config.vsh.yaml",
+          "functionalityNamespace" : "datasets/normalization",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/datasets/normalization/log_cp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/datasets/normalization/log_cp"
+      },
+      {
+        "name" : "spatially_variable_genes/process_dataset/split_dataset",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/split_dataset/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "split_dataset",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/process_dataset/split_dataset/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/process_dataset",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/process_dataset/split_dataset/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/process_dataset/split_dataset"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/workflows/process_datasets/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/workflows/process_datasets",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { check_dataset_schema } from "${meta.resources_dir}/../../../../nextflow/common/check_dataset_schema/main.nf"
+include { select_reference } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/process_dataset/select_reference/main.nf"
+include { simulate_svg } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/process_dataset/simulate_svg/main.nf"
+include { log_cp } from "${meta.resources_dir}/../../../../nextflow/datasets/normalization/log_cp/main.nf"
+include { split_dataset } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/process_dataset/split_dataset/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+include { findArgumentSchema } from "${meta.resources_dir}/helper.nf"
+
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+  output_ch = input_ch
+
+    | check_dataset_schema.run(
+      fromState: { id, state ->
+        def schema = findArgumentSchema(meta.config, "input")
+        def schemaYaml = tempFile("schema.yaml")
+        writeYaml(schema, schemaYaml)
+        [
+          "input": state.input,
+          "schema": schemaYaml
+        ]
+      },
+      toState: { id, output, state ->
+        // read the output to see if dataset passed the qc
+        def checks = readYaml(output.output)
+        state + [
+          "dataset": checks["exit_code"] == 0 ? state.input : null,
+        ]
+      }
+    )
+
+    // remove datasets which didn't pass the schema check
+    | filter { id, state ->
+      state.dataset != null
+    }
+
+    | select_reference.run(
+      fromState: [
+        input: "dataset",
+        num_features: "num_reference_genes",
+        coord_type_proc: "coord_type_proc"
+      ],
+      toState: [dataset: "output"]
+    )
+
+    | simulate_svg.run(
+      fromState: [
+        input: "dataset",
+        gp_k: "gp_k_sim",
+        select_top_variable_genes: "select_top_variable_genes_sim"
+      ],
+      toState: [
+        dataset_simulated: "output"
+      ]
+    )
+
+    | log_cp.run(
+      fromState: [
+        input: "dataset_simulated",
+      ],
+      toState: [
+        dataset_simulated_normalized: "output"
+      ],
+      args: [n_cp: -1]
+    )
+
+    | split_dataset.run(
+      fromState: [
+        input: "dataset_simulated_normalized"
+      ],
+      toState: [
+        output_dataset: "output_dataset",
+        output_solution: "output_solution" 
+      ]
+    )
+
+    // only output the files for which an output file was specified
+    | setState(["output_dataset", "output_solution", "dataset_simulated_normalized"])
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/workflows/process_datasets/nextflow.config b/target/nextflow/spatially_variable_genes/workflows/process_datasets/nextflow.config
new file mode 100644
index 0000000000..94d9e5d922
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/workflows/process_datasets/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/workflows/process_datasets'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/workflows/run_benchmark/.config.vsh.yaml b/target/nextflow/spatially_variable_genes/workflows/run_benchmark/.config.vsh.yaml
new file mode 100644
index 0000000000..c06427a1c2
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/workflows/run_benchmark/.config.vsh.yaml
@@ -0,0 +1,594 @@
+functionality:
+  name: "run_benchmark"
+  namespace: "spatially_variable_genes/workflows"
+  version: "2.0.0"
+  argument_groups:
+  - name: "Inputs"
+    arguments:
+    - type: "file"
+      name: "--input_dataset"
+      info:
+        label: "Dataset"
+        summary: "The dataset without spatially variable genes."
+        slots:
+          layers:
+          - type: "integer"
+            name: "counts"
+            description: "Raw counts."
+            required: true
+          - type: "double"
+            name: "normalized"
+            description: "Normalised expression values"
+            required: true
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature, in this case a ENSEMBL\
+              \ gene id suffixed with alpha value."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, in this case a gene\
+              \ symbol suffixed with alpha value."
+            required: false
+          obsm:
+          - type: "double"
+            name: "spatial"
+            description: "Spatial coordinates for each spot."
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: false
+      example:
+      - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--input_solution"
+      info:
+        label: "Solution"
+        summary: "Anndata with true spatial variability."
+        description: "Anndata with true spatial variability score for each gene."
+        slots:
+          var:
+          - type: "string"
+            name: "feature_id"
+            description: "Unique identifier for the feature (e.g., ESEMBL gene id\
+              \ suffixed with alpha value)."
+            required: false
+          - type: "string"
+            name: "feature_name"
+            description: "A human-readable name for the feature, in this case a gene\
+              \ symbol suffixed with alpha value."
+            required: true
+          - type: "string"
+            name: "orig_feature_name"
+            description: "Original human-readable name for the feature, usually a\
+              \ gene symbol."
+            required: true
+          - type: "double"
+            name: "true_spatial_var_score"
+            description: "True spatial variability score"
+            required: true
+          uns:
+          - type: "string"
+            name: "dataset_id"
+            description: "A unique identifier for the dataset"
+            required: true
+          - name: "dataset_name"
+            type: "string"
+            description: "Nicely formatted name."
+            required: true
+          - type: "string"
+            name: "dataset_url"
+            description: "Link to the original source of the dataset."
+            required: true
+          - name: "dataset_reference"
+            type: "string"
+            description: "Bibtex reference of the paper in which the dataset was published."
+            required: false
+          - name: "dataset_summary"
+            type: "string"
+            description: "Short description of the dataset."
+            required: true
+          - name: "dataset_description"
+            type: "string"
+            description: "Long description of the dataset."
+            required: true
+          - name: "dataset_organism"
+            type: "string"
+            description: "The organism of the sample in the dataset."
+            required: true
+      example:
+      - "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Method specific arguments"
+    arguments:
+    - type: "string"
+      name: "--coord_type_moran_i"
+      description: "Type of coordinate system to use for Moran's I. Valid options\
+        \ are \"grid\" for grid coordinates or \"generic\" for generic coordinates."
+      info: null
+      required: false
+      choices:
+      - "grid"
+      - "generic"
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "string"
+      name: "--coord_type_sepal"
+      description: "Type of coordinate system to use for Sepal. Valid options are\
+        \ \"grid\" for grid coordinates or \"generic\" for generic coordinates."
+      info: null
+      required: false
+      choices:
+      - "grid"
+      - "generic"
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "integer"
+      name: "--max_neighs_sepal"
+      description: "Maximum number of neighbors of a node in the spatial graph. Valid\
+        \ options are 4 (square-grid) and 6 (hexagonal-grid)."
+      info: null
+      required: false
+      choices:
+      - 4
+      - 6
+      direction: "input"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  - name: "Outputs"
+    arguments:
+    - type: "file"
+      name: "--output_scores"
+      description: "A yaml file containing the scores of each of the methods"
+      info: null
+      default:
+      - "score_uns.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_method_configs"
+      info: null
+      default:
+      - "method_configs.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_metric_configs"
+      info: null
+      default:
+      - "metric_configs.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_dataset_info"
+      info: null
+      default:
+      - "dataset_uns.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+    - type: "file"
+      name: "--output_task_info"
+      info: null
+      default:
+      - "task_info.yaml"
+      must_exist: true
+      create_parent: true
+      required: true
+      direction: "output"
+      multiple: false
+      multiple_sep: ":"
+      dest: "par"
+  resources:
+  - type: "nextflow_script"
+    path: "main.nf"
+    is_executable: true
+    entrypoint: "run_wf"
+  - type: "file"
+    path: "../../api/task_info.yaml"
+  info: null
+  status: "enabled"
+  dependencies:
+  - name: "common/check_dataset_schema"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+    configInfo:
+      functionalityName: "check_dataset_schema"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/check_dataset_schema/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+  - name: "common/extract_metadata"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+    configInfo:
+      functionalityName: "extract_metadata"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml"
+      functionalityNamespace: "common"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/common/extract_metadata/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+  - name: "spatially_variable_genes/control_methods/random_ranking"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/control_methods/random_ranking/config.vsh.yaml"
+    configInfo:
+      functionalityName: "random_ranking"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/control_methods/random_ranking/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/control_methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/control_methods/random_ranking/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/control_methods/random_ranking"
+  - name: "spatially_variable_genes/control_methods/true_ranking"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/control_methods/true_ranking/config.vsh.yaml"
+    configInfo:
+      functionalityName: "true_ranking"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/control_methods/true_ranking/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/control_methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/control_methods/true_ranking/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/control_methods/true_ranking"
+  - name: "spatially_variable_genes/methods/boostgp"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/boostgp/config.vsh.yaml"
+    configInfo:
+      functionalityName: "boostgp"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/boostgp/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/methods/boostgp/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/boostgp"
+  - name: "spatially_variable_genes/methods/gpcounts"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/gpcounts/config.vsh.yaml"
+    configInfo:
+      functionalityName: "gpcounts"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/gpcounts/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/methods/gpcounts/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/gpcounts"
+  - name: "spatially_variable_genes/methods/moran_i"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/moran_i/config.vsh.yaml"
+    configInfo:
+      functionalityName: "moran_i"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/moran_i/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/methods/moran_i/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/moran_i"
+  - name: "spatially_variable_genes/methods/nnsvg"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/nnsvg/config.vsh.yaml"
+    configInfo:
+      functionalityName: "nnsvg"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/nnsvg/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/methods/nnsvg/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/nnsvg"
+  - name: "spatially_variable_genes/methods/scgco"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/scgco/config.vsh.yaml"
+    configInfo:
+      functionalityName: "scgco"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/scgco/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/methods/scgco/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/scgco"
+  - name: "spatially_variable_genes/methods/sepal"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/sepal/config.vsh.yaml"
+    configInfo:
+      functionalityName: "sepal"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/sepal/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/methods/sepal/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/sepal"
+  - name: "spatially_variable_genes/methods/somde"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/somde/config.vsh.yaml"
+    configInfo:
+      functionalityName: "somde"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/somde/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/methods/somde/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/somde"
+  - name: "spatially_variable_genes/methods/spagcn"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spagcn/config.vsh.yaml"
+    configInfo:
+      functionalityName: "spagcn"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spagcn/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/methods/spagcn/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spagcn"
+  - name: "spatially_variable_genes/methods/spagft"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spagft/config.vsh.yaml"
+    configInfo:
+      functionalityName: "spagft"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spagft/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/methods/spagft/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spagft"
+  - name: "spatially_variable_genes/methods/spanve"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spanve/config.vsh.yaml"
+    configInfo:
+      functionalityName: "spanve"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spanve/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/methods/spanve/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spanve"
+  - name: "spatially_variable_genes/methods/spark"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spark/config.vsh.yaml"
+    configInfo:
+      functionalityName: "spark"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spark/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/methods/spark/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spark"
+  - name: "spatially_variable_genes/methods/spark_x"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spark_x/config.vsh.yaml"
+    configInfo:
+      functionalityName: "spark_x"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spark_x/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/methods/spark_x/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spark_x"
+  - name: "spatially_variable_genes/methods/spatialde"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spatialde/config.vsh.yaml"
+    configInfo:
+      functionalityName: "spatialde"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spatialde/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/methods/spatialde/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spatialde"
+  - name: "spatially_variable_genes/methods/spatialde2"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spatialde2/config.vsh.yaml"
+    configInfo:
+      functionalityName: "spatialde2"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spatialde2/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/methods"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/methods/spatialde2/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spatialde2"
+  - name: "spatially_variable_genes/metrics/correlation"
+    repository:
+      type: "local"
+      localPath: ""
+    foundConfigPath: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/metrics/correlation/config.vsh.yaml"
+    configInfo:
+      functionalityName: "correlation"
+      git_tag: "v1.0.0-1413-gb782e93f"
+      git_remote: "https://github.com/openproblems-bio/openproblems"
+      viash_version: "0.8.6"
+      config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/metrics/correlation/config.vsh.yaml"
+      functionalityNamespace: "spatially_variable_genes/metrics"
+      output: ""
+      platform: ""
+      git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+      executable: "/nextflow/spatially_variable_genes/metrics/correlation/main.nf"
+    writtenPath: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/metrics/correlation"
+  set_wd_to_resources_dir: false
+platforms:
+- type: "nextflow"
+  id: "nextflow"
+  directives:
+    tag: "$id"
+  auto:
+    simplifyInput: true
+    simplifyOutput: false
+    transcript: false
+    publish: false
+  config:
+    labels:
+      lowmem: "memory = 20.Gb"
+      midmem: "memory = 50.Gb"
+      highmem: "memory = 100.Gb"
+      lowcpu: "cpus = 5"
+      midcpu: "cpus = 15"
+      highcpu: "cpus = 30"
+      lowtime: "time = 1.h"
+      midtime: "time = 4.h"
+      hightime: "time = 8.h"
+      veryhightime: "time = 24.h"
+    script:
+    - "process.errorStrategy = 'ignore'"
+  debug: false
+  container: "docker"
+info:
+  config: "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/workflows/run_benchmark/config.vsh.yaml"
+  platform: "nextflow"
+  output: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/workflows/run_benchmark"
+  executable: "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/workflows/run_benchmark/main.nf"
+  viash_version: "0.8.6"
+  git_commit: "b782e93f596e0060953ca0260098b8ae2569194b"
+  git_remote: "https://github.com/openproblems-bio/openproblems"
+  git_tag: "v1.0.0-1413-gb782e93f"
diff --git a/target/nextflow/spatially_variable_genes/workflows/run_benchmark/main.nf b/target/nextflow/spatially_variable_genes/workflows/run_benchmark/main.nf
new file mode 100644
index 0000000000..b4b2049500
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/workflows/run_benchmark/main.nf
@@ -0,0 +1,3894 @@
+// run_benchmark 2.0.0
+// 
+// This wrapper script is auto-generated by viash 0.8.6 and is thus a derivative
+// work thereof. This software comes with ABSOLUTELY NO WARRANTY from Data
+// Intuitive.
+// 
+// The component may contain files which fall under a different license. The
+// authors of this component should specify the license in the header of such
+// files, or include a separate license file detailing the licenses of all included
+// files.
+
+////////////////////////////
+// VDSL3 helper functions //
+////////////////////////////
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_checkArgumentType.nf'
+class UnexpectedArgumentTypeException extends Exception {
+  String errorIdentifier
+  String stage
+  String plainName
+  String expectedClass
+  String foundClass
+  
+  // ${key ? " in module '$key'" : ""}${id ? " id '$id'" : ""}
+  UnexpectedArgumentTypeException(String errorIdentifier, String stage, String plainName, String expectedClass, String foundClass) {
+    super("Error${errorIdentifier ? " $errorIdentifier" : ""}:${stage ? " $stage" : "" } argument '${plainName}' has the wrong type. " +
+      "Expected type: ${expectedClass}. Found type: ${foundClass}")
+    this.errorIdentifier = errorIdentifier
+    this.stage = stage
+    this.plainName = plainName
+    this.expectedClass = expectedClass
+    this.foundClass = foundClass
+  }
+}
+
+/**
+  * Checks if the given value is of the expected type. If not, an exception is thrown.
+  *
+  * @param stage The stage of the argument (input or output)
+  * @param par The parameter definition
+  * @param value The value to check
+  * @param errorIdentifier The identifier to use in the error message
+  * @return The value, if it is of the expected type
+  * @throws UnexpectedArgumentTypeException If the value is not of the expected type
+*/
+def _checkArgumentType(String stage, Map par, Object value, String errorIdentifier) {
+  // expectedClass will only be != null if value is not of the expected type
+  def expectedClass = null
+  def foundClass = null
+  
+  // todo: split if need be
+  
+  if (!par.required && value == null) {
+    expectedClass = null
+  } else if (par.multiple) {
+    if (value !instanceof Collection) {
+      value = [value]
+    }
+    
+    // split strings
+    value = value.collectMany{ val ->
+      if (val instanceof String) {
+        // collect() to ensure that the result is a List and not simply an array
+        val.split(par.multiple_sep).collect()
+      } else {
+        [val]
+      }
+    }
+
+    // process globs
+    if (par.type == "file" && par.direction == "input") {
+      value = value.collect{ it instanceof String ? file(it, hidden: true) : it }.flatten()
+    }
+
+    // check types of elements in list
+    try {
+      value = value.collect { listVal ->
+        _checkArgumentType(stage, par + [multiple: false], listVal, errorIdentifier)
+      }
+    } catch (UnexpectedArgumentTypeException e) {
+      expectedClass = "List[${e.expectedClass}]"
+      foundClass = "List[${e.foundClass}]"
+    }
+  } else if (par.type == "string") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else if (par.type == "integer") {
+    // cast to integer if need be
+    if (value instanceof String) {
+      try {
+        value = value.toInteger()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigInteger) {
+      value = value.intValue()
+    }
+    expectedClass = value instanceof Integer ? null : "Integer"
+  } else if (par.type == "long") {
+    // cast to long if need be
+    if (value instanceof String) {
+      try {
+        value = value.toLong()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof Integer) {
+      value = value.toLong()
+    }
+    expectedClass = value instanceof Long ? null : "Long"
+  } else if (par.type == "double") {
+    // cast to double if need be
+    if (value instanceof String) {
+      try {
+        value = value.toDouble()
+      } catch (NumberFormatException e) {
+        // do nothing
+      }
+    }
+    if (value instanceof java.math.BigDecimal) {
+      value = value.doubleValue()
+    }
+    if (value instanceof Float) {
+      value = value.toDouble()
+    }
+    expectedClass = value instanceof Double ? null : "Double"
+  } else if (par.type == "boolean" | par.type == "boolean_true" | par.type == "boolean_false") {
+    // cast to boolean if need be
+    if (value instanceof String) {
+      def valueLower = value.toLowerCase()
+      if (valueLower == "true") {
+        value = true
+      } else if (valueLower == "false") {
+        value = false
+      }
+    }
+    expectedClass = value instanceof Boolean ? null : "Boolean"
+  } else if (par.type == "file" && (par.direction == "input" || stage == "output")) {
+    // cast to path if need be
+    if (value instanceof String) {
+      value = file(value, hidden: true)
+    }
+    if (value instanceof File) {
+      value = value.toPath()
+    }
+    expectedClass = value instanceof Path ? null : "Path"
+  } else if (par.type == "file" && stage == "input" && par.direction == "output") {
+    // cast to string if need be
+    if (value instanceof GString) {
+      value = value.toString()
+    }
+    expectedClass = value instanceof String ? null : "String"
+  } else {
+    // didn't find a match for par.type
+    expectedClass = par.type
+  }
+
+  if (expectedClass != null) {
+    if (foundClass == null) {
+      foundClass = value.getClass().getName()
+    }
+    throw new UnexpectedArgumentTypeException(errorIdentifier, stage, par.plainName, expectedClass, foundClass)
+  }
+  
+  return value
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processInputValues.nf'
+Map _processInputValues(Map inputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.required) {
+        assert inputs.containsKey(arg.plainName) && inputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required input argument '${arg.plainName}' is missing"
+      }
+    }
+
+    inputs = inputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid input argument"
+
+      value = _checkArgumentType("input", par, value, "in module '$key' id '$id'")
+
+      [ name, value ]
+    }
+  }
+  return inputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/arguments/_processOutputValues.nf'
+Map _processOutputValues(Map outputs, Map config, String id, String key) {
+  if (!workflow.stubRun) {
+    config.functionality.allArguments.each { arg ->
+      if (arg.direction == "output" && arg.required) {
+        assert outputs.containsKey(arg.plainName) && outputs.get(arg.plainName) != null : 
+          "Error in module '${key}' id '${id}': required output argument '${arg.plainName}' is missing"
+      }
+    }
+
+    outputs = outputs.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && it.direction == "output" }
+      assert par != null : "Error in module '${key}' id '${id}': '${name}' is not a valid output argument"
+      
+      value = _checkArgumentType("output", par, value, "in module '$key' id '$id'")
+      
+      [ name, value ]
+    }
+  }
+  return outputs
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/IDChecker.nf'
+class IDChecker {
+  final def items = [] as Set
+
+  @groovy.transform.WithWriteLock
+  boolean observe(String item) {
+    if (items.contains(item)) {
+      return false
+    } else {
+      items << item
+      return true
+    }
+  }
+
+  @groovy.transform.WithReadLock
+  boolean contains(String item) {
+    return items.contains(item)
+  }
+
+  @groovy.transform.WithReadLock
+  Set getItems() {
+    return items.clone()
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_checkUniqueIds.nf'
+
+/**
+ * Check if the ids are unique across parameter sets
+ *
+ * @param parameterSets a list of parameter sets.
+ */
+private void _checkUniqueIds(List<Tuple2<String, Map<String, Object>>> parameterSets) {
+  def ppIds = parameterSets.collect{it[0]}
+  assert ppIds.size() == ppIds.unique().size() : "All argument sets should have unique ids. Detected ids: $ppIds"
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_getChild.nf'
+
+// helper functions for reading params from file //
+def _getChild(parent, child) {
+  if (child.contains("://") || java.nio.file.Paths.get(child).isAbsolute()) {
+    child
+  } else {
+    def parentAbsolute = java.nio.file.Paths.get(parent).toAbsolutePath().toString()
+    parentAbsolute.replaceAll('/[^/]*$', "/") + child
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_parseParamList.nf'
+/**
+  * Figure out the param list format based on the file extension
+  *
+  * @param param_list A String containing the path to the parameter list file.
+  *
+  * @return A String containing the format of the parameter list file.
+  */
+def _paramListGuessFormat(param_list) {
+  if (param_list !instanceof String) {
+    "asis"
+  } else if (param_list.endsWith(".csv")) {
+    "csv"
+  } else if (param_list.endsWith(".json") || param_list.endsWith(".jsn")) {
+    "json"
+  } else if (param_list.endsWith(".yaml") || param_list.endsWith(".yml")) {
+    "yaml"
+  } else {
+    "yaml_blob"
+  }
+}
+
+
+/**
+  * Read the param list
+  * 
+  * @param param_list One of the following:
+  *   - A String containing the path to the parameter list file (csv, json or yaml),
+  *   - A yaml blob of a list of maps (yaml_blob),
+  *   - Or a groovy list of maps (asis).
+  * @param config A Map of the Viash configuration.
+  * 
+  * @return A List of Maps containing the parameters.
+  */
+def _parseParamList(param_list, Map config) {
+  // first determine format by extension
+  def paramListFormat = _paramListGuessFormat(param_list)
+
+  def paramListPath = (paramListFormat != "asis" && paramListFormat != "yaml_blob") ?
+    file(param_list, hidden: true) :
+    null
+
+  // get the correct parser function for the detected params_list format
+  def paramSets = []
+  if (paramListFormat == "asis") {
+    paramSets = param_list
+  } else if (paramListFormat == "yaml_blob") {
+    paramSets = readYamlBlob(param_list)
+  } else if (paramListFormat == "yaml") {
+    paramSets = readYaml(paramListPath)
+  } else if (paramListFormat == "json") {
+    paramSets = readJson(paramListPath)
+  } else if (paramListFormat == "csv") {
+    paramSets = readCsv(paramListPath)
+  } else {
+    error "Format of provided --param_list not recognised.\n" +
+    "Found: '$paramListFormat'.\n" +
+    "Expected: a csv file, a json file, a yaml file,\n" +
+    "a yaml blob or a groovy list of maps."
+  }
+
+  // data checks
+  assert paramSets instanceof List: "--param_list should contain a list of maps"
+  for (value in paramSets) {
+    assert value instanceof Map: "--param_list should contain a list of maps"
+  }
+
+  // id is argument
+  def idIsArgument = config.functionality.allArguments.any{it.plainName == "id"}
+
+  // Reformat from List<Map> to List<Tuple2<String, Map>> by adding the ID as first element of a Tuple2
+  paramSets = paramSets.collect({ data ->
+    def id = data.id
+    if (!idIsArgument) {
+      data = data.findAll{k, v -> k != "id"}
+    }
+    [id, data]
+  })
+
+  // Split parameters with 'multiple: true'
+  paramSets = paramSets.collect({ id, data ->
+    data = _splitParams(data, config)
+    [id, data]
+  })
+  
+  // The paths of input files inside a param_list file may have been specified relatively to the
+  // location of the param_list file. These paths must be made absolute.
+  if (paramListPath) {
+    paramSets = paramSets.collect({ id, data ->
+      def new_data = data.collectEntries{ parName, parValue ->
+        def par = config.functionality.allArguments.find{it.plainName == parName}
+        if (par && par.type == "file" && par.direction == "input") {
+          if (parValue instanceof Collection) {
+            parValue = parValue.collectMany{path -> 
+              def x = _resolveSiblingIfNotAbsolute(path, paramListPath)
+              x instanceof Collection ? x : [x]
+            }
+          } else {
+            parValue = _resolveSiblingIfNotAbsolute(parValue, paramListPath) 
+          }
+        }
+        [parName, parValue]
+      }
+      [id, new_data]
+    })
+  }
+
+  return paramSets
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/_splitParams.nf'
+/**
+ * Split parameters for arguments that accept multiple values using their separator
+ *
+ * @param paramList A Map containing parameters to split.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ *
+ * @return A Map of parameters where the parameter values have been split into a list using
+ *         their seperator.
+ */
+Map<String, Object> _splitParams(Map<String, Object> parValues, Map config){
+  def parsedParamValues = parValues.collectEntries { parName, parValue ->
+    def parameterSettings = config.functionality.allArguments.find({it.plainName == parName})
+
+    if (!parameterSettings) {
+      // if argument is not found, do not alter 
+      return [parName, parValue]
+    }
+    if (parameterSettings.multiple) { // Check if parameter can accept multiple values
+      if (parValue instanceof Collection) {
+        parValue = parValue.collect{it instanceof String ? it.split(parameterSettings.multiple_sep) : it }
+      } else if (parValue instanceof String) {
+        parValue = parValue.split(parameterSettings.multiple_sep)
+      } else if (parValue == null) {
+        parValue = []
+      } else {
+        parValue = [ parValue ]
+      }
+      parValue = parValue.flatten()
+    }
+    // For all parameters check if multiple values are only passed for
+    // arguments that allow it. Quietly simplify lists of length 1.
+    if (!parameterSettings.multiple && parValue instanceof Collection) {
+      assert parValue.size() == 1 : 
+      "Error: argument ${parName} has too many values.\n" +
+      "  Expected amount: 1. Found: ${parValue.size()}"
+      parValue = parValue[0]
+    }
+    [parName, parValue]
+  }
+  return parsedParamValues
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/channelFromParams.nf'
+/**
+ * Parse nextflow parameters based on settings defined in a viash config.
+ * Return a list of parameter sets, each parameter set corresponding to 
+ * an event in a nextflow channel. The output from this function can be used
+ * with Channel.fromList to create a nextflow channel with Vdsl3 formatted 
+ * events.
+ *
+ * This function performs:
+ *   - A filtering of the params which can be found in the config file.
+ *   - Process the params_list argument which allows a user to to initialise 
+ *     a Vsdl3 channel with multiple parameter sets. Possible formats are 
+ *     csv, json, yaml, or simply a yaml_blob. A csv should have column names 
+ *     which correspond to the different arguments of this pipeline. A json or a yaml
+ *     file should be a list of maps, each of which has keys corresponding to the
+ *     arguments of the pipeline. A yaml blob can also be passed directly as a parameter.
+ *     When passing a csv, json or yaml, relative path names are relativized to the
+ *     location of the parameter file.
+ *   - Combine the parameter sets into a vdsl3 Channel.
+ *
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A list of parameters with the first element of the event being
+ *         the event ID and the second element containing a map of the parsed parameters.
+ */
+ 
+private List<Tuple2<String, Map<String, Object>>> _paramsToParamSets(Map params, Map config){
+  // todo: fetch key from run args
+  def key_ = config.functionality.name
+  
+  /* parse regular parameters (not in param_list)  */
+  /*************************************************/
+  def globalParams = config.functionality.allArguments
+    .findAll { params.containsKey(it.plainName) }
+    .collectEntries { [ it.plainName, params[it.plainName] ] }
+  def globalID = params.get("id", null)
+
+  /* process params_list arguments */
+  /*********************************/
+  def paramList = params.containsKey("param_list") && params.param_list != null ?
+    params.param_list : []
+  // if (paramList instanceof String) {
+  //   paramList = [paramList]
+  // }
+  // def paramSets = paramList.collectMany{ _parseParamList(it, config) }
+  // TODO: be able to process param_list when it is a list of strings
+  def paramSets = _parseParamList(paramList, config)
+  if (paramSets.isEmpty()) {
+    paramSets = [[null, [:]]]
+  }
+
+  /* combine arguments into channel */
+  /**********************************/
+  def processedParams = paramSets.indexed().collect{ index, tup ->
+    // Process ID
+    def id = tup[0] ?: globalID
+  
+    if (workflow.stubRun && !id) {
+      // if stub run, explicitly add an id if missing
+      id = "stub${index}"
+    }
+    assert id != null: "Each parameter set should have at least an 'id'"
+
+    // Process params
+    def parValues = globalParams + tup[1]
+    // // Remove parameters which are null, if the default is also null
+    // parValues = parValues.collectEntries{paramName, paramValue ->
+    //   parameterSettings = config.functionality.allArguments.find({it.plainName == paramName})
+    //   if ( paramValue != null || parameterSettings.get("default", null) != null ) {
+    //     [paramName, paramValue]
+    //   }
+    // }
+    parValues = parValues.collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      assert par != null : "Error in module '${key_}' id '${id}': '${name}' is not a valid input argument"
+
+      if (par == null) {
+        return [:]
+      }
+      value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+
+      [ name, value ]
+    }
+
+    [id, parValues]
+  }
+
+  // Check if ids (first element of each list) is unique
+  _checkUniqueIds(processedParams)
+  return processedParams
+}
+
+/**
+ * Parse nextflow parameters based on settings defined in a viash config 
+ * and return a nextflow channel.
+ * 
+ * @param params Input parameters. Can optionaly contain a 'param_list' key that
+ *               provides a list of arguments that can be split up into multiple events
+ *               in the output channel possible formats of param_lists are: a csv file, 
+ *               json file, a yaml file or a yaml blob. Each parameters set (event) must
+ *               have a unique ID.
+ * @param config A Map of the Viash configuration. This Map can be generated from the config file
+ *               using the readConfig() function.
+ * 
+ * @return A nextflow Channel with events. Events are formatted as a tuple that contains 
+ *         first contains the ID of the event and as second element holds a parameter map.
+ *       
+ *
+ */
+def channelFromParams(Map params, Map config) {
+  def processedParams = _paramsToParamSets(params, config)
+  return Channel.fromList(processedParams)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/checkUniqueIds.nf'
+def checkUniqueIds(Map args) {
+  def stopOnError = args.stopOnError == null ? args.stopOnError : true
+
+  def idChecker = new IDChecker()
+
+  return filter { tup ->
+    if (!idChecker.observe(tup[0])) {
+      if (stopOnError) {
+        error "Duplicate id: ${tup[0]}"
+      } else {
+        log.warn "Duplicate id: ${tup[0]}, removing duplicate entry"
+        return false
+      }
+    }
+    return true
+  }
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/preprocessInputs.nf'
+// This helper file will be deprecated soon
+preprocessInputsDeprecationWarningPrinted = false
+
+def preprocessInputsDeprecationWarning() {
+  if (!preprocessInputsDeprecationWarningPrinted) {
+    preprocessInputsDeprecationWarningPrinted = true
+    System.err.println("Warning: preprocessInputs() is deprecated and will be removed in Viash 0.9.0.")
+  }
+}
+
+/**
+ * Generate a nextflow Workflow that allows processing a channel of 
+ * Vdsl3 formatted events and apply a Viash config to them:
+ *    - Gather default parameters from the Viash config and make 
+ *      sure that they are correctly formatted (see applyConfig method).
+ *    - Format the input parameters (also using the applyConfig method).
+ *    - Apply the default parameter to the input parameters.
+ *    - Do some assertions:
+ *        ~ Check if the event IDs in the channel are unique.
+ * 
+ * The events in the channel are formatted as tuples, with the 
+ * first element of the tuples being a unique id of the parameter set, 
+ * and the second element containg the the parameters themselves.
+ * Optional extra elements of the tuples will be passed to the output as is.
+ *
+ * @param args A map that must contain a 'config' key that points
+ *              to a parsed config (see readConfig()). Optionally, a
+ *              'key' key can be provided which can be used to create a unique
+ *              name for the workflow process.
+ *
+ * @return A workflow that allows processing a channel of Vdsl3 formatted events
+ * and apply a Viash config to them.
+ */
+def preprocessInputs(Map args) {
+  preprocessInputsDeprecationWarning()
+
+  def config = args.config
+  assert config instanceof Map : 
+    "Error in preprocessInputs: config must be a map. " +
+    "Expected class: Map. Found: config.getClass() is ${config.getClass()}"
+  def key_ = args.key ?: config.functionality.name
+
+  // Get different parameter types (used throughout this function)
+  def defaultArgs = config.functionality.allArguments
+    .findAll { it.containsKey("default") }
+    .collectEntries { [ it.plainName, it.default ] }
+
+  map { tup ->
+    def id = tup[0]
+    def data = tup[1]
+    def passthrough = tup.drop(2)
+
+    def new_data = (defaultArgs + data).collectEntries { name, value ->
+      def par = config.functionality.allArguments.find { it.plainName == name && (it.direction == "input" || it.type == "file") }
+      
+      if (par != null) {
+        value = _checkArgumentType("input", par, value, "in module '$key_' id '$id'")
+      }
+
+      [ name, value ]
+    }
+
+    [ id, new_data ] + passthrough
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runComponents.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component config.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component config.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component config.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runComponents(Map args) {
+  log.warn("runComponents is deprecated, use runEach instead")
+  assert args.components: "runComponents should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runComponents"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runComponentsWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def comp_config = comp_.config
+
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_config)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          // def new_id = id_(tup[0], tup[1], comp_config)
+          def new_id = tup[0]
+          if (id_ instanceof String) {
+            new_id = id_
+          } else if (id_ instanceof Closure) {
+            new_id = id_(new_id, tup[1], comp_config)
+          }
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_config)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_config)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runComponentsWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/runEach.nf'
+/**
+ * Run a list of components on a stream of data.
+ * 
+ * @param components: list of Viash VDSL3 modules to run
+ * @param fromState: a closure, a map or a list of keys to extract from the input data.
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param toState: a closure, a map or a list of keys to extract from the output data
+ *   If a closure, it will be called with the id, the output data, the old state and the component itself.
+ * @param filter: filter function to apply to the input.
+ *   It will be called with the id, the data and the component itself.
+ * @param id: id to use for the output data
+ *   If a closure, it will be called with the id, the data and the component itself.
+ * @param auto: auto options to pass to the components
+ *
+ * @return: a workflow that runs the components
+ **/
+def runEach(Map args) {
+  assert args.components: "runEach should be passed a list of components to run"
+
+  def components_ = args.components
+  if (components_ !instanceof List) {
+    components_ = [ components_ ]
+  }
+  assert components_.size() > 0: "pass at least one component to runEach"
+
+  def fromState_ = args.fromState
+  def toState_ = args.toState
+  def filter_ = args.filter
+  def id_ = args.id
+
+  workflow runEachWf {
+    take: input_ch
+    main:
+
+    // generate one channel per method
+    out_chs = components_.collect{ comp_ ->
+      def filter_ch = filter_
+        ? input_ch | filter{tup ->
+          filter_(tup[0], tup[1], comp_)
+        }
+        : input_ch
+      def id_ch = id_
+        ? filter_ch | map{tup ->
+          def new_id = id_
+          if (new_id instanceof Closure) {
+            new_id = new_id(tup[0], tup[1], comp_)
+          }
+          assert new_id instanceof String : "Error in runEach: id should be a String or a Closure that returns a String. Expected: id instanceof String. Found: ${new_id.getClass()}"
+          [new_id] + tup.drop(1)
+        }
+        : filter_ch
+      def data_ch = id_ch | map{tup ->
+          def new_data = tup[1]
+          if (fromState_ instanceof Map) {
+            new_data = fromState_.collectEntries{ key0, key1 ->
+              [key0, new_data[key1]]
+            }
+          } else if (fromState_ instanceof List) {
+            new_data = fromState_.collectEntries{ key ->
+              [key, new_data[key]]
+            }
+          } else if (fromState_ instanceof Closure) {
+            new_data = fromState_(tup[0], new_data, comp_)
+          }
+          tup.take(1) + [new_data] + tup.drop(1)
+        }
+      def out_ch = data_ch
+        | comp_.run(
+          auto: (args.auto ?: [:]) + [simplifyInput: false, simplifyOutput: false]
+        )
+      def post_ch = toState_
+        ? out_ch | map{tup ->
+          def output = tup[1]
+          def old_state = tup[2]
+          def new_state = null
+          if (toState_ instanceof Map) {
+            new_state = old_state + toState_.collectEntries{ key0, key1 ->
+              [key0, output[key1]]
+            }
+          } else if (toState_ instanceof List) {
+            new_state = old_state + toState_.collectEntries{ key ->
+              [key, output[key]]
+            }
+          } else if (toState_ instanceof Closure) {
+            new_state = toState_(tup[0], output, old_state, comp_)
+          }
+          [tup[0], new_state] + tup.drop(3)
+        }
+        : out_ch
+      
+      post_ch
+    }
+
+    // mix all results
+    output_ch =
+      (out_chs.size == 1)
+        ? out_chs[0]
+        : out_chs[0].mix(*out_chs.drop(1))
+
+    emit: output_ch
+  }
+
+  return runEachWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/channel/safeJoin.nf'
+/**
+ * Join sourceChannel to targetChannel
+ * 
+ * This function joins the sourceChannel to the targetChannel. 
+ * However, each id in the targetChannel must be present in the
+ * sourceChannel. If _meta.join_id exists in the targetChannel, that is 
+ * used as an id instead. If the id doesn't match any id in the sourceChannel,
+ * an error is thrown.
+ */
+
+def safeJoin(targetChannel, sourceChannel, key) {
+  def sourceIDs = new IDChecker()
+
+  def sourceCheck = sourceChannel
+    | map { tup ->
+      sourceIDs.observe(tup[0])
+      tup
+    }
+  def targetCheck = targetChannel
+    | map { tup ->
+      def id = tup[0]
+      
+      if (!sourceIDs.contains(id)) {
+        error (
+          "Error in module '${key}' when merging output with original state.\n" +
+          "  Reason: output with id '${id}' could not be joined with source channel.\n" +
+          "    If the IDs in the output channel differ from the input channel,\n" + 
+          "    please set `tup[1]._meta.join_id to the original ID.\n" +
+          "  Original IDs in input channel: ['${sourceIDs.getItems().join("', '")}'].\n" + 
+          "  Unexpected ID in the output channel: '${id}'.\n" +
+          "  Example input event: [\"id\", [input: file(...)]],\n" +
+          "  Example output event: [\"newid\", [output: file(...), _meta: [join_id: \"id\"]]]"
+        )
+      }
+      // TODO: add link to our documentation on how to fix this
+
+      tup
+    }
+  
+  sourceCheck.cross(targetChannel)
+    | map{ left, right ->
+      right + left.drop(1)
+    }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/_processArgument.nf'
+def _processArgument(arg) {
+  arg.multiple = arg.multiple != null ? arg.multiple : false
+  arg.required = arg.required != null ? arg.required : false
+  arg.direction = arg.direction != null ? arg.direction : "input"
+  arg.multiple_sep = arg.multiple_sep != null ? arg.multiple_sep : ":"
+  arg.plainName = arg.name.replaceAll("^-*", "")
+
+  if (arg.type == "file") {
+    arg.must_exist = arg.must_exist != null ? arg.must_exist : true
+    arg.create_parent = arg.create_parent != null ? arg.create_parent : true
+  }
+
+  // add default values to output files which haven't already got a default
+  if (arg.type == "file" && arg.direction == "output" && arg.default == null) {
+    def mult = arg.multiple ? "_*" : ""
+    def extSearch = ""
+    if (arg.default != null) {
+      extSearch = arg.default
+    } else if (arg.example != null) {
+      extSearch = arg.example
+    }
+    if (extSearch instanceof List) {
+      extSearch = extSearch[0]
+    }
+    def extSearchResult = extSearch.find("\\.[^\\.]+\$")
+    def ext = extSearchResult != null ? extSearchResult : ""
+    arg.default = "\$id.\$key.${arg.plainName}${mult}${ext}"
+    if (arg.multiple) {
+      arg.default = [arg.default]
+    }
+  }
+
+  if (!arg.multiple) {
+    if (arg.default != null && arg.default instanceof List) {
+      arg.default = arg.default[0]
+    }
+    if (arg.example != null && arg.example instanceof List) {
+      arg.example = arg.example[0]
+    }
+  }
+
+  if (arg.type == "boolean_true") {
+    arg.default = false
+  }
+  if (arg.type == "boolean_false") {
+    arg.default = true
+  }
+
+  arg
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/addGlobalParams.nf'
+def addGlobalArguments(config) {
+  def localConfig = [
+    "functionality" : [
+      "argument_groups": [
+        [
+          "name": "Nextflow input-output arguments",
+          "description": "Input/output parameters for Nextflow itself. Please note that both publishDir and publish_dir are supported but at least one has to be configured.",
+          "arguments" : [
+            [
+              'name': '--publish_dir',
+              'required': true,
+              'type': 'string',
+              'description': 'Path to an output directory.',
+              'example': 'output/',
+              'multiple': false
+            ],
+            [
+              'name': '--param_list',
+              'required': false,
+              'type': 'string',
+              'description': '''Allows inputting multiple parameter sets to initialise a Nextflow channel. A `param_list` can either be a list of maps, a csv file, a json file, a yaml file, or simply a yaml blob.
+              |
+              |* A list of maps (as-is) where the keys of each map corresponds to the arguments of the pipeline. Example: in a `nextflow.config` file: `param_list: [ ['id': 'foo', 'input': 'foo.txt'], ['id': 'bar', 'input': 'bar.txt'] ]`.
+              |* A csv file should have column names which correspond to the different arguments of this pipeline. Example: `--param_list data.csv` with columns `id,input`.
+              |* A json or a yaml file should be a list of maps, each of which has keys corresponding to the arguments of the pipeline. Example: `--param_list data.json` with contents `[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]`.
+              |* A yaml blob can also be passed directly as a string. Example: `--param_list "[ {'id': 'foo', 'input': 'foo.txt'}, {'id': 'bar', 'input': 'bar.txt'} ]"`.
+              |
+              |When passing a csv, json or yaml file, relative path names are relativized to the location of the parameter file. No relativation is performed when `param_list` is a list of maps (as-is) or a yaml blob.'''.stripMargin(),
+              'example': 'my_params.yaml',
+              'multiple': false,
+              'hidden': true
+            ]
+            // TODO: allow multiple: true in param_list?
+            // TODO: allow to specify a --param_list_regex to filter the param_list?
+            // TODO: allow to specify a --param_list_from_state to remap entries in the param_list?
+          ]
+        ]
+      ]
+    ]
+  ]
+
+  return processConfig(_mergeMap(config, localConfig))
+}
+
+def _mergeMap(Map lhs, Map rhs) {
+  return rhs.inject(lhs.clone()) { map, entry ->
+    if (map[entry.key] instanceof Map && entry.value instanceof Map) {
+      map[entry.key] = _mergeMap(map[entry.key], entry.value)
+    } else if (map[entry.key] instanceof Collection && entry.value instanceof Collection) {
+      map[entry.key] += entry.value
+    } else {
+      map[entry.key] = entry.value
+    }
+    return map
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/generateHelp.nf'
+def _generateArgumentHelp(param) {
+  // alternatives are not supported
+  // def names = param.alternatives ::: List(param.name)
+
+  def unnamedProps = [
+    ["required parameter", param.required],
+    ["multiple values allowed", param.multiple],
+    ["output", param.direction.toLowerCase() == "output"],
+    ["file must exist", param.type == "file" && param.must_exist]
+  ].findAll{it[1]}.collect{it[0]}
+  
+  def dflt = null
+  if (param.default != null) {
+    if (param.default instanceof List) {
+      dflt = param.default.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      dflt = param.default.toString()
+    }
+  }
+  def example = null
+  if (param.example != null) {
+    if (param.example instanceof List) {
+      example = param.example.join(param.multiple_sep != null ? param.multiple_sep : ", ")
+    } else {
+      example = param.example.toString()
+    }
+  }
+  def min = param.min?.toString()
+  def max = param.max?.toString()
+
+  def escapeChoice = { choice ->
+    def s1 = choice.replaceAll("\\n", "\\\\n")
+    def s2 = s1.replaceAll("\"", """\\\"""")
+    s2.contains(",") || s2 != choice ? "\"" + s2 + "\"" : s2
+  }
+  def choices = param.choices == null ? 
+    null : 
+    "[ " + param.choices.collect{escapeChoice(it.toString())}.join(", ") + " ]"
+
+  def namedPropsStr = [
+    ["type", ([param.type] + unnamedProps).join(", ")],
+    ["default", dflt],
+    ["example", example],
+    ["choices", choices],
+    ["min", min],
+    ["max", max]
+  ]
+    .findAll{it[1]}
+    .collect{"\n        " + it[0] + ": " + it[1].replaceAll("\n", "\\n")}
+    .join("")
+  
+  def descStr = param.description == null ?
+    "" :
+    _paragraphWrap("\n" + param.description.trim(), 80 - 8).join("\n        ")
+  
+  "\n    --" + param.plainName +
+    namedPropsStr +
+    descStr
+}
+
+// Based on Helper.generateHelp() in Helper.scala
+def _generateHelp(config) {
+  def fun = config.functionality
+
+  // PART 1: NAME AND VERSION
+  def nameStr = fun.name + 
+    (fun.version == null ? "" : " " + fun.version)
+
+  // PART 2: DESCRIPTION
+  def descrStr = fun.description == null ? 
+    "" :
+    "\n\n" + _paragraphWrap(fun.description.trim(), 80).join("\n")
+
+  // PART 3: Usage
+  def usageStr = fun.usage == null ? 
+    "" :
+    "\n\nUsage:\n" + fun.usage.trim()
+
+  // PART 4: Options
+  def argGroupStrs = fun.allArgumentGroups.collect{argGroup ->
+    def name = argGroup.name
+    def descriptionStr = argGroup.description == null ?
+      "" :
+      "\n    " + _paragraphWrap(argGroup.description.trim(), 80-4).join("\n    ") + "\n"
+    def arguments = argGroup.arguments.collect{arg -> 
+      arg instanceof String ? fun.allArguments.find{it.plainName == arg} : arg
+    }.findAll{it != null}
+    def argumentStrs = arguments.collect{param -> _generateArgumentHelp(param)}
+    
+    "\n\n$name:" +
+      descriptionStr +
+      argumentStrs.join("\n")
+  }
+
+  // FINAL: combine
+  def out = nameStr + 
+    descrStr +
+    usageStr + 
+    argGroupStrs.join("")
+
+  return out
+}
+
+// based on Format._paragraphWrap
+def _paragraphWrap(str, maxLength) {
+  def outLines = []
+  str.split("\n").each{par ->
+    def words = par.split("\\s").toList()
+
+    def word = null
+    def line = words.pop()
+    while(!words.isEmpty()) {
+      word = words.pop()
+      if (line.length() + word.length() + 1 <= maxLength) {
+        line = line + " " + word
+      } else {
+        outLines.add(line)
+        line = word
+      }
+    }
+    if (words.isEmpty()) {
+      outLines.add(line)
+    }
+  }
+  return outLines
+}
+
+def helpMessage(config) {
+  if (params.containsKey("help") && params.help) {
+    def mergedConfig = addGlobalArguments(config)
+    def helpStr = _generateHelp(mergedConfig)
+    println(helpStr)
+    exit 0
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/processConfig.nf'
+def processConfig(config) {
+  // set defaults for arguments
+  config.functionality.arguments = 
+    (config.functionality.arguments ?: []).collect{_processArgument(it)}
+
+  // set defaults for argument_group arguments
+  config.functionality.argument_groups =
+    (config.functionality.argument_groups ?: []).collect{grp ->
+      grp.arguments = (grp.arguments ?: []).collect{_processArgument(it)}
+      grp
+    }
+
+  // create combined arguments list
+  config.functionality.allArguments = 
+    config.functionality.arguments +
+    config.functionality.argument_groups.collectMany{it.arguments}
+
+  // add missing argument groups (based on Functionality::allArgumentGroups())
+  def argGroups = config.functionality.argument_groups
+  if (argGroups.any{it.name.toLowerCase() == "arguments"}) {
+    argGroups = argGroups.collect{ grp ->
+      if (grp.name.toLowerCase() == "arguments") {
+        grp = grp + [
+          arguments: grp.arguments + config.functionality.arguments
+        ]
+      }
+      grp
+    }
+  } else {
+    argGroups = argGroups + [
+      name: "Arguments",
+      arguments: config.functionality.arguments
+    ]
+  }
+  config.functionality.allArgumentGroups = argGroups
+
+  config
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/config/readConfig.nf'
+
+def readConfig(file) {
+  def config = readYaml(file ?: moduleDir.resolve("config.vsh.yaml"))
+  processConfig(config)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_resolveSiblingIfNotAbsolute.nf'
+/**
+  * Resolve a path relative to the current file.
+  * 
+  * @param str The path to resolve, as a String.
+  * @param parentPath The path to resolve relative to, as a Path.
+  *
+  * @return The path that may have been resovled, as a Path.
+  */
+def _resolveSiblingIfNotAbsolute(str, parentPath) {
+  if (str !instanceof String) {
+    return str
+  }
+  if (!_stringIsAbsolutePath(str)) {
+    return parentPath.resolveSibling(str)
+  } else {
+    return file(str, hidden: true)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/_stringIsAbsolutePath.nf'
+/**
+  * Check whether a path as a string is absolute.
+  *
+  * In the past, we tried using `file(., relative: true).isAbsolute()`,
+  * but the 'relative' option was added in 22.10.0.
+  *
+  * @param path The path to check, as a String.
+  *
+  * @return Whether the path is absolute, as a boolean.
+  */
+def _stringIsAbsolutePath(path) {
+  def _resolve_URL_PROTOCOL = ~/^([a-zA-Z][a-zA-Z0-9]*:)?\\/.+/
+
+  assert path instanceof String
+  return _resolve_URL_PROTOCOL.matcher(path).matches()
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/collectTraces.nf'
+class CustomTraceObserver implements nextflow.trace.TraceObserver {
+  List traces
+
+  CustomTraceObserver(List traces) {
+    this.traces = traces
+  }
+
+  @Override
+  void onProcessComplete(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+
+  @Override
+  void onProcessCached(nextflow.processor.TaskHandler handler, nextflow.trace.TraceRecord trace) {
+    def trace2 = trace.store.clone()
+    trace2.script = null
+    traces.add(trace2)
+  }
+}
+
+def collectTraces() {
+  def traces = Collections.synchronizedList([])
+
+  // add custom trace observer which stores traces in the traces object
+  session.observers.add(new CustomTraceObserver(traces))
+
+  traces
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/deepClone.nf'
+/**
+  * Performs a deep clone of the given object.
+  * @param x an object
+  */
+def deepClone(x) {
+  iterateMap(x, {it instanceof Cloneable ? it.clone() : it})
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getPublishDir.nf'
+def getPublishDir() {
+  return params.containsKey("publish_dir") ? params.publish_dir : 
+    params.containsKey("publishDir") ? params.publishDir : 
+    null
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/getRootDir.nf'
+
+// Recurse upwards until we find a '.build.yaml' file
+def _findBuildYamlFile(pathPossiblySymlink) {
+  def path = pathPossiblySymlink.toRealPath()
+  def child = path.resolve(".build.yaml")
+  if (java.nio.file.Files.isDirectory(path) && java.nio.file.Files.exists(child)) {
+    return child
+  } else {
+    def parent = path.getParent()
+    if (parent == null) {
+      return null
+    } else {
+      return _findBuildYamlFile(parent)
+    }
+  }
+}
+
+// get the root of the target folder
+def getRootDir() {
+  def dir = _findBuildYamlFile(meta.resources_dir)
+  assert dir != null: "Could not find .build.yaml in the folder structure"
+  dir.getParent()
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/iterateMap.nf'
+/**
+  * Recursively apply a function over the leaves of an object.
+  * @param obj The object to iterate over.
+  * @param fun The function to apply to each value.
+  * @return The object with the function applied to each value.
+  */
+def iterateMap(obj, fun) {
+  if (obj instanceof List && obj !instanceof String) {
+    return obj.collect{item ->
+      iterateMap(item, fun)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectEntries{key, item ->
+      [key.toString(), iterateMap(item, fun)]
+    }
+  } else {
+    return fun(obj)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/functions/niceView.nf'
+/**
+  * A view for printing the event of each channel as a YAML blob.
+  * This is useful for debugging.
+  */
+def niceView() {
+  workflow niceViewWf {
+    take: input
+    main:
+      output = input
+        | view{toYamlBlob(it)}
+    emit: output
+  }
+  return niceViewWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readCsv.nf'
+
+def readCsv(file_path) {
+  def output = []
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+
+  // todo: allow escaped quotes in string
+  // todo: allow single quotes?
+  def splitRegex = java.util.regex.Pattern.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
+  def removeQuote = java.util.regex.Pattern.compile('''"(.*)"''')
+
+  def br = java.nio.file.Files.newBufferedReader(inputFile)
+
+  def row = -1
+  def header = null
+  while (br.ready() && header == null) {
+    def line = br.readLine()
+    row++
+    if (!line.startsWith("#")) {
+      header = splitRegex.split(line, -1).collect{field ->
+        m = removeQuote.matcher(field)
+        m.find() ? m.replaceFirst('$1') : field
+      }
+    }
+  }
+  assert header != null: "CSV file should contain a header"
+
+  while (br.ready()) {
+    def line = br.readLine()
+    row++
+    if (line == null) {
+      br.close()
+      break
+    }
+
+    if (!line.startsWith("#")) {
+      def predata = splitRegex.split(line, -1)
+      def data = predata.collect{field ->
+        if (field == "") {
+          return null
+        }
+        def m = removeQuote.matcher(field)
+        if (m.find()) {
+          return m.replaceFirst('$1')
+        } else {
+          return field
+        }
+      }
+      assert header.size() == data.size(): "Row $row should contain the same number as fields as the header"
+      
+      def dataMap = [header, data].transpose().collectEntries().findAll{it.value != null}
+      output.add(dataMap)
+    }
+  }
+
+  output
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJson.nf'
+def readJson(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parse(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readJsonBlob.nf'
+def readJsonBlob(str) {
+  def jsonSlurper = new groovy.json.JsonSlurper()
+  jsonSlurper.parseText(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readTaggedYaml.nf'
+// Custom constructor to modify how certain objects are parsed from YAML
+class CustomConstructor extends org.yaml.snakeyaml.constructor.Constructor {
+  Path root
+
+  class ConstructPath extends org.yaml.snakeyaml.constructor.AbstractConstruct {
+    public Object construct(org.yaml.snakeyaml.nodes.Node node) {
+      String filename = (String) constructScalar(node);
+      if (root != null) {
+        return root.resolve(filename);
+      }
+      return java.nio.file.Paths.get(filename);
+    }
+  }
+
+  CustomConstructor(org.yaml.snakeyaml.LoaderOptions options, Path root) {
+    super(options)
+    this.root = root
+    // Handling !file tag and parse it back to a File type
+    this.yamlConstructors.put(new org.yaml.snakeyaml.nodes.Tag("!file"), new ConstructPath())
+  }
+}
+
+def readTaggedYaml(Path path) {
+  def options = new org.yaml.snakeyaml.LoaderOptions()
+  def constructor = new CustomConstructor(options, path.getParent())
+  def yaml = new org.yaml.snakeyaml.Yaml(constructor)
+  return yaml.load(path.text)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYaml.nf'
+def readYaml(file_path) {
+  def inputFile = file_path !instanceof Path ? file(file_path, hidden: true) : file_path
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(inputFile)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/readYamlBlob.nf'
+def readYamlBlob(str) {
+  def yamlSlurper = new org.yaml.snakeyaml.Yaml()
+  yamlSlurper.load(str)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toJsonBlob.nf'
+String toJsonBlob(data) {
+  return groovy.json.JsonOutput.toJson(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toTaggedYamlBlob.nf'
+// Custom representer to modify how certain objects are represented in YAML
+class CustomRepresenter extends org.yaml.snakeyaml.representer.Representer {
+  Path relativizer
+
+  class RepresentPath implements org.yaml.snakeyaml.representer.Represent {
+    public String getFileName(Object obj) {
+      if (obj instanceof File) {
+        obj = ((File) obj).toPath();
+      }
+      if (obj !instanceof Path) {
+        throw new IllegalArgumentException("Object: " + obj + " is not a Path or File");
+      }
+      def path = (Path) obj;
+
+      if (relativizer != null) {
+        return relativizer.relativize(path).toString()
+      } else {
+        return path.toString()
+      }
+    }
+
+    public org.yaml.snakeyaml.nodes.Node representData(Object data) {
+      String filename = getFileName(data);
+      def tag = new org.yaml.snakeyaml.nodes.Tag("!file");
+      return representScalar(tag, filename);
+    }
+  }
+  CustomRepresenter(org.yaml.snakeyaml.DumperOptions options, Path relativizer) {
+    super(options)
+    this.relativizer = relativizer
+    this.representers.put(sun.nio.fs.UnixPath, new RepresentPath())
+    this.representers.put(Path, new RepresentPath())
+    this.representers.put(File, new RepresentPath())
+  }
+}
+
+String toTaggedYamlBlob(data) {
+  return toRelativeTaggedYamlBlob(data, null)
+}
+String toRelativeTaggedYamlBlob(data, Path relativizer) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  def representer = new CustomRepresenter(options, relativizer)
+  def yaml = new org.yaml.snakeyaml.Yaml(representer, options)
+  return yaml.dump(data)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/toYamlBlob.nf'
+String toYamlBlob(data) {
+  def options = new org.yaml.snakeyaml.DumperOptions()
+  options.setDefaultFlowStyle(org.yaml.snakeyaml.DumperOptions.FlowStyle.BLOCK)
+  options.setPrettyFlow(true)
+  def yaml = new org.yaml.snakeyaml.Yaml(options)
+  def cleanData = iterateMap(data, { it instanceof Path ? it.toString() : it })
+  return yaml.dump(cleanData)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeJson.nf'
+void writeJson(data, file) {
+  assert data: "writeJson: data should not be null"
+  assert file: "writeJson: file should not be null"
+  file.write(toJsonBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/readwrite/writeYaml.nf'
+void writeYaml(data, file) {
+  assert data: "writeYaml: data should not be null"
+  assert file: "writeYaml: file should not be null"
+  file.write(toYamlBlob(data))
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/findStates.nf'
+def findStates(Map params, Map config) {
+  def auto_config = deepClone(config)
+  def auto_params = deepClone(params)
+
+  auto_config = auto_config.clone()
+  // override arguments
+  auto_config.functionality.argument_groups = []
+  auto_config.functionality.arguments = [
+    [
+      type: "string",
+      name: "--id",
+      description: "A dummy identifier",
+      required: false
+    ],
+    [
+      type: "file",
+      name: "--input_states",
+      example: "/path/to/input/directory/**/state.yaml",
+      description: "Path to input directory containing the datasets to be integrated.",
+      required: true,
+      multiple: true,
+      multiple_sep: ";"
+    ],
+    [
+      type: "string",
+      name: "--filter",
+      example: "foo/.*/state.yaml",
+      description: "Regex to filter state files by path.",
+      required: false
+    ],
+    // to do: make this a yaml blob?
+    [
+      type: "string",
+      name: "--rename_keys",
+      example: ["newKey1:oldKey1", "newKey2:oldKey2"],
+      description: "Rename keys in the detected input files. This is useful if the input files do not match the set of input arguments of the workflow.",
+      required: false,
+      multiple: true,
+      multiple_sep: ","
+    ],
+    [
+      type: "string",
+      name: "--settings",
+      example: '{"output_dataset": "dataset.h5ad", "k": 10}',
+      description: "Global arguments as a JSON glob to be passed to all components.",
+      required: false
+    ]
+  ]
+  if (!(auto_params.containsKey("id"))) {
+    auto_params["id"] = "auto"
+  }
+
+  // run auto config through processConfig once more
+  auto_config = processConfig(auto_config)
+
+  workflow findStatesWf {
+    helpMessage(auto_config)
+
+    output_ch = 
+      channelFromParams(auto_params, auto_config)
+        | flatMap { autoId, args ->
+
+          def globalSettings = args.settings ? readYamlBlob(args.settings) : [:]
+
+          // look for state files in input dir
+          def stateFiles = args.input_states
+
+          // filter state files by regex
+          if (args.filter) {
+            stateFiles = stateFiles.findAll{ stateFile ->
+              def stateFileStr = stateFile.toString()
+              def matcher = stateFileStr =~ args.filter
+              matcher.matches()}
+          }
+
+          // read in states
+          def states = stateFiles.collect { stateFile ->
+            def state_ = readTaggedYaml(stateFile)
+            [state_.id, state_]
+          }
+
+          // construct renameMap
+          if (args.rename_keys) {
+            def renameMap = args.rename_keys.collectEntries{renameString ->
+              def split = renameString.split(":")
+              assert split.size() == 2: "Argument 'rename_keys' should be of the form 'newKey:oldKey,newKey:oldKey'"
+              split
+            }
+
+            // rename keys in state, only let states through which have all keys
+            // also add global settings
+            states = states.collectMany{id, state ->
+              def newState = [:]
+
+              for (key in renameMap.keySet()) {
+                def origKey = renameMap[key]
+                if (!(state.containsKey(origKey))) {
+                  return []
+                }
+                newState[key] = state[origKey]
+              }
+
+              [[id, globalSettings + newState]]
+            }
+          }
+
+          states
+        }
+    emit:
+    output_ch
+  }
+
+  return findStatesWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/joinStates.nf'
+def joinStates(Closure apply_) {
+  workflow joinStatesWf {
+    take: input_ch
+    main:
+    output_ch = input_ch
+      | toSortedList
+      | filter{ it.size() > 0 }
+      | map{ tups ->
+        def ids = tups.collect{it[0]}
+        def states = tups.collect{it[1]}
+        apply_(ids, states)
+      }
+
+    emit: output_ch
+  }
+  return joinStatesWf
+}
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/publishStates.nf'
+def collectFiles(obj) {
+  if (obj instanceof java.io.File || obj instanceof Path)  {
+    return [obj]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.collectMany{item ->
+      collectFiles(item)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectFiles(item)
+    }
+  } else {
+    return []
+  }
+}
+
+/**
+ * Recurse through a state and collect all input files and their target output filenames.
+ * @param obj The state to recurse through.
+ * @param prefix The prefix to prepend to the output filenames.
+ */
+def collectInputOutputPaths(obj, prefix) {
+  if (obj instanceof File || obj instanceof Path)  {
+    def path = obj instanceof Path ? obj : obj.toPath()
+    def ext = path.getFileName().toString().find("\\.[^\\.]+\$") ?: ""
+    def newFilename = prefix + ext
+    return [[obj, newFilename]]
+  } else if (obj instanceof List && obj !instanceof String) {
+    return obj.withIndex().collectMany{item, ix ->
+      collectInputOutputPaths(item, prefix + "_" + ix)
+    }
+  } else if (obj instanceof Map) {
+    return obj.collectMany{key, item ->
+      collectInputOutputPaths(item, prefix + "." + key)
+    }
+  } else {
+    return []
+  }
+}
+
+def publishStates(Map args) {
+  def key_ = args.get("key")
+  def yamlTemplate_ = args.get("output_state", args.get("outputState", '$id.$key.state.yaml'))
+
+  assert key_ != null : "publishStates: key must be specified"
+  
+  workflow publishStatesWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1]
+
+          // the input files and the target output filenames
+          def inputoutputFilenames_ = collectInputOutputPaths(state_, id_ + "." + key_).transpose()
+          def inputFiles_ = inputoutputFilenames_[0]
+          def outputFilenames_ = inputoutputFilenames_[1]
+
+          def yamlFilename = yamlTemplate_
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+
+            // TODO: do the pathnames in state_ match up with the outputFilenames_?
+
+          // convert state to yaml blob
+          def yamlBlob_ = toRelativeTaggedYamlBlob([id: id_] + state_, java.nio.file.Paths.get(yamlFilename))
+
+          [id_, yamlBlob_, yamlFilename, inputFiles_, outputFilenames_]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesWf
+}
+process publishStatesProc {
+  // todo: check publishpath?
+  publishDir path: "${getPublishDir()}/", mode: "copy"
+  tag "$id"
+  input:
+    tuple val(id), val(yamlBlob), val(yamlFile), path(inputFiles, stageAs: "_inputfile?/*"), val(outputFiles)
+  output:
+    tuple val(id), path{[yamlFile] + outputFiles}
+  script:
+  def copyCommands = [
+    inputFiles instanceof List ? inputFiles : [inputFiles],
+    outputFiles instanceof List ? outputFiles : [outputFiles]
+  ]
+    .transpose()
+    .collectMany{infile, outfile ->
+      if (infile.toString() != outfile.toString()) {
+        [
+          "[ -d \"\$(dirname '${outfile.toString()}')\" ] || mkdir -p \"\$(dirname '${outfile.toString()}')\"",
+          "cp -r '${infile.toString()}' '${outfile.toString()}'"
+        ]
+      } else {
+        // no need to copy if infile is the same as outfile
+        []
+      }
+    }
+  """
+mkdir -p "\$(dirname '${yamlFile}')"
+echo "Storing state as yaml"
+echo '${yamlBlob}' > '${yamlFile}'
+echo "Copying output files to destination folder"
+${copyCommands.join("\n  ")}
+"""
+}
+
+
+// this assumes that the state contains no other values other than those specified in the config
+def publishStatesByConfig(Map args) {
+  def config = args.get("config")
+  assert config != null : "publishStatesByConfig: config must be specified"
+
+  def key_ = args.get("key", config.functionality.name)
+  assert key_ != null : "publishStatesByConfig: key must be specified"
+  
+  workflow publishStatesSimpleWf {
+    take: input_ch
+    main:
+      input_ch
+        | map { tup ->
+          def id_ = tup[0]
+          def state_ = tup[1] // e.g. [output: new File("myoutput.h5ad"), k: 10]
+          def origState_ = tup[2] // e.g. [output: '$id.$key.foo.h5ad']
+
+          // TODO: allow overriding the state.yaml template
+          // TODO TODO: if auto.publish == "state", add output_state as an argument
+          def yamlTemplate = params.containsKey("output_state") ? params.output_state : '$id.$key.state.yaml'
+          def yamlFilename = yamlTemplate
+            .replaceAll('\\$id', id_)
+            .replaceAll('\\$key', key_)
+          def yamlDir = java.nio.file.Paths.get(yamlFilename).getParent()
+
+          // the processed state is a list of [key, value, inputPath, outputFilename] tuples, where
+          //   - key is a String
+          //   - value is any object that can be serialized to a Yaml (so a String/Integer/Long/Double/Boolean, a List, a Map, or a Path)
+          //   - inputPath is a List[Path]
+          //   - outputFilename is a List[String]
+          //   - (key, value) are the tuples that will be saved to the state.yaml file
+          //   - (inputPath, outputFilename) are the files that will be copied from src to dest (relative to the state.yaml)
+          def processedState =
+            config.functionality.allArguments
+              .findAll { it.direction == "output" }
+              .collectMany { par ->
+                def plainName_ = par.plainName
+                // if the state does not contain the key, it's an
+                // optional argument for which the component did 
+                // not generate any output
+                if (!state_.containsKey(plainName_)) {
+                  return []
+                }
+                def value = state_[plainName_]
+                // if the parameter is not a file, it should be stored
+                // in the state as-is, but is not something that needs 
+                // to be copied from the source path to the dest path
+                if (par.type != "file") {
+                  return [[key: plainName_, value: value, inputPath: [], outputFilename: []]]
+                }
+                // if the orig state does not contain this filename,
+                // it's an optional argument for which the user specified
+                // that it should not be returned as a state
+                if (!origState_.containsKey(plainName_)) {
+                  return []
+                }
+                def filenameTemplate = origState_[plainName_]
+                // if the pararameter is multiple: true, fetch the template
+                if (par.multiple && filenameTemplate instanceof List) {
+                  filenameTemplate = filenameTemplate[0]
+                }
+                // instantiate the template
+                def filename = filenameTemplate
+                  .replaceAll('\\$id', id_)
+                  .replaceAll('\\$key', key_)
+                if (par.multiple) {
+                  // if the parameter is multiple: true, the filename
+                  // should contain a wildcard '*' that is replaced with
+                  // the index of the file
+                  assert filename.contains("*") : "Module '${key_}' id '${id_}': Multiple output files specified, but no wildcard '*' in the filename: ${filename}"
+                  def outputPerFile = value.withIndex().collect{ val, ix ->
+                    def filename_ix = filename.replace("*", ix.toString())
+                    def value_ = java.nio.file.Paths.get(filename_ix)
+                    // if id contains a slash
+                    if (yamlDir != null) {
+                      value_ = yamlDir.relativize(value_)
+                    }
+                    def inputPath = val instanceof File ? val.toPath() : val
+                    [value: value_, inputPath: inputPath, outputFilename: filename_ix]
+                  }
+                  def transposedOutputs = ["value", "inputPath", "outputFilename"].collectEntries{ key -> 
+                    [key, outputPerFile.collect{dic -> dic[key]}]
+                  }
+                  return [[key: plainName_] + transposedOutputs]
+                } else {
+                  def value_ = java.nio.file.Paths.get(filename)
+                  // if id contains a slash
+                  if (yamlDir != null) {
+                    value_ = yamlDir.relativize(value_)
+                  }
+                  def inputPath = value instanceof File ? value.toPath() : value
+                  return [[key: plainName_, value: value_, inputPath: [inputPath], outputFilename: [filename]]]
+                }
+              }
+          
+          def updatedState_ = processedState.collectEntries{[it.key, it.value]}
+          def inputPaths = processedState.collectMany{it.inputPath}
+          def outputFilenames = processedState.collectMany{it.outputFilename}
+          
+          // convert state to yaml blob
+          def yamlBlob_ = toTaggedYamlBlob([id: id_] + updatedState_)
+
+          [id_, yamlBlob_, yamlFilename, inputPaths, outputFilenames]
+        }
+        | publishStatesProc
+    emit: input_ch
+  }
+  return publishStatesSimpleWf
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/states/setState.nf'
+def setState(fun) {
+  assert fun instanceof Closure || fun instanceof Map || fun instanceof List :
+    "Error in setState: Expected process argument to be a Closure, a Map, or a List. Found: class ${fun.getClass()}"
+
+  // if fun is a List, convert to map
+  if (fun instanceof List) {
+    // check whether fun is a list[string]
+    assert fun.every{it instanceof CharSequence} : "Error in setState: argument is a List, but not all elements are Strings"
+    fun = fun.collectEntries{[it, it]}
+  }
+
+  // if fun is a map, convert to closure
+  if (fun instanceof Map) {
+    // check whether fun is a map[string, string]
+    assert fun.values().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all values are Strings"
+    assert fun.keySet().every{it instanceof CharSequence} : "Error in setState: argument is a Map, but not all keys are Strings"
+    def funMap = fun.clone()
+    // turn the map into a closure to be used later on
+    fun = { id_, state_ ->
+      assert state_ instanceof Map : "Error in setState: the state is not a Map"
+      funMap.collectMany{newkey, origkey ->
+        if (state_.containsKey(origkey)) {
+          [[newkey, state_[origkey]]]
+        } else {
+          []
+        }
+      }.collectEntries()
+    }
+  }
+
+  map { tup ->
+    def id = tup[0]
+    def state = tup[1]
+    def unfilteredState = fun(id, state)
+    def newState = unfilteredState.findAll{key, val -> val != null}
+    [id, newState] + tup.drop(2)
+  }
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processAuto.nf'
+// TODO: unit test processAuto
+def processAuto(Map auto) {
+  // remove null values
+  auto = auto.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = ["simplifyInput", "simplifyOutput", "transcript", "publish"]
+  def unexpectedKeys = auto.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty(), "unexpected keys in auto: '${unexpectedKeys.join("', '")}'"
+
+  // check auto.simplifyInput
+  assert auto.simplifyInput instanceof Boolean, "auto.simplifyInput must be a boolean"
+
+  // check auto.simplifyOutput
+  assert auto.simplifyOutput instanceof Boolean, "auto.simplifyOutput must be a boolean"
+
+  // check auto.transcript
+  assert auto.transcript instanceof Boolean, "auto.transcript must be a boolean"
+
+  // check auto.publish
+  assert auto.publish instanceof Boolean || auto.publish == "state", "auto.publish must be a boolean or 'state'"
+
+  return auto.subMap(expectedKeys)
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processDirectives.nf'
+def assertMapKeys(map, expectedKeys, requiredKeys, mapName) {
+  assert map instanceof Map : "Expected argument '$mapName' to be a Map. Found: class ${map.getClass()}"
+  map.forEach { key, val -> 
+    assert key in expectedKeys : "Unexpected key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+  requiredKeys.forEach { requiredKey -> 
+    assert map.containsKey(requiredKey) : "Missing required key '$key' in ${mapName ? mapName + " " : ""}map"
+  }
+}
+
+// TODO: unit test processDirectives
+def processDirectives(Map drctv) {
+  // remove null values
+  drctv = drctv.findAll{k, v -> v != null}
+
+  // check for unexpected keys
+  def expectedKeys = [
+    "accelerator", "afterScript", "beforeScript", "cache", "conda", "container", "containerOptions", "cpus", "disk", "echo", "errorStrategy", "executor", "machineType", "maxErrors", "maxForks", "maxRetries", "memory", "module", "penv", "pod", "publishDir", "queue", "label", "scratch", "storeDir", "stageInMode", "stageOutMode", "tag", "time"
+  ]
+  def unexpectedKeys = drctv.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Unexpected keys in process directive: '${unexpectedKeys.join("', '")}'"
+
+  /* DIRECTIVE accelerator
+    accepted examples:
+    - [ limit: 4, type: "nvidia-tesla-k80" ]
+  */
+  if (drctv.containsKey("accelerator")) {
+    assertMapKeys(drctv["accelerator"], ["type", "limit", "request", "runtime"], [], "accelerator")
+  }
+
+  /* DIRECTIVE afterScript
+    accepted examples:
+    - "source /cluster/bin/cleanup"
+  */
+  if (drctv.containsKey("afterScript")) {
+    assert drctv["afterScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE beforeScript
+    accepted examples:
+    - "source /cluster/bin/setup"
+  */
+  if (drctv.containsKey("beforeScript")) {
+    assert drctv["beforeScript"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cache
+    accepted examples:
+    - true
+    - false
+    - "deep"
+    - "lenient"
+  */
+  if (drctv.containsKey("cache")) {
+    assert drctv["cache"] instanceof CharSequence || drctv["cache"] instanceof Boolean
+    if (drctv["cache"] instanceof CharSequence) {
+      assert drctv["cache"] in ["deep", "lenient"] : "Unexpected value for cache"
+    }
+  }
+
+  /* DIRECTIVE conda
+    accepted examples:
+    - "bwa=0.7.15"
+    - "bwa=0.7.15 fastqc=0.11.5"
+    - ["bwa=0.7.15", "fastqc=0.11.5"]
+  */
+  if (drctv.containsKey("conda")) {
+    if (drctv["conda"] instanceof List) {
+      drctv["conda"] = drctv["conda"].join(" ")
+    }
+    assert drctv["conda"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE container
+    accepted examples:
+    - "foo/bar:tag"
+    - [ registry: "reg", image: "im", tag: "ta" ]
+      is transformed to "reg/im:ta"
+    - [ image: "im" ] 
+      is transformed to "im:latest"
+  */
+  if (drctv.containsKey("container")) {
+    assert drctv["container"] instanceof Map || drctv["container"] instanceof CharSequence
+    if (drctv["container"] instanceof Map) {
+      def m = drctv["container"]
+      assertMapKeys(m, [ "registry", "image", "tag" ], ["image"], "container")
+      def part1 = 
+        System.getenv('OVERRIDE_CONTAINER_REGISTRY') ? System.getenv('OVERRIDE_CONTAINER_REGISTRY') + "/" : 
+        params.containsKey("override_container_registry") ? params["override_container_registry"] + "/" : // todo: remove?
+        m.registry ? m.registry + "/" : 
+        ""
+      def part2 = m.image
+      def part3 = m.tag ? ":" + m.tag : ":latest"
+      drctv["container"] = part1 + part2 + part3
+    }
+  }
+
+  /* DIRECTIVE containerOptions
+    accepted examples:
+    - "--foo bar"
+    - ["--foo bar", "-f b"]
+  */
+  if (drctv.containsKey("containerOptions")) {
+    if (drctv["containerOptions"] instanceof List) {
+      drctv["containerOptions"] = drctv["containerOptions"].join(" ")
+    }
+    assert drctv["containerOptions"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE cpus
+    accepted examples:
+    - 1
+    - 10
+  */
+  if (drctv.containsKey("cpus")) {
+    assert drctv["cpus"] instanceof Integer
+  }
+
+  /* DIRECTIVE disk
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("disk")) {
+    assert drctv["disk"] instanceof CharSequence
+    // assert drctv["disk"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE echo
+    accepted examples:
+    - true
+    - false
+  */
+  if (drctv.containsKey("echo")) {
+    assert drctv["echo"] instanceof Boolean
+  }
+
+  /* DIRECTIVE errorStrategy
+    accepted examples:
+    - "terminate"
+    - "finish"
+  */
+  if (drctv.containsKey("errorStrategy")) {
+    assert drctv["errorStrategy"] instanceof CharSequence
+    assert drctv["errorStrategy"] in ["terminate", "finish", "ignore", "retry"] : "Unexpected value for errorStrategy"
+  }
+
+  /* DIRECTIVE executor
+    accepted examples:
+    - "local"
+    - "sge"
+  */
+  if (drctv.containsKey("executor")) {
+    assert drctv["executor"] instanceof CharSequence
+    assert drctv["executor"] in ["local", "sge", "uge", "lsf", "slurm", "pbs", "pbspro", "moab", "condor", "nqsii", "ignite", "k8s", "awsbatch", "google-pipelines"] : "Unexpected value for executor"
+  }
+
+  /* DIRECTIVE machineType
+    accepted examples:
+    - "n1-highmem-8"
+  */
+  if (drctv.containsKey("machineType")) {
+    assert drctv["machineType"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE maxErrors
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxErrors")) {
+    assert drctv["maxErrors"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxForks
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxForks")) {
+    assert drctv["maxForks"] instanceof Integer
+  }
+
+  /* DIRECTIVE maxRetries
+    accepted examples:
+    - 1
+    - 3
+  */
+  if (drctv.containsKey("maxRetries")) {
+    assert drctv["maxRetries"] instanceof Integer
+  }
+
+  /* DIRECTIVE memory
+    accepted examples:
+    - "1 GB"
+    - "2TB"
+    - "3.2KB"
+    - "10.B"
+  */
+  if (drctv.containsKey("memory")) {
+    assert drctv["memory"] instanceof CharSequence
+    // assert drctv["memory"].matches("[0-9]+(\\.[0-9]*)? *[KMGTPEZY]?B")
+    // ^ does not allow closures
+  }
+
+  /* DIRECTIVE module
+    accepted examples:
+    - "ncbi-blast/2.2.27"
+    - "ncbi-blast/2.2.27:t_coffee/10.0"
+    - ["ncbi-blast/2.2.27", "t_coffee/10.0"]
+  */
+  if (drctv.containsKey("module")) {
+    if (drctv["module"] instanceof List) {
+      drctv["module"] = drctv["module"].join(":")
+    }
+    assert drctv["module"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE penv
+    accepted examples:
+    - "smp"
+  */
+  if (drctv.containsKey("penv")) {
+    assert drctv["penv"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE pod
+    accepted examples:
+    - [ label: "key", value: "val" ]
+    - [ annotation: "key", value: "val" ]
+    - [ env: "key", value: "val" ]
+    - [ [label: "l", value: "v"], [env: "e", value: "v"]]
+  */
+  if (drctv.containsKey("pod")) {
+    if (drctv["pod"] instanceof Map) {
+      drctv["pod"] = [ drctv["pod"] ]
+    }
+    assert drctv["pod"] instanceof List
+    drctv["pod"].forEach { pod ->
+      assert pod instanceof Map
+      // TODO: should more checks be added?
+      // See https://www.nextflow.io/docs/latest/process.html?highlight=directives#pod
+      // e.g. does it contain 'label' and 'value', or 'annotation' and 'value', or ...?
+    }
+  }
+
+  /* DIRECTIVE publishDir
+    accepted examples:
+    - []
+    - [ [ path: "foo", enabled: true ], [ path: "bar", enabled: false ] ]
+    - "/path/to/dir" 
+      is transformed to [[ path: "/path/to/dir" ]]
+    - [ path: "/path/to/dir", mode: "cache" ]
+      is transformed to [[ path: "/path/to/dir", mode: "cache" ]]
+  */
+  // TODO: should we also look at params["publishDir"]?
+  if (drctv.containsKey("publishDir")) {
+    def pblsh = drctv["publishDir"]
+    
+    // check different options
+    assert pblsh instanceof List || pblsh instanceof Map || pblsh instanceof CharSequence
+    
+    // turn into list if not already so
+    // for some reason, 'if (!pblsh instanceof List) pblsh = [ pblsh ]' doesn't work.
+    pblsh = pblsh instanceof List ? pblsh : [ pblsh ]
+
+    // check elements of publishDir
+    pblsh = pblsh.collect{ elem ->
+      // turn into map if not already so
+      elem = elem instanceof CharSequence ? [ path: elem ] : elem
+
+      // check types and keys
+      assert elem instanceof Map : "Expected publish argument '$elem' to be a String or a Map. Found: class ${elem.getClass()}"
+      assertMapKeys(elem, [ "path", "mode", "overwrite", "pattern", "saveAs", "enabled" ], ["path"], "publishDir")
+
+      // check elements in map
+      assert elem.containsKey("path")
+      assert elem["path"] instanceof CharSequence
+      if (elem.containsKey("mode")) {
+        assert elem["mode"] instanceof CharSequence
+        assert elem["mode"] in [ "symlink", "rellink", "link", "copy", "copyNoFollow", "move" ]
+      }
+      if (elem.containsKey("overwrite")) {
+        assert elem["overwrite"] instanceof Boolean
+      }
+      if (elem.containsKey("pattern")) {
+        assert elem["pattern"] instanceof CharSequence
+      }
+      if (elem.containsKey("saveAs")) {
+        assert elem["saveAs"] instanceof CharSequence //: "saveAs as a Closure is currently not supported. Surround your closure with single quotes to get the desired effect. Example: '\{ foo \}'"
+      }
+      if (elem.containsKey("enabled")) {
+        assert elem["enabled"] instanceof Boolean
+      }
+
+      // return final result
+      elem
+    }
+    // store final directive
+    drctv["publishDir"] = pblsh
+  }
+
+  /* DIRECTIVE queue
+    accepted examples:
+    - "long"
+    - "short,long"
+    - ["short", "long"]
+  */
+  if (drctv.containsKey("queue")) {
+    if (drctv["queue"] instanceof List) {
+      drctv["queue"] = drctv["queue"].join(",")
+    }
+    assert drctv["queue"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE label
+    accepted examples:
+    - "big_mem"
+    - "big_cpu"
+    - ["big_mem", "big_cpu"]
+  */
+  if (drctv.containsKey("label")) {
+    if (drctv["label"] instanceof CharSequence) {
+      drctv["label"] = [ drctv["label"] ]
+    }
+    assert drctv["label"] instanceof List
+    drctv["label"].forEach { label ->
+      assert label instanceof CharSequence
+      // assert label.matches("[a-zA-Z0-9]([a-zA-Z0-9_]*[a-zA-Z0-9])?")
+      // ^ does not allow closures
+    }
+  }
+
+  /* DIRECTIVE scratch
+    accepted examples:
+    - true
+    - "/path/to/scratch"
+    - '$MY_PATH_TO_SCRATCH'
+    - "ram-disk"
+  */
+  if (drctv.containsKey("scratch")) {
+    assert drctv["scratch"] == true || drctv["scratch"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE storeDir
+    accepted examples:
+    - "/path/to/storeDir"
+  */
+  if (drctv.containsKey("storeDir")) {
+    assert drctv["storeDir"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE stageInMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageInMode")) {
+    assert drctv["stageInMode"] instanceof CharSequence
+    assert drctv["stageInMode"] in ["copy", "link", "symlink", "rellink"]
+  }
+
+  /* DIRECTIVE stageOutMode
+    accepted examples:
+    - "copy"
+    - "link"
+  */
+  if (drctv.containsKey("stageOutMode")) {
+    assert drctv["stageOutMode"] instanceof CharSequence
+    assert drctv["stageOutMode"] in ["copy", "move", "rsync"]
+  }
+
+  /* DIRECTIVE tag
+    accepted examples:
+    - "foo"
+    - '$id'
+  */
+  if (drctv.containsKey("tag")) {
+    assert drctv["tag"] instanceof CharSequence
+  }
+
+  /* DIRECTIVE time
+    accepted examples:
+    - "1h"
+    - "2days"
+    - "1day 6hours 3minutes 30seconds"
+  */
+  if (drctv.containsKey("time")) {
+    assert drctv["time"] instanceof CharSequence
+    // todo: validation regex?
+  }
+
+  return drctv
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/processWorkflowArgs.nf'
+def processWorkflowArgs(Map args, Map defaultWfArgs, Map meta) {
+  // override defaults with args
+  def workflowArgs = defaultWfArgs + args
+
+  // check whether 'key' exists
+  assert workflowArgs.containsKey("key") : "Error in module '${meta.config.functionality.name}': key is a required argument"
+
+  // if 'key' is a closure, apply it to the original key
+  if (workflowArgs["key"] instanceof Closure) {
+    workflowArgs["key"] = workflowArgs["key"](meta.config.functionality.name)
+  }
+  def key = workflowArgs["key"]
+  assert key instanceof CharSequence : "Expected process argument 'key' to be a String. Found: class ${key.getClass()}"
+  assert key ==~ /^[a-zA-Z_]\w*$/ : "Error in module '$key': Expected process argument 'key' to consist of only letters, digits or underscores. Found: ${key}"
+
+  // check for any unexpected keys
+  def expectedKeys = ["key", "directives", "auto", "map", "mapId", "mapData", "mapPassthrough", "filter", "runIf", "fromState", "toState", "args", "renameKeys", "debug"]
+  def unexpectedKeys = workflowArgs.keySet() - expectedKeys
+  assert unexpectedKeys.isEmpty() : "Error in module '$key': unexpected arguments to the '.run()' function: '${unexpectedKeys.join("', '")}'"
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("directives") : "Error in module '$key': directives is a required argument"
+  assert workflowArgs["directives"] instanceof Map : "Error in module '$key': Expected process argument 'directives' to be a Map. Found: class ${workflowArgs['directives'].getClass()}"
+  workflowArgs["directives"] = processDirectives(defaultWfArgs.directives + workflowArgs["directives"])
+
+  // check whether directives exists and apply defaults
+  assert workflowArgs.containsKey("auto") : "Error in module '$key': auto is a required argument"
+  assert workflowArgs["auto"] instanceof Map : "Error in module '$key': Expected process argument 'auto' to be a Map. Found: class ${workflowArgs['auto'].getClass()}"
+  workflowArgs["auto"] = processAuto(defaultWfArgs.auto + workflowArgs["auto"])
+
+  // auto define publish, if so desired
+  if (workflowArgs.auto.publish == true && (workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : [:]).isEmpty()) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.publish is true, params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.publish_dir = \"./output/\""
+    def publishDir = getPublishDir()
+    
+    if (publishDir != null) {
+      workflowArgs.directives.publishDir = [[ 
+        path: publishDir, 
+        saveAs: "{ it.startsWith('.') ? null : it }", // don't publish hidden files, by default
+        mode: "copy"
+      ]]
+    }
+  }
+
+  // auto define transcript, if so desired
+  if (workflowArgs.auto.transcript == true) {
+    // can't assert at this level thanks to the no_publish profile
+    // assert params.containsKey("transcriptsDir") || params.containsKey("transcripts_dir") || params.containsKey("publishDir") || params.containsKey("publish_dir") : 
+    //   "Error in module '${workflowArgs['key']}': if auto.transcript is true, either params.transcripts_dir or params.publish_dir needs to be defined.\n" +
+    //   "  Example: params.transcripts_dir = \"./transcripts/\""
+    def transcriptsDir = 
+      params.containsKey("transcripts_dir") ? params.transcripts_dir : 
+      params.containsKey("transcriptsDir") ? params.transcriptsDir : 
+      params.containsKey("publish_dir") ? params.publish_dir + "/_transcripts" :
+      params.containsKey("publishDir") ? params.publishDir + "/_transcripts" : 
+      null
+    if (transcriptsDir != null) {
+      def timestamp = nextflow.Nextflow.getSession().getWorkflowMetadata().start.format('yyyy-MM-dd_HH-mm-ss')
+      def transcriptsPublishDir = [ 
+        path: "$transcriptsDir/$timestamp/\${task.process.replaceAll(':', '-')}/\${id}/",
+        saveAs: "{ it.startsWith('.') ? it.replaceAll('^.', '') : null }", 
+        mode: "copy"
+      ]
+      def publishDirs = workflowArgs.directives.publishDir != null ? workflowArgs.directives.publishDir : null ? workflowArgs.directives.publishDir : []
+      workflowArgs.directives.publishDir = publishDirs + transcriptsPublishDir
+    }
+  }
+
+  // if this is a stubrun, remove certain directives?
+  if (workflow.stubRun) {
+    workflowArgs.directives.keySet().removeAll(["publishDir", "cpus", "memory", "label"])
+  }
+
+  for (nam in ["map", "mapId", "mapData", "mapPassthrough", "filter", "runIf"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam]) {
+      assert workflowArgs[nam] instanceof Closure : "Error in module '$key': Expected process argument '$nam' to be null or a Closure. Found: class ${workflowArgs[nam].getClass()}"
+    }
+  }
+
+  // TODO: should functions like 'map', 'mapId', 'mapData', 'mapPassthrough' be deprecated as well?
+  for (nam in ["map", "mapData", "mapPassthrough", "renameKeys"]) {
+    if (workflowArgs.containsKey(nam) && workflowArgs[nam] != null) {
+      log.warn "module '$key': workflow argument '$nam' is deprecated and will be removed in Viash 0.9.0. Please use 'fromState' and 'toState' instead."
+    }
+  }
+
+  // check fromState
+  workflowArgs["fromState"] = _processFromState(workflowArgs.get("fromState"), key, meta.config)
+
+  // check toState
+  workflowArgs["toState"] = _processToState(workflowArgs.get("toState"), key, meta.config)
+
+  // return output
+  return workflowArgs
+}
+
+def _processFromState(fromState, key_, config_) {
+  assert fromState == null || fromState instanceof Closure || fromState instanceof Map || fromState instanceof List :
+    "Error in module '$key_': Expected process argument 'fromState' to be null, a Closure, a Map, or a List. Found: class ${fromState.getClass()}"
+  if (fromState == null) {
+    return null
+  }
+  
+  // if fromState is a List, convert to map
+  if (fromState instanceof List) {
+    // check whether fromstate is a list[string]
+    assert fromState.every{it instanceof CharSequence} : "Error in module '$key_': fromState is a List, but not all elements are Strings"
+    fromState = fromState.collectEntries{[it, it]}
+  }
+
+  // if fromState is a map, convert to closure
+  if (fromState instanceof Map) {
+    // check whether fromstate is a map[string, string]
+    assert fromState.values().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all values are Strings"
+    assert fromState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': fromState is a Map, but not all keys are Strings"
+    def fromStateMap = fromState.clone()
+    def requiredInputNames = meta.config.functionality.allArguments.findAll{it.required && it.direction == "Input"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    fromState = { it ->
+      def state = it[1]
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def data = fromStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (state.containsKey(origkey)) {
+          [[newkey, state[origkey]]]
+        } else if (!requiredInputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': fromState key '$origkey' not found in current state")
+        }
+      }.collectEntries()
+      data
+    }
+  }
+  
+  return fromState
+}
+
+def _processToState(toState, key_, config_) {
+  if (toState == null) {
+    toState = { tup -> tup[1] }
+  }
+
+  // toState should be a closure, map[string, string], or list[string]
+  assert toState instanceof Closure || toState instanceof Map || toState instanceof List :
+    "Error in module '$key_': Expected process argument 'toState' to be a Closure, a Map, or a List. Found: class ${toState.getClass()}"
+
+  // if toState is a List, convert to map
+  if (toState instanceof List) {
+    // check whether toState is a list[string]
+    assert toState.every{it instanceof CharSequence} : "Error in module '$key_': toState is a List, but not all elements are Strings"
+    toState = toState.collectEntries{[it, it]}
+  }
+
+  // if toState is a map, convert to closure
+  if (toState instanceof Map) {
+    // check whether toState is a map[string, string]
+    assert toState.values().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all values are Strings"
+    assert toState.keySet().every{it instanceof CharSequence} : "Error in module '$key_': toState is a Map, but not all keys are Strings"
+    def toStateMap = toState.clone()
+    def requiredOutputNames = config_.functionality.allArguments.findAll{it.required && it.direction == "Output"}.collect{it.plainName}
+    // turn the map into a closure to be used later on
+    toState = { it ->
+      def output = it[1]
+      def state = it[2]
+      assert output instanceof Map : "Error in module '$key_': the output is not a Map"
+      assert state instanceof Map : "Error in module '$key_': the state is not a Map"
+      def extraEntries = toStateMap.collectMany{newkey, origkey ->
+        // check whether newkey corresponds to a required argument
+        if (output.containsKey(origkey)) {
+          [[newkey, output[origkey]]]
+        } else if (!requiredOutputNames.contains(origkey)) {
+          []
+        } else {
+          throw new Exception("Error in module '$key_': toState key '$origkey' not found in current output")
+        }
+      }.collectEntries()
+      state + extraEntries
+    }
+  }
+
+  return toState
+}
+
+// helper file: 'src/main/resources/io/viash/platforms/nextflow/workflowFactory/workflowFactory.nf'
+def _debug(workflowArgs, debugKey) {
+  if (workflowArgs.debug) {
+    view { "process '${workflowArgs.key}' $debugKey tuple: $it"  }
+  } else {
+    map { it }
+  }
+}
+
+// depends on: innerWorkflowFactory
+def workflowFactory(Map args, Map defaultWfArgs, Map meta) {
+  def workflowArgs = processWorkflowArgs(args, defaultWfArgs, meta)
+  def key_ = workflowArgs["key"]
+  
+  workflow workflowInstance {
+    take: input_
+
+    main:
+    def chModified = input_
+      | checkUniqueIds([:])
+      | _debug(workflowArgs, "input")
+      | map { tuple ->
+        tuple = deepClone(tuple)
+        
+        if (workflowArgs.map) {
+          tuple = workflowArgs.map(tuple)
+        }
+        if (workflowArgs.mapId) {
+          tuple[0] = workflowArgs.mapId(tuple[0])
+        }
+        if (workflowArgs.mapData) {
+          tuple[1] = workflowArgs.mapData(tuple[1])
+        }
+        if (workflowArgs.mapPassthrough) {
+          tuple = tuple.take(2) + workflowArgs.mapPassthrough(tuple.drop(2))
+        }
+
+        // check tuple
+        assert tuple instanceof List : 
+          "Error in module '${key_}': element in channel should be a tuple [id, data, ...otherargs...]\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: List. Found: tuple.getClass() is ${tuple.getClass()}"
+        assert tuple.size() >= 2 : 
+          "Error in module '${key_}': expected length of tuple in input channel to be two or greater.\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: tuple.size() == ${tuple.size()}"
+        
+        // check id field
+        if (tuple[0] instanceof GString) {
+          tuple[0] = tuple[0].toString()
+        }
+        assert tuple[0] instanceof CharSequence : 
+          "Error in module '${key_}': first element of tuple in channel should be a String\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Found: ${tuple[0]}"
+        
+        // match file to input file
+        if (workflowArgs.auto.simplifyInput && (tuple[1] instanceof Path || tuple[1] instanceof List)) {
+          def inputFiles = meta.config.functionality.allArguments
+            .findAll { it.type == "file" && it.direction == "input" }
+          
+          assert inputFiles.size() == 1 : 
+              "Error in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Anonymous file inputs are only allowed when the process has exactly one file input.\n" +
+              "  Expected: inputFiles.size() == 1. Found: inputFiles.size() is ${inputFiles.size()}"
+
+          tuple[1] = [[ inputFiles[0].plainName, tuple[1] ]].collectEntries()
+        }
+
+        // check data field
+        assert tuple[1] instanceof Map : 
+          "Error in module '${key_}' id '${tuple[0]}': second element of tuple in channel should be a Map\n" +
+          "  Example: [\"id\", [input: file('foo.txt'), arg: 10]].\n" +
+          "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+        // rename keys of data field in tuple
+        if (workflowArgs.renameKeys) {
+          assert workflowArgs.renameKeys instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class: Map. Found: renameKeys.getClass() is ${workflowArgs.renameKeys.getClass()}"
+          assert tuple[1] instanceof Map : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Expected class: Map. Found: tuple[1].getClass() is ${tuple[1].getClass()}"
+
+          // TODO: allow renameKeys to be a function?
+          workflowArgs.renameKeys.each { newKey, oldKey ->
+            assert newKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of newKey: String. Found: newKey.getClass() is ${newKey.getClass()}"
+            assert oldKey instanceof CharSequence : 
+              "Error renaming data keys in module '${key_}' id '${tuple[0]}'.\n" +
+              "  Example: renameKeys: ['new_key': 'old_key'].\n" +
+              "  Expected class of oldKey: String. Found: oldKey.getClass() is ${oldKey.getClass()}"
+            assert tuple[1].containsKey(oldKey) : 
+              "Error renaming data keys in module '${key}' id '${tuple[0]}'.\n" +
+              "  Key '$oldKey' is missing in the data map. tuple[1].keySet() is '${tuple[1].keySet()}'"
+            tuple[1].put(newKey, tuple[1][oldKey])
+          }
+          tuple[1].keySet().removeAll(workflowArgs.renameKeys.collect{ newKey, oldKey -> oldKey })
+        }
+        tuple
+      }
+
+    def chModifiedFiltered = workflowArgs.filter ?
+      chModified | filter{workflowArgs.filter(it)} :
+      chModified
+
+    def chRun = null
+    def chPassthrough = null
+    if (workflowArgs.runIf) {
+      def runIfBranch = chModifiedFiltered.branch{ tup ->
+        run: workflowArgs.runIf(tup[0], tup[1])
+        passthrough: true
+      }
+      chRun = runIfBranch.run
+      chPassthrough = runIfBranch.passthrough
+    } else {
+      chRun = chModifiedFiltered
+      chPassthrough = Channel.empty()
+    }
+
+    def chArgs = workflowArgs.fromState ? 
+      chRun | map{
+        def new_data = workflowArgs.fromState(it.take(2))
+        [it[0], new_data]
+      } :
+      chRun | map {tup -> tup.take(2)}
+
+    // fill in defaults
+    def chArgsWithDefaults = chArgs
+      | map { tuple ->
+        def id_ = tuple[0]
+        def data_ = tuple[1]
+
+        // TODO: could move fromState to here
+
+        // fetch default params from functionality
+        def defaultArgs = meta.config.functionality.allArguments
+          .findAll { it.containsKey("default") }
+          .collectEntries { [ it.plainName, it.default ] }
+
+        // fetch overrides in params
+        def paramArgs = meta.config.functionality.allArguments
+          .findAll { par ->
+            def argKey = key_ + "__" + par.plainName
+            params.containsKey(argKey)
+          }
+          .collectEntries { [ it.plainName, params[key_ + "__" + it.plainName] ] }
+        
+        // fetch overrides in data
+        def dataArgs = meta.config.functionality.allArguments
+          .findAll { data_.containsKey(it.plainName) }
+          .collectEntries { [ it.plainName, data_[it.plainName] ] }
+        
+        // combine params
+        def combinedArgs = defaultArgs + paramArgs + workflowArgs.args + dataArgs
+
+        // remove arguments with explicit null values
+        combinedArgs
+          .removeAll{_, val -> val == null || val == "viash_no_value" || val == "force_null"}
+
+        combinedArgs = _processInputValues(combinedArgs, meta.config, id_, key_)
+
+        [id_, combinedArgs] + tuple.drop(2)
+      }
+
+    // TODO: move some of the _meta.join_id wrangling to the safeJoin() function.
+    def chInitialOutput = chArgsWithDefaults
+      | _debug(workflowArgs, "processed")
+      // run workflow
+      | innerWorkflowFactory(workflowArgs)
+      // check output tuple
+      | map { id_, output_ ->
+
+        // see if output map contains metadata
+        def meta_ =
+          output_ instanceof Map && output_.containsKey("_meta") ? 
+          output_["_meta"] :
+          [:]
+        def join_id = meta_.join_id ?: id_
+        
+        // remove metadata
+        output_ = output_.findAll{k, v -> k != "_meta"}
+
+        // check value types
+        output_ = _processOutputValues(output_, meta.config, id_, key_)
+
+        // simplify output if need be
+        if (workflowArgs.auto.simplifyOutput && output_.size() == 1) {
+          output_ = output_.values()[0]
+        }
+
+        [join_id, id_, output_]
+      }
+      // | view{"chInitialOutput: ${it.take(3)}"}
+
+    // join the output [prev_id, new_id, output] with the previous state [prev_id, state, ...]
+    def chNewState = safeJoin(chInitialOutput, chModifiedFiltered, key_)
+      // input tuple format: [join_id, id, output, prev_state, ...]
+      // output tuple format: [join_id, id, new_state, ...]
+      | map{ tup ->
+        def new_state = workflowArgs.toState(tup.drop(1).take(3))
+        tup.take(2) + [new_state] + tup.drop(4)
+      }
+
+    if (workflowArgs.auto.publish == "state") {
+      def chPublish = chNewState
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [join_id, id, new_state]
+        | map{ tup ->
+          tup.take(3)
+        }
+
+      safeJoin(chPublish, chArgsWithDefaults, key_)
+        // input tuple format: [join_id, id, new_state, orig_state, ...]
+        // output tuple format: [id, new_state, orig_state]
+        | map { tup ->
+          tup.drop(1).take(3)
+      }
+        | publishStatesByConfig(key: key_, config: meta.config)
+    }
+
+    // remove join_id and meta
+    chReturn = chNewState
+      | map { tup ->
+        // input tuple format: [join_id, id, new_state, ...]
+        // output tuple format: [id, new_state, ...]
+        tup.drop(1)
+      }
+      | _debug(workflowArgs, "output")
+      | concat(chPassthrough)
+
+    emit: chReturn
+  }
+
+  def wf = workflowInstance.cloneWithName(key_)
+
+  // add factory function
+  wf.metaClass.run = { runArgs ->
+    workflowFactory(runArgs, workflowArgs, meta)
+  }
+  // add config to module for later introspection
+  wf.metaClass.config = meta.config
+
+  return wf
+}
+
+nextflow.enable.dsl=2
+
+// START COMPONENT-SPECIFIC CODE
+
+// create meta object
+meta = [
+  "resources_dir": moduleDir,
+  "config": processConfig(readJsonBlob('''{
+  "functionality" : {
+    "name" : "run_benchmark",
+    "namespace" : "spatially_variable_genes/workflows",
+    "version" : "2.0.0",
+    "argument_groups" : [
+      {
+        "name" : "Inputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--input_dataset",
+            "info" : {
+              "label" : "Dataset",
+              "summary" : "The dataset without spatially variable genes.",
+              "slots" : {
+                "layers" : [
+                  {
+                    "type" : "integer",
+                    "name" : "counts",
+                    "description" : "Raw counts.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "normalized",
+                    "description" : "Normalised expression values",
+                    "required" : true
+                  }
+                ],
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature, in this case a ENSEMBL gene id suffixed with alpha value.",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                    "required" : false
+                  }
+                ],
+                "obsm" : [
+                  {
+                    "type" : "double",
+                    "name" : "spatial",
+                    "description" : "Spatial coordinates for each spot.",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : false
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/dataset.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--input_solution",
+            "info" : {
+              "label" : "Solution",
+              "summary" : "Anndata with true spatial variability.",
+              "description" : "Anndata with true spatial variability score for each gene.",
+              "slots" : {
+                "var" : [
+                  {
+                    "type" : "string",
+                    "name" : "feature_id",
+                    "description" : "Unique identifier for the feature (e.g., ESEMBL gene id suffixed with alpha value).",
+                    "required" : false
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "feature_name",
+                    "description" : "A human-readable name for the feature, in this case a gene symbol suffixed with alpha value.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "orig_feature_name",
+                    "description" : "Original human-readable name for the feature, usually a gene symbol.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "double",
+                    "name" : "true_spatial_var_score",
+                    "description" : "True spatial variability score",
+                    "required" : true
+                  }
+                ],
+                "uns" : [
+                  {
+                    "type" : "string",
+                    "name" : "dataset_id",
+                    "description" : "A unique identifier for the dataset",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_name",
+                    "type" : "string",
+                    "description" : "Nicely formatted name.",
+                    "required" : true
+                  },
+                  {
+                    "type" : "string",
+                    "name" : "dataset_url",
+                    "description" : "Link to the original source of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_reference",
+                    "type" : "string",
+                    "description" : "Bibtex reference of the paper in which the dataset was published.",
+                    "required" : false
+                  },
+                  {
+                    "name" : "dataset_summary",
+                    "type" : "string",
+                    "description" : "Short description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_description",
+                    "type" : "string",
+                    "description" : "Long description of the dataset.",
+                    "required" : true
+                  },
+                  {
+                    "name" : "dataset_organism",
+                    "type" : "string",
+                    "description" : "The organism of the sample in the dataset.",
+                    "required" : true
+                  }
+                ]
+              }
+            },
+            "example" : [
+              "resources_test/spatially_variable_genes/mouse_brain_coronal_section1/solution.h5ad"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Method specific arguments",
+        "arguments" : [
+          {
+            "type" : "string",
+            "name" : "--coord_type_moran_i",
+            "description" : "Type of coordinate system to use for Moran's I. Valid options are \\"grid\\" for grid coordinates or \\"generic\\" for generic coordinates.",
+            "required" : false,
+            "choices" : [
+              "grid",
+              "generic"
+            ],
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "string",
+            "name" : "--coord_type_sepal",
+            "description" : "Type of coordinate system to use for Sepal. Valid options are \\"grid\\" for grid coordinates or \\"generic\\" for generic coordinates.",
+            "required" : false,
+            "choices" : [
+              "grid",
+              "generic"
+            ],
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "integer",
+            "name" : "--max_neighs_sepal",
+            "description" : "Maximum number of neighbors of a node in the spatial graph. Valid options are 4 (square-grid) and 6 (hexagonal-grid).",
+            "required" : false,
+            "choices" : [
+              4,
+              6
+            ],
+            "direction" : "input",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      },
+      {
+        "name" : "Outputs",
+        "arguments" : [
+          {
+            "type" : "file",
+            "name" : "--output_scores",
+            "description" : "A yaml file containing the scores of each of the methods",
+            "default" : [
+              "score_uns.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_method_configs",
+            "default" : [
+              "method_configs.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_metric_configs",
+            "default" : [
+              "metric_configs.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_dataset_info",
+            "default" : [
+              "dataset_uns.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          },
+          {
+            "type" : "file",
+            "name" : "--output_task_info",
+            "default" : [
+              "task_info.yaml"
+            ],
+            "must_exist" : true,
+            "create_parent" : true,
+            "required" : true,
+            "direction" : "output",
+            "multiple" : false,
+            "multiple_sep" : ":",
+            "dest" : "par"
+          }
+        ]
+      }
+    ],
+    "resources" : [
+      {
+        "type" : "nextflow_script",
+        "path" : "main.nf",
+        "is_executable" : true,
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/workflows/run_benchmark/",
+        "entrypoint" : "run_wf"
+      },
+      {
+        "type" : "file",
+        "path" : "../../api/task_info.yaml",
+        "parent" : "file:/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/workflows/run_benchmark/"
+      }
+    ],
+    "status" : "enabled",
+    "dependencies" : [
+      {
+        "name" : "common/check_dataset_schema",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "check_dataset_schema",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/check_dataset_schema/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/check_dataset_schema/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/check_dataset_schema"
+      },
+      {
+        "name" : "common/extract_metadata",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "extract_metadata",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/common/extract_metadata/config.vsh.yaml",
+          "functionalityNamespace" : "common",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/common/extract_metadata/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/common/extract_metadata"
+      },
+      {
+        "name" : "spatially_variable_genes/control_methods/random_ranking",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/control_methods/random_ranking/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "random_ranking",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/control_methods/random_ranking/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/control_methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/control_methods/random_ranking/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/control_methods/random_ranking"
+      },
+      {
+        "name" : "spatially_variable_genes/control_methods/true_ranking",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/control_methods/true_ranking/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "true_ranking",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/control_methods/true_ranking/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/control_methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/control_methods/true_ranking/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/control_methods/true_ranking"
+      },
+      {
+        "name" : "spatially_variable_genes/methods/boostgp",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/boostgp/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "boostgp",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/boostgp/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/methods/boostgp/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/boostgp"
+      },
+      {
+        "name" : "spatially_variable_genes/methods/gpcounts",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/gpcounts/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "gpcounts",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/gpcounts/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/methods/gpcounts/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/gpcounts"
+      },
+      {
+        "name" : "spatially_variable_genes/methods/moran_i",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/moran_i/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "moran_i",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/moran_i/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/methods/moran_i/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/moran_i"
+      },
+      {
+        "name" : "spatially_variable_genes/methods/nnsvg",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/nnsvg/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "nnsvg",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/nnsvg/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/methods/nnsvg/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/nnsvg"
+      },
+      {
+        "name" : "spatially_variable_genes/methods/scgco",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/scgco/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "scgco",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/scgco/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/methods/scgco/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/scgco"
+      },
+      {
+        "name" : "spatially_variable_genes/methods/sepal",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/sepal/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "sepal",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/sepal/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/methods/sepal/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/sepal"
+      },
+      {
+        "name" : "spatially_variable_genes/methods/somde",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/somde/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "somde",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/somde/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/methods/somde/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/somde"
+      },
+      {
+        "name" : "spatially_variable_genes/methods/spagcn",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spagcn/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "spagcn",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spagcn/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/methods/spagcn/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spagcn"
+      },
+      {
+        "name" : "spatially_variable_genes/methods/spagft",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spagft/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "spagft",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spagft/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/methods/spagft/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spagft"
+      },
+      {
+        "name" : "spatially_variable_genes/methods/spanve",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spanve/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "spanve",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spanve/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/methods/spanve/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spanve"
+      },
+      {
+        "name" : "spatially_variable_genes/methods/spark",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spark/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "spark",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spark/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/methods/spark/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spark"
+      },
+      {
+        "name" : "spatially_variable_genes/methods/spark_x",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spark_x/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "spark_x",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spark_x/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/methods/spark_x/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spark_x"
+      },
+      {
+        "name" : "spatially_variable_genes/methods/spatialde",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spatialde/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "spatialde",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spatialde/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/methods/spatialde/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spatialde"
+      },
+      {
+        "name" : "spatially_variable_genes/methods/spatialde2",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spatialde2/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "spatialde2",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/methods/spatialde2/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/methods",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/methods/spatialde2/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/methods/spatialde2"
+      },
+      {
+        "name" : "spatially_variable_genes/metrics/correlation",
+        "repository" : {
+          "type" : "local",
+          "localPath" : ""
+        },
+        "foundConfigPath" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/metrics/correlation/config.vsh.yaml",
+        "configInfo" : {
+          "functionalityName" : "correlation",
+          "git_tag" : "v1.0.0-1413-gb782e93f",
+          "git_remote" : "https://github.com/openproblems-bio/openproblems",
+          "viash_version" : "0.8.6",
+          "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/metrics/correlation/config.vsh.yaml",
+          "functionalityNamespace" : "spatially_variable_genes/metrics",
+          "output" : "",
+          "platform" : "",
+          "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+          "executable" : "/nextflow/spatially_variable_genes/metrics/correlation/main.nf"
+        },
+        "writtenPath" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/metrics/correlation"
+      }
+    ],
+    "set_wd_to_resources_dir" : false
+  },
+  "platforms" : [
+    {
+      "type" : "nextflow",
+      "id" : "nextflow",
+      "directives" : {
+        "tag" : "$id"
+      },
+      "auto" : {
+        "simplifyInput" : true,
+        "simplifyOutput" : false,
+        "transcript" : false,
+        "publish" : false
+      },
+      "config" : {
+        "labels" : {
+          "lowmem" : "memory = 20.Gb",
+          "midmem" : "memory = 50.Gb",
+          "highmem" : "memory = 100.Gb",
+          "lowcpu" : "cpus = 5",
+          "midcpu" : "cpus = 15",
+          "highcpu" : "cpus = 30",
+          "lowtime" : "time = 1.h",
+          "midtime" : "time = 4.h",
+          "hightime" : "time = 8.h",
+          "veryhightime" : "time = 24.h"
+        },
+        "script" : [
+          "process.errorStrategy = 'ignore'"
+        ]
+      },
+      "debug" : false,
+      "container" : "docker"
+    }
+  ],
+  "info" : {
+    "config" : "/home/runner/work/openproblems/openproblems/src/tasks/spatially_variable_genes/workflows/run_benchmark/config.vsh.yaml",
+    "platform" : "nextflow",
+    "output" : "/home/runner/work/openproblems/openproblems/target/nextflow/spatially_variable_genes/workflows/run_benchmark",
+    "viash_version" : "0.8.6",
+    "git_commit" : "b782e93f596e0060953ca0260098b8ae2569194b",
+    "git_remote" : "https://github.com/openproblems-bio/openproblems",
+    "git_tag" : "v1.0.0-1413-gb782e93f"
+  }
+}'''))
+]
+
+// resolve dependencies dependencies (if any)
+meta["root_dir"] = getRootDir()
+include { check_dataset_schema } from "${meta.resources_dir}/../../../../nextflow/common/check_dataset_schema/main.nf"
+include { extract_metadata } from "${meta.resources_dir}/../../../../nextflow/common/extract_metadata/main.nf"
+include { random_ranking } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/control_methods/random_ranking/main.nf"
+include { true_ranking } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/control_methods/true_ranking/main.nf"
+include { boostgp } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/methods/boostgp/main.nf"
+include { gpcounts } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/methods/gpcounts/main.nf"
+include { moran_i } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/methods/moran_i/main.nf"
+include { nnsvg } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/methods/nnsvg/main.nf"
+include { scgco } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/methods/scgco/main.nf"
+include { sepal } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/methods/sepal/main.nf"
+include { somde } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/methods/somde/main.nf"
+include { spagcn } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/methods/spagcn/main.nf"
+include { spagft } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/methods/spagft/main.nf"
+include { spanve } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/methods/spanve/main.nf"
+include { spark } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/methods/spark/main.nf"
+include { spark_x } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/methods/spark_x/main.nf"
+include { spatialde } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/methods/spatialde/main.nf"
+include { spatialde2 } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/methods/spatialde2/main.nf"
+include { correlation } from "${meta.resources_dir}/../../../../nextflow/spatially_variable_genes/metrics/correlation/main.nf"
+
+// inner workflow
+// user-provided Nextflow code
+workflow auto {
+  findStates(params, meta.config)
+    | meta.workflow.run(
+      auto: [publish: "state"]
+    )
+}
+
+workflow run_wf {
+  take:
+  input_ch
+
+  main:
+
+  // construct list of methods
+  methods = [
+    random_ranking,
+    true_ranking, 
+    boostgp, 
+    gpcounts,
+    moran_i,
+    nnsvg,
+    scgco,
+    sepal,
+    somde,
+    spagcn,
+    spagft,
+    spanve,
+    spark,
+    spark_x,
+    spatialde,
+    spatialde2
+  ]
+
+  // construct list of metrics
+  metrics = [
+    correlation
+  ]
+
+
+  /****************************
+   * EXTRACT DATASET METADATA *
+   ****************************/
+  dataset_ch = input_ch
+    // store join id
+    | map{ id, state -> 
+      [id, state + ["_meta": [join_id: id]]]
+    }
+    
+    // extract the dataset metadata
+    | extract_metadata.run(
+      fromState: [input: "input_solution"],
+      toState: { id, output, state ->
+        state + [
+          dataset_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+  /***************************
+   * RUN METHODS AND METRICS *
+   ***************************/
+  score_ch = dataset_ch
+  
+    // run all methods
+    | runEach(
+      components: methods,
+
+      // define a new 'id' by appending the method name to the dataset id
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: { id, state, comp ->
+        def new_args = [
+          input_data: state.input_dataset
+        ]
+        if (comp.config.functionality.info.type == "control_method") {
+          new_args.input_solution = state.input_solution
+        }
+        if (comp.config.functionality.name == "sepal") {
+          new_args.coord_type_sepal = state.coord_type_sepal
+          new_args.max_neighs_sepal = state.max_neighs_sepal
+        }
+        if (comp.config.functionality.name == "moran_i") {
+          new_args.coord_type_moran_i = state.coord_type_moran_i
+        }
+        new_args
+      },
+
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          method_id: comp.config.functionality.name,
+          method_output: output.output
+        ]
+      }
+    )
+    
+    // run all metrics
+    | runEach(
+      components: metrics,
+      id: { id, state, comp ->
+        id + "." + comp.config.functionality.name
+      },
+      // use 'fromState' to fetch the arguments the component requires from the overall state
+      fromState: { id, state, comp ->
+        [
+          input_solution: state.input_solution,
+          input_method: state.method_output
+        ]
+      },
+      // use 'toState' to publish that component's outputs to the overall state
+      toState: { id, output, state, comp ->
+        state + [
+          metric_id: comp.config.functionality.name,
+          metric_output: output.output
+        ]
+      }
+    )
+
+  /******************************
+   * GENERATE OUTPUT YAML FILES *
+   ******************************/
+  // TODO: can we store everything below in a separate helper function?
+
+  // extract the dataset metadata
+  dataset_meta_ch = dataset_ch
+    | joinStates { ids, states ->
+      // store the dataset metadata in a file
+      def dataset_uns = states.collect{state ->
+        def uns = state.dataset_uns.clone()
+        uns.remove("normalization_id")
+        uns
+      }
+      def dataset_uns_yaml_blob = toYamlBlob(dataset_uns)
+      def dataset_uns_file = tempFile("dataset_uns.yaml")
+      dataset_uns_file.write(dataset_uns_yaml_blob)
+
+      ["output", [output_dataset_info: dataset_uns_file]]
+    }
+
+  output_ch = score_ch
+
+    // extract the scores
+    | extract_metadata.run(
+      key: "extract_scores",
+      fromState: [input: "metric_output"],
+      toState: { id, output, state ->
+        state + [
+          score_uns: readYaml(output.output).uns
+        ]
+      }
+    )
+
+    | joinStates { ids, states ->
+      // store the method configs in a file
+      def method_configs = methods.collect{it.config}
+      def method_configs_yaml_blob = toYamlBlob(method_configs)
+      def method_configs_file = tempFile("method_configs.yaml")
+      method_configs_file.write(method_configs_yaml_blob)
+
+      // store the metric configs in a file
+      def metric_configs = metrics.collect{it.config}
+      def metric_configs_yaml_blob = toYamlBlob(metric_configs)
+      def metric_configs_file = tempFile("metric_configs.yaml")
+      metric_configs_file.write(metric_configs_yaml_blob)
+
+      def task_info_file = meta.resources_dir.resolve("task_info.yaml")
+
+      // store the scores in a file
+      def score_uns = states.collect{it.score_uns}
+      def score_uns_yaml_blob = toYamlBlob(score_uns)
+      def score_uns_file = tempFile("score_uns.yaml")
+      score_uns_file.write(score_uns_yaml_blob)
+
+      def new_state = [
+        output_method_configs: method_configs_file,
+        output_metric_configs: metric_configs_file,
+        output_task_info: task_info_file,
+        output_scores: score_uns_file,
+        _meta: states[0]._meta
+      ]
+
+      ["output", new_state]
+    }
+
+    // merge all of the output data 
+    | mix(dataset_meta_ch)
+    | joinStates{ ids, states ->
+      def mergedStates = states.inject([:]) { acc, m -> acc + m }
+      [ids[0], mergedStates]
+    }
+
+  emit:
+  output_ch
+}
+
+// inner workflow hook
+def innerWorkflowFactory(args) {
+  return run_wf
+}
+
+// defaults
+meta["defaults"] = [
+  // key to be used to trace the process and determine output names
+  key: null,
+
+  // fixed arguments to be passed to script
+  args: [:],
+
+  // default directives
+  directives: readJsonBlob('''{
+  "tag" : "$id"
+}'''),
+
+  // auto settings
+  auto: readJsonBlob('''{
+  "simplifyInput" : true,
+  "simplifyOutput" : false,
+  "transcript" : false,
+  "publish" : false
+}'''),
+
+  // Apply a map over the incoming tuple
+  // Example: `{ tup -> [ tup[0], [input: tup[1].output] ] + tup.drop(2) }`
+  map: null,
+
+  // Apply a map over the ID element of a tuple (i.e. the first element)
+  // Example: `{ id -> id + "_foo" }`
+  mapId: null,
+
+  // Apply a map over the data element of a tuple (i.e. the second element)
+  // Example: `{ data -> [ input: data.output ] }`
+  mapData: null,
+
+  // Apply a map over the passthrough elements of a tuple (i.e. the tuple excl. the first two elements)
+  // Example: `{ pt -> pt.drop(1) }`
+  mapPassthrough: null,
+
+  // Filter the channel
+  // Example: `{ tup -> tup[0] == "foo" }`
+  filter: null,
+
+  // Choose whether or not to run the component on the tuple if the condition is true.
+  // Otherwise, the tuple will be passed through.
+  // Example: `{ tup -> tup[0] != "skip_this" }`
+  runIf: null,
+
+  // Rename keys in the data field of the tuple (i.e. the second element)
+  // Will likely be deprecated in favour of `fromState`.
+  // Example: `[ "new_key": "old_key" ]`
+  renameKeys: null,
+
+  // Fetch data from the state and pass it to the module without altering the current state.
+  // 
+  // `fromState` should be `null`, `List[String]`, `Map[String, String]` or a function. 
+  // 
+  // - If it is `null`, the state will be passed to the module as is.
+  // - If it is a `List[String]`, the data will be the values of the state at the given keys.
+  // - If it is a `Map[String, String]`, the data will be the values of the state at the given keys, with the keys renamed according to the map.
+  // - If it is a function, the tuple (`[id, state]`) in the channel will be passed to the function, and the result will be used as the data.
+  // 
+  // Example: `{ id, state -> [input: state.fastq_file] }`
+  // Default: `null`
+  fromState: null,
+
+  // Determine how the state should be updated after the module has been run.
+  // 
+  // `toState` should be `null`, `List[String]`, `Map[String, String]` or a function.
+  // 
+  // - If it is `null`, the state will be replaced with the output of the module.
+  // - If it is a `List[String]`, the state will be updated with the values of the data at the given keys.
+  // - If it is a `Map[String, String]`, the state will be updated with the values of the data at the given keys, with the keys renamed according to the map.
+  // - If it is a function, a tuple (`[id, output, state]`) will be passed to the function, and the result will be used as the new state.
+  //
+  // Example: `{ id, output, state -> state + [counts: state.output] }`
+  // Default: `{ id, output, state -> output }`
+  toState: null,
+
+  // Whether or not to print debug messages
+  // Default: `false`
+  debug: false
+]
+
+// initialise default workflow
+meta["workflow"] = workflowFactory([key: meta.config.functionality.name], meta.defaults, meta)
+
+// add workflow to environment
+nextflow.script.ScriptMeta.current().addDefinition(meta.workflow)
+
+// anonymous workflow for running this module as a standalone
+workflow {
+  // add id argument if it's not already in the config
+  def newConfig = deepClone(meta.config)
+  def newParams = deepClone(params)
+
+  def argsContainsId = newConfig.functionality.allArguments.any{it.plainName == "id"}
+  if (!argsContainsId) {
+    def idArg = [
+      'name': '--id',
+      'required': false,
+      'type': 'string',
+      'description': 'A unique id for every entry.',
+      'multiple': false
+    ]
+    newConfig.functionality.arguments.add(0, idArg)
+    newConfig = processConfig(newConfig)
+  }
+  if (!newParams.containsKey("id")) {
+    newParams.id = "run"
+  }
+
+  helpMessage(newConfig)
+
+  channelFromParams(newParams, newConfig)
+    // make sure id is not in the state if id is not in the args
+    | map {id, state ->
+      if (!argsContainsId) {
+        [id, state.findAll{k, v -> k != "id"}]
+      } else {
+        [id, state]
+      }
+    }
+    | meta.workflow.run(
+      auto: [ publish: "state" ]
+    )
+}
+
+// END COMPONENT-SPECIFIC CODE
diff --git a/target/nextflow/spatially_variable_genes/workflows/run_benchmark/nextflow.config b/target/nextflow/spatially_variable_genes/workflows/run_benchmark/nextflow.config
new file mode 100644
index 0000000000..8772005046
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/workflows/run_benchmark/nextflow.config
@@ -0,0 +1,86 @@
+manifest {
+  name = 'spatially_variable_genes/workflows/run_benchmark'
+  mainScript = 'main.nf'
+  nextflowVersion = '!>=20.12.1-edge'
+  version = '2.0.0'
+}
+
+process.container = 'nextflow/bash:latest'
+
+// detect tempdir
+tempDir = java.nio.file.Paths.get(
+  System.getenv('NXF_TEMP') ?:
+    System.getenv('VIASH_TEMP') ?: 
+    System.getenv('TEMPDIR') ?: 
+    System.getenv('TMPDIR') ?: 
+    '/tmp'
+).toAbsolutePath()
+
+profiles {
+  no_publish {
+    process {
+      withName: '.*' {
+        publishDir = [
+          enabled: false
+        ]
+      }
+    }
+  }
+  mount_temp {
+    docker.temp            = tempDir
+    podman.temp            = tempDir
+    charliecloud.temp      = tempDir
+  }
+  docker {
+    docker.enabled         = true
+    // docker.userEmulation   = true
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  singularity {
+    singularity.enabled    = true
+    singularity.autoMounts = true
+    docker.enabled         = false
+    podman.enabled         = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  podman {
+    podman.enabled         = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    shifter.enabled        = false
+    charliecloud.enabled   = false
+  }
+  shifter {
+    shifter.enabled        = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    charliecloud.enabled   = false
+  }
+  charliecloud {
+    charliecloud.enabled   = true
+    docker.enabled         = false
+    singularity.enabled    = false
+    podman.enabled         = false
+    shifter.enabled        = false
+  }
+}
+
+process{
+  withLabel: lowmem { memory = 20.Gb }
+  withLabel: midmem { memory = 50.Gb }
+  withLabel: highmem { memory = 100.Gb }
+  withLabel: lowcpu { cpus = 5 }
+  withLabel: midcpu { cpus = 15 }
+  withLabel: highcpu { cpus = 30 }
+  withLabel: lowtime { time = 1.h }
+  withLabel: midtime { time = 4.h }
+  withLabel: hightime { time = 8.h }
+  withLabel: veryhightime { time = 24.h }
+}
+
+process.errorStrategy = 'ignore'
diff --git a/target/nextflow/spatially_variable_genes/workflows/run_benchmark/task_info.yaml b/target/nextflow/spatially_variable_genes/workflows/run_benchmark/task_info.yaml
new file mode 100644
index 0000000000..78937fb59e
--- /dev/null
+++ b/target/nextflow/spatially_variable_genes/workflows/run_benchmark/task_info.yaml
@@ -0,0 +1,47 @@
+name: spatially_variable_genes
+label: "Spatially variable genes"
+summary: "Detecting genes whose expression levels vary across spatial regions."
+motivation: |
+  Recent years have witnessed significant progress in spatially-resolved transcriptome profiling techniques that simultaneously characterize cellular gene expression and their physical position, generating spatial transcriptomic (ST) data. The application of these techniques has dramatically advanced our understanding of disease and developmental biology. One common task for all ST profiles, regardless of the employed protocols, is to identify genes that exhibit spatial patterns. These genes, defined as spatially variable genes (SVGs), contain additional information about the spatial structure of the tissues of interest, compared to highly variable genes (HVGs).
+description: |
+  Identification of spatially variable genes is crucial to for studying spatial domains within tissue microenvironmnets, developmental gradients and cell signaling pathways. In this task we attempt to evaluate various methods for detecting SVGs using a number of realistic simulated datasets with diverse patterns derived from real-world spatial transcriptomics data using scDesign3. Synthetic data is generated by mixing a Gaussian Process (GP) model and a non-spatial model (obtained by shuffling mean parameters of the GP model to remove spatial correlation between spots) to generate gene expressions with various spatial variability. For more details, please refer to our [manuscript](https://www.biorxiv.org/content/10.1101/2023.12.02.569717v1) and [Github](https://github.com/pinellolab/SVG_Benchmarking).
+references:
+  doi:
+    # Benchmarking computational methods to identify spatially variable genes and peaks
+    # Zhijian Li, Zain M.Patel, Dongyuan Song, Guanao Yan, Jingyi Jessica Li, Luca Pinello
+    # bioRxiv 2023.12.02.569717; doi: https://doi.org/10.1101/2023.12.02.569717 
+    - 10.1101/2023.12.02.569717
+authors:
+  - name: Zhijian Li
+    roles: [ author, maintainer ]
+    info: 
+      github: lzj1769
+      orcid: 0000-0002-1523-1333
+  - name: Zain M. Patel
+    roles: [ author ]
+    info: 
+      github: doczmp
+  - name: Dongyuan Song
+    roles: [ author]
+    info:
+      github: SONGDONGYUAN1994
+  - name: Guanao Yan
+    roles: [ author ]
+  - name: Jingyi Jessica Li
+    roles: [ author ]
+    info:
+      github: JSB-UCLA
+  - name: Luca Pinello
+    roles: [ author ]
+    info:
+      github: pinellolab
+  - name: Robrecht Cannoodt
+    roles: [contributor]
+    info:
+      github: rcannood
+      orcid: 0000-0003-3641-729X
+  - name: Sai Nirmayi Yasa
+    roles: [contributor]
+    info: 
+      github: sainirmayi
+      orcid: 0009-0003-6319-9803
\ No newline at end of file